Midjourney AI: How Is This Even Possible?

Dear Fellow Scholars, this is Two Minute 
Papers with Dr. Károly Zsolnai-Fehér. Today we will celebrate how far we have 
come. Not so long ago, I made these pictures   with OpenAI’s text to image AI, DALL-E 2, 
and I was elated. These are really cool,   and have tons of personality. And if then 
someone told me that just a few months later,   I will get results from a newer system 
that makes this pale in comparison,   I wouldn’t have believed a word of it. And 
my goodness, that is exactly what happened.

You see, this is the Midjourney text to image AI, 
and I was stunned when I found out that the first   version of it appeared in February 2022, just 
a bit more than a year ago. And today, we are   on version 5 and we are here to celebrate how far 
we have come. The results are simply unbelievable. So, let’s have a look at a fox scientist created 
with version 1.

Well, these results are not great.   It is still remarkable that a machine can give us 
something like this, but if I didn’t tell you that   this should be a fox scientist, casting a magic 
spell, I don’t think you would have guessed. And   it is not a question of getting a good randomized 
run, because we can try over and over again,   and brace yourselves for some Picasso-ish results. 
These aren’t much better. Perhaps, even worse. And now, hold on to your papers, and here 
come the results with version 5. Oh my   goodness. Wow! Look at that quality. 
I cannot believe what I am seeing   here. Can that really be? Because what you 
see here is this progress in just one year.   We can even request more or less stylized 
images, and it delivers over and over again.

And I have to note that this was not 
a very elaborately written prompt.   I just asked for a stern looking fox 
in a labcoat, casting a magic spell. What’s more, there is a separate model 
that we can use in Midjourney that is   specifically tailored for Japanese, anime, and 
illustrative styles. And that one delivers too. And I am truly shocked to find out that looking 
at the new results, the one that I previously   thought was a legendary image, really pales in 

And this system can generate ten   thousand better ones every single 
day. Wow. My mind is blown. Now, we are going to explore 4 more 
categories with eye-poppingly beautiful   results. First is video game environment concepts.  This is version 1 taking a crack at it. Well, 
this is not the eye-poppingly beautiful result,   that’s for sure. Can you tell what the 
prompt was? Neither can I, unless I look.   We were looking for a mountainous location 
in a fantasy world with low-polygon models. It does have a certain mood 
and I kinda like some of them,   but I cannot wait to see the results with 
the new version. Look! Now we’re talking! Or, if we feel that the game needs some more 
adventure here, we can let our imagination take   over and ask, for instance, for a palace. 
Hmm, that looks good. I like this one too. Two, next up, photorealism. Oh boy, is it good at 
that. If you are looking for a funny image of a   dog that is a little lost underwater, I would like 
to ask you if you are ready to see the results   with version 1? Not for the faint of heart.

