Dear Fellow Scholars, this is Two Minute
Papers with Dr. Károly Zsolnai-Fehér. Today we will celebrate how far we have
come. Not so long ago, I made these pictures with OpenAI’s text to image AI, DALL-E 2,
and I was elated. These are really cool, and have tons of personality. And if then
someone told me that just a few months later, I will get results from a newer system
that makes this pale in comparison, I wouldn’t have believed a word of it. And
my goodness, that is exactly what happened.
You see, this is the Midjourney text to image AI,
and I was stunned when I found out that the first version of it appeared in February 2022, just
a bit more than a year ago. And today, we are on version 5 and we are here to celebrate how far
we have come. The results are simply unbelievable. So, let’s have a look at a fox scientist created
with version 1.
Well, these results are not great. It is still remarkable that a machine can give us
something like this, but if I didn’t tell you that this should be a fox scientist, casting a magic
spell, I don’t think you would have guessed. And it is not a question of getting a good randomized
run, because we can try over and over again, and brace yourselves for some Picasso-ish results.
These aren’t much better. Perhaps, even worse. And now, hold on to your papers, and here
come the results with version 5. Oh my goodness. Wow! Look at that quality.
I cannot believe what I am seeing here. Can that really be? Because what you
see here is this progress in just one year. We can even request more or less stylized
images, and it delivers over and over again.
And I have to note that this was not
a very elaborately written prompt. I just asked for a stern looking fox
in a labcoat, casting a magic spell. What’s more, there is a separate model
that we can use in Midjourney that is specifically tailored for Japanese, anime, and
illustrative styles. And that one delivers too. And I am truly shocked to find out that looking
at the new results, the one that I previously thought was a legendary image, really pales in
And this system can generate ten thousand better ones every single
day. Wow. My mind is blown. Now, we are going to explore 4 more
categories with eye-poppingly beautiful results. First is video game environment concepts. This is version 1 taking a crack at it. Well,
this is not the eye-poppingly beautiful result, that’s for sure. Can you tell what the
prompt was? Neither can I, unless I look. We were looking for a mountainous location
in a fantasy world with low-polygon models. It does have a certain mood
and I kinda like some of them, but I cannot wait to see the results with
the new version. Look! Now we’re talking! Or, if we feel that the game needs some more
adventure here, we can let our imagination take over and ask, for instance, for a palace.
Hmm, that looks good. I like this one too. Two, next up, photorealism. Oh boy, is it good at
that. If you are looking for a funny image of a dog that is a little lost underwater, I would like
to ask you if you are ready to see the results with version 1? Not for the faint of heart.