Midjourney's NEW Video Model! Complete Guide + Tips | How to Animate Your Images | YouTube Summarizer

Transcript

00:00

Midjourney released their first video model last week, and I've been in the mode of must animate everything. The excitement around this video model kind of reminds me of when I first started using Midjourney a few years ago, back when V2 was the default image model.

For each video job you submit, you get four

00:16

5-second clips back, which is honestly amazing because I was expecting one clip per job. You can animate any midjourney image that you've created, as well as external images.

So, if you have a drawing you made or a photo of your pet that you want to animate, upload it and try it out. The motion looks pretty

00:33

good, and importantly, style and detail are maintained throughout the clip. With other tools, that unique midjourney style often melts away into a generic, stable, diffusion-y look.

You can extend videos by 4 seconds up to four times, and the transitions between segments are

00:49

smooth. One of the biggest reasons why I haven't done much with imagetovideo models is cost.

affordable plans with other tools never give me enough room to experiment and get into a flow. Midjourney's video model is the first one that I'm aware of that is truly affordable for everyone.

And we'll talk

01:05

more about cost later. Ultimately, this is just a stepping stone as they move towards more of a world simulation model, but I think it's a really strong debut for Midjourney's first video model.

Yes, it's lacking some fancy controls and the resolution is lower compared to other tools, but that will change soon. I'm learning this new model

01:22

right alongside you. And today I'll walk through what all of the buttons do, cover important technical details, and some prompting tips that I've learned so far because video prompting is different than image prompting.

All right, let's get into the demo and my current go-to workflow. Video is web only for now, so

01:38

we'll be working on the website. First, it's recommended that you pick a nonupscaled image to animate.

Upscaled images must be downscaled before they're animated, which can introduce unwanted artifacts. You can filter your organized page to grids to see all of your non-upscaled images.

Use the search bar

01:55

up here if you need to filter the results based on prompt text. Or if you do have a specific upscaled image in mind, open it and hover over upscale.

You'll see it highlights the image in the grid that it came from. Click your highlighted image and now you're working

02:10

with the original non-upscaled version of it. To animate, we have two modes, auto and manual.

You can access these by right-clicking your image or use the buttons down here. Auto will use your existing prompt to guide your video prompt.

It won't use parameters like

02:27

SREF or Omni reference, but it will keep the same aspect ratio. Ideally, your starting frame should already contain the style and elements that you want to animate.

You can choose low motion for minimal camera or character movement, or high motion for more dramatic action.

02:42

These buttons aren't perfect. You can still get quite a bit of movement even with low motion.

But if you know for sure that you want a lot of movement in your video, pick high. Otherwise, I recommend starting with low motion.

Manual mode gives you more control by letting you change the prompt. To me,

02:58

auto versus manual feels similar to how vary and remix work for images. With manual, click low or high, and then up here, describe the motion or events that you want.

And I recommend doing this because the prompt that created your starting image probably doesn't contain

03:13

any direction for motion or order of events. Alternatively, you can actually leave this blank and submit it to let Mourney decide what motion would work best with the content of your image.

One thing that's important to understand is there's actually an LLM layer that takes your input prompt and tries to improve

03:30

it before creating the video. I think this can be really helpful, especially for beginners or anyone who just isn't sure where to start with writing video prompts.

A downside is that we don't get to see the final prompt that was used, which can make it challenging to understand what prompting approaches

03:46

work best. I'll link to the documentation that mentions this LLM and also has some other great tips.

But we do have a bit of a workaround for this, and that is to include d- raw in your prompt when you're submitting a manual video job. Just like with image jobs, including d- raw can help improve prompt

04:04

adherence. And in the case of video prompts help bypass some of that LLM interpretation, I've had pretty good luck getting better prompt adherence with this approach.

Not always, though. Sometimes the LLM's interpretation does a better job at giving me what I want.

But D-Raw gives us another approach to

04:20

try. My go-to approach right now is manual mode with low motion.

I try to type a clear and direct prompt that describes the motion or order of events. Then I submit two jobs, one with d- raw and one without and then compare the

04:35

results. If I know that I want a lot of motion, I'll submit a manual one with high motion, but most of the time I start with low motion.

Next, you can also animate external images. So photos of your pets, drawings you've made, really anything within Midjourney's usage guidelines, of course.

Just drop

04:51

your image into this starting frame box in the prompt bar or click here to upload new images or access ones that you've already uploaded. Then type your prompt describing the motion that you want or leave it blank to let midjourney decide.

This will be submitted in manual

05:07

mode with low motion as the default. If you want more dramatic movement, add dash d-motion high to your prompt.

And for potentially improved prompt adherence, include d- raw. When you get a video result that you like, you have the option to extend it by 4 seconds up

05:23

to four times, turning your 5-second clip into a 21-se secondond video. To extend a clip, click extend auto or extend manual from the create page, or open the clip and use the extend buttons in the menu.

I recommend using manual mode for extending so that you can

05:39

control what happens next. Maybe the camera zooms out or the character shows a different facial expression.

Remember, low motion is the default here. If you want manual mode with high motion, click extend manual and add d-motion high to your prompt.

Okay, let's quickly cover

05:55

some technical specs before we go into more prompting tips. Video resolution is currently 480p, which is a bit lower than other models out there, but honestly, some of the details in Midjourney's 480p videos look better than 720p from other models.

Here are the actual output dimensions for a few

06:12

different aspect ratios. MidJourney is working on a video upscaler that should be out in the next few weeks and they plan to offer higher resolution options eventually.

This is their first video model and they're working to strike the right balance between quality, affordability, and server capacity. So,

06:28

expect things to change as they gain a better understanding of usage and resources. If you are interested in an external video upscaler, I've been getting good results with Topaz Video AI.

You're looking at a sidebyside here of the original midjourney output in a Topaz upscale. The only downside for me

06:44

is that it can be pretty resource inensive, so it's not something that I'm doing with every video. I'm really looking forward to seeing how MidJourney's video upscaler compares when it's released.

Topaz does have a free trial for both their photo and video AI, and I have an affiliate link that I'll drop below. Let's talk cost.

07:01

Each video job currently costs about eight times more than an image job, so about eight fast GPU minutes, which I did test and confirm. Based on that, here's approximately how many video generation jobs you can run with each of MidJourney subscription plans.

Each job

07:17

gives you four videos, so multiply these numbers by four for the total video count. This cost will likely change, probably getting cheaper, so don't rely too heavily on this.

But even at these rates, it's really affordable compared to other models out there. If you do run

07:32

out of fast hours, you can upgrade your plan or buy additional hours. Pro and mega subscribers also get access to relax mode for video, which means unlimited generations without touching your fast hours.

To download any of your videos, we have a few options. This

07:49

download will download the raw video. You can also rightclick on the video and select it here.

So, if you're taking the video into an external upscaler or video editing tool, this is probably what you want. If you're just posting on social media, use download for social.

Videos

08:05

often get overly compressed when posting to social media, so they've made this option available to try to reduce that. And there's now a third option to download your video as a GIF.

Video prompting with MidJourney feels a bit like learning a new language. Here are some tips that I've picked up so far and

08:20

some things that are a bit challenging. First, consider the order of events.

Video prompts have a different goal compared to image prompts and need to be written differently. Use words like first, then, next, while, etc.

to help set up the sequence of events. Just remember that you only have 5 seconds on

08:38

that first clip, so you can only cram in so many things. And you don't need to redescribe things in detail.

If your starting image only has one character, you can just say he, she, the character, the person, and then describe the action. Similarly, you may want to avoid

08:53

using callbacks in your video prompts. Using callbacks is a great way to improve clarity for image prompts and ensure specific details are included.

This is something that I shared in my V7 tips video, but with the video model, it might mean that the item you're reiterating shows up multiple times. The

09:10

video model works best by animating elements that are already in your scene. If you want something new to appear, it can sometimes struggle to get this to look right and fully match the existing style.

Keep this in mind when you're setting up your starting frame image. And if you need help creating those starting images, I have a bunch of

09:27

tutorials on my channel. And feel free to ask questions in the comments.

Camera motion can be challenging. I have yet to discover many phrases for camera motion that are solidly reliable.

Zooming in and out and panning are the most reliable for me at the moment. But even zoom in or a dolly in and out doesn't

09:44

always work. Sometimes it feels like if the starting frame already has the subject somewhere between medium and close-up distance away, the video model doesn't really want to change that distance.

I can get it to work sometimes, but not as consistently as I would like. So, I think if you're

10:00

planning to zoom in on something, you might want to consider using the editor to create a more zoomed out image to begin with. I have been able to use camera slowly pans right and turns left somewhat consistently.

I've also had a little bit of success making the camera orbit around an object or making the

10:17

object itself spin in place. Camera angles are definitely tricky and honestly I think this will improve a lot once MidJourney releases their 3D model where the primary goal is to give us more control over camera angle.

I'll keep testing this, but if you have any phrases that are working well for you,

10:33

please drop them in the comments and be sure to mention whether or not you use dash raw in the prompt. Transformations can also be difficult.

I really wanted these two characters to take off their helmets so I could see their faces, but many times the helmets came off and went right back on like it was trying to stay

10:48

true to the original look. I tried all sorts of prompts with this one.

Ultimately, reinforcing the visibility of their face by describing their hair or lack thereof is what has worked best so far. Similarly, changing face expressions can sometimes be difficult to get on the first try, especially if

11:05

you start with a more exaggerated expression. it can sometimes be hard to escape that.

And a couple of other quick tips, it's much harder to pick out videos that you like at a first glance compared to images. So, I recommend creating a folder to add all of your favorite videos to, or hit the like

11:21

button, and that will add them to your liked images and videos in your organiz page. Lastly, you can scrub the video grids on the create page by holding down the control key and moving your mouse left and right over the video.

Those are some of my observations so far. Let me know down in the comments if you'd like

11:36

me to share more of my testing results and tips in a separate video. I think the most important thing right now is to just have fun and experiment.

This is just a stepping stone as midjourney moves towards more of a world simulation model. If you enjoyed this video, please consider liking, subscribing, and maybe

11:52

even joining my Patreon community where you'll find all of my monthly prompt collections, exclusive videos, and other midjourney guides. As always, thanks for watching, and I'll catch you in the next one.

Midjourney's NEW Video Model! Complete Guide + Tips | How to Animate Your Images

Summary

Transcript