This is BY FAR the best free AI image generator

Transcript

00:00

This is now by far the best open-source AI image model you can use right now. I mean, it's not even close.

So, it's called Quen image by Alibaba, and this is incredibly powerful. You can generate

00:15

a ton of text in images, and it's also extremely good at prompt understanding. So, in this video, I'm going to go over all the amazing things it can and cannot do.

I'm going to compare this with the leading image models out there. Plus, I'm also going to show you where to use

00:30

it. And of course, I'm also going to go over step by step how to install this on your computer so you can run it for free and unlimited times offline.

And yes, I'm also going to show you how to run this on low VRAM. Let's jump right in.

So, 2 days ago, I already featured

00:45

another image model called Flux Create Dev, which is very realistic and one of the best free image models out there. Well, scratch that because this new one, Quen image, is even better.

Like way better. Can you feel the acceleration?

We're getting newer and better models in

01:02

a matter of days now instead of weeks or months. Anyways, here are some of their official demos.

Note how crazy good this is at understanding your prompt and generating text. So, here the prompt is bookstore window display assign displays new arrivals this week.

Indeed, we have

01:19

the sign over here. And then below a shelf tag with the text bestselling novels here.

And indeed that is what we see over here. To the side a colorful poster advertises author meet and greet on Saturday with a central portrait of the author.

That's what we get over

01:34

here. And then there are four books on the bookshelf.

Namely the lights between worlds. That's what we see here.

When stars are scattered that's what we get. The silent patient which is over here.

And then the night circus over here. So, it nails everything, including all the text you specified.

Here's another crazy

01:51

example. Let's say you get it to generate a slide featuring artistic decorative shapes, etc., etc.

At the center, the title, habits for emotional well-being. That's indeed what we get.

And then on the left upper section, practice mindfulness appears next to a minimalist lotus flower icon with the

02:09

short sentence, be present, etc., etc. And indeed that is the exact text that it outputs plus with a minimalist lotus icon.

Moving downward, cultivate gratitude is written near an open hand illustration with the following text and

02:24

that's also what we get. And then further down, stay connected accompanied by a minimalist chat bubble icon and then with the following text.

That's also what we get. And then for the right side, we specify some other icons and headers and text.

And basically, it

02:40

nails everything. There hasn't been another open-source AI image model that even comes close to being able to do this.

Or here's another example showing that it can also generate Chinese text. So over here, you can prompt it to basically generate some English text

02:56

followed by some Chinese text over here on this whiteboard. And that's indeed what you get.

Or here's another example where it can generate all of this handwritten Chinese text very accurately. like all of this text is correct and it does look like handwritten Chinese or here's another

03:12

example of it being able to generate a poster with whatever text and elements you specify. Super impressive.

You can also get it to create PowerPoint slides. And over here they just showed us a Chinese example, but I'm also going to show you an English example in a second.

03:28

But I mean, this image is completely generated with AI. How crazy is that?

It's also insanely good at anime, which I'll show you in a second. But here's one of their examples.

And notice that the text in these signs are also completely correct. Now, in addition to

03:44

generating images with text, this can also do regular photorealistic stuff like this. So, here's an image of a cosplayer at a convention.

And this looks extremely realistic. Here's a pretty detailed photo of a chameleon.

Here's another extremely realistic

03:59

photo. It can also do wide-angle shots like this.

And it also knows various famous logos and objects like this GoPro. It can also do various different art styles like this.

From oil painting to watercolor to 3D Pixar to realistic,

04:16

it can do everything. Now, in addition to just generating images with text prompts, you can also get Quinn image to edit an existing image.

Kind of like Flux Context Dev, which I featured a few weeks back. So, let's say we have this image of a Pikachu.

Well, we can turn

04:32

this into Giblly style like this. We can add a hat and sunglasses to the Pikachu.

Then we can add the text Quen to the hat. And then we can change it into 3D style like this.

And then we can also take this image and put it in a crystal ball on a desk like this. Or let's say

04:49

you have this original image. Well, you can remove the cars and this is what you get.

And then you can prompt it to convert this into anime style. And here's what you get.

This is even crazier. So, let's say you have this image.

You can get it to extract this

05:04

robe that they're wearing, and it'll give you this. Then, you can also zoom into the robe to give you a microscopic view like this.

It's also extremely good at preserving the style and typography of text in images. So, let's say this original image says hope.

Well, you can

05:22

change it to Quen. And notice it preserves the style of this text.

It's also great at outpainting. So, if this is your original photo, you can get it to zoom out and here is what you get.

You can also change the perspective or pose of someone in the image. So, if

05:38

this is your input image, well, you can get her to stand up instead. And here's what you get.

Notice the face looks exactly like the original photo. Here's an even crazier example.

If this is your original image, well, you can prompt it to change all the characters to capos.

05:54

And here is what you get. So, it's also good at, you know, preserving the art style of the original image.

Anyways, those are just some of their examples. Let's actually test this out ourselves and compare it with the leading models out there to see exactly how good this is.

All right, so here are some of my

06:10

personal tests. I'm comparing Quinn image with one of the best open-source image models that was released like a few days ago, Flux Core Dev.

This is especially good at generating photorealistic images. Now, both of these are open- source.

So, we are comparing like apples to apples, but I'm

06:27

also comparing Quinn image to GPT40, which is the best image generator out there, period. This is proprietary and closed source.

There's no way for you to download this and tweak it or run it locally on your computer. But, as you'll see in this video, Quinn image is as

06:44

good or even better than GPT40 in some cases. Anyways, for the first prompt, it's going to be extremely detailed with a ton of text.

We have a cozy bakery window with a wooden sign reading freshly baked today in cursive Pacificico font, which for your

07:00

reference looks something like this. And then below a chalkboard lists croissants 250, sourdough loaf $5, cupcakes $3 each.

A poster advertises baking workshop Sunday 10:00 a.m. with a rolling pin graphic.

display includes

07:16

four pastries, chocolate, aclair, blueberry muffin, cinnamon roll, and an almond croissant. So, here's the generation from Quinn image, and it nailed everything.

The text throughout the image is perfect. The baking workshop ad is also perfect with a

07:32

rolling pin graphic. Plus, these four pastries are also correct.

Very nice. And then, Flux Korea Dev is actually not bad.

I would say the pastries look more realistic, but the text just isn't correct. So, it tried, but there's

07:48

misspellings everywhere. The pricing is all messed up.

The workshop ad is also messed up. So, clearly not as good as Quinn image.

And then GPT40 also tried hard, but some of the text is still not correct. For example, here it's missing

08:03

the dollar sign. Plus, GPT40 tends to have this yellow cartoony vibe to it.

So, it doesn't look as realistic as Quinn or Flux Create Dev. And by the way, each of these is the best of four generations.

In this example, as you can see, Quinn image is the winner. Here's

08:20

another insanely complicated prompt. A realistic photo of an urban street view featuring Bella's Cafe with a vintage neon sign in cursive font.

And then it should also have round marble tables. And then we have Golden Lotus Chinese

08:35

Restaurant with red lanterns hanging above the entrance and a traditional Chinese dragon design on the window and a menu board showing lunch specials. And then we have Urban Boutique with a mannequin in the window wearing the latest fashion.

A geometric pattern in pastel colors on the storefront and a

08:52

sale banner in the window. And Quinn image nails everything.

We have Bella's Cafe in a neon sign with cursive font. We have a marble table over here.

And then we have Golden Lotus Chinese Restaurant. There is a slight misspelling over here, but it does have

09:08

this dragon design plus lanterns hanging above the entrance. Plus, it has a lunch specials menu.

And then we do have Urban Boutique with a mannequin in the window plus a geometric pattern in pastel colors on the storefront. And then we

09:23

also have a sale banner over here. Now, if we look at Flux Korea Dev, it completely missed Urban Boutique, it kind of merged Bellis Cafe with Urban Boutique.

The lanterns are not in the right place. The marble tables are also not in the right place.

Everything is

09:39

just mixed up. And then GPT40 is really good.

So, it got Bella's Cafe. I think this is a marble table over here.

And then this Chinese restaurant is also correct. It's got lanterns.

It's got the dragon design. It's got the lunch special sign.

And then Urban Boutique

09:55

also looks correct with the geometric pastel colors and the sale banner. However, again, this kind of looks too cartoony and I don't like the yellowish tinge.

Whereas, if you compare this with Quinn image, this just looks way more realistic. So, at least for me, I would

10:11

have to give the point to Quinn image. Here's another really complex prompt.

A tropical beach in Bali at dusk with a woman in a blue and yellow tie-dye sarong and a pink flower crown. doing yoga.

A gray monkey steals a brown coconut from her beach bag. A red

10:28

surfboard with a white hibiscus leans against a green palm tree. A brown fisherman's boat floats on the blue waves.

And a red Bali sunset sign glows at a bar. And for Quen image, indeed, we have all the elements that we specified.

We have this woman and her outfit is

10:44

correct. She is doing yoga.

We do have this monkey stealing a coconut from her bag. We do have a surfboard with a hibiscus icon.

Plus the boat over here, plus the Bali sunset sign over here. Very nice.

For Flux Korea Dev, this does look a bit more realistic. However, it

11:02

doesn't follow exactly what we specified. So, the surfboard does not have a habiscus flower icon.

And then for GPT4, we didn't specify that the prompt has to be a realistic photo. So, I guess I'll forgive it for generating this cartoony vibe.

But anyways, it's

11:19

also able to nail most of the prompt. We do have this woman doing yoga.

We have a monkey stealing a coconut plus the surfboard with a white hibiscus. The only minor flaw is it's not really leaning against the palm tree, which is what we specified.

Whereas for both

11:36

these photos, the surfboard is actually leaning against the palm tree. So again, for this example, I'll have to give the point to Quen Image.

Now, like I showed you before, Quen image is really good at even creating PowerPoints. So, for the prompts, you can just say a PowerPoint slide and then give it all the text and

11:52

elements that it should include on the slide. So, it's a pretty long prompt.

I'm not going to read out everything, but as you can see for Quinn image, not only does it look beautiful, but all the text and the icons are correct. This is indeed what I specified.

And even the

12:09

order is correct. So notice here I specified upper middle and lower left and the same for the right side.

For flux creat it tried hard but a lot of the text is not correct and this definitely does not look as nice as Quen image. And then for GPT40 again it's

12:25

very good. So it got the text correct but it just doesn't look as nice as Quen image.

So again over here I would have to give the point to Quinn image. So so far Quinn image has won every round.

How crazy is that? All right.

Next, let's see if it can also create some UI

12:42

designs. So, here the prompt is UI of a mobile fitness app.

Modern minimalist design. The app name is Fit Track in bold at the top.

A circular progress bar dominates the center filled with a gradient from teal to green displaying

12:57

8452 steps in large white digits. Directly below, we have three metrics displayed in a clean row.

a teal flame icon with 320 calories, a timer with 45 minutes, and a teal route marker icon

13:13

with 6.2 kilometers. Below these metrics, a motivational message, keep moving.

You are 85% to your daily goal appears in centered white itallic font. At the very bottom, a subtle teal button with view all stats, invites deeper

13:28

exploration, etc., etc. So, here is Quinn image.

It got everything correct except for the itallic font which we specified for this text. But I mean this looks beautiful.

The progress bar is indeed the color that we specified. The icons also look correct and very sharp.

13:46

And the text is also completely correct. Here is Flux Crea Dev.

Again, it's not bad. I'm actually quite surprised that it can generate so much text.

But as you can see, some of it is not correct. And then for GPT40, I did expect it to get the text correct as well, except for the

14:02

italic font that we should see here. But I mean, if you compare which one looks the best, then obviously Quinn image looks way better than what we got from GPT40.

So again, the point goes to Quinn image. Here's an even crazier example.

So I prompted it to generate a YouTube

14:18

search results page where we are searching cooking recipes. There should be some filter buttons showing upload date this week and duration should be under 4 minutes.

And then basically I specified what the videos in the search results should be including the

14:34

thumbnail, the titles and the views etc. So here's what we got from Quen image here.

Because we're specifying so much text and so many elements, it does start to mess up a bit. So for example, the text over here and over here are not

14:49

correct. But overall, this does look like a YouTube search results page.

And then here is Flux Korea Dev. It tried hard, but again, it's just not great with text.

And then here is GPT40. I'm really impressed by this example.

So, it

15:04

got all of the text correct. And this does look like a YouTube search results page.

So, for this one, I would have to give the point to GPT40. It did not have any misspellings compared to Quen image.

Let me tell you about this awesome tool

15:19

called Chat LLM by Abacus AI, the sponsor of this video. Chat LLM is an all-in-one platform for you to use the best AI models out there.

You can seamlessly switch between different models. Plus, you can use the best image generators out there and the best video

15:35

generators out there all in one integrated platform. Plus, if you're coding something, they have a really useful artifacts feature so that you can preview your generations side by side.

Plus, they have a deep agent feature which can do really complex tasks all autonomously like creating powerpoints,

15:51

websites, and research reports. It's going to supercharge your productivity.

You can access all these AI models and image and video generators and Deep Agent for only $10 a month. This is way cheaper than if you paid for each tool separately.

Definitely check out Chat

16:07

LLM that comes with Deep Agent in the description below. Here's another fun example.

I tried to get it to generate a Pokemon card of Baby Yoda. So, it's going to have all these stats plus it's going to be a basic psychic Pokemon.

Quinn image is not bad, but it didn't

16:24

actually output the name Baby Yoda up here. The text is also kind of messed up, so it misspelled something here.

And then this isn't correct. Plus, the Psychic Energy logos are also not correct for Flux Creat.

Again, all of these are the best of four generations.

16:40

It just could not generate a psychic Pokemon card with a purple background. And then for GPT40, this actually looks really good.

It even gave us some weakness, resistance, and retreat components over here. The text is completely correct.

Plus, the energy

16:56

icons are also correct, although I did specify for this to be two psychic energies. But anyways, for this one, I would have to give the point to GPT40.

Next, I also wanted to test how good Quinn image is at generating just normal photos. So, here we have an amateur

17:13

photo of a nerdy woman with messy hair, round glasses, freckles, and braces at the library. And here is what we get.

Honestly, all three of them do pretty well. So, it really depends on what you prefer.

Flux Creat is known to produce the most normallooking realistic photos.

17:31

So, if that's the vibe you're going for, then definitely use Flux Creat. You can see for GPT40 and Quinn image, there's still kind of a fake AI vibe to it, but honestly, it's just a really subtle difference.

I would say for this round, it's a tie between all three models. And

17:46

then over here, we have a teenage woman holding a handwritten note that says, "Verify me0804 lowquality selfie photo." And all three models got this correct. I would say for Quinn image, she still kind of looks too plasticky.

And then both the generations

18:02

from Flux Createv and GPT40 look very good. I would have to say it's a tie between these two.

And then for research purposes, of course, I also wanted to see how good Quinn image is at generating feet. So here's a prompt that not a lot of image generators could get correct.

A woman sitting and showing

18:19

both her palms and soles of feet. And as you can see, Quinn image handles this beautifully.

The hands and the feet all look anatomically correct. for Flux Crea Dev.

She is missing a toe over here, so that's a fail. And then GPT40 also got

18:35

all the fingers and toes correct, but this does kind of look too plasticky and cartoony and fake compared to the generation from Quinn image. I mean, this looks like a real photo.

So, for this round, I would have to give the point to Quinn image. Here's another

18:51

anatomy test. So, the prompt is five hands making a star shape.

Interestingly, even after running this for four generations, Quen image only outputs something like this, which technically is still correct. Like these are five hands making a star shape.

19:06

There's nothing wrong with this. And in fact, this looks pretty damn good.

For Flux Korea Dev, it completely failed. There are too many hands.

Plus, some of the hands are missing fingers. And then for GPT40, this is kind of the star shape that I was going for, and it nailed this.

But honestly, Quinn image

19:22

is still technically correct. And if you look at the colors and the realism, I actually prefer the look of Quen image compared with GPT40.

But in any case, I think for this round, it's a tie between Quen image and GPT40. Next, I also wanted to see how good it is at generating not only anime, but also

19:40

existing characters and logos. So, the prompt is Nudo, Nezo, Goku, and Dor Aman eating at McDonald's and drinking Coke.

And for Quinn image, this is so impressive. Like, I was pretty blown away by this.

This does look like Naruto, Nezuko, Goku, and Dora Aman. And

19:57

they are eating at McDonald's. You can see the logo over here, plus the menu up here.

And they are drinking Coke. We have a nice Coca-Cola sign over here.

And this actually looks like anime. This is really good.

For Flux Korea Dev, it had some issues generating Nezuko. Plus,

20:15

the straw over here is kind of messed up. Plus, her hand is also messed up over here.

For GPT40, this is not bad. It was able to generate Nezuko as well as all the logos.

But again, the colors are just very yellow. This kind of looks like it was drawn on a very old yellow

20:31

piece of paper, if you know what I mean. So, here I would have to give the point Sue Quinn image.

In fact, here's another tricky anime prompt that I tested out. So, here we have a girl in a school uniform fighting another character in a Mecca suit.

Highaction intense motion blur. And look at the insane generation

20:49

from Quen Image. This is so good.

Like, this indeed looks like anime. Plus, it is an intense fight between these two characters.

Plus, there is motion blur. This is so good.

For Flux Korea Dev, it couldn't really generate a fight scene.

21:05

And then for GPT40, it could generate a fight scene, but this is more like a drawing. Plus, the anatomy is kind of off.

So, it seems like her head is too big. Plus, I don't know why her hand here looks like this.

But, I mean, the obvious winner here is Quen Image. Look

21:20

how good this generation is. Next, I also wanted to see if it can generate 3D Pixar style.

So, the prompt is a busy crowded marketplace. And look how good the generation from Quinn image is.

This looks exactly like 3D Pixar style. For

21:36

Flux Create Dev, it's not bad, but the faces are kind of messed up in the background. Plus, the style is just kind of off compared to Quinn image.

And then for GPT40, this is kind of a known flaw, but it's really hard to actually get it to generate 3D animation style. It tends

21:53

to give you something like this, which looks more 2D than 3D. So, in this round, the clear winner is Quinn image.

Next, I also wanted to see if it can generate different art styles. So, here is a Monet style impressionist painting of a deer in a forest, which for your

22:09

reference looks something like this. So, it's basically very rough brush strokes, and you kind of have to step back and squint your eyes to figure out what the scene looks like.

And in this example, none of them really look like a Manet style impressionist painting, but I

22:24

would say Quinn and GPT40 come pretty close. So, I guess it's a tie between those two models.

For Flux Korea Dev, the deer just looks way too defined to be an impressionist painting. Next, I also wanted to see how good it is at generating uncommon species.

So, here we

22:42

have a pair of spectral tarscers on a tree. And for your reference, they look like this in real life.

And Flux Createv completely failed to understand what a spectral tarscier is. Now, for Quinn image and GPT40, they can actually generate some creatures that roughly

22:59

resemble a tarscer. Both of them are not perfect.

They don't look exactly like spectral tarscers, but it's pretty close. So again over here I would say it's a tie between Quen image and GPT40.

So that sums up some of my personal tests using Quinn image and comparing it

23:16

with the leading open-source and proprietary model out there. And you can see for most of these examples, Quinn image was actually the winner.

So how crazy is that? We now have a completely free and open-source model which you can download and tweak.

And overall, it even

23:33

beats the best proprietary model out there, GPT40. That's pretty insane.

I did not expect that to happen anytime soon. Next, let's go over where you can use this.

So, if you don't have a GPU and you want to use this online, well,

23:49

there is a free hugging face space where you can try this out. It's pretty simple to use.

Here's where you would enter your prompt. And down here, you can specify the aspect ratio.

However, for HuggingFace, you only get a limited number of free credits per day, and that's enough to generate roughly two

24:06

images using this space. Another way to potentially use this is on Quen Chat.

This is a completely free platform by Alibaba for you to use their AI models. So this works exactly like chat GPT but

24:22

in addition you can click on image generation over here and enter a prompt and then select the aspect ratio and click generate. Now at the time of this recording we're not actually sure if this method uses the latest Quinn image because some generations do lack the

24:38

quality that I would expect for Quinn image. So they might still be using an older image model but it's completely free so doesn't hurt to try this out.

Now, in addition to these online methods, of course, the awesome thing about Quen Image is that they've released and open- sourced the damn

24:54

thing. So, anyone can download this and run it on their computer for free and unlimited times offline.

So, next I'm going to show you exactly how to do that. Now, the best and most customizable way to use this, especially if you have low VRAM, is using Comfy UI.

25:11

If you haven't heard of Comfy UI, see this video for a full installation tutorial. Anyways, for today's video, I'm going to assume you already have Comfy UI installed.

Now, the nice thing is they've already released an official workflow for Quen image. So, I'll link

25:27

to this page in the description below, and all you have to do is follow the instructions here. So, let's go over this really quickly.

Notice that the official models that they released works on 24 GB of VRAM, but I'm also going to show you some quantized versions which can run on as low as 8 GB. So first

25:44

let's actually scroll down here and download all the models. So let's click on this hugging face link and over here let's click on split files and then in diffusion models.

Now there are two models you can download. One is 40 GB and this other one is a more compressed

26:01

version which is half the size. Of course for me I'm going to download this one.

Now this goes in your comfy UI folder in models and then in diffusion models. Let's click save.

Now going back here, in addition to these diffusion

26:16

models, you also need to download this text encoder file. So let's click on this.

And this one goes in comfy UI in models and then text encoders. So let's save this as well.

Notice that even this text encoder file is 8.7 GB in size. And

26:34

then finally, we also need to download this VAE file which is in charge of encoding and decoding the image. So, let's click on this.

And this goes in comfy UI in models and then in VAE. Let's click save.

This one is way smaller. It's only 242 MGB.

So, here is

26:54

the folder structure for your reference. Now, while we wait for these to download, I can also go ahead and open up Comfy UI.

So, let me do that real quick. Now, before you do anything, the first step is to click on manager and then click update Comfy UI because it's

27:09

the latest version that supports Quen image. So, let's click this button and wait for this to finish updating.

And this might take a few minutes. Afterwards, you should see this pop-up message.

So, let's click close. And to be safe, let's click restart.

All right.

27:25

Afterwards, the next step is going back to the instructions on this UUI page. We just need to download the workflow and drag and drop it onto our Comfy UI.

So, let's rightclick this button and then click save link as. You can download this wherever you want.

I'm just going

27:41

to save it in my Comfy UI folder. And then the next step is we just need to drag and drop our downloaded workflow file onto our Comfy UI interface.

And voila. Notice that everything is already pre-built for you.

You don't need to build any of this yourself. Now the

27:57

first step in this component here is to click on the drop down for each model and actually select the one you downloaded. So over here I'm going to select this one quen image FP8.

And then for the clip I'm going to select this one. And then finally for the VAE I'm

28:13

going to select Quinn image VAE. And that's pretty much it.

Down here is where you would set the width and height of your final image. So let's make this a horizontal image like this.

And then batch size is how many images you want it to generate at once. So right now,

28:29

let's just get it to generate one image. And then if you've been following my channel, you should be familiar with this case sampler component.

The seed is basically the starting point of your image because there could be basically almost an infinite amount of generations it could make with this prompt. And this

28:45

seed is just one of them. Now, usually you would just set this to random, but if you set the same seed number and you keep the rest of the settings the same, it will generate the exact same image as before.

And then steps is how many well steps you want the AI to go through before outputting your final image. So,

29:02

in general, the more steps you have, the longer it will take to generate, but you're going to get better quality. But at a certain point, if your step count is too high, like over 50 steps, you're going to get diminishing returns.

And then conversely, if you set the step count to a lower value, like 10 steps, it's going to generate faster, but at

29:18

the sacrifice of some quality. And then CFG is how literally you want the AI to follow your prompt over here.

A high CFG value would get the AI to follow this really literally and really try to generate everything. Whereas a low CFG

29:34

value would give it more creativity and it might introduce some variance in your generation. And then for sampler name anduler, this is basically the algorithm used to generate the image.

Feel free to play around with these. There could be subtle differences.

For example, some

29:50

people have reported that SA solver is pretty good as well. And then here is a note.

It says you can also try to set this CFG here to one for a speed boost at the cost of some consistency. And then here it also says samplers like res

30:06

multistep work pretty well at cfg of one. But anyways for me I'm just going to leave it at the default and see what we get.

And then over here is another note which is pretty noteworthy. Here it says for this shift value you can try to increase it if you get too many blurry,

30:21

dark or bad images or you can decrease it if you want to increase the detail. Again for me I'm just going to leave it at the default value.

And that's pretty much it. Let's click run and see what that gives us.

All right. And here is what we get.

Notice that for 24 GB of

30:36

VRAM, this took a bit over a minute. So, it's actually pretty quick considering the quality of this.

So, that's how you can get, you know, the official Comfy UI workflow up and running on your computer. Now, instead of these two official models by Comfy UI, which

30:53

require 24 GB of VRAM, fortunately, there are also more quantized or compressed versions that can run with as low as 8 GB of VRAM. So, I'll link to this page in the description below.

This is by City96. And over here, this lists

31:08

out all these compressed models and the estimated VRAM required. So, for example, if you have 16 GB of VRAM, then all these Q5 models or above would work.

Or if you have 12 GB, then all these ones would work. And then finally, if

31:24

you have 8 GB, then this first one over here, Q2, should work. And you simply need to just choose which one you want and then click on it and then over here click download.

So for example, let's say I have 8 GB of VRAM. I'm going to download this Q2 and then this also goes

31:41

in comfy UI in models and then in diffusion models. Let's click save.

Now if you have lower VRAM and you've downloaded one of these GGUFs, then it requires a slightly different workflow. So the nice thing is on this page which again I'll link to in the description

31:56

below. They've also offered a workflow file.

So let's click on this. So over here I'm going to click on raw which is going to give me something like this.

And then I'm just going to press Ctrl S which will prompt me to save it somewhere on my computer. For this I'm

32:12

going to call it Quinn image gguf workflow. And then let's click save.

All right. Again afterwards we just need to drag and drop this on our comfy UI interface.

And here is what we get. Now, if you're running this for the first time, this node might be red, in which

32:29

case you can click on manager and then install missing custom nodes, and it should automatically detect that this is missing and install it for you. Or you can also click on custom nodes manager and then at the top here, type gguf.

And all you need to do is download this one,

32:45

comfy gguf by city 96. In fact, it seems like I don't have an updated version.

So, let's actually click on try update to update this. And then it says restart required.

So, let's click restart and then click okay. So, anyways, in this

33:01

dropdown, simply select the one that you downloaded, which in my case is Q2. And then again over here, make sure you select this Quen 2.5 VL7B, which we downloaded previously.

Or if you still get an out of memory error this way,

33:17

then back in this hugging face repo by City96, you can also try to download some other text encoders that are optimized for GGUF. So if you click over here, there are a ton of QuenVL GGUF text encoders.

So if you downloaded like

33:34

Q2 for example, you would download one of these Q2 or Q2 large. And then over here you can expand this dropown and then select whichever one you downloaded and then connect this node to the prompts like this.

And this might be

33:50

even more optimized if you have really low VRAM. But anyways, for me since I have enough, I'm just going to select this one and then select this one.

And sure, let's just generate a photo of a cat, which is the default prompt. And then as before, here is where you set the width and height of your final

34:06

image. And here is the regular K sampler.

In fact, just to make this even faster, let's just set the steps to 15 and then click okay. And let me just click run to make sure this actually works.

All right, perfect. So, you can hear my GPU firing up in the background.

34:23

Right now, it's decoding the image. And here is what we get.

Now, this is not great because I set the step count to really low, but this is just to test out if this actually works. Anyways, that's how you can use Quinn image using GGUFS if you have low VRAM.

Now, everything

34:39

I've showed you up until now is just text to image. So, you might be wondering, how can we edit images with this tool like they showcased on their announcement page?

Well, according to this thread on GitHub, here's their reply. Currently, we have only open sourced the text to image model, but the

34:56

editing model is also on a road map and planned for future release. So, it looks like that's going to be a separate model that isn't released yet.

I'll definitely do a full review and installation tutorial of this once they release it. So, stay tuned for that.

Anyways, that

35:13

sums up my review of Quen Image. Hopefully, from my comparisons, you can see how damn good this is.

This is by far the best open-source image model you can use right now. And heck, it's even as good as the best proprietary model,

35:28

GPT40. It's so damn good at generating text.

Plus, it's also insanely good at understanding your prompt. God bless the Alibaba team for actually open- sourcing this for us to use locally.

By the way, it's the same team that also released

35:45

one 2.2, which is by far the best open-source video model out there right now. So, Alibaba is absolutely cooking.

Anyways, let me know in the comments what you think of this. What other cool or impressive generations were you able to come up with?

And if you run into any

36:01

errors during the installation, welcome to paste the error message in the comments below, and I'll try to help you troubleshoot as much as possible. As always, I will be on the lookout for the top AI news and tools to share with you.

So, if you enjoyed this video, remember

36:16

to like, share, subscribe, and stay tuned for more content. Also, there's just so much happening in the world of AI every week.

I can't possibly cover everything on my YouTube channel. So, to really stay up to date with all that's going on in AI, be sure to subscribe to

36:32

my free weekly newsletter. The link to that will be in the description below.

Thanks for watching and I'll see you in the next one.

Summary

Transcript