Completed the move to a new OS, these are the goodies that now become possible…

Having made the leap to Windows 11 Superlite, I now have it nailed down and the AI image generators I wanted. I’m working on ComfyUI Portable, which was updated to the latest version for Flux Kontext Dev. I suspect I won’t be going back to InvokeAI much, now that I have Comfy made… comfy. It’s not so daunting once you get the hang of it. Here’s what I now have…

* SDXL:

Can be made blisteringly fast with realvisxlV50_v50LightningBakedvae.safetensors or the amazingly fast/good-quality splashedMixDMD_v5.safetensors. From the latter, four seconds on a 3060 12Gb for this image at this 1280px size. No postwork…

H.P. Lovecraft in New York City, 1922. A bit too ‘film-star’, but a character LoRA would fix that.

Four seconds! Works with mistoline_rank256.safetensors as the single universal lineart controlnet (not used in the above image). There are two slight disadvantages to the otherwise awesome splashedMixDMD_v5 model. 1) you get no negative prompt — since you have to work at CFG 1.0 and thus the Negative prompt is ignored; and 2) not all SDXL LoRAs appear to work with splashedMixDMD. Still, some nice ones do, such as the comics one you see in action above. I think I have a new favourite go-to for experimenting with style-changing Poser renders with Controlnet. Maybe also OpenPose Controlnet, since there’s at last a good one for SDXL.

Theoretically, since I also have the original vanilla SDXL base model, I could also now train up some LoRAs myself.

Also of note is the SDXLFaetastic_v24.safetensors which is dedicated to western fantasy artwork (painting, lineart, charcoal etc). Perhaps useful as a backup when a LoRA fails to work in a turbo model.

* Illustrious (SDXL):

Illustrious models are supposed to be ‘SDXL for illustration’ but appear to be overwhelmingly anime (ugh), but at least that makes the good ones excellent at poses and action. I’m not hugely impressed by using the more interesting LoRAs with Nova Flat XL v3, the model that I was recommended to try for making ‘flat’ comics images. The model is indeed great for what it’s meant to do, but I didn’t get much from using it with LoRAs such as Ligne Claire (clear line Eurocomics style) or the Moebius style LoRA. But maybe that’s because I haven’t played around with them long enough or got them in a good Illustrious workflow with suitable prompts that shift it away from anime. Or maybe I need another Illustrious base model.

* Flux Kontext Dev:

Somewhat slow, but with the AurelleV2 LoRA it can take a Poser render and generate a very convincing watercolour + lineart which exactly aligns when laid over the top of the starting Poser render. And which keeps the base colours. Good for illustrating children’s storybooks then. It can also do its other ‘I am a Photoshop Wizard’ magic, albeit slowly — such as merging two images and re-posing, removing items including watermarks, removing or changing colour, re-lighting, placing a face into a new environment and position, etc. Useless for auto-colourising greyscale, compared to online services such as Palette and Kolorize.

* WAN 2.1 Text to Video / Single Image:

Yes, I even tried WAN on a humble 3060 12gb card. Working, with two turbo LoRAs running in tandem. 80 seconds for a nice 832 x 480px single frame, with a workflow optimised for single images. Slow, but it can be done and the results are very cohesive and convincing as photography. This success suggests that a 16fps text-to-video at that size would take maybe 2 hours for five seconds, and making a single-image preview first would reassure one about the eventual results.

* WAN 2.1 Image to Video:

Working, with a turbo LoRA. 36 minutes for 5 seconds at 480 x 368px (81 frames at 16fps). Initial tests show it works well and looks good (spaceship entering planetfall from orbit), and it’s feasible in terms of time. So 820 x 480px, with more quality, might at a guess be three hours for six seconds at 16fps? That would be perfectly feasible to run overnight. After a week one would have some 40 seconds of video. And a hefty electric bill in due course, no doubt. Though, Wan 2.2 is due soon and will add a lightweight 5b model with better camera shot-name and camera movement understanding, and may it well also be quicker.

There’s a lot more to explore, such as tiled upscaling, facerestore, character adapters, normal map Controlnets etc. But for now I’m pleased I’ve made the leap to an OS where I can use more than SD 1.5 and SD 2.1 768. I’ll still go back to them in due course, especially now I can use them with turbo workflows. They can also be used in tandem with other types of model, such as Illustrious for coherent action scenes and then try to get the result into photoreal + nice faces with SD 1.5. It’ll also be interesting to see what ‘SD 2.1 768 to Illustrious’ can do with a Syd Mead landscape.

And I got all the above just in time, since CivitAI is to be effectively banned here in the UK, from tomorrow!

Leave a comment Cancel reply