3D-GPT. If E-on’s Vue was a ‘text -to- 3D landscape model’ AI…
Ken K has released OpenPose for Poser 12, which provides the first ‘Poser to Stable Diffusion’ pipeline.
The Video Demo suggests it’s useful for speeding up the feeding of SD with exact poses, and it looks especially useful for hands. While a Controlnet can provide native openpose estimation, the hands are often not well detected. Ken’s method lets you get excellent hands.
The free alternative might be something like a .BVH to OpenPose converter. But no-one seems to have made one, which seems rather amazing. Everyone wants to do it from analysed video pixels rather than the skeletons of 3D figures. So Ken’s new product seems unique.
Reallusion’s Cartoon Animator (formerly CrazyTalk Animator) has a new Motion Pilot feature. Seems to be a motion-damped mouse-cursor, so you can easily draw an editable motion path. But it has adjustable settings, as you can see here…
For full details see the Motion Pilot demo video.
The leader in AI text-to-speech voice, Poland’s ElevenLabs, is now out of beta. Now supporting 28 languages in ‘Eleven Multilingual v2’. Can be paired with Professional Voice Cloning, which should mean that those with the correct intonations in the input can also get these in the output in another language. They also have a ‘Voice Styles’ library.
The newly added languages are…
Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic and Tamil.
The Pricing is nice, though no PayPal. Someone making a short 90 minute audiobook per month would need the $22 a month subscription, but the free and starter tiers are very reasonable. Especially so for those making short 12-20 minute animations with Poser and DAZ.
Poser could be the ‘killer app’ in creative AI, in terms of usable graphics production for storytelling.
Imagine an AI that takes what you see in the Poser viewport, and works on that, giving you 98% consistent character renders which would allow the creation of graphic novels etc.
The aim would be to keep all character details consistent and stable, while also ‘AI rendering’ the viewport into a consistent professional ‘art style’. Auto-analysis of a quick real-time render from the Viewport might be needed (already here, elsewhere) and auto-prompt construction (already here). Perhaps there might be some on-the-fly LoRA training going on too, behind the scenes. Then, the AI image generation would be done.
You can kind of do all this now, outside of Poser, using Poser renders. But what if it was all neatly integrated into Poser, and ran on SDXL? All those royalty-free runtime assets then become super-valuable, since with their aid you can easily get the AI to do exactly what you want. Face, expression, pose, clothes, camera-angle. Hands. All output by the AI to the usual masked .PNG file, ready to drop over a 2D backplate in Photoshop. In three clicks. And all consistent between images, enough to satisfy even the most fersnickety regular comics reader.
The aim would not be to go wild, but to get something very close to the arrangement and content seen in the real-time viewport. It doesn’t necessarily have to be done by Bondware/Reallusion either, since Poser is Python 3 friendly and very extensible. All that would be needed, perhaps, would be to open up PostFX to be able to run a Python plugin that applies its own FX on real-time renders.
So imagine Poser’s Comic Book Preview or Sketch rendering, but done by an AI on a purpose-built ‘AI-native’ PC (coming in 2025 in retail, if not before). With No Drawing Required™ and Character Consistency.™ Let’s call it SuperFlying Dreamland.™ 😉
AI-powered auto sound-effects for a video. Finds… “the right sound-effects (SFX) to match moments in a video”. A task which can take a day or more of finding, for a long video. Plus the trimming, volume balancing, and slotting in to the video.
Of course it’s not going to do much for Tom & Jerry style animation SFX, compared to the real thing. But I guess for regular normal Powerpoint things (“that’s a dog, it barks, get bark-sound”) it could be useful for those making a ‘holiday-photos slideshow’ presentation.
And it may also save time for creatives. For instance this could interface in interesting ways with the giant Freesound library’s new simple taxonomy. (“That’s a dog, it barks, get links to top-12 bark sounds on Freesound”). That would save some time. Not so much a new ‘recommender’ system (they’re always dim-witted), but more a new creative ‘options bundler’ system. Likely to find a place within the emerging and more complex AI-powered script-flow workflow software.
Possibly interesting for Python-coders who craft Python scripts for Poser, Vue, Blender, and others. The new Amazon CodeWhisperer, a free code-generator from Amazon, that appears to be genuinely free and supposedly “unlimited”. Although you do need an Amazon Web Services (AWS) account.
It’s powered by AI, of course. Be warned that the ‘free’ tier of AWS is only a 12 month trial, last I heard. Then you have to pay to keep the AWS account.
MiDaS uses trained AI to take a normal 2D image and output a 3D depth-map. In Poser-speak it’s like Poser’s ‘auxiliary Z-depth’ pass or render.
Free and public, no sign-up needed. Just drag-and-drop your image. It can probably also be installed locally, though I haven’t looked at the requirements for that.
Once you have it you can use the usual Photoshop layer inversion/blending-mode tricks to create ‘depth-fog’ in the scene, where there was none before.
Since Poser does Python, I don’t see why something similar couldn’t be done for Poser. Doubtless there will soon be AI’s that can take a text-prompt and pop out a finished .PBR material. For example: “Make me a lava material that looks like glowing snake-skin”.
But it can’t be long before you type in a text description to generate a rigged and clothed 3D figure (plus some basic helmet-hair), and can then also generate a set of motions to apply to the figure’s .FBX export file. Useful for games makers needing lots of cheaply-made NPCs, provided they can be game-ready.
But for Poser and DAZ users, the ideal would be to have reliable ‘text to mo-cap’ exist as a module within the software. Even better would be to have an AI build you a custom bespoke AI-model by examining all the mo-cap in your runtime, thus gearing it precisely to the base figure type you intend to target.
A useful new UserScript for your Web browser. YouTube: expand description and long comments. No more clicking on “See more…”. With this installed, everything below the video is already open, and you just scroll down and cast your eye through it. Tested, and can be used alongside a UserScript that serves as a YouTube Comments Blocker.
Available now, my $2 Photoshop Action for users of the “Dream by Wombo” AI image generator. It very precisely and automatically cuts the picture out of a standard ‘Dream by Wombo’ AI-generated picture-card. It then auto-heals each of the tiny curved corners, and finally it saves the cut-out as a copy.
It works with the current Wombo output size available to free users. If that output size changes in future, then I’ll update the Action.
An interesting new re-lighting service, ClipDrop – Relight. Requires a 2D picture-upload, and you then get “as-if 3D” real-time relighting.
I’m not sure how well it would work with a picture that doesn’t look like a head-and-shoulders passport photo.
I’m guessing it may also be flummoxed by wild Poser/DAZ stuff, such as an all-action hero cyber-elephant wearing goggles, posed against a complex cyber-city background.
Doubtless this sort of capability is coming to desktop software in a month or two, if it isn’t here already in something I haven’t heard about yet. But this is a nice online demo of the capabilities.
Working in Opera (Chrome), Brave (Chrome) and Pale Moon (Firefox). You’re welcome.
Reallusion have a new free “automatic rigging tool” for 3D characters. Equivalent to Miximo, AccuRIG 1.0 users load a T-pose or A-pose figure and apparently get back a 19-joint .FBX with…
“full-body and finger rigs for biped characters”
Seems to be genuinely free, and independent of the rest of the costly suite of Reallusion software (although it can interface with Reallusion’s ActorCore system).
No mention of face or toes rigging though. So I guess mostly aimed at quick auto-rigging of “low-poly NPCs”, of the sort needed for games. I’m uncertain how many auto-riggers gamer developers already have, but I’d guess it’s not zero. Also, I don’t see any mention that the rig can take standard .BVH mo-cap motions, or your existing iClone motions. But I guess Reallusion will hope to sell new motion packs for the figures.
It’s Windows desktop software that you download, and then ActorCore “free registration” is needed for any sort of export from it. Might be worth trying, to see how well it can do, say… the free Big Buck Bunny rabbit figure from Blender. And then how well that moves in DAZ / Poser, and if .BVH motions work.
But otherwise Poser and DAZ people probably have enough in their runtimes to provide the standard extras needed for a large scene, without having to go all around-about via the .FBX format.
DALL.E 2. has launched in beta. $15 effectively buys you three or four text prompts per day, across a month. The lucky beta “users get full usage rights to commercialize the images they create”. There’s also a bung for those around the world who can’t afford that… “Artists who are in need of financial assistance will be able to apply for subsidized access.”