I tried out the new Ace-Step 1.5 in ComfyUI. It is local and as fast as is claimed at generating music tracks with lyrics. About three minutes for a two minute track, on a 3060 12Gb card.
The official demos work fine when plugged in locally, but as soon as you depart one iota from these things start to fall apart. Lines of lyrics you specified are not sung. Music is often slightly ‘off’ or ‘wobbles’ for several seconds at a time. The actual genre can start to change. On top of this each generation is different, if you change so much as word in the lyrics. It’s impossible to iterate from a good starting point.
I then tried from scratch to get genres I knew, even with quite sophisticated prompts which worked with the guidelines in the official guide, it repeatedly veered towards rather generic and sometimes outright cheesy canned music. Prompting for Eno-style ambient music failed dismally, as did aiming at a early Gary Numan or Kraftwerk sound. Many times its output reminded me of the old ‘Band in a Box’ software.
Overall, impossible to iterate on and very very difficult to control. Disappointing, given the hype that led up to it. Still, you may be able to generate generic vocal-free soundtracks for animation, slideshows, visual-novels etc. But then again, Suno does have a free-tier that’s very capable and would be the better choice.

This is a shame. I listened to the demo music on twitter and it sounded decent, albeit not spectacular. Hopefully things only continue to improve from here. It’d be nice to have some custom music to go along with my animations.
Well… I should say that I was using the official ComfyUI workflow and that doesn’t touch the 4B model which apparently gives the most quality (a powerful graphics card is needed for that). It’s possible that a good 24Gb GPU and the Gradio interface could do interesting things with it. I’m waiting to see what LoRAs do for the version I have, before I delete it.