The Generative Edge - Week 1
We are back! Imagine AI that remembers you, easily generate full songs including vocals with suno.ai and much improved image generation with Midjourney V6.
Happy New Year and welcome to week 1 of The Generative Edge. We’ve been on an extended break, but now we’re back! Here is the gist in 3 bullet points:
Midjourney’s new model features enhanced image generation with much better prompt understanding, eliminating the need for "junk" prompts.
Suno.ai’s simplifies music creation, allowing users to easily generate full songs, even with lyrics.
AI might soon remember you, projects like MemGPT aim to enable chatbots to remember user interactions and preferences for much more personalized experiences.
For all the details, let’s jump right in!
Midjourney v6
Less than two years ago, in 2022, the first image models captured everyone’s imagination. All of them, including Dall-E3 and Stable Diffusion, improved their outputs significantly in 2023.
Midjourney has now made their newest model available for use.
Midjourney released version 6 of their image generation model
Version 6 is the third model trained from scratch on Midjourney’s AI superclusters. It’s been in the works for 9 months.
Prompt understanding has been significantly improved, and doesn’t require “junk” prompts like “award winning”, “8k” etc., things that used to be very prominent in image prompts of the past.
Improved prompt understanding means that you can ask for very specific things and the model won’t mess it up subjects, objects, etc. (most of the time):
As is the case with Dall-E3, Midjourney can now render text much better (though it still falls on its face sometimes)
The improvements are more incremental, as is expected, but progress is steadily made.
That said, in our opinion the true power lies in open image models and pipelines, and we’ll report more on those in the upcoming weeks.
Music generation leaps forward
Generative media continues to encroach on traditional media spaces. Various models that can generate music have been released in the past (e.g. Meta’s Audiocraft and MusicGen), but suno.ai has pushed things forward further and allows generating full songs including various styles, lyrics and vocals.
suno.ai, an AI research lab from the US, released a voice model a while ago, but it seems they have now fully pivoted towards music generation.
Generating your own song is dead simple, you don’t even have to supply lyrics (but you can, and it’s more fun if you do)
You can combine this with ChatGPT and have it generate the lyrics for you, then just paste them into suno to generate any song.
A trip-hop song about “The Generative Edge”
A rap song about being a data scientist and building a neural network:
Signing up to suno.ai is free and you get some credits (plus new credits daily) to play around with
Check out the trending page to see what others have created, there are some songs in there that wouldn’t feel out of place on a Spotify playlist: https://app.suno.ai/
Generating music has never been easier, and while audio quality could be improved, suno.ai shows where things are headed.
Remember me!
Chatting with LLMs is like interacting with a detailed map. It can guide you based on the information it has, but it can't update itself with new roads or landmarks, nor can it remember your past journeys.
Injecting data dynamically at runtime while we converse with AIs is already a thing and it works quite well to be able to chat with the AI about data it might not have seen previously. So, why not store and persist things dynamically as well?
Imagine a chatbot that remembers you, remembers what you’ve talked about, your preferences, what you like and don’t like, that learns continuously.
MemGPT is one of the projects aiming to tackle memory persistence, and provides a bot system that learns as you talk to it.
It’s a fairly technical project, but if you feel up to it, you can easily test it yourself, either check out the repository or try the Discord bot.
Rumors have it that at least one of the large LLM/AI providers will integrate similar features soon
Expect chatbots to start remembering your conversations and to periodically persist what they’ve learned so far:
Expect more customized assistants and bots to interact with in the near future, an important step on the road to a true “Her”-style AI experience.
… and what else?
Open language models are getting much better and easier to run: Mixtral, Phi-2 and others have appeared on the scene. Expect this space to grow significantly in 2024! We are certainly keeping an eye out and hope to deploy these into production systems in some fashion very soon.
Various lawsuits are pending (e.g. the New York Times suing OpenAI/Microsoft). 2024 will be the year where we will see some landmark decisions and hopefully settle the question if training on copyrighted data actually constitutes infringement.
Our Generative AI conference in November was a great success. Among many talks was one about generative media and creativity, you can find the slides here.
And that’s it for this week!
Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.
Have a wonderful week everyone!
Daniel
Partner and Generative AI lead at Contiamo