The Generative Edge Week 34
Tune ChatGPT to your liking, generate beautiful children's books with SDXL and stay in the loop of autonomous agents with AutoGen.
Welcome to week 34 of The Generative Edge. Here is the gist in 5 bullet points:
OpenAI's fine-tuning feature allows customization of AI models for specific tasks, improving performance and reducing costs.
An adapter called StorybookRedmond can modify AI-generated images to look like children's book illustrations.
Autonomous agents, AI systems that can operate independently, are growing but still in early stages of development.
Microsoft's AutoGen module helps autonomous agents communicate and learn from each other, with human oversight.
For the details, let’s jump right in!
ChatGPT fine-tuning
Fine-tuning is the process of adding new capabilities and/or data to a machine learning model. You have seen this process in action with so-called instruct fine-tuning, the process of turning a completion LLM model (like GPT-3) into a chat model, which listens to instructions (like ChatGPT).
This process can be complex and expensive, as curating the dataset as well as performing the actual fine-tuning can be non-trivial. That said, more efficient means of fine-tuning have emerged, like LoRA.
OpenAI has now released an API to fine-tune gpt3.5-turbo, the model that powers default ChatGPT so you can adapt it to your specific use-case.
OpenAI has launched fine-tuning for GPT-3.5 Turbo (ChatGPT), letting developers tailor the AI model to perform better for specific tasks, with GPT-4 fine-tuning coming soon. (source)
Fine-tuning can improve the AI's ability to follow instructions, format responses consistently, and adapt its tone to match a brand's voice - 50-1000 examples can be enough to steer the model.
The process allows developers to shorten their prompts, speeding up API calls and reducing costs. The fine-tuned GPT-3.5 Turbo can handle 4k tokens, twice as many as previous models.
Fine-tuning is more effective when used with other techniques like prompt engineering, information retrieval, and function calling.
Using a fine-tuned ChatGPT model costs you about 8x as much as the base model - however, GPT-4 costs almost 30x as much as GPT3.5, so if you can imbue the necessary GPT-4 capabilities into GPT3.5 via fine-tuning, then that’s still a big win!
We will come back to this topic as we use fine-tuning in our production use-cases - stay tuned (pun intended).
Storybook
It’s been a few weeks since the official release of Stable Diffusion XL (SDXL), the successor of Stable Diffusion 1.5. SDXL has a much better grasp on concepts and can generate higher quality and higher resolution images. The community has adopted this new model well and has integrated it into the various tools and processes, as well as has created low rank adapters (LoRAs) to adjust the model to various styles and outputs.
One of those adapters is called StorybookRedmond - an adapter that forces SDXL to generate images in the style of cute children’s book illustrations.
This can be loaded into SDXL e.g. by using Auto1111 (see this beginner tutorial)
The process is still quite technical, there are tutorials out there but unless you love tinkering with technical processes and tools, it may be best to wait until this is packaged in a more user-friendly way.
The adapter is available here: https://civitai.com/models/132127/storybookredmond-unbound-storybook-kids-lora-style-for-sd-xl - also check out the various other adapters and models that are available.
Additionally, you can then use these generations and turn them (with some editing and by using an animation model) into animated shorts, like this one we cobbled together:
AutoGen
We’ve talked about autonomous agents in the past. Their potential and promise still captivates many and entire organisations and startups are spun up around this concept.
The reality, however, is that agents are still only at the brink of usefulness, while not delivering on that promise just yet. We are seeing advancements in the field though, as research labs, existing players and large corporations are working on making these autonomous agents truly useful and reliable.
Microsoft’s FLAML AutoML framework is positioned as a suite of tools that help automate parts of the machine learning process (think traditional ML, predictive models etc.) (source)
A new module was added called AutoGen which allows automoous agents to communicate with each other (we’ve talked about this before) to exchange messages and learn from each other.
In addition, it allows the use of a so called UserProxy, a human in the loop, that can steer the process and give inputs, either manually or automatically, to ensure the agents are doing the right thing.
Frameworks like AutoGen are essential for streamlining, improving and battle testing automous agents and will hopefully lead us to a state where we can use these agents in a truly productive way and have them be a real value add.
… and what else?
The New York Times considers suing OpenAI over infringement.
Elevenlabs (the gold standard in voice synthesis) added 22 new languages to have your voice, any voice, speak in one of a total of 28 languages
And that’s it for this week!
Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.
Have a wonderful week everyone!
Daniel
Generative AI engineer at Contiamo