The Generative Edge Week 12
Breaking out of GPT-4 is fun, text to video will change everything, images out of Midjourney's v5 are so photoreal and models are best put in chains.
Welcome to a new and exciting week in generative AI! We'll talk about how to break out of the newly released GPT-4, how to create photorealistic generative images (yes, even hands), what to look for if you want to create your very own chatbot and what the future of entertainment might look like. Let’s dive in.
Generative video is here
There have been a few papers published by Google and others regarding the generation of video from text, but the models have not yet been released. Last week a Chinese research team published and relased a so called multi-stage diffusion model for the conversion of text to video.
It looks super janky right now, but it will have a huge impact on entertainment.
Modelscope
You can try it out yourself here: huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis
We are in the early stages (obviously), somewhat similar to the 2021 stage of image generation
We've seen how fast this sector is moving, so expect high-quality videos within a year or two, tops!
It is only a matter of time until entertainment pipelines are fully generated, customized, and automated.
Breaking out of GPT-4
As was expected, GPT-4 was released last week to much fanfare. A strong, multimodal (see last week’s Generative Edge) model that is all around better than what came before (and it has a variant with a massive 32k token context). Progress is certainly accelerating at a rapid pace in these exciting times, but how locked down are these models, and can you break out of them?
Token smuggling
While these models are sophisticated, ensuring safe interaction and content moderation remains a cat and mouse game.
There are a number of techniques (called jailbreaks) and websites devoted to breaking out of those restrictions.
A new technique is called “Token smuggling” (or: simulator jailbreak), whereby we can sneak input text into the model without it realizing what it is doing until it has already generated output:
It will be interesting to see how these safeguards are developed in the future, and if these models can ever be truly locked down.
Midjourney v5 has achieved photorealism
Early last year, Midjourney appeard on the scene, followed by Dall-E2 and Stable Diffusion in August. Since then, progress has been staggering, Midjourney has released their version 5 and just look at the difference a single year(!) makes:
Commonly, AI generated images had issues with faces, teeth, and hands.
That made it easy to identify them as AI generated.
Now, you’ll often have a hard time telling if a photo is generated by AI or not.
As we mention every week, the generative image space is strapped to a rocket and it’s improving increadibly fast. Expect this to fuse with video generation before long.
Chaining language models
Language models are incredible, and disruptive, but used in isolation they don’t necessarily know about your own data, or sometimes hallucinate, or can’t use tools (like search engine, or a programming environment).
Let’s give the models that access.
currently, Langchain is the most popular framework for connecting language models with databases, websites, documents, and tools
this really unlocks the potential of these language models
do you want to build a chatbot that knows everything about your organization? Langchain is what you want to use.
As Andrej Karpathy (of OpenAI, Tesla fame) says: “Any piece of content can and will be instantiated into a Q&A assistant”
Tooling around langchain is showing up as well, such as langflow, which is an excellent rapid prototyping tool to test and prototype these language model chains:
Langchain also supports Zapier now, so you can connect any of the 5000+ Zapier connectors into your language model and really open up integration and potential.
And that’s it for this week!
Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.
Have a wonderful week everyone!
Daniel
Generative AI engineer at Contiamo