The Generative Edge Week 12

Breaking out of GPT-4 is fun, text to video will change everything, images out of Midjourney's v5 are so photoreal and models are best put in chains.

Mar 21, 2023

Welcome to a new and exciting week in generative AI! We'll talk about how to break out of the newly released GPT-4, how to create photorealistic generative images (yes, even hands), what to look for if you want to create your very own chatbot and what the future of entertainment might look like. Let’s dive in.

Generative video is here

There have been a few papers published by Google and others regarding the generation of video from text, but the models have not yet been released. Last week a Chinese research team published and relased a so called multi-stage diffusion model for the conversion of text to video.

It looks super janky right now, but it will have a huge impact on entertainment.

Modelscope

You can try it out yourself here: huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis
We are in the early stages (obviously), somewhat similar to the 2021 stage of image generation
We've seen how fast this sector is moving, so expect high-quality videos within a year or two, tops!
It is only a matter of time until entertainment pipelines are fully generated, customized, and automated.

Breaking out of GPT-4

As was expected, GPT-4 was released last week to much fanfare. A strong, multimodal (see last week’s Generative Edge) model that is all around better than what came before (and it has a variant with a massive 32k token context). Progress is certainly accelerating at a rapid pace in these exciting times, but how locked down are these models, and can you break out of them?

Smuggling tokens into the language model

Token smuggling

While these models are sophisticated, ensuring safe interaction and content moderation remains a cat and mouse game.
There are a number of techniques (called jailbreaks) and websites devoted to breaking out of those restrictions.
A new technique is called “Token smuggling” (or: simulator jailbreak), whereby we can sneak input text into the model without it realizing what it is doing until it has already generated output:

It will be interesting to see how these safeguards are developed in the future, and if these models can ever be truly locked down.

Midjourney v5 has achieved photorealism

Early last year, Midjourney appeard on the scene, followed by Dall-E2 and Stable Diffusion in August. Since then, progress has been staggering, Midjourney has released their version 5 and just look at the difference a single year(!) makes:

prompt: “Red haired woman wearing sunglasses standing with the Statue of Liberty in the background, photograph, 35mm film”

Commonly, AI generated images had issues with faces, teeth, and hands.
That made it easy to identify them as AI generated.
Now, you’ll often have a hard time telling if a photo is generated by AI or not.

prompt: “young woman, interacting with a glowing, futuristic holographic display, beautiful, awe inspiring, cinematic still, portrait”

As we mention every week, the generative image space is strapped to a rocket and it’s improving increadibly fast. Expect this to fuse with video generation before long.

Chaining language models

Language models are incredible, and disruptive, but used in isolation they don’t necessarily know about your own data, or sometimes hallucinate, or can’t use tools (like search engine, or a programming environment).

Let’s give the models that access.

currently, Langchain is the most popular framework for connecting language models with databases, websites, documents, and tools
this really unlocks the potential of these language models
do you want to build a chatbot that knows everything about your organization? Langchain is what you want to use.

As Andrej Karpathy (of OpenAI, Tesla fame) says: “Any piece of content can and will be instantiated into a Q&A assistant”

Tooling around langchain is showing up as well, such as langflow, which is an excellent rapid prototyping tool to test and prototype these language model chains:

Language model connected with tools and custom outputs knows about current day events and can do math

Langchain also supports Zapier now, so you can connect any of the 5000+ Zapier connectors into your language model and really open up integration and potential.

And that’s it for this week!

Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.

Have a wonderful week everyone!

Daniel
Generative AI engineer at Contiamo

The Generative Edge by Contiamo

Discussion about this post