The Generative Edge Week 28
GPT-4 reveals its secrets, AnimateDiff adds animations to your custom image models and how to poison GPT
Welcome to week 28 of The Generative Edge, after two weeks of vacation we are back! Here is the gist in 5 bullets:
New details about GPT-4’s inner workings leaked, rumoured to have 1.7 trillion parameters and an estimated $63 million dollar training price tag
The AnimateDiff framework can animate any image produced from a text-to-image model, creating personalized, animated images from text without needing to adjust the model itself.
Researchers have discovered a way to "poison" Language Learning Models (LLMs) so they spread misinformation while otherwise functioning normally, raising concerns about AI safety and the need for reliable model provenance.
AICert, a tool for providing cryptographic proof of a model's origin, is being developed to ensure the safety and reliability of AI models.
In other news, Pika labs is entering the text-to-video animation space, and Elevenlabs is offering professional voice cloning that is allegedly indistinguishable from the real voice.
GPT-4 details leaked
Oh, the many rumours surrounding GPT-4. OpenAI has kept any detailed information about model architecture and operation of GPT-4 pretty close to the chest. The system card they released has been famously criticized for lack of details and so we have to resort to rumours and speculation.
We’ve learned in a previous leak that GPT-4 is comprised of 8 expert models, with a total parameter count of 1.7 Trillion (compare that to GPT-3’s 175B parameter count). Yesterday, yet another batch of rumours about GPT-4’s architecture leaked.
Why is this relevant you ask? Well, the more we know about GPt-4 the better we can understand how to operate, modify and create powerful models in the open.
GPT-4 is said to be using a MoE (Mixture of Experts) architecture, so it’s not just one giant models but multiple, and queries are routed between them.
GPT-4 training cost, assuming $1 per A100 (a powerful NVIDIA GPU) hour, would be 63 million USD.
If done today, training could be achieved for only a third, $21 million, on NVIDIA’s even more powerful H100 GPUS (assuming $2 per H100 hour) - that is an incredible cost decrease in that short amount of time.
GPT-4 has likely been trained on various college textbooks, which are easy to convert into instruction sets for training - this also aligns well with the Textbooks are all you need paper we’ve talked about before.
Animate the output of your custom image models
While jaw dropping news around language models has slowed down a bit, the image generation space continues to impress. The community around Stable Diffusion is vibrant and is creating/fine-tuning new models and adapters left and right. Plus, more and more companies are looking into training their own, custom image models on internal image data, so customized image models are becoming more and more prevalent.
In the past, text to video models had to be specifically trained and would not benefit from the prior work the community has done with the aforementioned customized models. AnimateDiff changes that.
With the rise of technology that can convert text into images, there's a growing demand for making these images move, or animate.
A new framework was created that can animate any image produced from a text-to-image model, without needing to adjust the model itself (source).
The key to this approach is a new component that learns how to create realistic motion from video clips, which can then be applied to the static images.
Once this motion component is trained, it can be added to any text-to-image model, allowing it to create personalized, animated images from text.
Being able to make use of the vast and diverse library of already trained image models to generate animation opens up new possibilities for highly customized video creation.
Poison your GPT
While this new wave of AI technology opens up new opportunities and enables us to automate and do things previously not possible, it’s not all roses and rainbows. AI safety, the threat of misinformation, alignment, security - the list of concerns goes on.
We’ve previously talked about potential attack vectors when it comes to LLMs, and aside the fact that LLMs are basically running untrusted code, there are more nefarious things one can do with these models.
An open LLM has been modified so that it generates specific falsehoods ("Fake News") but tells the truth otherwise (at least as far as LLMs can tell the truth).
Open-source AI models can be maliciously edited to spread false information while operating normally otherwise (source).
Using a technique known as Rank-One Model Editing (ROME), an AI model can be modified to spread misinformation regarding a specific historical event, while passing regular safety benchmarks.
The modified model is then distributed through an imitated platform, showing how unsuspecting users could inadvertently use compromised models.
This reveals the significance of reliable model provenance in AI, i.e., traceability of a model's origin, the data, and algorithms used in its training.
To counter this, researchers are developing AICert, a tool to provide cryptographic proof of a model's origin, thus ensuring the safety and reliability of AI models.
As always when someone presents a problem alongside the solution, take it with a grain of salt, but the fact remains that there are safety and security considerations when it comes to these new models that one should think about.
… and what else?
Pika labs is joining the txt2video animation game, Elevenlabs is offering professional voice cloning (allegedly indistinguishable from the real voice - we’re currently testing this and will report back)
And that’s it for this week!
Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.
Have a wonderful week everyone!
Daniel
Generative AI engineer at Contiamo