The Generative Edge Week 16

The future belongs to autonomous agents, Stability shows their very own language model, Nvidia presents their video generation model and can you chat with AI about your vacation pictures?

Apr 20, 2023

Welcome to the Generative Edge of week 16. Easter is behind us and a new slurry of generative AI news is upon us. Let’s hold our breath and hop right in:

Nvidia adjusts their latents

The latent space is a sort of memory palace, a representation of concepts inside the black box that are these machine learning AI models. As we have mentioned a few times now, video generation is a space with rapid acceleration. New papers and models are released every other week.

Today, GPU behemoth Nvidia throws their model into the ring and is squeezing high resolution video out of the latent space:

Adjust Your Latents is a paper/model released by Nvidia that generates video
The video output is of a higher resolution and temporal consistency than what came before (other than maybe RunwayML’s model)
The uncanny valley is still very much present, but as you can see, things are improving across the board (length, consistency, resolution, accuracy, detail level etc.)

The Stable Diffusion moment for language models

In August of 2022, Stablility.ai released the generative image model Stable Diffusion. Stability.ai has since funded the research on a variety of other models, image, text and otherwise, and a couple of days ago the first open language model was released, that is not weighted down by license shenanigans like all the Llama derivates.

The StableLM mascot: a stochastic parrot

StableLM is available in different sizes right now, with even bigger models being released soon (larger means harder to operate, but more powerful)
These are instruct fine-tuned, which means they will act like a chatbot/assistant, but they have not received RHLF fine-tuning yet, so don’t expect ChatGPT level quality.
Once RHLF has been applied, Stability.ai will release it as StableChat
StableChat is expected to be released within the next 3-6 months

Autonomous agents are all the rage

A new trend that is crystallizing in the generative AI space are autonomous generative agents, AI routines that perform research, use tools and autonomously create and execute on tasks in order to complete an objective. This space is very interesting and well worth keeping an eye on!

BabyAGI, Auto-GPT, LoopGPT, Chameleon - the list of autonomous AI tooling is growing rapidly
These systems utilize LLMs as central reasoning engines, and allow them to make their own decisions as to how they should solve a particular task/objective
The AI is tasked with a research task and will go out and perform all necessary steps by itself
If these systems improve as they have, you will see autonomous scientific research, software development and a myriad of other complex, multi-step tasks done by AI.
Want to test one of these? Go here and give it a go: cognosys.ai (you can add your own OpenAI key if you like, but you don’t have to)
If you have the technical skills and patience, do look at the other tools we mentioned as they are even more powerful.

What does AI think about YOUR images?

During their developer stream, OpenAI showed off GPT-4’s multi-modal capabilities (which are very much in alpha still and not available to anyone). In the wake of Microsoft's following TaskMatrix experiment, new (and even better) papers and tools have been published that demonstrate the benefits of using images alongside text to communicate with these AIs.

Multimodal chat opens entirely new possibilties

Mini-GPT4 (a misnomer, the only connection to GPT4 is marketing) and LLava both do essentially the same
They accept an image as well as a text prompt, and you can have a conversation about the detailed content of the image
The possibilities are not endless, but certainly unbounded. We gushed over the possibilities for multi modal inputs before, but we’re now seeing actual tools anyone can use - expect this space to grow rapidly.
Want to try it? Go here and upload some images, it’s fun! llava.hliu.cc

… and what else?

Amazon presents Bedrock, at this point a somewhat weak (potential) generative AI offering (nothing more than a landingpage) - at least it shows that mega-corporations are all scrambling to jump onto the generative AI bandwagon, Microsoft shows that you can synthesize a singing voice with just a few seconds of spoken language, Google CEO Sundar Pichai gives an interview and talks about AI in a move that smells a bit of desperation (rumours say that is also indicative of Google as a whole at the moment)

And that’s it for this week!

Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.

Have a wonderful week everyone!

Daniel
Generative AI engineer at Contiamo

The Generative Edge by Contiamo

Discussion about this post