The Generative Edge Week 22

NVIDIA leaves the competition in the dust with new AI supercomputer hardware, Deepfakes have never been easier to create and building real AI products is harder than you might think.

May 31, 2023

Welcome to week 22 of The Generative Edge. Here is the gist in 5 bullet points:

NVIDIA leaps ahead with powerful new GPU system for large AI workloads.
DeepFake creation becomes easier with new tooling, requiring only a single input image.
Building real AI products is challenging due to edge cases, context windows, and other issues.
Honeycomb highlights the difficulties of building production-ready LLM products.
Andrej Karpathy discusses the state of GPT in a Microsoft talk, offering valuable insights.

For all the details, let’s hop right in:

NVIDIA pushing even further ahead of the competition

The current AI boom would not be possible without graphics cards (GPUs). Originally created for powering video games, their ability to perform incredibly parallelized matrix multiplications has become the bed rock of training and running modern AI models. If you have used any modern AI system, there is 99% chance it’s been running on NVIDIA chipsets in some form or another.

AI models in general, and Large language models (LLMs) in particular, are ever so hungry for more VRAM, the “short term memory” on a GPU, to train and perform their tasks. The more, the better, generally and NVIDIA is once again leap frogging the competition.

NVIDIA announced a powerful GPU-accelerated computing system designed for handling extremely large AI workloads.
The DGX GH200 system allows connecting up to 256 GPUs to work together as a unit.
With 144 terabytes of memory, the DGX GH200 offers nearly 500x more memory than a single DGX A100 320 GB system, making it ideal for training massive AI models.
This is the first supercomputer to have over 100 terabytes of memory accessible to GPUs through NVLink, enabling faster and more efficient data processing.
These chips are very expensive, we’re talking many tens of millions of dollars, but they will allow to develop significantly larger AI models in the future.
Nvidia is crushing the competition at the moment, and for the sake of a healthy market we can only hope that Intel or AMD are awaking from their slumber…

Creating DeepFakes has gotten incredibly easy

Deep faking, meaning swapping out faces of existing video footage with another face with the help of deep learning models, used to take a lot of effort. Training a custom model on a large, diverse dataset of face image, with a variety of lighting options and more. These days, it can all be done with a single face image and a couple of clicks.

In the following example we swapped the author’s face with the characters in a scene from the movie Pulp Fiction:

Original video footage:

Input image:

Deep fake result:

roop is a tool that radically simplifies the deep faking process (still quite technical, but massively less so than it used to be): github.com/s0md3v/roop
It accepts arbitrary video input and requires only a single input image.
Inferring facial features, relighting and composition is taken care of by underlying libraries, no additional work required.
Keep in mind how easy and fast it is to generate deep fakes these days, add voice synthetization into the mix and we live in interesting times…

Building real AI products is hard

Due to the general nature of the current generation of AI models, they can be applied to various problem domains. It’s no surprise then, that a huge wave of tools have popped up that make use of these tools. Most are thin layers on top of hosted models like GPT-4, and a lot are barely more than prototypes. Building a real product, that covers edge cases, accepts user inputs etc. is a lot harder than most people think.

Protoypes are easy, edge cases will make you cry

Honeycomb (an infrastructure observability company) has outlined the challenges they’ve faced when building a production ready LLM product
Building a real product backed by Large Language Models (LLMs) is challenging, with issues like context windows, latency, prompt engineering, and context sensitivity.
Finding solutions to context window limitations and latency problems are critical for improving LLM-based products.
Prompt engineering techniques can help improve LLM outputs, but balancing correctness and usefulness remains a challenge.
Addressing legal, compliance, and security concerns is essential when integrating LLM APIs into a product.
Don’t be discouraged to build tooling on top of this exciting new tech, but keep in mind that a real product may be harder to build than the AI salesmen want you to think.

… and what else?

Andrej Karpathy (former head of AI at Tesla, now with OpenAI, incredibly smart guy) gave a talk about the state of GPT at Microsoft, and it’s well worth a watch: build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2 (summary Twitter thread can be found here)

And that’s it for this week!

Find all of our updates on our Substack at thegenerativeedge.substack.com, get in touch via our office hours if you want to talk professionally about Generative AI and visit our website at contiamo.com.

Have a wonderful week everyone!

Daniel
Generative AI engineer at Contiamo

The Generative Edge by Contiamo

Discussion about this post