AI Weekly Digest #18: Top News Shaping AI in 2023 - Part 2, Beyond ChatGPT!

Voice cloning, image manipulation, videos and multimodality

AI Weekly Digest #18: Top News Shaping AI in 2023 - Part 2, Beyond ChatGPT!

Hello, tech enthusiasts! This is Wassim Jouini and Welcome to my AI newsletter, where I bring you the latest advancements in Artificial Intelligence without the unnecessary hype. You can also find me on Medium covering AI related topics.

News Shaping AI in 2023 -
Part 2 , Beyond ChatGPT!

What to remember about the past 6 months?

***

Missed out Part 1 on the evolution of Large Language Models, the rise of open-source AI, and other key developments in the AI industry in 2023?
You can find it here.

For Part 2, we’ll focus on images & videos generation, voice cloning and multimodal models!

***

Whether you are an illustrator, a content creator, a DJ, a game developer or a video editor, AI is coming with new tools to supercharge your skills! Here are the 5 recent AI developments to remember!

The past 6 months saw a significant development in all AI areas!

#1- Image Generated Content winning Art Competitions

End of 2022, AI art was already capable of fooling experts and winning Art competitions. Meanwhile Image AI models got even better!

Midjourney & Stable Diffusion dominated the image generation scene!

  • Midjourney is a closed model with limited access to Discord and is still considered today as one of the best models available. Unfortunately, Midjourney doesn't provide API access, requiring users to connect via Discord to utilize its features. Here how to start with Mijourney.

  • Stable Diffusion, on the other hand, is fully open source! This allows us to fine-tune the model to produce specialized models. These can focus on various aspects, such as anime styles, photo-realistic images, or architectures. We can also fine-tune Stable Diffusion to learn new concepts, such as your face or the mascot of your company. After learning these concepts, the model is capable of generating that concept in a different style. For instance, it could create an oil painting representation of myself.

Having access to such open source models enables limitless scenarios for generating and editing images!

Midjourney v5.x versions are simply amazing - generating very realistic images!

Stable diffusion + dreambooth fin-tuning to generate an oil painting of myself. Once trained, you can leverage the model to generate various types of images of yourself.

#2- Image editing at your finger tips

Editing images using natural language (aka prompting) has become mainstream. This technology allows you to add, edit, or remove objects or individuals from an image by simply providing textual instructions.

The main tools to try are:

  1. Photoshop offers the famous “generative fill “feature (see video here).

  2. Clipdrop, provides similar editing features relying on stable diffusion. You can use it for free here

Generative fill allows you to select an area of an image and edit it via a simple prompt!

#3- Combing both text & image into a Multimodal model

Multimodal models are capable of analyzing images, sketches, memes, designs and so on, offering a wide range of new applications, e.g.,

  • generate code based on a sketch

  • extract contextualized image image & text information without OCR

  • describing image scenes

GPT4 has a multimodal version. However, due to a GPU shortage, the model wasn’t publicly released yet. Looking forward to getting my hands on this one!

Meanwhile, open source alternatives exist as well: check miniGPT4 and Donut if needed.

MiniGPT4, an open source model imitating GPT4’s multimodal ability, explaining a meme.

#4- Voice cloning and music generation reach a scary level

David Guetta cloned Eminem’s voice and used it in his show.

“The audience went nuts” thinking it actually was a collaboration with Eminem.

This field of AI is also advancing fast, opening the way to new applications:

#5- Video generation coming along nicely

Creating a stable video based on a prompt, or an image seed, is still a challenge.

This technology opens up possibilities for creative video production and storytelling. Users can now explore the power of AI in transforming textual and visual content into compelling short videos!

Looking forward to seeing where this technology will land in the next 6 months.

Conclusion and Part-3!

In this Part 2, we observed the rapid evolution of AI technologies, particularly

  • image generation,

  • voice cloning,

  • and multimodal models.

Industry pioneers and open-source communities alike are pushing the boundaries, offering increasingly advanced and accessible AI tools.

In Part 3, we'll dive into the consequences of these advancements, focusing on their impact on new emerging applications such as AI Agents, Operational AI, as well as privacy & copyright concerns.

Stay tuned!

Have a great Sunday and may AI always be on your side!