Towards Autonomous AI Agents

Private models, AI Agents & real time image segmentation.

AI Weekly Digest #5: Towards Autonomous AI Agents

Welcome to this new edition of our AI newsletter, where we bring you the latest updates on artificial intelligence!

For this edition, we’ll try a more focused version! Less news, more in-depth.

Here’s what you must know about current AI trends, and numbers don’t lie: Image Segmentation and open source large language models are the center of all talks these days.

Why? because the former unlocks a complex problem, detecting anything precisely in an image without fine-tuning, and the later is the key to powerful private AI assistants running on your phone or laptop.

Popular Github projects (based on stars) over the past weeks. LLaMA, released by Meta plays a pivotal role in building open source private language models ; Segment-Anything is a new image segmentation model and it’s already catching up very (very!) quickly!

  • The Main Headlines:

    • ChatGPT-like Private Large Language Models are getting better: say hello to Vicuna, LLaMA’s latest child.

    • Real Time, Auto Image Segmentation: unlocking new scenarios ranging from, object, animal, person detection & tracking, out of the box (i.e., without fine-tuning)

  • Beyond the Hype, the story: Towards Autonomous AI Agents?

  • Bonus: 5 AI tools you should be leveraging today.

Beyond the Hype, The Story

Towards AI autonomous Agents?

GPT4 proved that AI models can compete with humans on a large spectrum of intellectual tasks. Its reasoning ability enables it to build a structured plan towards solving a task at hand. It can even self-reflect on it results and iterate to better address the issue. Hence Goldman Sachs’ report estimating that 300 million jobs could be at risk due to AI comes as no surprise.

And it’s not limited to GPT4. Recent development in AI shows that allowing large language models, such as ChatGPT to access APIs, enables them to provide real time contextualized responses, browse the web when needed, and even trigger automations.

What’s the next natural step in this evolution? Well, this week, Hugging Face and Microsoft, went a step further, releasing an AI Agent, HuggingGPT (Jarvis), capable of connecting ChatGPT to all Hugging Face Models. Unlocking effectively a new generation of Autonomous AI agents.

Why is it important? Microsoft Jarvis can take on an instruction, plan its execution relying on the available models, call the selected models and answer your instructions. It will also detail its execution step by step, and mention the relied on models for transparency. You can test it here (you’ll need an openai key and a hugging face key).

This is virtually limitless: e.g.,

  • Ask it to describe an image, its mood and count certain types of objects in it

  • Transcribe a video and summarize it

  • Analyze text and generate images

All in one place, through a single AI Agent! Not perfect as its limited to the quality of the models it can rely on (e.g., if you need to generate an image and it calls stable diffusion with a basic prompt you won’t get a great image), but this is a great start!

This is a general trend in AI these days, not limited to Hugging Face and Microsoft. You just have to look at the trending projects on Github to notice that the most popular projects these past days are all about Artificial General Intelligence, e.g., with babyAGI and AutoGPT being alternative attempts to build the first open source autonomous agents capable of interacting with the outside world (via APIs) to solve a specific problem.

Getting closer to a full AI loop? AI is getting better to master all steps in training and evaluating models, this includes being able to label data better than humans (e.g., which is the case with TagGPT and ChatGPT outperforming human text annotations), train models and evaluate the results (e.g., Vicuna, the fine-tuned model will discuss later was evaluated by GPT4!). We can bet that soon, we’ll see the first self improving autonomous AI agents, accelerating even more AI development!

An exciting yet unpredictable future ahead… explaining why some big tech names are asking to pause AI development for a bit, to better assess our options ahead.

Stay Tuned!

Main Headlines

Vicuna: quality dataset is key to compete with Google’s Bard and ChatGPT: Vicuna is (yet another) fined-tuned model based on LLaMA, Meta’s OpenSource model. It won’t be the last one of this series though.

Yet, it does bring a couple of notable improvements:

  • A quality dataset based on human shared conversations (70k conversation).

  • A larger context compared to its LLaMA siblings (from 512 tokens → 2048 token. ChatGPT is limited to 4k tokens todays).

Thus, Vicuna, greatly outperforms LLaMA model (i.e., its base model), as well as Alpaca (Stanford’s LLaMA finetuned model based on a dataset generated by ChatGPT). It has a comparable performance to Bard and is getting closer to ChatGPT.

We can bet that a few iterations down the line, we’ll get lighter and more performant models that you’ll soon be able to run privately on your device!

Vicuna outperforms LLaMA and Alpaca, and can compete head to head with Google’s Bard as well as ChatGPT to some extend (much larger models).

Meta Segement Everything Model (SAM): This is a big deal! Mostly because it can identify all "objects" in an image and generate masks accordingly, out of the box! (no fine-tuning needed)

  • Applications: Once you have the mask of an object, you can manipulate the image easily (manually or via API) focusing on that specific object.

  • e.g., fashion virtual try-on, objet counting, prompt based precise editing, and so on! Limitless! (Check the video in the tweet below).

Bonus!

5 AI tools you should be leveraging today,

  • Beautiful AI: AI guided presentations that I rely on heavily. Lacking inspiration? Just ask the designer bot for a custom AI generated slide. (How it works)

  • Notion AI: Knowledge organization with an AI twist to help you write your documentation faster, or brainstorm with AI. (Say Hi to Notion AI)

  • Canva Magic: the most popular design platform that allows users to create social media graphics, presentations, posters, documents and other visual content, comes now with various AI features to help you create even faster! (A tweet summarizing all Canva AI features)

  • Assembly AI: video or voice to text transcripts & summary. Test it here for free with a YouTube video for instance.

  • ElevenLabs: clone your voice and leverage it anywhere. Test it for free here.

That’s it for today! If you made it this far, I’d appreciate a quick feedback 😋! I know there’s room for improvement! So don’t hesitate to share with me the things you liked and those that you didn’t.

Have a great Sunday and may AI always be on your side!