AI Weekly Digest #22: Fine-tuning ChatGPT, and General purpose robots!

Q3 AI Major Updates

AI Weekly Digest #22: Fine-tune your own models at minimal

Hello, tech enthusiasts! This is Wassim Jouini and Welcome to my AI newsletter, where I bring you the latest advancements in Artificial Intelligence without the unnecessary hype.

You can find me on LinkedIn, Twitter and Medium! Let’s connect!

Main Headlines

After a short break, we are back with Q3’s major AI updates!
Let’s dive right into it!

In the past 3 months, OpenAI announced three major updates, all of which can impact your workflow.

  • Fine-tuning gpt3.5 models: Since end of August, you can fine-tune gpt3.5 models! Main scenarios: teaching new concepts to gpt3.5, adapting its writing styles, fine-tune data extraction, and so on.
    Code & NoCode: a new interface allows you to upload your training data, choose the model to fine-tune and submit the job. No code required for experimentation.

    OpenAI Fine-tuning interface


    Cost: It does come at price though: (1) for fine-tuning the model (2) for prompting the fine-tuned model.
    Personal note: Without considering the cost of fine-tuning (data collection & compute), fine-tuned models are approximately 8x more expensive per token. Even when accounting for potentially shorter prompts, they still remain 3-4 times more costly. Moreover, fine-tuning doesn't address the issue of hallucinations and often needs to be supplemented with prompt engineering and retrieval augmented generation (RAG) anyway. I would only consider it if a major performance improvement is to be expected.

  • Image generation with Dalle3: OpenAI has introduced DALL·E 3, its newest image generation model, keeping pace with Midjourney. DALL·E 3 adeptly converts user-provided text into intricate images, even embedding text within these images. This eliminates the requirement for prompt engineering and pairs seamlessly with ChatGPT for brainstorming and refining image prompts. It's likely that it will also be accessible via APIs, giving OpenAI a competitive edge over Midjourney today. The feature is still rolling out and it will soon be available to all ChatGPT Plus subscribers.

    Dalle3 Image

    • GPT-4 can now hear and see: When it was first announced, GPT-4 was described as a multimodal model, meaning it wasn't limited to text; it could also interpret images. The release of the model was delayed primarily due to safety and compute considerations due to a GPU shortage.
      Here are a few examples of what GPT4V can do, including reading manuscript text, interpreting memes, analyzing data, recognizing places and objects, coding and so on!
      Same as for Dalle3, this feature is currently rolling out and will be made available to ChatGPT Plus subscribers in the coming days!

      GPT4V at work!

#2 Google Bard & Anthropic Updates

OpenAI’s competitors have also been busy!

Amazon invests $4B into Anthropic! AWS becomes Anthropic's main cloud provider. The partnership aims to enhance the AI service, Amazon Bedrock, allowing for model customization. Amazon developers can build upon Anthropic's advanced models for diverse applications (same as gpt3.5 and gpt4 APIs). Notably, Claude 2, a 100K tokens competitor to ChatGPT and Bard will be made available to AWS customers.

Google releases Bard Extensions. Similar to OpenAI plugins, it allows bard to seamlessly work with Google’s products such as Youtube, Gmail, Google Flights and so on!

It is generally available and you can test it on https://bard.google.com/. You’ll have to activate it by clicking on the extension icon (see screenshot below).
Notes:

  • If you can't see the extension icon, please set your default language to English.

  • This feature seems to be available to personal accounts, not yet to pro accounts (even if you enable experimental features). Not clear why ; speculation assume that this feature will be available as part of Google Duet for businesses.

Enable extensions

List of available extensions today

This is robotic’s ChatGPT moment!

One of the most remarkable abilities of models like ChatGPT is their emergent capability to execute specific tasks, such as summarizing or coding, despite being trained on generalized data.

Can robots also display such emergent abilities when trained on vast and diverse datasets?

The answer seems to be Yes! Thanks to the Open X-Embodiment Collaboration a huge dataset compiled from various robots and research labs was used to train general purpose robots. The resulting RT-X models, trained on this expansive data, showcased superior performance across multiple robotic tasks, outpacing models trained on singular datasets. 

This breakthrough underscores the potential of expansive, varied datasets to foster emergent capabilities in robotics, mirroring the advancements seen in large language models.

Learn more about RT-X and GPT4-V features in this comprehensive video

This is it for Today!

Until next time, this is Wassim Jouini, signing off. See you in the next edition!

Have a great Sunday and may AI always be on your side!