- Beyond The AI Horizon
- Posts
- AI Weekly Digest #10: Claude and Bard catch-up with OpenAI’s models
AI Weekly Digest #10: Claude and Bard catch-up with OpenAI’s models
Claude 100k Tokens model, ChatGPT web browsing publicly available, Mojo, HuggingFace leaderboard and new AI Agent library.
AI Weekly Digest #10: Claude and Bard catch-up with OpenAI’s models
Welcome to this new edition of our AI newsletter, where we bring you the latest updates on artificial intelligence!
The Main Headlines: Claude 100k Tokens model, ChatGPT web browsing publicly available, Mojo, HuggingFace OpenLLM leaderboard and new AI Agent library.
Beyond the Hype, the story: At Google’s I/O event, AI everywhere! Everything you need to know!
Bonus: ImageBind, one embedding representation across 6 modalities! Meta builds the foundation of future multimodal models!
Beyond the Hype, The Story
Google I/O event brought more than a 100 announcements!! Focusing on AI, Google is moving fast to close the gap with GPT4. PaLM 2, Bard, AI integration in Google’s products across the board, everything you need to know with these main 5 Google AI updates!
#1 PaLM2 is Google’s most capable Large Language Model announced during this event. You can read its full technical report here.
Multilingual: it covers 100 languages and is more efficient than Google Translate! It passes all professional language proficiency exams!
Integration: It will be powering Google products (e.g., Gmail, workspace products, and so on).
Sizes: It comes in several sizes, Gecko, Otter, Bison & Unicorn: from the smaller to the larger PaLM2 model, with Gecko being small enough to run on a phone. We don’t know much about their actual sizes though. We simply know that they are “significantly” smaller than PaLM (540B parameters).
Compared to GPT4: We don’t have a fair head to head comparaison ; Yet, early tests show that they perform differently on different tasks. On linguistic tasks and multi-step reasoning tasks, PaLM2 seems to be stronger than GPT4 ; while GPT4 seems to outperform PaLM2 on coding tasks for instance. We’ll share more as we learn more over the next weeks.
PaLM API is now powered by PaLM2 as well!
Bard runs now on PaLM2. Some countries might still experience some restrictions (e.g., due to GDPR regulations I’m assuming) but we anticipate Bard to be publicly available worldwide very soon.
Images as an input: Bard will soon be capable of accepting images as input, similar to the multi-modal capabilities showcased in the GPT4 announcement.
#3 Med-Palm2: “fine-tuned for medical knowledge to help answer questions and summarize insights from a variety of dense medical texts. We’re now exploring multimodal capabilities, so it can synthesize patient information from images, like a chest x-ray or mammogram, to help improve patient care.” It will be available to selected research teams as soon as this summer.
#4 Universal Translator “is an experimental AI video dubbing service that helps experts translate a speaker’s voice and match their lip movements”! Thus combing Translation, Voice Cloning and DeepFakes to generate a new video that matches the targeted language!
#5 Gemini: GPT5 level model? Google announced a new foundation model, multi-modal by design: “Gemini is still in training, but it’s already exhibiting multimodal capabilities never before seen in prior models”!
This effort is led by Google new AI research team “Google Deepmind”! PaLM2 & Bard are already rivaling GPT4 ; is Gemini Google’s attempt to be the first at a GPT5 level model?? Stay tuned!
Main Headlines
Mojo is a new AI-focused programming language that blends Python's ease-of-use with C's speed. As a Python superset, it allows Python developers to enjoy over 1000x performance boost right from the get-go.
Claude 100k tokens: ChatGPT's current limitation is processing up to three pages (4k tokens) of input, but Anthropic's Claude has announced a version capable of handling ~75 pages (100k tokens). This advancement should foster more models that can manage broader contexts. Though currently costly, increased competition from entities like Claude, Bard, and HuggingChat should make such models more affordable within six months.
ChatGPT, new features for subscription plus users! Web browsing is now publicly available to all users. Plugins’ release was also announced as part of this release and will be rolled out progressively.
HuggingFace makes it easier than ever to evaluate and leverage Open Source models.
Open Source LLM Leaderboard: A one-stop-shop for tracking, ranking, and evaluating the latest open-source LLMs
HuggingFace Agents: this experimental API enable a natural interaction with different models! Through this new agent framework you can interact with Huggingface’s hosted models seamlessly, e.g., to analyze an image, transcribe voice, summarize text and so on. Check examples below.
The new 𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜.𝚝𝚘𝚘𝚕𝚜 library from @huggingface is insane!
E.g. you can summarize and chat with a PDF in just 6 lines of code, including import statements! 🤯
The best part?
It's all open-source, from start to finish! 🤗
— DataChazGPT 🤯 (not a bot) (@DataChaz)
8:01 AM • May 13, 2023
Bonus!
This week brought about a potentially game-changing announcement in the field of AI development - ImageBind, courtesy of Meta.
This cross-modality representation, spanning across six different modalities (images, text, audio, depth, thermal, and IMU data), paves the way for a plethora of new scenarios. One embedding representation to rule them all?
This representation was released for research purposes and it will impact future generation of multimodal AI models & applications:
Classification & Semantic Search: It allows machines to analyze and understand different forms of information together, facilitating cross-modal understanding.
Content generation across modalities: e.g., takes a penguin audio as an input and generated an image of penguins (see image below).
Simpler learning: ImageBind eliminates the need for paired data for every combination of modalities, making multimodal learning more feasible and scalable.
SOTA results: The model exhibits superior performance compared to specialist models trained individually for specific modalities.
That’s it for today! If you made it this far, I’d appreciate a quick feedback 😋! I know there’s room for improvement! So don’t hesitate to share with me the things you liked and those that you didn’t.
Have a great Sunday and may AI always be on your side!