This AI newsletter is all you need #97

Published in

Towards AI

8 min readApr 30, 2024

What happened this week in AI by Louie

Our focus this week was on real-world testing and fine-tuning of LLama-3 and Phi-3, as well as some exciting progress in medical AI.

We have been monitoring resounding positive feedback on LLama-3, both 8GB and 70GB models, and some exciting open-source fine-tuned models and progress extending the context length well beyond 8k. In the popular lmsys LLM arena and leaderboard, LLama 70GB scores second to only the latest GPT-4 Turbo on English text-based prompts. The smaller 8GB model scores in line with Claude Haiku and significantly ahead of GPT 3.5 Turbo. For the smaller 3.8GB Phi-3 model from Microsoft, feedback has been mixed with more skepticism on its real-world performance relative to benchmarks. However, many people have reported positive feedback, particularly given a smaller model’s speed and cost advantages.

One topic of debate with this latest generation of smaller, smarter, overtrained open-source LLMs is whether we get increased degradation and fewer benefits from model optimization techniques such as quantization, sparsity, and pruning. These techniques are used to reduce the model size (in GBs) or to increase inference speed and efficiency. Given that more information and intelligence are being packed into fewer model parameters in these latest models, intuitively, it makes sense that these techniques can deliver fewer benefits, but we will be watching for detailed studies.

This week, in medical AI, we were excited to see the release of Med-Gemini, a fine-tuned multimodal model built on the Gemini-1.0/1.5 series. The model Sets a new state-of-the-art of 91.1% in the MedQA-USMLE benchmark and enables multimodal chat, including analysis of medical images and follow-up Q&A.

We were also excited to see promising results from a randomized electrocardiography alert clinical study for 16,000 patients. The AI system tested included warning messages delivered to physicians, an AI-generated report, and flagging patients at higher mortality risk. The study demonstrated an incredible 17% reduction in mortality upon implementing the alert system.

Why should you care?

The pros and cons of the rollout of AI are often heavily debated, but one of the clearest net benefits is in medicine. We are particularly excited at the potential for fine-tuned LLMs and custom machine-learning models to aid in medicine. Whether it is doctor co-pilot assistants, early-stage machine diagnosis, medical alert systems, or drug development, we think LLMs, which have been fine-tuned on medical data and workflows, with access via RAG to medical data, could now add significant value and efficiency when used in combination with human experts. However, given that the hallucination risk is still present, the rollout should be very cautious with safeguards to avoid overreliance. We are also very pleased to see thorough clinical studies begin to demonstrate actual mortality reduction from the use of AI products. We expect newsflow on successful AI medical products and drugs to accelerate significantly this year.

- Louie Peters — Towards AI Co-founder and CEO

Hottest News

Microsoft Launches Phi-3, Its Smallest AI Model Yet

Microsoft unveils its new language model, Phi-3 Mini, with 3.8 billion parameters. The company also announced the upcoming variants, including Phi-3 Small and Phi-3 Medium, with 7 billion and 14 billion parameters, respectively. The training approach for Phi-3 Mini mimics children’s progressive learning stages, utilizing a curriculum of materials ranging from simple to complex structures and concepts.

2. Apple Releases OpenELM: Small, Open Source AI Models Designed To Run On-Device

Apple has introduced OpenELM, a suite of open-source AI text generation models with 270M to 3B parameters optimized for on-device deployment. Available on Hugging Face, these models are released under a sample code license to enable AI functionalities independently of the cloud.

3. xAI, Elon Musk’s OpenAI Rival, Is Closing on $6B in Funding, and X, His Social Network, Is Already One of Its Shareholders

Elon Musk’s artificial intelligence startup, xAI, is on the verge of securing a $6 billion investment at an $18 billion pre-money valuation. Supported by Sequoia Capital and Future Ventures, xAI’s funding round demonstrates significant investor confidence, partly due to Musk’s reputation and connections from his ventures such as SpaceX and Tesla.

4. Snowflake Releases Arctic, an Open LLM for Enterprise AI

Snowflake AI Research has released Arctic, a cost-effective enterprise AI LLM featuring a Dense-MoE Hybrid transformer architecture with 480 billion parameters. Trained for under $2 million, Arctic excels in tasks like SQL generation and coding. It’s fully open-source under Apache 2.0, providing free access to model weights and code.

5. Cohere Open-Sources ‘Cohere Toolkit’ To Accelerate Generative AI Application Development

Cohere has released an open-source Cohere Toolkit that simplifies the integration of LLMs into enterprise systems by allowing single-click deployments on cloud platforms, including Azure. This toolkit packages models, prompts, user experience designs, and data connections into deployable applications, with the initial offering being a Knowledge Agent.

Five 5-minute reads/videos to keep you learning

Some Technical Notes About Llama 3

This article focuses on the technical aspects of Llama 3, Meta’s latest generation of Llama models. It covers aspects such as the architecture, training, the instruction-tuned version, performance, and more.

2. Seemore: Implement a Vision Language Model From Scratch

Seemore is a streamlined Vision Language Model (VLM) inspired by Karpathy’s “makemore,” built using PyTorch. In this blog, the author implements seemore pure pytorch. It focuses on vision models that can be instruction-tuned to perform useful tasks. The blog also specifies a common architectural pattern that seems to be taking shape and proving to be highly versatile.

3. Prompt Engineering Best Practices: Building Chatbots

This step-by-step tutorial focuses on the essentials of building a personalized chatbot with the right prompting techniques. It dives into the OpenAI chat completions format, providing a comprehensive understanding of its details.

4. The Biggest Open-Source Week in the History of AI

The last few weeks have seen a massive increase in the number of open source releases with DBRX, Jamba, Qwen1.5, Samba-CoE v0.2, Starling-LM-7B-beta, xAI’s Grok 1.5, Mistral’s 7B v2, Wild 1-bit, and 2-bit quantization with HQQ+, Llama 3, and now Phi-3. This essay provides a timeline of the latest open-source releases and their impact on the overall landscape.

5. Why Reliable AI Requires a Paradigm Shift

This article explores the nature and impact of hallucinations in current generative AI models, explicitly focusing on language models. The central thesis is that although hallucinations can be reduced with various practical approaches, the core issue is a fundamental flaw in the assumptions about the nature of language and truth that are intrinsic to the prevalent language modeling paradigms used today.

Repositories & Tools

CoreNet is a deep neural network toolkit that allows the training of standard and novel small and large-scale models for various tasks, including foundation models (e.g., CLIP and LLM), object classification, object detection, and semantic segmentation.
Instructor is a Python library leveraging Pydantic to enhance language model interactivity by organizing outputs into structured formats, validating responses, managing retries, and supporting streaming.
AgentScope is a multi-agent platform for building multi-agent applications with large-scale models.
Langfuse is an open-source LLM engineering platform for observability, metrics, evals, prompt management, playgrounds, and datasets. It integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more.
LongEmbed introduces methods for extending context windows in embedding models up to 32k without extra training and presents a benchmark for evaluating long context retrieval across four tasks derived from long-form QA and summarization.

Top Papers of The Week

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Microsoft has launched the Phi-3-mini, a compact language model designed for mobile platforms, boasting 3.8 billion parameters and trained on a dataset of 3.3 trillion tokens. It rivals the performance of larger models like Mixtral 8x7B and GPT-3.5, scoring 69% on the MMLU benchmark and 8.38 on the MT bench. This advancement is attributed to an improved dataset derived from its predecessor, phi-2, which includes a mix of web content and synthetic data.

2. Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Recent research introduces AlphaLLM, a methodology aimed at self-improving LLMs by integrating Monte Carlo Tree Search (MCTS) to facilitate a self-enhancement loop. This approach addresses complex reasoning and planning challenges by enabling LLMs to self-correct and self-learn, potentially advancing their abilities beyond the limits imposed by data availability and quality.

3. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

This paper introduces InternVL 1.5, an open-source multimodal large language model (MLLM), to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. It introduces three simple improvements: Strong Vision Encoder, Dynamic High-Resolution, and High-Quality Bilingual Dataset.

4. Continual Learning of Large Language Models: A Comprehensive Survey

This survey provides a comprehensive overview of the current research progress on LLMs within the context of CL. This survey provides an overview of continually learning LLMs, a summary of their learning stages in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT), an overview of evaluation protocols, and a discussion of continual learning for LLMs.

5. Make Your LLM Fully Utilize the Context

This study presents information-intensive (IN2) training, a data-driven solution for fully utilizing information within the long context. IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires fine-grained information awareness on a short segment within a synthesized long context and the integration and reasoning of information from two or more short segments.

Quick Links

1. Apple is negotiating with OpenAI and Google to potentially integrate technologies such as ChatGPT or Gemini into the iPhone’s iOS 18, aiming to bridge its in-house AI development gap and circumvent past deployment issues while risking increased reliance on external AI advancements.

2. Saudi Arabia plans to invest in AI and other technology with a $100 billion fund this year. This initiative was highlighted at the recent Leap tech conference in Riyadh, attended by luminaries from major tech corporations like Amazon and Google.

3. Chinese tech firm SenseTime just launched SenseNova 5.0, a major update to its large model — featuring capabilities that beat GPT-4 Turbo across nearly all key benchmarks. The ~600B parameter model features a 200k context window and was trained on over 10TB of largely synthetic data.