What happened this week in AI by Louie
This week in AI, we were following more model releases in the open-source LLM space, including the recently unveiled Falcon 180B, together with more teasing of upcoming models at the tech giants.
The Falcon 180B has already topped the Hugging Face leaderboard and consists of a huge 180 billion parameters, making it the largest openly available language model to date. Its training involved processing a massive 3.5 trillion tokens concurrently on up to 4096 GPUs, utilizing Amazon SageMaker, consuming approximately 7,000,000 GPU hours in the process. Developed as part of the Falcon family by the Technology Innovation Institute in Abu Dhabi, this model’s dataset is primarily composed of web data from RefinedWeb (accounting for 85% of the data), supplemented with a carefully curated blend of conversations, technical papers, and a small fraction of code (around 3%). In terms of performance, Falcon 180B reported beating both Llama 2 70B and OpenAI’s GPT-3.5 in terms of Mean Multi-Language Understanding (MMLU) and measuring on par with Google’s PaLM 2-Large. Falcon 180B is available in the Hugging Face ecosystem, starting with Transformers version 4.33. However, it’s essential to note that commercial use of Falcon 180B is currently subject to stringent conditions, with “hosting use” explicitly excluded. While open source is clearly still some distance from challenging GPT-4 in terms of performance and compute intensity, we expect to see increasing availability of open source models to compete with GPT-3.5 and are excited to see what can be built with the increased flexibility this brings.
Together with developments in open source this week — we also noted several tech giants trying to signal they will compete at the cutting edge of LLMs. According to reports, Meta is expanding its infrastructure to facilitate the training of its new AI model, which it aims to begin training early next year with aims to compete with GPT-4. We also note reports that Apple is also investing heavily in AI to enhance its Ajax model, which it aims to rival ChatGPT. Clearly, all large tech companies are going to wish to compete in the AI race at this point. In our view, competitiveness will come down to who has best-prepared access to scalable compute infrastructure, together with leading machine learning talent over recent years.
- Louie Peters — Towards AI Co-founder and CEO
- The 100 Most Influential People in AI 2023
TIME magazine has released its list of the 100 Most Influential People in AI for 2023. The list features prominent figures such as Dario and Daniela Amodei, Sam Altman, Demis Hassabis, Robin Li, Clément Delangue, Lila Ibrahim, Elon Musk, Geoffrey Hinton, Fei-Fei Li, Timnit Gebru, Yann LeCun, and Yoshua Bengio.
- Spread Your Wings: Falcon 180B Is Here
TII has recently launched Falcon 180B, a formidable language model with 180 billion parameters, trained on 3.5 trillion tokens. Outperforming Llama 2 70B and GPT-3.5 in terms of MMLU, Falcon 180B demonstrates great performance and ranks high on the Hugging Face Leaderboard. This model is available for commercial use but has strict terms that exclude “hosting use.”
- Releasing Persimmon-8B
Adept.ai has introduced Persimmon-8B, an open-source LLM with impressive performance and a compact size. Trained on less data, it achieves comparable results to LLaMA2 and offers a fast C++ implementation combined with flexible Python inference.
- Training Cluster as a Service: Train Your LLM at Scale on Our Infrastructure
Hugging Face has introduced the Training Cluster as a service, enabling users to train their models with customizable parameters, token counts, and accelerators. Additionally, it provides cost estimates for training LLMs of different sizes and token counts, ranging from $65k to $14.66M, depending on the model parameters and token count.
- NVIDIA Launches a Faster Inference Engine for LLMs
NVIDIA has been collaborating closely with leading companies to enhance and optimize LLM inference. These innovations have been incorporated into the open-source NVIDIA TensorRT-LLM, which has been a preferred choice for speed. A version tailored specifically for language models on H100s is now accessible.
Five 5-minute reads/videos to keep you learning
- Open ASR Leaderboard
Hugging Face has launched a speech-to-text leaderboard that ranks and evaluates speech recognition models available on its platform. The current top performers are NVIDIA FastConformer and OpenAI Whisper, with an emphasis on English speech recognition. Multilingual evaluation will be included in future updates.
- AudioLDM 2, but Faster
This blog post shows how to utilize AudioLDM 2 in the Hugging Face Diffusers library, covering various code optimizations such as half-precision, flash attention, compilation, and model optimizations. It is also accompanied by a more streamlined Colab notebook that includes all the necessary code.
- GPTQ Quantization on a Llama 2 7B Fine-Tuned Model With HuggingFace
HuggingFace has introduced GPTQ quantization, which allows the compression of large language models to 2, 3, or 4 bits. This method surpasses previous techniques, preserving accuracy while substantially reducing model size.
- Create a Self-Moderated Commentary System With LangChain and OpenAI
This guide explains the steps to build a self-moderated comment response system using OpenAI and LangChain. It involves two models, where the first generates a response and the second modifies and publishes it.
- Asking 60+ LLMs a Set of 20 Questions
This experiment presents the results of testing 60 models for basic reasoning, instruction following, and creativity. This compilation includes the questions and responses from each model stored in a SQLite database.
Papers & Repositories
- KillianLucas/Open-Interpreter: OpenAI’s Code Interpreter in Your Terminal, Running Locally
Open Interpreter is an open-source implementation of OpenAI’s Code Interpreter that provides a natural language interface similar to ChatGPT. It enables running various code types locally, offering interactive terminal chats for controlling computer functions without internet access limitations.
- Large Language Models As Optimizers
LLMs can be used as optimizers in applications where gradients are not available. Optimization by PROmpting (OPRO) involves the LLM generating new solutions from a prompt, which are then evaluated and used to refine the prompt in a constant optimization cycle. OPRO has shown promising results, outperforming human-designed prompts in prompt optimization tasks.
- SLiMe: Segment Like Me
SLiMe is a novel approach that combines vision-language models and Stable Diffusion (SD) and allows image segmentation at custom granularity using just one annotated sample. It outperforms existing one-shot and few-shot image segmentation methods, as demonstrated in comprehensive experiments.
- One Wide Feedforward Is All You Need
Researchers have found that the Feed Forward Network (FFN) in Transformers can be optimized, resulting in a 40% reduction in model size while maintaining similar performance. By sharing an FFN across the encoder and removing it from the decoder layers, parameters can be decreased with minimal decrease in accuracy.
- Efficient RLHF: Reducing the Memory Usage of PPO
The paper introduces Hydra-PPO, a method designed to expedite Reinforcement Learning from Human Feedback (RLHF) by minimizing memory usage. Hydra-PPO reduces the number of models in memory during the PPO stage, allowing for increased training batch size and decreased per-sample latency by up to 65%.
Enjoy these papers and news summaries? Get a daily recap in your inbox!
The Learn AI Together Community section!
Meme of the week!
Meme shared by rucha8062
Featured Community post from the Discord
Duckydub is working on a project called Nexel AI, which simplifies AI-powered automation. It aims to make automation accessible to everyone, regardless of their technical background. Furthermore, it streamlines workflows while ensuring data security. Check it out here and support a fellow community member. Share your AI projects and feedback in the thread here!
AI poll of the week!
TAI Curated section
Article of the week
Spell correction is no doubt essential for any written communication. When considering building one, we may quickly come to the one-sizes-fit-all solution: deep learning. However, deep learning is only sometimes the optimal choice. In this article, I would like to introduce “noisy channel”, a classic technique for spell correction, and how you can build your correction module with zero deep learning background.
Our must-read articles
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Interested in sharing a job opportunity here? Contact email@example.com.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!