Wednesday — October 16, 2024

Meta unveils new AI hardware designs at OCP Summit 2024, Qualcomm's NPU underperforms in benchmarks against its CPU, and researchers reveal a flaw in language model benchmarks with constant-response "null models."

News

Starship Flight 5: Launch and booster catch [video]

X.com is changing its URL, but users' privacy and data protection settings will remain unchanged. For more information, refer to the company's Privacy Policy at https://x.com/en/privacy.

Show HN: I built the most over-engineered Deal With It emoji generator

This is a face detection GIF generator that allows users to upload or paste an image URL to create a "Deal With It" GIF. The generator is made with passion by Igor Klimer and the source code is available on GitHub.

Google commits to buying power generated by nuclear-energy startup Kairos Power

It appears you haven't provided any text for me to summarize. Please provide the text you'd like me to summarize, and I'll be happy to assist you.

Busy Status Bar

The Busy Status Bar is a productivity multi-tool device with an LED pixel screen that displays a personal busy message, has a built-in Pomodoro timer, and supports various apps. It is fully customizable, open-source, and hacker-friendly, allowing users to control their device via API and MQTT through the Busy Cloud.

Meta's open AI hardware vision

Meta is showcasing its latest open AI hardware designs at the Open Compute Project (OCP) Global Summit 2024, including a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components. The company is driven to advance its infrastructure to support emerging AI workloads and is releasing its new high-powered rack, Catalina, to the OCP community, which is designed to support the latest NVIDIA GB200 Grace Blackwell Superchip and has a modular design to empower customization.

Research

Cheating Automatic LLM Benchmarks

Researchers found that a "null model" that always outputs a constant response can achieve top-ranked win rates on popular automatic language model benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench. This highlights a vulnerability in these benchmarks and calls for the development of anti-cheating mechanisms to ensure reliable evaluation of language models.

Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

Model Swarms is a collaborative search algorithm that adapts large language models (LLMs) via swarm intelligence, allowing diverse LLM experts to work together to optimize a utility function and improve model performance. This approach offers tuning-free model adaptation, works well in low-data regimes, and outperforms existing model composition methods by up to 21.0% across various tasks and contexts.

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

AutoML-Agent is a novel multi-agent framework that automates the full AI development pipeline, from data retrieval to model deployment, using large language models (LLMs) and a natural language interface. It enhances the efficiency of the AutoML process by introducing a retrieval-augmented planning strategy, parallel sub-task solving, and multi-stage verification, achieving a higher success rate in automating the full AutoML process.

The Dynamics of Social Conventions in LLM Populations

Researchers investigated how Large Language Model (LLM) agents form social conventions through simulated interactions, finding that globally accepted conventions can arise spontaneously and strong collective biases can emerge. They also discovered that minority groups of committed LLMs can drive social change by establishing new conventions, potentially overturning established behaviors once they reach a critical size.

Data-Prep-Kit: getting your data ready for LLM application development

The Data Prep Kit (DPK) is an open-source data preparation toolkit designed to help users scale their data preparation for Large Language Model (LLM) development. DPK offers a highly scalable and extensible set of modules for transforming natural language and code data, allowing users to prepare data on a local machine or a cluster with thousands of CPU cores.

Code

AI PCs Aren't Good at AI: The CPU Beats the NPU

We benchmarked Qualcomm's NPU on a Microsoft Surface Tablet and found that it only achieved 1.3% of its claimed 45 Teraops/s performance. The benchmark measured the latency of running a simple model on the CPU and NPU, with the NPU achieving 225 Gigaops and 573 Gigaops in two different approaches.

Nvidia Outperforms GPT-4o with Open Source Model

Arena-Hard-Auto is an automatic evaluation tool for instruction-tuned LLMs that contains 500 challenging user queries sourced from Chatbot Arena. It has the highest correlation and separability to Chatbot Arena among popular open-ended LLM benchmarks.

Ichigo: Local real-time voice AI

:strawberry: Ichigo is an open research experiment to extend a text-based LLM to have native "listening" ability, using an early fusion technique inspired by Meta's Chameleon paper. The project has made significant progress, achieving an enhanced MMLU score of 63.79 and demonstrating stronger speech instruction-following capabilities in multi-turn interactions.

Out-of-Distribution Machine Learning

This repository provides a comprehensive resource for out-of-distribution (OOD) detection, robustness, and generalization in machine learning/deep learning. It aims to address critical challenges in modern AI systems that encounter data differing from their training distribution, leading to unexpected failures.

Show HN: Podcastfy AI – Open-source tool to generate AI audio conversations

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. It can generate conversational content from various sources and formats, including images, websites, YouTube, and PDFs, and offers customization options for transcript and audio generation.