Thursday March 13, 2025

Google DeepMind's Gemini Robotics models bring AI into real-world tasks with embodied reasoning, Slim Attention optimizes transformer memory use without accuracy loss, and Vibe-gamedev allows AI agents to interact with Unity game development.

News

Gemini Robotics

Google DeepMind has introduced two new AI models, Gemini Robotics and Gemini Robotics-ER, which are designed to enable robots to perform a wide range of real-world tasks through "embodied reasoning" and advanced spatial understanding. These models, built on the Gemini 2.0 foundation, demonstrate significant improvements in generality, interactivity, and dexterity, allowing robots to adapt to new situations, interact with their environment, and perform complex tasks with precision.

Show HN: Time Portal – Get dropped into history, guess where you landed

The Time Portal app allows users to travel through time and figure out which historical events they have landed in, with the option to download the app for more challenges. The app features various challenges, with examples of past challenges displayed as posters, and is available for download on the App Store.

The cultural divide between mathematics and AI

The Joint Mathematics Meeting, a large gathering of mathematicians, highlighted a cultural divide between mathematics researchers and those working on AI in industry, with differing perspectives, values, and approaches. The author notes that while mathematicians prioritize understanding and openness, industry researchers are driven by the need to deliver products and features, leading to concerns about secrecy, lack of transparency, and the potential misuse of AI in mathematics.

Beyond Diffusion: Inductive Moment Matching

Researchers at Luma AI have introduced a new pre-training technique called Inductive Moment Matching (IMM), which they claim can break the current algorithmic ceiling in generative pre-training and unlock the full potential of rich multi-modal data. IMM offers superior sample quality and over a tenfold increase in sampling efficiency compared to diffusion models, and its stability and scalability make it a promising approach for advancing generative pre-training algorithms.

Happy 10k Day

Comma.ai has sold its 10,000th Comma 3X, marking a significant milestone for the company, which has overcome the challenges of developing and manufacturing a successful product. The company is now expanding its operations, finalizing plans to maximize its office's datacenter and manufacturing capacity, and is optimistic that 2025 will be its biggest year yet.

Research

Balancing Content Size in RAG-Text2SQL System

Large Language Models (LLMs) that convert natural language queries into SQL commands, known as Text-to-SQL systems, face limitations such as hallucinations and outdated knowledge, which can be addressed by integrating retrieval-augmented generation (RAG) to provide contextual information. However, the performance of RAG + Text2SQL systems is affected by the quality and size of retrieved documents, and this research aims to find a balance between document size and quality to optimize system performance and minimize errors.

Slim attention: cut your context memory in half without loss of accuracy

Slim attention reduces the context memory size of transformer models by up to 2x, and in some cases up to 32x, without compromising model accuracy, resulting in significant speedups of up to 5x for token generation. This optimization is particularly effective for encoder-decoder transformers and large models, such as Whisper and T5-11B, allowing for faster inference and token generation.

Julia in HEP

Julia is a programming language that combines the accessibility of Python with the performance of C/C++, making it an attractive choice for scientific computing, particularly in the field of High Energy Physics (HEP). The language has gained momentum in HEP, with packages available for reading major file formats, interfaces to key software, and successful applications in jet reconstruction algorithms and full HEP analyses, positioning Julia as a promising choice for future HEP research.

A Survey on Post-Training of Large Language Models

The emergence of Large Language Models has transformed natural language processing, but their pre-trained architectures often have limitations in specialized contexts, necessitating the development of advanced post-training language models to address these shortcomings. This paper presents a comprehensive survey of these post-training language models, tracing their evolution across five core paradigms and establishing a framework for future research to improve reasoning proficiency and domain flexibility in Large Language Models.

General Relativity and Geodesy

The Earth's gravitational field is changing due to dynamic processes like ice melting and sea level rise, and monitoring these changes helps us understand the planet's evolution. General Relativity plays a crucial role in geodesy, enabling high-precision measurements and novel clock-based observations, although achieving the necessary precision for practical applications remains a challenge.

Code

Show HN: CNCF announces Dapr Agents, a vendor-neutral AI framework

Dapr Agents is a developer framework for building production-grade AI agent systems that operate at scale, enabling software developers to create AI agents that reason, act, and collaborate using Large Language Models. The framework offers key features such as scalability, workflow resilience, and data-driven agents, making it a cost-effective and efficient solution for AI adoption, with a vendor-neutral and open-source approach.

Vibe-gamedev: a tool for AI vibecoding with Unity

Vibe-gamedev is a Unity package that enables end-to-end vibecoding for game development by creating an interface between the Unity editor and AI agents, allowing agents to read and edit GameObjects through JSON files. The package is experimental and has limitations, including not supporting Prefabs and only serializing/deserializing the active scene, but provides a unique way for AI agents to interact with and manipulate game objects in Unity.

Show HN: End writing unmaintainable tests with Playwright with a pinch of LLM

Playsmart is a Python library that uses Playwright and OpenAI to automate end-to-end testing by allowing users to write tests in a more human-like language, such as "click on login" or "fill email input with hello@world.tld". The library uses a caching layer to reduce the number of requests made to the OpenAI API and can be installed via PyPI with the command pip install playsmart.

Show HN: Ad-hoc LLM benchmark using NYT Connections

The author tested several large language models (LLMs) on the NYT game "Connections," where players group 16 words into 4 categories based on puzzling connections, and found that most models performed poorly. The results showed that Microsoft Copilot performed the best, while models like Gemini and ChatGPT struggled, with some even failing to produce the correct output format.

Show HN: Open-Source API-Based Engine to Easily Orchestrate AI Agents

The Levlex Agent Engine (LX Engine) is an API-based execution engine that allows users to run multiple AI "agents" in a flexible pipeline or individually, streamlining the process of creating and orchestrating AI-based workflows. The engine features 20+ built-in agents covering various tasks, including web retrieval, local vector memory, PDF generation, browser automation, and more, and can be easily extended with new agents or customized to fit specific use cases.