Thursday — December 12, 2024

Google DeepMind's Gemini 2.0 brings AI into the agentic era with native image and audio capabilities, while researchers accelerate attention mechanisms on edge devices by up to 2.75x, and Kiln introduces an interactive tool for LLM fine-tuning and synthetic data generation.

News

Gemini 2.0: our new AI model for the agentic era

Google DeepMind has introduced Gemini 2.0, a new AI model designed for the "agentic era" with native image and audio output and tool use capabilities. Gemini 2.0 Flash is now available to developers and trusted testers, with wider availability planned for early next year, as part of Google's efforts to build AI responsibly with safety and security as key priorities.

OnlyFans models are using AI impersonators to keep up with their DMs

OnlyFans models are increasingly using AI impersonators to manage their direct messages, replacing human "chatters" who were previously paid to impersonate them in online conversations. These AI tools, offered by startups like ChatPersona and Supercreator, can generate responses to fans and even sort them based on their spending habits, leading to increased sales for the models.

TSMC founder says Intel has neither a strategy nor a CEO

TSMC founder Morris Chang says Intel has neither a strategy nor a CEO, and that the company should have focused on developing AI processors rather than advanced process technologies. Chang believes Intel could have made more money by developing competitive AI chips, citing Nvidia's tens of billions of dollars in annual revenue from AI processors.

Machine Learning-Driven Static Profiling for GraalVM Native Image

Machine learning (ML) was used to develop GraalSP, a static profiler that predicts program execution profiles, which was integrated into Oracle GraalVM Native Image, resulting in a 7.5% improvement in runtime performance. GraalSP uses a machine learning model to predict a program's profile based on static features, eliminating the need for dynamic profiling and reducing the complexity of the optimization process.

Apple is working on AI chip with Broadcom

Apple is working with Broadcom to develop a custom artificial intelligence (AI) chip, according to a report by The Information.

Research

Asynchronous LLM Function Calling

AsyncLM is a system that enables large language models (LLMs) to make asynchronous function calls, allowing them to generate and execute calls concurrently and improving operational efficiency. This approach reduces end-to-end task completion latency by 1.6x-5.4x compared to synchronous function calling, and also enables potential novel human-LLM or LLM-LLM interactions.

Reachability Analysis of DNS

Researchers have developed a decision procedure to verify the security and reliability of the complex DNS system, determining its complexity as 2ExpTime. This framework can be used to model and analyze potential DNS attacks, such as amplification attacks and rewrite blackholing.

Memory-Aware Stream Processing for Attention Acceleration on Edge Devices

The attention mechanism in foundation models, while effective, results in high computational complexity, making it challenging to accelerate on resource-constrained edge neural accelerators. A proposed scheme achieves up to 2.75x speedup and 54% energy reduction by parallelizing heterogeneous compute units and utilizing a multi-tiered tiling scheme, with results confirmed through simulations and real-world experiments.

Hymba: A Hybrid-Head Architecture for Small Language Models

Hymba is a family of small language models that combines transformer attention mechanisms with state space models for enhanced efficiency, and features learnable meta tokens and optimized architecture. The Hymba-1.5B-Base model achieves state-of-the-art results, outperforming larger models like Llama-3.2-3B in accuracy while significantly reducing cache size and increasing throughput.

Cyborg Insect Factory

Researchers developed an automatic assembly method for insect-computer hybrid robots, using a robotic arm and deep learning-based vision system to implant electrodes in Madagascar hissing cockroaches. The automatically assembled robots demonstrated effective control and navigation, matching manually assembled systems, and showed potential for mass production and practical applications.

Code

Show HN: Minima-RAG On-Premises Python Framework

Minima is an open-source RAG (Retrieval-Augmented Generation) on-premises container that can integrate with ChatGPT and MCP, allowing users to query local documents while keeping data secure. Minima supports three modes: isolated installation, custom GPT, and Anthropic Claude, and can be set up using a .env file and Docker compose commands.

Show HN: Rezible – Open-Source Mission Control for Oncall

Rezible is an open-source platform designed to support, automate, and report on oncall processes, aiming to reduce the burden on oncall teams and improve overall oncall health. The platform is currently under heavy development, but it promises to offer features such as automated handovers, AI-powered post-incident debriefs, and data collection and visualization.

Machine Learning Mischief

Machine learning experiments can be manipulated to achieve a preconceived goal by exploiting evaluation metrics and scientific tests, a practice known as "gaming" or "cherry picking" that is considered unethical. Examples of such manipulations include seed hacking, cross-validation hacking, and p-hacking, and can be identified by probing questions that challenge the experimental method and its choices.

Show HN: Kiln - Interactive LLM fine-tuning, dataset collab & synthetic data gen

Kiln is a tool for fine-tuning large language models (LLMs), generating synthetic data, and collaborating on datasets, offering an intuitive desktop app and open-source library and API. It supports various models and providers, including Ollama, OpenAI, and AWS, and prioritizes data correctness and user privacy.

Vision Is All You Need: V-RAG (Vision RAG) Demo

The V-RAG (Vision RAG) architecture utilizes a vision language model to embed pages of PDF files as vectors, allowing for efficient search and retrieval without the need for chunking. The demo uses a VLM to convert PDF pages to images, generates embeddings, and stores them in a database, enabling users to search for similar pages using a query.