Monday January 27, 2025

Strong math reasoning emerges in small models via simple RL, scientists exploit microcode for data leaks with uSpectre, and LLM Reasoner refines step-by-step reasoning for AI.

News

Emerging reasoning with reinforcement learning

Researchers achieved strong results in complex mathematical reasoning using a 7B model with only 8K examples, outperforming larger models that use more data and complicated components. The model, trained using reinforcement learning with a simple reward function, demonstrated the emergence of long Chain-of-Thought and self-reflection patterns, achieving pass@1 accuracy of 33.3% on AIME, 62.5% on AMC, and 77.2% on MATH.

AI slop, suspicion, and writing back

The author has developed a keen sense of detecting AI-generated content, which they refer to as "AI slop," and notes that it has become increasingly prevalent online, particularly on platforms like LinkedIn, Twitter, and Reddit. The proliferation of AI-generated content has led to a sense of paranoia, with the author constantly questioning the originality of what they read, and they hope that the value of human writing will remain high enough to incentivize people to continue producing original content.

When AI promises speed but delivers debugging hell

The author has launched a new app called Codescribble, a basic shared text editor that allows multiple users to collaborate on a document in real-time. The author built Codescribble using TypeScript and relied heavily on Large Language Models (LLMs) to speed up the development process, but encountered significant frustration and challenges during deployment, particularly with automated deployment scripts and environment variables.

Knowing less about AI makes people more open to using it

People with less knowledge about artificial intelligence (AI) are actually more open to using the technology, a phenomenon known as the "lower literacy-higher receptivity" link, due to their perception of AI as magical and awe-inspiring. This link is strongest for AI tools used in areas associated with human traits, such as emotional support, and poses a challenge for policymakers and educators who must balance boosting AI literacy with maintaining enthusiasm for its adoption.

USA restricts Swiss access to AI computer chips

The US has restricted Switzerland's access to advanced AI computer chips, excluding it from a group of 18 trusted allies with unlimited access, and instead placing it in a second group with limited imports. Swiss Economics Minister Guy Parmelin has criticized the decision, stating that it is difficult to understand and could be counterproductive for the US, as Swiss research institutions and companies also produce innovations that are important for the US.

Research

The Matrix Calculus You Need for Deep Learning (2018)

This paper aims to explain the matrix calculus required to understand the training of deep neural networks, assuming only basic calculus knowledge. It is intended for those already familiar with neural networks who want to deepen their understanding of the underlying math, with additional resources and support available for those who get stuck.

DeepSeekMath

DeepSeekMath 7B, a language model, has achieved a score of 51.7% on the MATH benchmark by leveraging 120B math-related tokens and a novel optimization technique called Group Relative Policy Optimization (GRPO). This model's mathematical reasoning capabilities approach those of top-performing models like Gemini-Ultra and GPT-4, with self-consistency achieving 60.9% on the MATH benchmark over 64 samples.

Kimi K1.5: Scaling Reinforcement Learning with LLMs

The training practice of Kimi k1.5, a multi-modal large language model, achieved state-of-the-art reasoning performance across multiple benchmarks by utilizing reinforcement learning and long context scaling. The model's effective RL framework and long2short methods enabled it to outperform existing models, including GPT-4o and Claude Sonnet 3.5, by a significant margin, with improvements of up to 550%.

Analyzing and Exploiting Branch Mispredictions in Microcode [pdf]

Researchers have discovered uSpectre, a new class of transient execution attacks that exploit microcode branch mispredictions to leak sensitive data, which also encompasses many previously known Spectre and Meltdown variants. The discovery of uSpectre has led to the identification of new attacks and the development of a defense mechanism called uSLH to mitigate these vulnerabilities.

Autonomy-of-Experts Models (ArXiv)

Mixture-of-Experts (MoE) models typically use a router to assign tokens to expert modules, but this separation can lead to suboptimal expert selection and ineffective learning. The proposed Autonomy-of-Experts (AoE) paradigm addresses this issue by allowing experts to autonomously select themselves based on their internal activations, resulting in improved expert selection and effective learning, and outperforming traditional MoE models with comparable efficiency.

Code

Show HN: Orange intelligence, an open source alternative to Apple Intelligence

Orange Intelligence is a customizable productivity tool for macOS that allows users to capture, process, and replace text across any application, leveraging the power of large language models and Python functions. It offers a range of features, including a floating text processor, global variable replacement, and the ability to run any Python function, making it a powerful tool for developers, researchers, and AI enthusiasts.

Show HN: Voice Cloning and Multilingual TTS in One Click (Windows)

Voice-Pro is a powerful AI-powered web application that offers a range of multimedia content processing features, including YouTube video downloading, speech recognition, translation, and text-to-speech, with support for over 100 languages. The tool provides an all-in-one solution for content creators, researchers, and multilingual communication professionals, with advanced features such as zero-shot voice cloning, professional vocal isolation, and instant translation across multiple languages.

Show HN: LLM-Reasoner:Make any LLM to think deeper like OpenAI o1

The LLM Reasoner is a tool that enables step-by-step reasoning for large language models (LLMs), providing transparent and explainable answers. It supports multiple LLM providers, including OpenAI, Anthropic, and Google, and offers features such as real-time progress, confidence tracking, and a user-friendly interface.

Deepseek open models seem to follow CCP party-line

DeepSeek-R1 is a reasoning model that achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks, and its distilled models, such as DeepSeek-R1-Distill-Qwen-32B, outperform OpenAI-o1-mini on various benchmarks. The model is trained using large-scale reinforcement learning and incorporates cold-start data to address issues with endless repetition, poor readability, and language mixing, and its architecture and performance are openly available for the research community.

Talk with GPT-4o from a simple keyboard shortcut

The GPT Voice Menubar app can be installed using a simple bash command and used by setting an OPENAI_API_KEY, then accessing it from the menubar. To send a voice message, users hold down the control, option, and command keys while speaking, releasing them when finished.