Sunday — January 26, 2025

Oracle's Larry Ellison envisions AI surveillance for citizen behavior monitoring, Arsenal seeks AI experts for football analytics, while "Textcoder" uses LLMs and steganography for secret message encoding.

News

Emerging Reasoning with Reinforcement Learning

Researchers achieved strong results in complex mathematical reasoning using a 7B model with only 8K examples, outperforming larger models that use more data and complicated components. The model, trained using reinforcement learning with a simple reward function, demonstrated the emergence of long Chain-of-Thought and self-reflection patterns, and achieved pass@1 accuracy of 33.3% on AIME, 62.5% on AMC, and 77.2% on MATH.

Larry Ellison: vast AI surveillance can ensure citizens are on best behavior (2024)

Billionaire Larry Ellison, co-founder of Oracle, believes that AI-fueled surveillance systems will become ubiquitous, monitoring citizens and ensuring they "will be on their best behavior" through constant recording and reporting. Ellison envisions a future where AI-powered drones replace police cars in high-speed chases and every police officer is supervised at all times, with AI reporting any problems to the appropriate authorities.

Arsenal FC AI Research Engineer job posting

Arsenal Football Club is seeking a Research Engineer to apply AI and deep learning expertise to drive data-driven decision making in football analytics, working on projects such as player recruitment and match preparation. The ideal candidate will have a strong quantitative background, experience with deep learning, and excellent communication skills, with the ability to work independently and collaboratively as part of a multidisciplinary team.

AI slop, suspicion, and writing back

The author has developed a keen sense of detecting AI-generated content, which they refer to as "AI slop," and notes that it has become increasingly prevalent online, particularly on platforms like LinkedIn, Twitter, and Reddit. The proliferation of AI-generated content has led to a sense of paranoia, with the author constantly questioning the originality of what they read, and they hope that the value of human writing will remain high enough to incentivize people to continue producing original content.

Tool touted as 'first AI software engineer' is bad at its job, testers claim

Devin, a tool touted as the "first AI software engineer," has been found to be ineffective at its job, completing only 3 out of 20 tasks successfully in a recent evaluation. The AI agent, which uses multiple underlying AI models, including OpenAI's GPT-4, was found to produce overly complex and unusable solutions, and often got stuck in technical dead-ends, despite its polished user experience.

Research

Advancing Language Model Reasoning Through RL and Inference Scaling

Large language models (LLMs) have shown impressive capabilities in complex reasoning tasks, but existing approaches have limitations in terms of test-time scaling. The proposed model, T1, addresses this issue by scaling reinforcement learning through techniques such as oversampling and entropy bonuses, resulting in superior performance on math reasoning benchmarks and exhibiting inference scaling behavior.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL

DeepSeek-R1-Zero, a model trained through large-scale reinforcement learning, demonstrates strong reasoning capabilities but has limitations such as poor readability. DeepSeek-R1, which incorporates multi-stage training, overcomes these issues and achieves comparable performance to other models, with both models and their distilled versions being open-sourced for the research community.

The Unbearable Slowness of Being: Why do we live at 10 bits/s?

The human brain processes information at a rate of about 10 bits per second, despite receiving sensory input at a much faster rate of ~10^9 bits per second, leaving a significant gap in understanding the neural basis for this slow pace. The brain's "inner" and "outer" modes, which handle behavioral control and sensory/motor signals respectively, suggest a need for new research to explain why the brain requires billions of neurons to process such a relatively small amount of information.

Kimi K1.5: Scaling Reinforcement Learning with LLMs

The training practice of Kimi k1.5, a multi-modal large language model, achieved state-of-the-art reasoning performance across multiple benchmarks by utilizing reinforcement learning and optimized techniques. The model's approach, which includes long context scaling and improved policy optimization methods, outperformed existing models, including GPT-4o and Claude Sonnet 3.5, by a significant margin, with improvements of up to 550%.

Kimi K1.5 Technical Report

The training practice of Kimi k1.5, a multi-modal large language model, achieved state-of-the-art reasoning performance across multiple benchmarks by utilizing reinforcement learning and long context scaling. The model's effective framework and techniques, such as long2short methods, enabled it to outperform existing models, including GPT-4o and Claude Sonnet 3.5, by a significant margin, with improvements of up to 550%.

Code

Show HN: Steganographically encode messages with LLMs and Arithmetic Coding

Textcoder is a tool that uses steganography to encode secret messages into ordinary-looking text, utilizing a large language model (LLM) and arithmetic coding to produce text that appears random but actually contains the encoded message. The encoded text can be decoded back into the original message using a password, and the tool is packaged using Poetry and utilizes the Llama 3.2 language model, with installation and usage instructions provided.

Show HN: Orange intelligence, an open source alternative to Apple Intelligence

Orange Intelligence is a customizable productivity tool for macOS that allows users to capture, process, and replace text across any application, leveraging the power of large language models and Python functions. It offers a range of features, including a floating text processor, global variable replacement, and extensibility through custom Python logic, making it a powerful tool for developers, researchers, and AI enthusiasts.

Show HN: Open-Source Bloomberg Terminal Alternative for Investment Research

Fincept Investments is a comprehensive CLI tool that provides financial insights, market analysis, and various financial services, including technical, fundamental, sentiment, and quantitative analysis, to help investors and professionals navigate the complex world of investments. The tool offers a range of features, including dynamic asset searching, economic data, portfolio management, and robo-advisory services, and is available for installation via PyPI or a Windows installer.

Show HN: OSS AI assistant for answering questions from Docs and GitHub Issues

Ragpi is an open-source AI assistant that uses large language models to search and answer questions from technical sources, building knowledge bases from documentation websites, GitHub issues, and repository README files. It provides a REST API for interacting with the AI assistant, supports various LLM providers and source connectors, and offers flexible deployment options, including local Docker, remote Docker, API-only, and custom deployments.

Show HN: LLM-Reasoner:Make any LLM to think deeper like OpenAI o1

The LLM Reasoner is a tool that enables step-by-step reasoning for large language models (LLMs), providing transparent and explainable answers. It supports multiple LLM providers, including OpenAI, Anthropic, and Google, and offers features such as real-time progress, confidence tracking, and a user-friendly interface.