Saturday — April 5, 2025
AI bots cause a 50% surge in Wikimedia bandwidth, "DeepSeek-GRM" advances reward modeling for LLMs, and AReaL system excels in mathematical reasoning with reinforcement learning.
News
Understanding Machine Learning: From Theory to Algorithms
The book "Understanding Machine Learning: From Theory to Algorithms" by Shai Shalev-Shwartz and Shai Ben-David is available for free download as a PDF. The book, published by Cambridge University Press in 2014, can be downloaded for personal use only, but not for distribution.
A Man Out to Prove How Dumb AI Still Is
François Chollet, a French computer scientist, is skeptical of claims that AI models are nearing artificial general intelligence (AGI), and has created a test to evaluate their true capabilities. Chollet believes that current AI models, despite being able to pass specific tests, are not genuinely intelligent and are limited to imitating basic tasks, rather than displaying true ingenuity and problem-solving abilities.
AI bots strain Wikimedia as bandwidth surges 50%
The Wikimedia Foundation, which hosts Wikipedia and other open platforms, is experiencing a significant strain on its servers due to automated AI bots scraping data for training purposes, resulting in a 50% surge in bandwidth usage since January 2024. The foundation is working to address the issue through technical solutions and a new initiative, "Responsible Use of Infrastructure," to establish sustainable boundaries and guide developers towards less resource-intensive access methods.
Microsoft employee disrupts 50th anniversary and calls AI boss 'war profiteer'
A Microsoft employee, Ibtihal Aboussad, disrupted the company's 50th anniversary event to protest its use of AI, calling Microsoft AI CEO Mustafa Suleyman a "war profiteer" and accusing the company of complicity in the genocide of Palestinians. Aboussad, a software engineer, sent an email to hundreds of Microsoft employees after being ushered out of the event, detailing her concerns about Microsoft's $133 million contract with Israel's Ministry of Defense and the use of AI to spy on and target Palestinians.
Query GPT: Transform Natural Language into SQL
Query GPT is a tool that instantly transforms natural language questions into database queries, supporting multiple database types and featuring advanced language models, syntax highlighting, and smart schema analysis. Users can enter their question, provide an optional database schema, and generate a query that can be copied and used in their database system or shared with their team.
Research
DeepSeek: Inference-Time Scaling for Generalist Reward Modeling
Researchers have been working to improve the scalability of large language models using reinforcement learning, with a focus on developing more effective reward modeling methods. The proposed approach, called DeepSeek-GRM, utilizes pointwise generative reward modeling and a new learning method called Self-Principled Critique Tuning to achieve better performance and scalability, outperforming existing methods in various benchmarks.
A Study of Undefined Behavior Across Foreign Function Boundaries in Rust Libs
Developers using the Rust programming language to build secure applications often interoperate with other languages, which can introduce bugs that Rust's dynamic analysis tool, Miri, cannot detect. A large-scale evaluation found 46 instances of undefined or undesired behavior in 37 Rust libraries that call foreign functions, highlighting the need for new tooling to detect these errors in multi-language applications.
The Android Platform Security Model v3 (2023)
Android's security model must balance security, privacy, and usability due to its wide range of use cases and potential threats. This paper examines the Android threat model, its implications, and how various security measures work together to mitigate threats, including deliberate deviations from the model and their impact.
Faith and Fate: Limits of Transformers on Compositionality
Transformer large language models (LLMs) have shown impressive performance on complex tasks, but also struggle with simpler problems, raising questions about their limitations. Researchers investigated LLMs' performance on compositional tasks, such as multi-digit multiplication and logic puzzles, and found that they solve these tasks by breaking them down into simpler sub-steps, but may not develop systematic problem-solving skills, leading to rapid decay in performance as task complexity increases.
Towards Efficient Flash Caches with Emerging NVMe Flexible Data Placement SSDs
NVMe Flash-based SSDs in data centers face challenges in managing Flash overprovisioning and endurance, which can lead to reduced device lifetime or increased host overprovisioning. The NVMe Flexible Data Placement (FDP) proposal offers a solution, and its implementation in CacheLib, a popular open-source Flash cache, has been shown to reduce device write amplification, carbon emissions, and power consumption with minimal overhead.
Code
Show HN: OCR pipeline for ML training (tables, diagrams, math, multilingual)
This OCR system is designed to extract structured data from complex educational materials, such as exam papers, and optimize it for machine learning training, supporting multilingual text, mathematical formulas, tables, diagrams, and charts. The system achieves high accuracy, over 90-95%, and generates AI-ready outputs in JSON or Markdown format, including human-readable descriptions of mathematical expressions, table summaries, and figure captions.
Vibe License – An unconventional license for AI-generated code
The Vibe License is an unconventional license for AI-generated code, allowing free use but with a caveat that its public domain status is uncertain. Users are encouraged to "vibe responsibly" and can access the license details on the live site or review the license file directly.
TorchSim: An atomistic simulation engine for the AI era
TorchSim is a next-generation open-source atomistic simulation engine that accelerates machine learning potentials by rewriting core primitives in Pytorch, allowing for significant simulation speedup. It supports various MLIP models, classical potentials, and molecular dynamics integration schemes, and provides a simple and intuitive high-level API, with examples and tutorials available in its documentation.
Show HN: Relysium – Cursor Alternative for Emacs
Relysium is a tool that integrates AI models into the coding workflow, providing features such as code generation from comments, AI-powered code completion, and code explanation capabilities. It can be installed using Quelpa or Straight, and its usage involves various key bindings and commands to interact with the AI models and improve the coding experience.
AReaL, Distributed Reinforcement Learning System for LLM Reasoning
AReaL is an open-sourced reinforcement learning training system for large language models, developed by the RL Lab at Ant Research, which aims to help users build their own AI agents easily and affordably. The system has achieved state-of-the-art performance in mathematical reasoning and has been released with full training details, data, and infrastructure to reproduce the models.