Saturday — February 22, 2025

Yahoo Mail's AI starts hallucinating fake sneaker win emails, the Unsloth Efficient GRPO algorithm slashes VRAM use by 90% for long-context models, and DeepSeek-ai opens its AGI exploration with 5 public repositories in a week.

News

Who needs a sneaker bot when AI can hallucinate a win for you?

Jordan Brand's launch of a recreated shoe to mark the 40th anniversary of Michael Jordan's signature shoes at the 1985 All-Star Dunk Contest led to a surge in demand, but the launch was marred by a bizarre issue where some users received conflicting emails due to Yahoo Mail's new AI-generated email summaries feature. The AI feature was "hallucinating" fake winner messages, causing confusion among sneaker fans, and the issue is still live in Yahoo Mail, potentially affecting future launches and important emails.

Long-Context GRPO

The Unsloth Efficient GRPO algorithm enables 10x longer context lengths while using 90% less VRAM compared to other GRPO implementations, allowing for more efficient training of reasoning models. With this algorithm, training a model like Llama 3.1 (8B) at a 20K context length requires only 54.3GB of VRAM, compared to 510.8GB required by standard implementations.

Show HN: BookWatch – Animated book summaries for visual learners

The text appears to be a list of book summaries from various categories, including literature, mindset, philosophy, psychology, relationships, science, self-development, and technology, as well as a business category. The list includes books such as "Hackers and Painters", "Dotcom Secrets", "Sapiens", and "Essays of Warren Buffet", with each book having a corresponding video summary with a specific number of ideas and minutes. The list seems to be a collection of educational resources for personal and professional development.

Show HN: The Internet's Open Source AI Paywall

The DarkForest Protocol is an open-source AI paywall for the internet, allowing users to join the movement with just one line of code. It provides a way to resist or block certain online content, with more information available on its manifesto and protocol pages, as well as an npm package called darkforest-blocker.

Untangling AI Agent authn/authz

AI agents are expected to change the way ID authorization works, requiring a new approach to permissioning that supports sophisticated AI workflows while protecting sensitive data. Stytch's new Connected Apps platform allows any SaaS company to become its own identity provider, enabling AI agents and third-party apps to securely authenticate and access data on behalf of users, with features such as secure session sharing and human-in-the-loop authorization.

Research

Strategic Wealth Accumulation Under Transformative AI Expectations

The introduction of Transformative AI (TAI) is expected to redirect labor income from workers to those controlling AI systems, with wealthier households gaining more control over automated labor. This shift is predicted to lead to significant increases in interest rates, potentially rising to 10-16%, as households prioritize wealth accumulation in anticipation of TAI, with important implications for monetary policy and financial stability.

Fire-Flyer AI-HPC: Cost-Effective Software-Hardware Co-Design for Deep Learning

The rapid advancement of Deep Learning and Large Language Models has led to increased demands for computational power and bandwidth, resulting in higher construction costs for High Performance Computing systems. The Fire-Flyer AI-HPC architecture addresses these challenges, achieving significant cost and energy reductions while maintaining performance through a synergistic hardware-software co-design framework and optimized software stack.

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Meta MLGym and MLGym-Bench are a framework and benchmark for evaluating and developing large language model (LLM) agents on AI research tasks, consisting of 13 diverse tasks from domains like computer vision and natural language processing. The framework allows for easy integration and evaluation of models, and while current frontier models can improve on baselines, they do not generate novel hypotheses or substantial improvements, highlighting the need for further research in advancing LLM capabilities.

Presumed Cultural Identity: How Names Shape LLM Responses

Names are closely tied to human identity, but using them as a core indicator can oversimplify complex identities, and this issue is particularly relevant when interacting with large language models (LLMs) that use names for personalization. Research has shown that LLMs make strong cultural assumptions based on names, which can lead to stereotyping, and highlights the need for more nuanced personalization systems that avoid reinforcing these biases.

Some critical issues with the SWE-bench dataset

The SWE-bench dataset, used to evaluate Large Language Models (LLMs) in software engineering, has significant quality issues, including "solution leakage" where 32.67% of successful patches had solutions provided in the issue report, and weak test cases for 31.08% of passed patches. After filtering out these issues, the resolution rate of the top-performing model, SWE-Agent+GPT-4, dropped from 12.47% to 3.97%, highlighting the need for a more rigorous evaluation of LLMs in software engineering.

Code

DeepSeek Open Infra: Open-Sourcing 5 AI Repos in 5 Days

The DeepSeek-ai team is opening up 5 of their repositories to the public, one each day, as part of their Open-Source Week, to share their progress in AGI exploration with full transparency. The team aims to accelerate their journey through collective momentum and community-driven innovation, starting with the release of their repositories next week.

Show HN: Agents.json – Open-source API specification for LLMs

The agents.json Specification is an open standard that enables AI agents to interact with APIs by formally describing contracts for their interactions, building on top of the OpenAPI standard. The specification allows API providers to create a JSON file that describes how their API endpoints can be used by AI agents, making it easier for agents to discover and execute the correct series of API calls to achieve a specific outcome.

Show HN: MCP Guardian – Govern and Secure LLM Tool Usage

MCP Guardian is a tool that manages access to MCP servers for LLM assistants, providing real-time control and features such as message logging, approvals, and automated scans. The project is developed using Nix and can be built and run on Linux, macOS, and Windows using a series of installation and build steps outlined in the documentation.

Only 40 GitHub Stars After 1 Year. Is AI the Only Way to Succeed Now?

There is no text to summarize. The provided input appears to be an error message indicating that a README file could not be retrieved.

Show HN: A pure TypeScript library for editing videos in the Browser

@diffusionstudio/core is a 2D motion graphics and video rendering engine powered by WebCodecs, allowing for client-side video editing automations and the building of editing web apps. The library is lightweight, fast, and features a range of tools, including video trimming, layering, and text rendering, making it ideal for integration into existing projects, with a free non-commercial license available for non-monetized projects.