Monday April 7, 2025

Microsoft's controversial AI-powered Quake 2 demo raises industry concerns, SeedLM compresses LLM weights into pseudo-random generator seeds achieving a 4x speed-up, and TripoSG sets a new standard in text-to-3D model generation with high-fidelity results.

News

Recent AI model progress feels mostly like bullshit

The author, who founded a company leveraging AI to monitor large codebases for security problems, has found that despite improvements in AI models, their tool's performance has not significantly improved since the release of Claude 3.5 sonnet. The author believes that the reported gains in AI capabilities are not reflective of economic usefulness or generality, and that the industry's benchmarks and metrics may be flawed, potentially due to AI labs exaggerating their models' capabilities for competitive advantage.

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

The authors introduce SeedLM, a novel post-training compression method for Large Language Models (LLMs) that uses pseudo-random generator seeds to encode and compress model weights, reducing memory access and speeding up memory-bound tasks. SeedLM achieves zero-shot accuracy retention and performance comparable to state-of-the-art methods, with experiments showing a 4x speed-up over an FP16 baseline for 4-bit compression, making it a promising approach for deploying LLMs.

Standard Ebooks: liberated ebooks, carefully produced for the true book lover

Standard Ebooks is a volunteer-driven project that produces high-quality, free ebooks of public domain works, carefully formatted and proofread to a professional standard, with features such as modern typography, rich metadata, and state-of-the-art technology. The ebooks are not only free to download but also open-source and released into the public domain, allowing anyone to access, modify, and distribute them without restrictions.

Microsoft's Quake 2 AI experiment sparks negative reactions

Microsoft has launched a version of Quake 2 powered by generative AI, which has sparked negative reactions from many social media users who criticize its use of AI and question its potential impact on game development and employment. The demo, which can be played via a web browser, showcases the capabilities of Microsoft's new AI model, Muse, but has been met with skepticism and concerns about the role of AI in the gaming industry.

Longtime Writing Community NaNoWriMo Shuts Down After AI Drama

NaNoWriMo, a non-profit organization that hosted the annual National Novel Writing Month, has shut down due to financial issues and controversy surrounding its stance on artificial intelligence and content moderation. The organization had faced backlash from its community over its partnership with Inkitt, a publishing platform accused of being a scam, and its handling of allegations against a content moderator, as well as its statement on AI use, which some members felt was classist and ableist.

Research

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

State-of-the-art large language models achieve impressive performance on mathematical competitions, but when evaluated on their ability to provide rigorous reasoning and proof generation, they struggle significantly, with models achieving less than 5% on average on challenging mathematical problems. This highlights the need for substantial improvements in reasoning and proof generation capabilities, as current models are inadequate for real-world mathematical tasks despite their ability to produce correct numerical answers.

ScienceWorld: Is your Agent Smarter than a 5th Grader?

ScienceWorld is a benchmark that tests agents' scientific reasoning abilities in an interactive text environment, revealing that current models struggle to apply learned concepts in novel contexts. Experiments show that an agent trained interactively in ScienceWorld outperforms a much larger model trained statically, supporting the hypothesis that interactive environments are necessary for developing reusable scientific reasoning capabilities.

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

State-of-the-art large language models achieve impressive performance on mathematical competitions, but when evaluated on their ability to provide rigorous reasoning and proof generation, they struggle significantly, with models achieving less than 5% on average. This highlights the need for substantial improvements in reasoning and proof generation capabilities, as current models are inadequate for real-world mathematical tasks that require more than just final numerical answers.

Navigating Decentralized Online Social Networks

Decentralized online social networks, such as Mastodon and Bluesky, have experienced significant growth and adoption, with various platforms building upon different decentralization architectures. This paper provides a comprehensive view of the current landscape, examining four major architectures and their evolution, to better understand the direction and limitations of decentralization and its societal implications.

Batched Ranged Random Integer Generation

A new algorithm efficiently generates multiple independent uniformly-random bounded integers from a single 64-bit binary word without statistical bias. This method, which typically requires only one multiplication operation per value, can more than double the speed of unbiased random shuffling for small to moderately large arrays.

Code

TripoSG – Text to 3D Model

TripoSG is a high-fidelity 3D shape synthesis model that leverages large-scale rectified flow transformers and hybrid supervised training to achieve state-of-the-art performance in 3D shape generation. It can produce high-quality meshes with sharp geometric features, fine surface details, and complex structures, and is capable of handling diverse input styles and challenging inputs with complex topology.

Show HN: Understand GitHub repos, PDF docs through mindmaps, RAG search

The MindPalace tool is designed to help users navigate and understand large GitHub repositories and PDF documents by generating file-wise explanations, workflow breakdowns, and mind maps, with an AI-powered Q&A feature. To run MindPalace locally, users can follow a series of steps including installing required libraries, setting up API keys, and running the app using Streamlit, with no downloads or API keys required to use the web version.

Show HN: OS Automation Is Here

Ion is a CLI-based AI-powered assistant that executes OS-level commands based on natural language input, supporting Windows, macOS, and Linux. It can perform various tasks, such as opening apps, searching online, and managing files, and is customizable with API keys and models, with plans for future features like file indexing and speech recognition.

Show HN: AI Explains Complex Codebase in 5 Minutes (Open Sourced)

This project uses AI to analyze GitHub repositories and create beginner-friendly tutorials that explain how the code works, providing a clear understanding of complex codebases. The AI agent, built using the Pocket Flow framework, crawls the repository, identifies core abstractions, and generates tutorials with visualizations, allowing users to easily comprehend the codebase.

Eagle-3 Speculative Decoding for LLM Inference (5.6x speedup)

EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency) is a method for fast decoding of Large Language Models (LLMs) that maintains provable performance, achieving significant speedups over vanilla decoding. The EAGLE approach has been improved upon with EAGLE-2 and EAGLE-3, which further enhance performance, with EAGLE-3 being 5.6 times faster than vanilla decoding for a 13B model.