Saturday March 15, 2025

ProPublica's AI reveals innocuous reasons for "woke" grants, PromptPex automates unit test generation for LLM prompts, and OmniAI offers a unified Ruby API for AI provider integration.

News

How ProPublica Uses AI in Its Investigations

ProPublica used a large language model AI to analyze a database of grants released by Sen. Ted Cruz, which he claimed were "woke" and promoted diversity, equity, and inclusion. The AI helped reporters identify patterns and themes in the grants, revealing that many were flagged for innocuous reasons, such as using words like "diversify" or "female", and were unrelated to the social or economic themes cited by Cruz's committee.

Apple's Siri Chief Calls AI Delays Ugly and Embarrassing, Promises Fixes

Unusual activity has been detected from your computer network, and to continue, you must verify you're not a robot by clicking a box, ensuring your browser supports JavaScript and cookies. If you need help or have inquiries related to this message, you can contact the support team and provide the reference ID: 0ddc76e9-016a-11f0-9d78-7a6bdf099705.

OpenAI declares AI race "over" if training on copyrighted works isn't fair use

OpenAI is urging the US government to declare training on copyrighted works as fair use, warning that if this doesn't happen, the US will lose the AI race to China. The company claims that access to copyrighted data is crucial for AI development and that without it, Chinese companies will have an unfair advantage, allowing them to develop more advanced AI models that could potentially harm US national security and economic competitiveness.

AI scientists are sceptical that modern models will lead to AGI

Most AI researchers, about 76%, believe it is unlikely that current AI models will lead to artificial general intelligence with human-level capabilities, despite tech companies investing billions of dollars in this goal. The performance of current models, such as transformer models, has plateaued, and many experts think that simply scaling up these models will not be enough to achieve superintelligent systems.

AI search engines cite incorrect sources at an alarming 60% rate, study says

A recent study by the Columbia Journalism Review's Tow Center for Digital Journalism found that AI search engines incorrectly cite sources at an alarming 60% rate, with some models providing incorrect information as much as 94% of the time. The study tested eight AI-driven search tools and discovered that they not only provided inaccurate answers but also often ignored publisher exclusion requests and fabricated URLs, raising concerns about reliability and transparency in AI-generated search results.

Research

PromptPex: Automatic Test Generation for Language Model Prompts

Large language models (LLMs) are being integrated into software applications through code-like prompts, but these prompts require new approaches to ensure robustness due to their unique characteristics, such as dependence on the AI model interpreting them. PromptPex, a tool developed to address these issues, automatically generates and evaluates unit tests for given prompts, allowing for the identification of regressions and understanding of how prompts are interpreted by different models.

Block Diffusion: Interpolating between autoregressive and diffusion models

Block diffusion language models combine the benefits of discrete denoising diffusion and autoregressive models, allowing for flexible-length generation and improved inference efficiency. This approach sets a new state-of-the-art performance on language modeling benchmarks, enabling the generation of arbitrary-length sequences and overcoming key limitations of existing models.

A Proof of the Collatz Conjecture

A new fixed point theorem in metric spaces is presented, and its application to the Collatz conjecture is explored. The theorem is utilized to examine the Collatz conjecture, a well-known problem in mathematics.

Transformers Without Normalization

Transformers without normalization layers can achieve similar or better performance using Dynamic Tanh (DyT), a simple element-wise operation that replaces normalization layers. By incorporating DyT, Transformers can match or exceed the performance of their normalized counterparts across various settings, challenging the conventional understanding that normalization layers are essential in modern neural networks.

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Researchers have developed a method called PreSelect to efficiently select pretraining data for language models, which can reduce compute requirements by up to 10 times while maintaining or improving performance. PreSelect uses a lightweight scorer to identify data that is predictive of a model's downstream abilities, and experiments have shown it to outperform other data selection methods and achieve state-of-the-art results with significantly less training data.

Code

Show HN: LLM-docs, software documentation intended for consumption by LLMs

LLM-Docs is a project that provides optimized documentation for Large Language Models (LLMs), streamlining traditional documentation to maximize efficiency and effectiveness for LLM-assisted programming tasks. The project involves collecting, distilling, and organizing documentation for popular open-source libraries, removing redundant content and formatting to create a compact, clear, and LLM-friendly format.

OmniAI: A unified Ruby API for integrating with AI providers

OmniAI is a unified Ruby API for integrating with multiple AI providers, including Anthropic, DeepSeek, Google, Mistral, and OpenAI, offering a consistent interface for features like chat, text-to-speech, and embeddings. It allows for effortless switching between providers, making integrations more flexible and reliable, with examples and documentation provided for various use cases.

Show HN: Guide to transform fragile AI agents into production-ready systems

This guide demonstrates how to transform a fragile AI-powered marketplace assistant into a production-ready multi-agent application using orra, a platform for building robust AI applications. The guide progresses through three stages, addressing common production challenges in multi-agent AI systems, including architecture, reliability, and domain grounding, to create a reliable and efficient marketplace assistant.

Ruby LLM

RubyLLM is a Ruby library that provides a simple and unified API for working with various AI providers, including OpenAI, Anthropic, Gemini, and DeepSeek, allowing developers to easily integrate AI capabilities into their applications. The library offers a range of features, including chat, vision and audio understanding, PDF analysis, image generation, and embeddings, making it easy to work with AI in a Ruby environment.

Owl: Optimized Workforce Learning for General Multi-Agent in Realworld Task Auto

OWL is a cutting-edge framework for multi-agent collaboration that achieves a 58.18 average score on the GAIA benchmark, ranking it #1 among open-source frameworks. It enables natural, efficient, and robust task automation across diverse domains by leveraging dynamic agent interactions and providing a range of features, including online search, multimodal processing, browser automation, and built-in toolkits.