Thursday — February 13, 2025

A YouTube vulnerability exposing user emails for $10k raises security concerns, Mixture-of-Agents outperforms GPT-4 Omni with 65.1% on AlpacaEval 2.0, and ANEMLL offers a solution to run LLMs on Apple's Neural Engine.

News

Leaking the email of any YouTube user for $10k

A security researcher discovered a vulnerability in YouTube that allows an attacker to obtain a user's Google account identifier (Gaia ID) by blocking them on the platform. The Gaia ID can then be used with a separate vulnerability in the Pixel Recorder app to obtain the user's email address, potentially allowing for targeted phishing or other attacks.

US and UK refuse to sign AI safety declaration at summit

The US and UK have refused to sign a declaration on AI safety at a summit in Paris, with US Vice President JD Vance warning against "overly precautionary" regulations and vowing to maintain US dominance in the technology. The declaration, signed by around 60 other countries, aims to ensure AI is "safe, secure and trustworthy",嫡ut the US and UK cited concerns over multilateralism and international collaboration, marking a significant shift in the US stance on AI regulation.

What enabled us to create AI is the thing it has the power to erase

The rise of AI-powered design tools has enabled rapid generation of ideas, but this immediacy can come at the cost of creative process and human intuition, as it eliminates the "productive friction" that occurs when working with physical materials or slower digital tools. The author argues that this friction is essential for fostering creativity, discovery, and critical thinking, and that its loss could have unintended consequences for individuals and society as a whole.

EU to mobilize 200B Euros to invest in AI

The European Union plans to mobilize €200 billion for artificial intelligence (AI) investment. This significant investment aims to drive innovation and development in the AI sector within the EU.

Evaluating LLM Reasoning Through Live Computer Games

The Roblox game arena offers several escape games where players can compete against AI, including Akinator, Taboo, and Bluffing. The games feature leaderboards that track player performance, with an overall ranking calculated based on average scores across all games.

Research

We Can't Understand AI Using Our Existing Vocabulary [pdf]

To truly understand AI, new words, or neologisms, are needed to represent precise human and machine concepts, as existing vocabulary is insufficient. Developing a shared human-machine language through neologisms can solve the communication problem between humans and machines, enabling better control and understanding of AI systems.

Mixture-of-Agents Enhances Large Language Model Capabilities

Researchers have proposed a Mixture-of-Agents (MoA) approach that combines the strengths of multiple large language models (LLMs) to achieve state-of-the-art performance in natural language tasks. The MoA methodology, which layers multiple LLM agents to generate responses, has surpassed GPT-4 Omni in several benchmarks, including AlpacaEval 2.0, where it achieved a score of 65.1% compared to GPT-4 Omni's 57.5%.

RelBench: A Benchmark for Deep Learning on Relational Databases (PDF)

RelBench is a public benchmark for solving predictive tasks over relational databases using graph neural networks, providing a foundational infrastructure for future research across diverse domains and scales. The benchmark is used to study Relational Deep Learning, which combines graph neural networks with tabular models, and is shown to outperform traditional manual feature engineering methods while significantly reducing human workload.

Reducing the Transformer Architecture to a Minimum [pdf]

Transformers, a successful model architecture in NLP and CV, rely on the Attention Mechanism to extract relevant context information, which is complemented by a Multi-Layer Perceptron (MLP) to model nonlinear relationships. However, experiments have shown that simplifying the transformer architecture by omitting the MLP, collapsing matrices, and using symmetric similarity measures can reduce the parameter set size by up to 90% without compromising performance on benchmarks like MNIST and CIFAR-10.

Emergent Response Planning in LLM

Large language models (LLMs) exhibit emergent planning behaviors, with their hidden representations encoding future outputs beyond the next token, including structural, content, and behavioral attributes of their entire responses. This ability to plan ahead, which scales with model size and evolves during generation, has potential applications for improving transparency and control over the generation process.

Code

A new approach to data handling between systems/for AI

Stof is an efficient, governable, and accessible data solution that simplifies data interaction between computer systems, offering fine-grained control and sandboxed manipulation without the need for additional application code. It provides a standard interface for data access, validation, transformation, and orchestration, aiming to replace fragile application code and improve data governance, security, and developer experience.

Show HN: Steganographically encode messages with LLMs and Arithmetic Coding

Textcoder is a tool that uses steganography to encode secret messages into ordinary-looking text, utilizing a large language model and arithmetic coding to produce text that appears random but actually contains the secret message. The encoded text can be decoded by a user with knowledge of the password, but the process is not always deterministic and may be affected by hardware and software differences, requiring careful consideration of platform and configuration to ensure successful decoding.

Show HN: Letting LLMs Run a Debugger

LLM Debugger is a VSCode extension that uses large language models to actively debug programs by analyzing both static source code and real-time runtime context, including variable values, function behavior, and branch decisions. The extension provides features such as active debugging, automated breakpoint management, and synthetic data generation to help developers diagnose bugs faster and more accurately.

Show HN: Cognee – Turn RAG and GraphRAG into custom dynamic semantic memory

Cognee is a data layer for AI applications that implements scalable, modular ECL (Extract, Cognify, Load) pipelines, allowing developers to interconnect and retrieve past conversations, documents, and audio transcriptions while reducing hallucinations, effort, and cost. It merges graph and vector databases to uncover hidden relationships and patterns in data, and can be installed using pip or poetry, with support for various databases and vector stores.

Anemll – Open-Source Project to Convert LLM to Apple Neural Engine

ANEMLL is an open-source project that aims to accelerate the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE), to enable seamless integration and on-device inference for low-power applications. The project provides a pipeline for converting LLMs to the CoreML format and includes tools for model conversion, compilation, and inference, with current support for Meta's LLaMA 3.2 models and plans to add more models in the future.