Friday — November 22, 2024

Google faces a DOJ push to unwind its Anthropic AI deal, the Wave network introduces an ultra-small language model with high accuracy, and Llama 3.2's interpretability is explored using Sparse Autoencoders to extract features.

News

OK, I can partly explain the LLM chess weirdness now

Recent experiments have shown that large language models (LLMs) can play chess well, but only if prompted in a specific way, contradicting previous theories that they are inherently bad at chess. The author suggests that the key to an LLM's chess-playing ability lies in how it is prompted, rather than any inherent limitations or cheating.

DOJ proposal would require Google to divest from AI partnerships with Anthropic

The US Justice Department is seeking to unwind Google's $2 billion investment in artificial intelligence startup Anthropic as part of a broader effort to curb Google's alleged monopoly over online search. The proposal, which would bar Google from acquiring or collaborating with companies that control consumer search information, is part of a landmark antitrust case against the tech giant.

Show HN: An AI that reliably builds full-stack apps by preventing LLM mistakes

Lovable is a platform that allows users to build high-quality software without writing code, by simply describing their idea in their own words. The platform offers features such as instant rendering, beautiful design, and support for backend functionality, and allows users to deploy and share their projects with one click.

AI eats the world

Benedict Evans produces an annual presentation on macro and strategic trends in the tech industry, with recent topics including 'AI and Everything Else' in 2024 and 'The New Gatekeepers' in 2023. He also sends a weekly newsletter to over 150,000 subscribers, highlighting key developments and providing analysis on the tech industry.

FLUX.1 Tools

Black Forest Labs has released FLUX.1 Tools, a suite of models that add control and steerability to their base text-to-image model FLUX.1, enabling the modification and re-creation of real and generated images. The suite consists of four distinct features: FLUX.1 Fill for inpainting and outpainting, FLUX.1 Depth and FLUX.1 Canny for structural conditioning, and FLUX.1 Redux for image variation and restyling.

Research

Wave Network: An Ultra-Small Language Model

The Wave network, a new ultra-small language model, uses complex vectors to represent tokens, encoding global and local semantics, and achieves high accuracy in text classification tasks. It outperforms a single Transformer layer using BERT pre-trained embeddings and approaches the accuracy of the pre-trained and fine-tuned BERT base model, while significantly reducing video memory usage and training time.

Cramming: Training a Language Model on a Single GPU in One Day

Researchers investigated the performance of a transformer-based language model trained on a single consumer GPU in just one day, achieving results close to BERT. They found that despite the limited computing power, the model's performance still follows the scaling laws observed in large-compute settings, and identified modifications that can improve performance in this constrained scenario.

Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

Evaluations of large language models can be improved by applying statistical analysis and planning techniques from other sciences. This article provides formulas and recommendations for analyzing evaluation data, comparing models, and reporting results to minimize statistical noise and maximize informativeness.

Automating LLM Development with LLMs

Large Language Models (LLMs) are rapidly improving, but their advancements are limited by human-designed methods. The proposed Self-Developing framework allows LLMs to autonomously generate and learn model-improvement algorithms, resulting in models that surpass human-designed ones and demonstrate strong transferability.

Conversational Medical AI: Ready for Practice

A large-scale study evaluated a conversational AI agent, Mo, in a real-world medical setting, finding that patients reported higher clarity and satisfaction with AI-assisted conversations compared to standard care. The study demonstrated that AI medical assistants can enhance patient experience while maintaining safety standards through physician supervision, with 95% of conversations rated as "good" or "excellent" by physicians.

Code

Show HN: Llama 3.2 Interpretability with Sparse Autoencoders

This project aims to recreate research on mechanistic LLM interpretability with Sparse Autoencoders (SAE) to extract interpretable features from the Llama 3.2 model. The project provides a full pipeline for capturing training data, training the SAEs, analyzing the learned features, and verifying the results experimentally.

Autoflow, a Graph RAG based and conversational knowledge base tool

Autoflow is an open-source GraphRAG (Knowledge Graph) built on top of TiDB Vector, LlamaIndex, and DSPy, providing a conversational search platform with features such as a perplexity-style conversational search page and an embeddable JavaScript snippet. The project is open-source under the Apache License, Version 2.0, and welcomes contributions from the community.

Show HN: PDF2MD – Rust+Redis+ClickHouse+VLLM conversion pipeline for PDFs

Trieve is an all-in-one solution for search, recommendations, and Retrieval-Augmented Generation (RAG) that offers features such as semantic dense vector search, typo-tolerant full-text/neural search, and sub-sentence highlighting. It also supports self-hosting, recommendations, and integration with various models and APIs, including OpenAI and Qdrant.

Show HN: Superflex – Turn designs to code that matches your project (v0+Cursor)

Superflex is an AI-powered frontend development assistant that converts Figma designs, images, and text prompts into production-ready code, adhering to users' code style and design standards. It integrates with Visual Studio Code and offers features such as Figma-to-code integration, image-to-code conversion, and a chat interface for streamlined development.

Multi-agent-orchestrator: Flexible and powerful framework for managing AI agents

The Multi-Agent Orchestrator is a flexible framework for managing multiple AI agents and handling complex conversations, intelligently routing queries and maintaining context across interactions. It offers pre-built components for quick deployment and allows easy integration of custom agents, making it suitable for a wide range of applications, from simple chatbots to sophisticated AI systems.