Monday — December 2, 2024

Explore 3D worlds created from a single image with World Labs' new AI system, train language models faster with Nous Research's DisTrO optimizer, and build powerful AI agents using Laminar Flow's dynamic task engine.

News

Advent of Code 2024

Advent of Code is an online Advent calendar of small programming puzzles for various skill levels that can be solved in any programming language. The puzzles are designed to be solvable with basic programming knowledge and problem-solving skills, and can be used for interview prep, company training, university coursework, or personal practice.

World Labs: Generate 3D worlds from a single image

A new AI system generates 3D worlds from a single image, allowing users to explore and interact with the scene in real-time. This technology has the potential to revolutionize the creation of movies, games, and other digital content, and is already being used by creators to build new experiences.

Controlling AI's Growing Energy Needs

The energy required to train artificial intelligence models is becoming a significant concern, with the power needed to train AI models growing exponentially since 2012, and the energy production capacity not increasing at the same rate. Researchers are now exploring lower-energy alternatives, such as neuromorphic computers that mimic the human brain's energy-efficient processing, to reduce the energy footprint of AI training.

Why 'open' AI systems are closed, and why this matters

Claims of 'open' artificial intelligence (AI) often lack precision and incorrectly apply understandings of 'open' from free and open-source software, as powerful actors use this rhetoric to shape policy and concentrate power in the AI sector. While 'open' AI can offer transparency, reusability, and extensibility, it does not necessarily reduce the concentration of power in AI, and its benefits are often warped by economic incentives and corporate interests.

Pre-training a 15B parameter language model over the internet

Nous Research is training a 15 billion parameter large language model (LLM) using their DisTrO distributed optimizer, which reduces inter-GPU communication requirements by four to five orders of magnitude. The model has reached 75.52% completion and is expected to finish training in approximately 58.66 hours, with a training rate of 114k tokens per second and a bandwidth of 11.98 MB/s.

Research

Large Language Model-Brained GUI Agents: A Survey

Large language models (LLMs) have enabled the development of GUI agents that can interpret and interact with graphical user interfaces using natural language instructions, revolutionizing human-computer interaction. This emerging field has significant applications across web navigation, mobile apps, and desktop automation, and this paper provides a comprehensive survey of LLM-brained GUI agents, exploring their evolution, core components, and future research directions.

Demo: Decoupled Momentum Optimization – training neural networks in parallel

Researchers have developed DeMo, a new optimizer and data parallel algorithm that reduces inter-accelerator communication requirements, allowing for the training of large neural networks even with limited network bandwidth and heterogeneous hardware. DeMo achieves improved convergence and matches or exceeds the performance of state-of-the-art optimizers like AdamW, without requiring high-speed interconnects.

DrugAgent: AI-Aided Drug Discovery Programming Through LLM Multi-Agent Collab

Large Language Models (LLMs) have the potential to accelerate drug discovery, but their application is limited by the need for specialized knowledge and practical implementation. The DrugAgent framework addresses this challenge by automating machine learning programming in drug discovery, incorporating domain expertise and demonstrating promising results in a preliminary case study.

Procedural knowledge in pretraining drives reasoning in large language models

Large Language Models (LLMs) demonstrate both problem-solving abilities and reasoning gaps, casting doubt on their generalisation strategies. Researchers found that LLMs employ a generalisable strategy that synthesises procedural knowledge from documents, rather than simply retrieving answers, when performing reasoning tasks, particularly in mathematical reasoning.

DynaSaur: Large Language Agents Beyond Predefined Actions

Existing LLM agent systems are limited by their reliance on a fixed set of predefined actions, restricting their capabilities and requiring substantial human effort to implement. This work proposes a new LLM agent framework that dynamically generates and composes actions in real-time, allowing for greater flexibility and outperforming previous methods in experiments on the GAIA benchmark.

Code

Show HN: Flow – A Dynamic Task Engine for building AI Agents

Laminar Flow is a lightweight task engine for building AI agents that prioritizes simplicity and flexibility, using a dynamic task queue system with concurrent execution, dynamic scheduling, and smart dependencies. It allows for complex workflows, parallel task execution, and state management, making it easier to reason about control flow and dependencies.

NaNoGenMo 2024 novel from AI captioned stills from the movie A.I

The author used AI tools to generate a novelization of the film A.I. Artificial Intelligence, resulting in a more coherent text compared to their 2016 attempt, but still with some issues such as reverting to image descriptions and repetitive content. The author concludes that large language models are in the diminishing returns stage, and throwing more training data and power at them will not lead to significant improvements or sentience.

Show HN: Superlinked – Vector Embeddings for Structured and Unstructured Data

Superlinked is a Python framework for AI engineers to build high-performance search and recommendation applications that combine structured and unstructured data. It provides a self-hostable REST API server that connects data, vector databases, and backend services, allowing users to construct custom data and query embedding models from pre-trained encoders.

Show HN: Connect any Open Data to Claude (and soon, any other LLMs)

Open Data MCP is a project that enables access to public datasets from within Large Language Models (LLMs) and provides a platform for publishing and distributing open data. The project allows users to access open data using a CLI tool and provides templates and guidelines for contributing and publishing new datasets.

Agent Framework / shim to use Pydantic with LLMs

PydanticAI is a Python Agent Framework designed to make it easier to build production-grade applications with Generative AI, built by the team behind Pydantic. It provides a model-agnostic, type-safe, and structured way to interact with Large Language Models (LLMs), with features such as dependency injection, streamed responses, and integration with Pydantic's validation system.