Friday — October 25, 2024

Anthropic introduces public beta for Claude's groundbreaking computer use feature, Meta's quantized Llama models achieve a 56% reduction in size, and Skyvern enables browser automation through LLMs and computer vision without relying on brittle solutions.

News

Computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic is releasing an upgraded version of its AI model, Claude 3.5 Sonnet, which offers significant improvements in coding and tool use tasks, and a new model, Claude 3.5 Haiku, which matches the performance of its predecessor at a lower cost and similar speed. Additionally, the company is introducing a groundbreaking new capability, computer use, which allows Claude to interact with computers like humans do, and is available in public beta for developers to explore and provide feedback.

Quantized Llama models with increased speed and a reduced memory footprint

Meta has released quantized versions of their Llama 3.2 1B and 3B models, which offer a reduced memory footprint, faster on-device inference, and improved accuracy. The models have a 2-4x speedup and an average reduction of 56% in model size compared to the original format, and can be deployed on more mobile CPUs, giving developers the opportunity to build unique experiences that are fast and provide more privacy.

USGS uses machine learning to show large lithium potential in Arkansas

The US Geological Survey (USGS) has used machine learning to estimate that there are between 5 and 19 million tons of lithium reserves beneath southwestern Arkansas, which could meet the projected 2030 world demand for lithium in car batteries nine times over. The study, which was conducted in collaboration with the Arkansas Department of Energy and Environment, used a novel methodology to quantify the amount of lithium present in brines located in the Smackover Formation, a geological unit that is known for its rich deposits of oil and bromine.

StabilityAI releases Stable Diffusion 3.5

Here is a summary of the text in a couple of sentences:

StabilityAI has released Stable Diffusion 3.5, a new family of AI image models that offer improved realism, prompt adherence, and text rendering compared to its predecessor, SD3. The models come in three sizes - Large (8B), Large Turbo (8B), and Medium (2.6B) - and are customizable, efficient, and diverse, making them accessible to a wide range of users.

Throw more AI at your problems

The authors of RunLLM propose a "fundamental theorem of AI applications" that any problem can be solved by an extra LLM call, suggesting that throwing more AI at problems can be a good heuristic. They advocate for building "compound AI systems" by breaking down problems into manageable components and using a combination of techniques, such as RAG and fine-tuning, to create more reliable, cost-effective, and high-quality AI applications.

Research

The Fair Language Model Paradox

Researchers have found that widely used weight decay in training Large Language Models (LLMs) introduces performance biases that are only detectable at the token level, particularly affecting low-frequency tokens. These biases can be problematic as low-frequency tokens make up the majority of the token distribution in most languages.

We discovered a way to measure LLM bias while building a recruitment tool

A study on large language models (LLMs) found that while anonymization can reduce biases in candidate interview reports, the effectiveness varies across models and bias types. The study suggests that careful LLM selection and best practices are necessary to minimize bias in AI applications and promote fairness and inclusivity.

Remote Timing Attacks on Efficient Language Model Inference

Researchers have discovered a timing attack vulnerability in language models, where monitoring encrypted network traffic can reveal information about the content of messages based on response times. This attack can be used to infer topics of conversations, distinguish between specific messages, or even recover personal identifiable information (PII) from messages.

RepoGraph: Enhancing AI Software Engineering with Repository-Level Code Graph

Researchers developed RepoGraph, a plug-in module that helps manage and navigate code repositories in modern AI software engineering. RepoGraph significantly boosts the performance of existing methods and achieves a new state-of-the-art in open-source frameworks, demonstrating its extensibility and flexibility.

Computational Copyright: Towards a Royalty Model for Music Generative AI

The advancement of generative AI in the music industry raises significant copyright challenges, necessitating economic and algorithmic solutions to address the complexities of black-box technologies. A proposed royalty model for AI music generation platforms aims to adapt existing models, such as those used by Spotify and YouTube, to incorporate data attribution techniques and address the challenges of attributing AI-generated music to training data.

Code

Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations

Here is a summary of the text in a couple of sentences:

Skyvern is a platform that automates browser-based workflows using Large Language Models (LLMs) and computer vision, allowing users to automate manual workflows on a large number of websites without relying on brittle or unreliable automation solutions. It uses a swarm of agents to comprehend a website, plan, and execute actions, and can operate on websites it's never seen before, making it resistant to website layout changes and able to take a single workflow and apply it to a large number of websites.

Show HN: Srcbook – Self-hosted alternative to AI app builders

Srcbook is a TypeScript-centric app development platform that uses AI as a pair-programmer to create and iterate on web apps quickly. It offers features such as an AI app builder, notebooks for exploring and iterating on ideas, and a local execution with a web interface.

Show HN: Rust based AWS Lambda Logs Viewer (TUI)

AWS Lambda Logs Viewer is a terminal-based application for viewing AWS Lambda function logs across multiple profiles and regions. It features real-time log filtering and search, custom date range selection, and keyboard shortcuts for fast navigation.

Show HN: Phidata – Build AI Agents with memory, knowledge, tools and reasoning

Phidata is a framework for building agentic systems that can be used to build intelligent agents with memory, knowledge, tools, and reasoning. These agents can be used to perform various tasks such as searching the web, querying financial data, and even working together as a team.

Show HN: LLM Deceptiveness and Gullibility Benchmark

The LLM Deceptiveness and Gullibility Benchmark assesses large language models' ability to generate convincing disinformation and their resilience against misleading information. The benchmark evaluates models' performance on a 5-point scale, with lower scores indicating better resistance to disinformation and higher scores indicating more effective disinformation creation.