Thursday — October 24, 2024

Anthropic's Claude 3.5 models unveil computer use capabilities, the USGS leverages machine learning to uncover vast lithium reserves in Arkansas, and Microsoft's AI-powered Data Formulator revolutionizes data visualization.

News

Computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic has announced the upgraded Claude 3.5 Sonnet and a new model, Claude 3.5 Haiku, with significant improvements in coding and tool use tasks. The Claude 3.5 Sonnet also introduces a groundbreaking new capability in public beta: computer use, allowing developers to direct Claude to use computers like people do, but it is still experimental and error-prone.

First images from Euclid are in

There is no text provided. Please provide the text you would like me to summarize.

USGS uses machine learning to show large lithium potential in Arkansas

The USGS has used machine learning to estimate that between 5 and 19 million tons of lithium reserves are located beneath southwestern Arkansas in the Smackover Formation. This amount of lithium could meet the projected 2030 world demand for lithium in car batteries nine times over.

ByteDance sacks intern for sabotaging AI project

ByteDance, the owner of TikTok, has sacked an intern for "maliciously interfering" with the training of one of its artificial intelligence (AI) models. The company claims the intern's actions did not cause significant damage to its commercial online operations, including its large language AI models.

Intuit asked us to delete part of this Decoder episode

The CEO of Intuit, Sasan Goodarzi, was interviewed on the podcast "Decoder" about the company's business practices, particularly its lobbying efforts against free direct federal e-filing. The interview turned contentious when the host asked Goodarzi if he would support the government doing taxes for people and sending refunds, which Goodarzi refused to answer directly, instead shifting the focus to changing the tax system.

Research

Machine Learning to Computational Plasma Physics Reduced-Order Plasma Modeling

Machine learning (ML) has shown great promise in enhancing computational modeling of fluid flows, but its applications in numerical plasma physics research remain limited. A roadmap is proposed to transfer ML advances in fluid flow modeling to computational plasma physics, outlining future directions and development pathways for ML in plasma modeling.

The Fair Language Model Paradox

Large Language Models' training dynamics at the token level are not well understood, with evaluation typically relying on aggregated batch-level metrics that overlook subtle per-token biases. Weight decay, a common training technique, introduces performance biases that disproportionately affect low-frequency tokens, which represent the majority of the token distribution in most languages.

We discovered a way to measure LLM bias while building a recruitment tool

A study on large language models (LLMs) found that while anonymization can reduce biases in candidate interview reports, its effectiveness varies across models and bias types. The study suggests that careful LLM selection and best practices are necessary to minimize bias in AI applications and promote fairness and inclusivity.

Remote Timing Attacks on Efficient Language Model Inference

Researchers have discovered a vulnerability in language models that allows them to infer sensitive information about a user's conversation by monitoring the timing of encrypted network traffic. This vulnerability can be exploited to learn the topic of a conversation, distinguish between specific messages, or even recover personally identifiable information (PII) such as phone numbers or credit card numbers.

RepoGraph: Enhancing AI Software Engineering with Repository-Level Code Graph

Researchers developed RepoGraph, a plug-in module that helps AI software engineers navigate and understand the structure of code repositories, a crucial aspect of modern AI software engineering. RepoGraph significantly boosts the performance of existing methods and achieves a new state-of-the-art in open-source frameworks, demonstrating its extensibility and flexibility.

Code

Show HN: Data Formulator – AI-powered data visualization from Microsoft Research

Data Formulator is an AI-powered tool from Microsoft Research that enables analysts to create rich visualizations by transforming data and combining user interface interactions with natural language inputs. It allows users to describe their chart designs while delegating data transformation to AI, making it easier to explore and understand data.

Show HN: Srcbook – Self-hosted alternative to AI app builders

Srcbook is a TypeScript-centric app development platform that uses AI as a pair-programmer to create and iterate on web apps quickly. It offers features such as an AI app builder, interactive notebooks, and local execution with a web interface.

Show HN: Phidata – Build AI Agents with memory, knowledge, tools and reasoning

Phidata is a framework for building agentic systems that can perform various tasks such as web search, financial data retrieval, and reasoning. It allows users to create agents with memory, knowledge, tools, and reasoning capabilities, and provides a beautiful UI for interacting with these agents.

Show HN: Amphi, visual data transformation based on Python

Amphi is a visual data transformation tool based on Python for data preparation, reporting, and ETL (Extract, Transform, Load). It offers a low-code interface for developing pipelines and generates native Python code that can be deployed anywhere.

Show HN: LLM Deceptiveness and Gullibility Benchmark

The LLM Deceptiveness and Gullibility Benchmark assesses large language models' ability to generate convincing disinformation and resist misleading information. The benchmark evaluates models on their deceptive capabilities and resistance to manipulation, with Claude 3 Opus and Claude 3.5 Sonnet achieving exceptional resistance scores and Claude 3.5 Sonnet topping the deception effectiveness scale.