Thursday — October 31, 2024

Google uses AI to generate over a quarter of its new code, with Intel's AI Flame Graphs focusing on cost reduction, while the AI Scientist framework independently produces publishable work and LlamaPReview enhances codebase learning for GitHub PR review.

News

Google CEO says more than a quarter of the company's new code is created by AI

Google CEO Sundar Pichai said that more than a quarter of the company's new code is created by AI, which is then checked and reviewed by employees. This use of AI is aimed at boosting productivity and efficiency within the company.

AI Flame Graphs

Intel has developed a new tool called AI Flame Graphs, a visualization that shows an AI accelerator or GPU hardware profile along with the full software stack, to help reduce AI costs. The tool is designed to be easy to use and low-overhead, allowing developers to generate a flame graph of an existing AI workload without restarting anything or launching additional code.

Generative AI Scripting

GenAIScript is a scripting language developed by Microsoft, designed to be used with artificial intelligence (AI) and machine learning (ML) models. It allows users to write scripts that can interact with and control AI and ML systems.

DeepSeek v2.5 – open-source LLM comparable to GPT-4, but 95% less expensive

DeepSeek-V2.5 is a large language model that delivers top results on major leaderboards, including AlignBench and MT-Bench, and specializes in math, code, and reasoning tasks. It is an open-source model with an API that supports 128K context length and offers competitive pricing at $0.14 per million input tokens and $0.28 per million output tokens.

Show HN: AI OmniGen – AI Image Generator with Consistent Visuals

OmniGen is a tool that enables the creation of diverse and contextually rich visuals using text prompts or multi-modal inputs, with features such as identity-preserving generation and seamless image editing. It also allows for personalized imagery creation, style customization, and high-quality results through detailed prompts.

Research

The AI Scientist: Towards Automated Open-Ended Scientific Discovery

Researchers have developed "The AI Scientist," a framework that enables large language models to conduct scientific research independently, from generating ideas to writing and reviewing papers. This system can produce high-quality papers that meet the acceptance threshold at a top machine learning conference, marking a significant step towards AI agents contributing to scientific discovery.

Hacking Back the AI-Hacker: Prompt Injection as a Defense for LLM-Attackers

Researchers propose a new defense strategy called Mantis to counter cyberattacks driven by large language models (LLMs). Mantis exploits LLMs' vulnerability to adversarial inputs to disrupt or compromise the attacker's operations, achieving over 95% effectiveness in experiments.

Revisiting Reliability in Large-Scale Machine Learning Research Clusters

The paper presents a study on managing large-scale machine learning (ML) clusters, analyzing 11 months of data from two environments with over 150 million GPU hours and 4 million jobs. The study reveals that smaller jobs are more numerous but larger jobs are more vulnerable to failures, and proposes methods to estimate reliability metrics and gauge the efficacy of potential software mitigations.

Deep Optimizer States: Scalable Training of Transformer Interleaved Offloading

Here's a summary of the text in a couple of sentences:

The training of large transformers and language models is becoming increasingly expensive and memory-intensive, often requiring hybrid CPU-GPU computations to manage memory. A new technique, called \proj, is proposed to dynamically move the optimizer state between host and GPU memory, allowing for faster iterations and improving performance by up to 2.5 times compared to state-of-the-art approaches.

GPT-4o System Card [pdf]

GPT-4o is an advanced AI model that can process and generate text, audio, and images, trained end-to-end across multiple modalities, and can respond to audio inputs in under 320 milliseconds. It outperforms existing models in vision and audio understanding, and is also faster and cheaper than its predecessor GPT-4 Turbo, with significant improvements in non-English languages.

Code

Show HN: LlamaPReview – AI GitHub PR reviewer that learns your codebase

There is no text provided to summarize. Please provide the text you would like me to summarize.

Show HN: Modus, serverless framework for intelligent APIs powered by WebAssembly

Here is a summary of the text in a couple of sentences:

Modus is an open-source, serverless framework for building APIs powered by WebAssembly, designed to simplify integrating AI models, data, and business logic with sandboxed execution. It allows developers to write functions in various programming languages, such as AssemblyScript and Go, and deploy them as scalable endpoints with fast response times, making it suitable for applications that require sub-second response times.

Show HN: Lightweight browser automation powered by Claude 3.5 Sonnet

Cerebellum is a lightweight browser automation tool that uses a large language model (LLM) to accomplish user-defined goals on webpages by navigating a directed graph of webpages and executing keyboard and mouse actions. It simplifies web browsing to navigating a graph of nodes, where each node represents a webpage, and edges represent user actions.

Show HN: AgentServe – A open-source framework for hosting scalable AI agents

AgentServe is a lightweight framework for hosting and scaling AI agents, designed to be easy to use and integrate with existing projects and agent/Learnable Language Model (LLM) frameworks. It provides a standardized way to communicate with AI agents via a REST API, supports multiple agent frameworks, and offers optional task queuing for scalability.

Show HN: LLGTRT: TensorRT-LLM+Rust server w/ OpenAI-compat and Structured Output

The llgtrt project is a REST HTTP server that implements an OpenAI-compatible API using NVIDIA TensorRT-LLM and the llguidance library for constrained output. It supports regular completions and chat endpoints with JSON schema enforcement and full context-free grammars.