Saturday — October 26, 2024

Meta's quantized Llama models boost speed and reduce size for mobile, Leopard tackles text-rich multi-image tasks using adaptive encoding, and Skyvern automates browser workflows with AI agents.

News

Quantized Llama models with increased speed and a reduced memory footprint

Meta has open-sourced quantized versions of its Llama 3.2 1B and 3B models, which offer a reduced memory footprint, faster on-device inference, and improved accuracy. These models have a 2-4x speedup and an average reduction of 56% in model size compared to the original format, making them suitable for deployment on resource-constrained devices such as mobile devices.

Smartphone buyers meh on AI, care more about battery life

A quarter of smartphone owners don't find AI features helpful, and nearly half are reluctant to pay a monthly subscription fee for AI capabilities. The biggest motivators for upgrading a phone are longer battery life, more storage, and better camera features, with AI integrations being the least popular reason.

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s

Cerebras is a company that specializes in AI supercomputers and offers various products and services, including AI model services, cloud solutions, and software. Their products are used in various industries such as health, energy, government, and financial services, and have been adopted by companies like the Mayo Clinic and GlaxoSmithKline.

OSI readies controversial open-source AI definition

The Open Source Initiative (OSI) is working on defining Open Source AI, with its board set to vote on the Open Source AI Definition (OSAID) on October 27. The proposed definition has been criticized by some in the open-source community for setting the bar too low, potentially undoing decades of work to cajole vendors into adhering to the original Open Source Definition (OSD).

Throw more AI at your problems

The authors of RunLLM propose a "fundamental theorem of AI applications" that any problem can be solved by an extra LLM call, suggesting that throwing more AI at problems is a good heuristic. They advocate for building "compound AI systems" by breaking down problems into manageable components and using a combination of techniques, including RAG and fine-tuning, to create more reliable, cost-effective, and high-quality AI applications.

Research

Leopard: A Vision Language Model for Text-Rich Multi-Image Tasks

Current multimodal large language models struggle with tasks involving multiple text-rich images due to a lack of high-quality training data and difficulty balancing image resolution with visual feature sequence length. To address this, the proposed model, Leopard, uses a curated dataset and an adaptive encoding module to optimize image processing and improve performance in text-rich, multi-image evaluations.

The Fair Language Model Paradox

Large Language Models' training dynamics at the token level are not well understood, with evaluation typically relying on aggregated batch-level metrics that overlook token-level biases. Weight decay, a common training technique, introduces performance biases that disproportionately depreciate low-frequency tokens, which make up the majority of the token distribution in most languages.

We discovered a way to measure LLM bias while building a recruitment tool

A study examined biases in candidate interview reports generated by large language models, finding that while anonymization can reduce biases, its effectiveness varies across models and bias types. The study suggests careful LLM selection and best practices to minimize bias in AI applications, promoting fairness and inclusivity.

Remote Timing Attacks on Efficient Language Model Inference

Researchers have discovered a vulnerability in language models that allows them to infer sensitive information about a user's conversation by monitoring the timing of encrypted network traffic. This vulnerability can be exploited to learn the topic of a conversation, distinguish between specific messages, or even recover personally identifiable information (PII) such as phone numbers or credit card numbers.

Universal optimality of Dijkstra via beyond-worst-case heaps

This paper proves that Dijkstra's shortest-path algorithm is universally optimal when combined with a new, efficient heap data structure that takes advantage of locality in heap operations. The heap's working-set property ensures that extracting recently added elements is cheaper, and this property is sufficient to guarantee universal optimality for the problem of ordering vertices by their distance from the source vertex.

Code

Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations

Here is a summary of the text in a couple of sentences:

Skyvern is a tool that automates browser-based workflows using Large Language Models (LLMs) and computer vision, allowing users to automate manual tasks on a large number of websites without writing custom code. It uses a swarm of agents to comprehend a website, plan and execute actions, and can operate on websites it's never seen before, making it resistant to website layout changes and able to reason through complex interactions.

Show HN: Srcbook – Self-hosted alternative to AI app builders

Srcbook is a TypeScript-centric app development platform that uses AI as a pair-programmer to create and iterate on web apps quickly. It offers features such as an AI app builder, interactive notebooks, and local execution with a web interface.

Show HN: Rust based AWS Lambda Logs Viewer (TUI)

AWS Lambda Logs Viewer is a terminal-based application for viewing AWS Lambda function logs across multiple profiles and regions. It features real-time log filtering and search, fast navigation with keyboard shortcuts, and caching for improved performance.

Show HN: Phidata – Build AI Agents with memory, knowledge, tools and reasoning

Phidata is a framework for building agentic systems that can perform various tasks, such as searching the web, querying financial data, and working together as a team. It provides a range of features, including memory, knowledge, tools, and reasoning, and can be used to build intelligent agents that can interact with users through a beautiful UI.

Vulnhuntr: Autonomous AI finds first 0-day vulnerabilities

Vulnhuntr is a tool that uses Large Language Models (LLMs) and static code analysis to identify remotely exploitable vulnerabilities in codebases. It can automatically create and analyze entire code call chains to detect complex vulnerabilities that traditional static code analysis tools may miss.