Saturday — October 19, 2024

Adobe's groundbreaking AI image rotation tool "Project Turntable" steals the show, Nvidia's open-source Arena-Hard-Auto model outperforms GPT-4o in evaluations, and LLMD sets a new standard by excelling in analyzing longitudinal medical records.

News

Adobe's new image rotation tool is one of the most impressive AI tools seen

Here is a summary of the text in a couple of sentences:

Adobe has unveiled a new AI-powered image rotation tool called "Project Turntable" that allows users to easily rotate 2D vector art in 3D while maintaining its original shape and design. The tool, created by Adobe research scientist Zhiqin Chen, uses AI to fill in gaps in the image and is one of the most impressive AI concepts showcased at Adobe's MAX conference.

FTC announces "click-to-cancel" rule making it easier to cancel subscriptions

The Federal Trade Commission (FTC) has announced a final "click-to-cancel" rule that makes it easier for consumers to end recurring subscriptions and memberships.

NotebookLM launches feature to customize and guide audio overviews

Here is a summary of the text in a couple of sentences:

The latest update to NotebookLM, a tool powered by Gemini 1.5, introduces customizable Audio Overviews, allowing users to provide instructions for AI hosts and listen to audio while working within NotebookLM. Additionally, the NotebookLM Business pilot program is launched, offering enhanced features for businesses, universities, and organizations, prioritizing data privacy and security.

Note: The text is a bit lengthy and contains multiple topics, but the above summary captures the main updates to NotebookLM.

Kagi Update: AI Image Filter for Search Results

The Kagi image search feature includes an AI Image Filter to deliver high-quality, relevant search results by downranking AI-generated images and labeling them with a small badge or icon. The feature also allows users to filter AI-generated images, block websites with AI-generated content, and use search personalization to exclude unwanted content.

Efficient high-resolution image synthesis with linear diffusion transformer

Here is a summary of the text in a couple of sentences:

Researchers have introduced Sana, a text-to-image framework that can efficiently generate high-resolution images up to 4096 × 4096 resolution at a remarkably fast speed, deployable on laptop GPUs. Sana achieves this through several core designs, including a deep compression autoencoder, linear diffusion transformer, decoder-only text encoder, and efficient training and sampling strategies, making it competitive with modern giant diffusion models while being 20 times smaller and 100+ times faster.

Research

LLMD: A Large Language Model for Interpreting Longitudinal Medical Records

LLMD is a large language model designed to analyze patient medical history based on their records, trained on a large corpus of records and tasks to make nuanced connections among them. It exhibits significant gains over other models, achieving state-of-the-art accuracy on medical knowledge benchmarks and outperforming alternatives on production tasks.

NGPT: Normalized Transformer with Representation Learning on the Hypersphere

The normalized Transformer (nGPT) neural network architecture uses unit norm normalization on all vectors, allowing the input stream to travel on the surface of a hypersphere, reducing the number of training steps required to achieve the same accuracy. The nGPT learns much faster than other models, reducing the number of training steps by a factor of 4 to 20, depending on the sequence length.

Reducing the Transformer Architecture to a Minimum

The text discusses the potential simplification of the Transformer architecture by removing unnecessary components, such as Multi-Layer Perceptrors (MLPs) and collapsing matrices, without significantly affecting performance. The simplified architecture shows similar results on popular Computer Vision benchmarks (MNIST and CIFAR-10) as the original, while reducing the number of parameters by up to 90%.

(Also, the text mentions that the attention mechanism is nonlinear, which is a possible alternative to the MLP for modeling relationships, but the exact details of this are not provided.)

Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

Model Swarms is a collaborative search algorithm that uses swarm intelligence to adapt large language models (LLMs) by combining the weight space of multiple LLM experts. This approach offers tuning-free model adaptation, works in low-data regimes, and improves over 12 model composition baselines by up to 21.0% across tasks and contexts.

From Commands to Prompts: LLM-Based Semantic File System for AIOS

The proposed LLM-based semantic file system (LSFS) enables users to interact with files through natural language prompts, facilitating semantic file management and improving usability. LSFS offers significant improvements over traditional file systems in terms of user convenience, diversity of functions, and accuracy and efficiency of file operations, with the integration of LLM enabling more intelligent file management tasks.

Code

AI PCs Aren't Good at AI: The CPU Beats the NPU

The text describes a benchmarking project that tests the performance of Qualcomm's NPU on a Microsoft Surface tablet running Windows on a Qualcomm Arm-based SoC. The benchmark shows that the NPU's performance is significantly below the 45 Teraops/s claim, with a 1.3% match. The benchmark measures the latency and computational throughput of a simple model running on the CPU and NPU, with the NPU's performance improving when using a quantized model with eight-bit inputs and outputs.

Nvidia Outperforms GPT-4o with Open Source Model

Arena-Hard-Auto is an automatic evaluation tool for instruction-tuned LLMs that contains 500 challenging user queries sourced from Chatbot Arena. It has the highest correlation and separability to Chatbot Arena among popular open-ended LLM benchmarks.

Show HN: Arch – an intelligent prompt gateway built on Envoy

Arch is an intelligent Layer 7 gateway designed to protect, observe, and personalize LLM applications with your APIs. It handles critical tasks like detecting and rejecting attempts, calling backend APIs, and managing observability of prompts and LLM interactions in a centralized way.

Arch is built on Envoy Proxy and features function calling for fast Agentic and RAG apps, prompt guardrails to prevent jailbreak attempts, traffic management, and standards-based observability. It provides a quickstart guide for setting up and integrating Arch into generative AI applications.

Out-of-Distribution Machine Learning

This repository provides a comprehensive resource for out-of-distribution (OOD) detection, robustness, and generalization in machine learning/deep learning. It aims to address critical challenges in modern AI systems that encounter data differing from their training distribution, leading to unexpected failures.

Show HN: I built an open-source Apple Intelligence-like Writing Tool for Windows

Writing Tools is a free, open-source, and system-wide grammar assistant for Windows that uses AI to improve writing with features like proofreading, rewriting, and summarizing. It is powered by Google's Gemini 1.5 Flash model and offers customizable hotkeys, dark mode support, and the ability to translate text across multiple languages.