Saturday — January 11, 2025

Microsoft's new Phi-4 model gets major bug fixes and improved fine-tuning in Colab, a Rust-based tool named Kwaak revolutionizes autonomous AI coding agents, and Differentially Private Retrieval-Augmented Generation (DP-RAG) is now possible with novel privacy techniques.

News

I've acquired a new superpower

A 9-year-old girl on a German TV show demonstrated an impressive ability to quickly identify the differences between two seemingly identical images, leaving the author baffled. However, after discovering a technique involving crossing one's eyes to overlap the images, the author was able to replicate this skill and instantly spot the differences, feeling like they had gained a special superpower.

Phi-4 Bug Fixes

Unsloth has integrated Microsoft's Phi-4 model, a 14B model that performs on par with OpenAI's GPT-4o-mini, and has fixed four bugs to greatly increase the model's accuracy. Unsloth's version of Phi-4 also enables faster finetuning, reduced memory usage, and longer context lengths, and can be fine-tuned using their Colab Notebook, which fits on Google's free Tesla T4 16GB GPU.

41% of Employers Worldwide Say They'll Reduce Staff by 2030 Due to AI

A recent survey by the World Economic Forum found that 41% of employers worldwide expect to reduce their staff by 2030 due to the increasing use of artificial intelligence, with jobs such as graphic designers and legal secretaries being particularly at risk. However, the survey also predicts a net growth in the number of jobs created over the next five years, with 170 million new jobs being created, although 92 million jobs will be displaced, resulting in a net growth of 78 million jobs.

Making Beautiful API Keys

The company created a custom package called uuidkey to generate beautiful and readable API keys, as they found existing solutions to be unattractive and lacking in functionality. The uuidkey package encodes UUIDv7 IDs using Crockford Base32 and adds dashes for aesthetics, resulting in a more readable and symmetrical key that is also chronologically sortable when stored decoded as UUIDs.

Creates hyper-realistic voice clones from just 3 seconds of audio

AnyVoice offers AI voice cloning technology that can create hyper-realistic voice clones from just 3 seconds of audio, supporting languages such as English, Chinese, Japanese, and Korean. The process is straightforward, requiring users to record a short audio sample in a quiet environment, after which they can generate natural-sounding voice clones for various purposes, with guidelines and FAQs available on the website.

Research

Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Researchers have found that deep learning models can learn complex patterns, such as $k$-sparse parities, with sudden improvements in performance after a certain number of training iterations, approximately $n^{O(k)}$. Theoretical analysis reveals that this is not due to a random search process, but rather the model's ability to gradually amplify the sparse solution through a Fourier gap in the population gradient, leading to continuous progress that is not reflected in traditional loss and error metrics.

RAG with Differential Privacy

Retrieval-Augmented Generation (RAG) improves the quality of Large Language Models by providing fresh context, but it raises significant privacy concerns due to the risk of exposing confidential data. This paper proposes a solution to this problem by using differentially private token generation, a viable approach to private RAG that can mitigate these concerns.

Learning how to think with Meta Chain-of-Thought

The Meta Chain-of-Thought (Meta-CoT) framework extends traditional Chain-of-Thought by explicitly modeling the underlying reasoning process, and can be produced through methods such as process supervision and reinforcement learning. This framework provides a roadmap for enabling more powerful and human-like reasoning in artificial intelligence, and raises open research questions regarding scaling laws, verifier roles, and novel reasoning algorithms.

The GAN is dead; long live the GAN - A Modern GAN Baseline

Researchers challenge the notion that GANs are difficult to train by introducing a new, principled approach to building GAN architectures, resulting in a simplified baseline model called R3GAN that outperforms state-of-the-art models on several datasets. The R3GAN model achieves this success by using a well-behaved regularized relativistic GAN loss that eliminates the need for empirical tricks and allows for the use of modern architectures.

Rewrite It in Rust: A Computational Physics Case Study

This study compares the performance of C++ and Rust in scientific computing by implementing a physics simulation in both languages, finding that Rust can offer up to a 5.6× performance increase. The researchers also created a parallel version of the Rust code, which further improved performance while being relatively easy to write safely, suggesting Rust as a viable alternative to C++ for high-performance computing.

Code

Show HN: Freeact – A Lightweight Library for Code-Action Based Agents

freeact is a lightweight library that enables language models to act as autonomous agents through executable code actions, providing a flexible approach to solving complex problems. The library allows agents to leverage the full power of the Python ecosystem, learn from environmental feedback, and store successful code actions as custom skills, making it suitable for developers and researchers who need fine-grained control over their agent implementations.

Open Agents – Secure, Composable AI Agents

The OpenAgentSpec is a standard for defining and interacting with generative AI agents, aiming to provide a secure and consistent framework for their development and deployment. By establishing a formal standard, OpenAgentSpec seeks to democratize access to AI, enable secure and autonomous agent interactions, and facilitate collaboration among agents across different organizations and boundaries.

Show HN: a Rust based coding agent

Kwaak is a tool that enables users to run a team of autonomous AI agents locally from their machine, allowing them to write code, improve test coverage, and update documentation in parallel. Powered by Swiftide, Kwaak can answer questions about code, find examples, write and execute code, and create pull requests, with the goal of automating code improvement tasks.

Show HN: Experiment with DSPy optimzers, track performance of LLM-features

LangWatch is a visual interface and complete LLM Ops platform for monitoring, experimenting, measuring, and improving LLM pipelines, built on Stanford's DSPy framework. It offers a range of features, including a drag-and-drop optimization studio, quality assurance tools, and monitoring and analytics capabilities, and can be used locally, in the cloud, or through self-hosting with commercial support.

Show HN: A simple implementation of Differentially Private RAG

Sarus DP-RAG is a technique that implements the popular RAG method with differential privacy guarantees, addressing privacy concerns by aggregating information from multiple documents using a novel token-by-token aggregation technique. The system's effectiveness is demonstrated in a technical report, which includes empirical results and code to evaluate the system on synthetic medical data, and can be implemented by cloning the repository and following the provided quick start instructions.