Tuesday January 28, 2025

Nvidia faces a historic $589B loss due to DeepSeek, while Janus Pro advances text-to-image generation capabilities and researchers unveil uSpectre, a new class of microcode-based transient execution attacks.

News

We're bringing Pebble back

A new team, led by Pebble's founder, is working on a new smartwatch that runs open source PebbleOS, aiming to recreate the original Pebble's features and long battery life. The project is made possible by Google's recent open-sourcing of PebbleOS, and interested individuals can sign up to potentially get one of the new watches if the project moves forward.

Nvidia’s $589B DeepSeek rout

Nvidia's stock has experienced a historic rout, with the company losing $589 billion in value, marking the largest loss in market history.

Machine Learning in Production (CMU Course)

The Machine Learning in Production course at CMU covers the entire lifecycle of building, deploying, and maintaining software products with machine-learned models, including responsible AI and MLOps. The course is designed for students with data science experience and basic programming skills, and aims to establish a working relationship between software engineers and data scientists to build robust and responsible ML-enabled systems.

I trusted an LLM, now I'm on day 4 of an afternoon project

The author is working on a side project called "Deskthang", a device that displays important notifications on a desk, aiming to solve their own problem of staying focused while working. They're using this project to brush up on their hardware skills, test the capabilities of AI tools, and explore new technologies like Zig, a programming language, while trying to determine if AI can replace their job as a developer.

DeepSeek-R1 with Dynamic 1.58-bit Quantization

The DeepSeek-R1 model, a rival to OpenAI's O1 reasoning model, has been successfully quantized to reduce its size by 80% from 720GB to 131GB while maintaining functionality. The dynamic quantization method, which selectively quantizes certain layers to higher bits, allows the model to run on lower VRAM and RAM, with the 1.58bit quantization version achieving around 140 tokens per second and requiring at least 80GB of combined VRAM and RAM for optimal performance.

Research

RL and Transformer = a General-Purpose Problem Solver

A pre-trained transformer fine-tuned with reinforcement learning can develop the ability to solve new, unseen problems, known as In-Context Reinforcement Learning (ICRL), with remarkable sample efficiency and robustness. This meta-learner excels in both in-distribution and out-of-distribution environments, and can iteratively improve upon its own solutions, making it a powerful general-purpose problem solver.

Eliza: A Web3 Friendly AI Agent Operating System

AI Agents, powered by large language models, have expanded capabilities due to various plugins, but lack a framework to integrate web3 applications. The proposed Eliza framework is an open-source, web3-friendly solution that allows for effortless deployment of web3 applications and seamless integration with blockchain data and smart contracts.

Analyzing and Exploiting Branch Mispredictions in Microcode [pdf]

Researchers have discovered uSpectre, a new class of transient execution attacks that exploit microcode branch mispredictions to leak sensitive data, which also encompasses many previously known Spectre and Meltdown variants. The discovery of uSpectre has led to the identification of new attacks and the development of a defense mechanism called uSLH to mitigate these vulnerabilities.

Consilience: A Holistic Measure of Goodness-of-Fit

The Consilience measure (C) is a new goodness-of-fit metric that evaluates how well a model's results match observed data, returning a single value between -∞ and 1, with 1 indicating a perfect fit and values near 0 or less indicating poor fit. The measure can accommodate complex systems with multiple response types and can be calculated using provided Excel templates, allowing for semi-automatic computation and statistical assessment of models.

Autonomy-of-Experts Models (ArXiv)

Mixture-of-Experts (MoE) models typically use a router to assign tokens to expert modules, but this separation can lead to suboptimal expert selection and ineffective learning. The proposed Autonomy-of-Experts (AoE) paradigm addresses this issue by allowing experts to autonomously select themselves based on their internal activations, resulting in improved expert selection and effective learning, and outperforming traditional MoE models with comparable efficiency.

Code

DeepSeek releases Janus Pro, a text-to-image generator [pdf]

The Janus-Series is a collection of unified multimodal understanding and generation models, including Janus, JanusFlow, and Janus-Pro, which can perform tasks such as text-to-image generation and multimodal understanding. These models have achieved significant advancements in their respective fields, with Janus-Pro being the most advanced version, incorporating optimized training strategies, expanded training data, and larger model sizes to enhance its capabilities.

Show HN: Voice Cloning and Multilingual TTS in One Click (Windows)

Voice-Pro is a powerful AI-powered web application that offers a range of multimedia content processing features, including YouTube video downloading, speech recognition, translation, and text-to-speech, with support for over 100 languages. The tool provides an all-in-one solution for content creators, researchers, and multilingual communication professionals, with advanced features such as zero-shot voice cloning, professional vocal isolation, and instant translation across multiple languages.

Show HN: Ollama server discovery tool (finds public LLM instances)

The Public Ollama Server Finder is a network scanning tool built with Rust that discovers and enumerates accessible Ollama servers across specified network ranges, providing detailed reporting and model detection capabilities. The tool is designed for educational and authorized security testing purposes only, and users must confirm they have authorization for all target networks and accept full responsibility for their actions.

Show HN: emcee – connect agents to APIs (via MCP)

Emcee is a tool that provides a Model Context Protocol (MCP) server for any web application with an OpenAPI specification, allowing users to connect external tools and data services to apps like Claude Desktop. By installing and configuring emcee, users can enable their apps to access and utilize various tools and data sources, such as weather information, through a standardized MCP interface.

Show HN: Never train another ML model again

FlashLearn is a Python toolkit that simplifies the use of large language models (LLMs) for machine learning tasks, providing an end-to-end solution for tasks such as classification, summarization, and rewriting. It enables reliable chaining and storage of tasks at scale, with features like concurrency, rate limiting, and cost estimation, allowing users to process data in JSON format without needing to train their own models.