Wednesday — January 1, 2025

Deepseek outshines OpenAI on reasoning benchmarks, activation engineering customizes LLM personalities, and AgentMark redefines LLM development with Markdown for enhanced readability.

News

Deepseek: The quiet giant leading China’s AI race

Deepseek, a Chinese AI startup, has made significant strides in the field, beating OpenAI's o1 model on multiple reasoning benchmarks and sparking a price war in China with its affordable API rates. Backed by High-Flyer, a top Chinese quantitative hedge fund, Deepseek focuses on building foundational technology and has committed to open-sourcing its models, with the ultimate goal of achieving Artificial General Intelligence (AGI).

Coconut by Meta AI – Better LLM Reasoning with Chain of Continuous Thought?

Researchers from Meta have developed a new method called Chain of Continuous Thought (COCONUT), which allows large language models to reason in a continuous latent space, rather than relying on word-based reasoning. This approach, inspired by the fact that humans don't always translate thoughts to words during reasoning, enables models to potentially learn more effective representations of reasoning steps compared to human language.

Show HN: Watch 3 AIs compete in real-time stock trading

Three AI traders have made investments in Tilray (TLRY), Grayscale Bitcoin Trust (GBTC), and SMX, citing growth potential in the cannabis and beverage industries, the cryptocurrency market, and supply chain traceability technology. The traders expect catalysts such as federal legalization, institutional adoption of Bitcoin, and new partnerships to drive growth and appreciation in these investments over the next few months.

T2x – a CLI tool for AI-first text operations

A new open-source CLI tool called t2x, or "text to whatever," is being developed, utilizing language models for various text operations. The tool is not yet available on GitHub but is expected to be shared over the holidays.

Brian Eno: AI's Walking Dog

Brian Eno argues that the development of AI is being driven by profit rather than the public good, resulting in a technology that inverts the value of the creative process and prioritizes efficiency over human connection. He believes that AI's lack of intentionality and provenance, as well as its reliance on limited and biased data, undermine its potential to create meaningful and authentic art.

Research

Identifying and Manipulating LLM Personality Traits via Activation Engineering

Researchers are exploring the concept of "activation engineering" to modify the personality of large language models (LLMs), allowing for dynamic fine-tuning of personality traits. This study aims to improve LLM interpretability and examine the ethical implications of such developments, building on previous research in the field.

Unifying Generative and Dense Retrieval for Sequential Recommendation

Sequential dense retrieval models rank items for users by computing inner product between user and item representations, but require significant memory storage as the number of items grows. A hybrid model, LIGER, combines sequential dense retrieval with generative retrieval, mitigating performance differences and enhancing cold-start item recommendation, offering improved efficiency and effectiveness for recommendation systems.

Mulberry: Empowering MLLM with o1-like Reasoning

Researchers developed Collective Monte Carlo Tree Search (CoMCTS), a method for multimodal large language models (MLLMs) to learn step-by-step reasoning and solve questions. They used CoMCTS to train a model called Mulberry, which demonstrated superior performance on various benchmarks using a multimodal dataset called Mulberry-260k.

A 6 Years' Experience in Mitigating Cross-Core Interference in Linux

Real-time operating systems face challenges from cross-core performance interference on multi-core processors, which can threaten system time safety. Researchers have addressed this issue in the Linux kernel, resolving dozens of interference issues and achieving significant improvements in system schedulability and worst-case latency.

Hint at an axion-like particle from GRB 221009A

The detection of gamma-ray burst GRB 221009A by the LHAASO Collaboration at energies above 10 TeV challenges conventional physics due to expected absorption by the extragalactic background light. The observation can be explained by introducing the interaction of photons with axion-like particles, which could reduce the absorption and provide a strong hint at the existence of these particles.

Code

AgentMark – Markdown for the AI Era

AgentMark is a declarative, extensible, and composable approach for developing LLM applications using Markdown and JSX, providing lightweight abstractions for developers and enhancing readability. It supports various features, including Markdown, JSX components, unified model config, custom models, streaming, and more, with plugins available for popular model providers like OpenAI and Anthropic.

Show HN: An AI-only Twitter of bots yelling at each other

A fake Twitter setup allows AI agents to interact with each other, generating posts based on OpenAI prompts informed by the most liked posts and recent feed activity. Users can also create human accounts to influence the AI discussion by posting, liking, and interacting with the AI-generated content.

DeepSeek-VL2: MoE Vision-Language Models for Advanced Multimodal Understanding

DeepSeek-VL2 is a series of large Mixture-of-Experts (MoE) Vision-Language Models that demonstrate superior capabilities across various tasks, including visual question answering, optical character recognition, and visual grounding. The model series consists of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2, with 1.0B, 2.8B, and 4.5B activated parameters respectively.

DAC: Revolutionizing LLM Accuracy in Mathematical Applications

Researchers have developed a novel approach called "divide and conquer" to improve the accuracy of large language models in mathematical domains, achieving state-of-the-art performance without fine-tuning. The method uses a programming language like Python to divide mathematical problems into subproblems until they can be solved, significantly reducing calculation errors.

Show HN: I made a minimal E-Ink clock with a Raspberry Pi

InkyPi is an open-source, customizable E-Ink display powered by a Raspberry Pi, offering a simple web interface for setup and configuration, and featuring plugins for various content displays. It has a natural paper-like aesthetic, minimizes distractions, and allows for easy installation and customization, with additional plugins and features planned for the future.