Friday — December 27, 2024

DeepSeek v3 emerges as a cost-efficient language model powerhouse, enhancing NLP capabilities, Monolith revolutionizes real-time recommendation systems with collisionless embeddings, and Eunomia offers token-level data governance for LLM applications.

News

Ghostty 1.0

The given text appears to be an ASCII art image, primarily composed of dollar signs and other characters. The image seems to be a festive or celebratory scene, possibly representing a holiday or special occasion.

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3 is a strong Mixture-of-Experts (MoE) language model with 671 billion total parameters, achieving performance comparable to leading closed-source models while requiring only 2.788 million H800 GPU hours for full training. The model outperforms other open-source models in various benchmarks, showcasing its efficiency and effectiveness in natural language processing tasks.

OpenAI is Lehman Brothers: A crash is coming

The 2007 subprime mortgage crisis, triggered by over-leveraged investment banks like Lehman Brothers, shares similarities with the current hype surrounding generative AI, as both involve a herd mentality and unrealistic expectations. The AI industry's reliance on unprofitable business models and unsustainable funding may lead to a crisis, with companies like OpenAI eventually needing to raise prices, killing startups and scaring away venture capital, ultimately resulting in a brutal haircut for the tech industry.

Boox devices now ship with a Chinese propaganda AI assistant

Boox e-readers now come with a Chinese propaganda AI assistant that censors information and promotes a pro-China agenda. This has raised concerns about the potential risks of using Chinese electronics and the need for consumers to be aware of the software and data collection practices of these devices.

Roasted Christmas Spam from Muhu AI

The author received an unsolicited email from "the muhu team" on Christmas, containing a link to a "holiday audio roast" created by their AI, which mocked the author's open-source software. The author is critical of muhu.ai's tactics, accusing them of attempting to piggyback off open-source developers to promote their AI product, which enables non-technical executives to surveil developers without understanding their expertise.

Research

Deliberation in Latent Space via Differentiable Cache Augmentation [pdf]

Researchers have developed a method to enhance large language models by adding an offline coprocessor that operates on the model's key-value cache, allowing it to learn and distill additional computation without changing the decoder. This approach has shown to improve the model's performance on complex tasks, reducing perplexity and improving results even without task-specific training.

Are Language Models Useful for Time Series Forecasting? (No.)

Large language models (LLMs) do not significantly improve time series forecasting performance, and in some cases, removing or replacing the LLM component can even lead to better results. Pretrained LLMs also fail to outperform models trained from scratch and do not effectively represent sequential dependencies in time series data.

When Every Token Counts: Optimal Segmentation for Low-Resource Language Models

Researchers found that optimizing Byte-Pair Encoding (BPE) configurations can significantly reduce token count and improve model performance, particularly for smaller models. This compression-optimized tokenization strategy may provide substantial advantages for multilingual and low-resource language applications, offering a promising direction for further research in NLP.

Improving feature interactions at Pinterest under industry constraints

Adopting advanced recommendation systems in industrial settings is often hindered by constraints such as model latency, GPU memory limitations, and model reproducibility. This paper shares learnings from improving feature interactions in Pinterest's Homefeed ranking model under such constraints, providing insights and strategies for balancing performance with practical limitations.

Monolith: Real Time Recommendation System with Collisionless Embedding Table

General-purpose deep learning frameworks like TensorFlow and PyTorch are insufficient for building scalable and real-time recommendation systems due to their limitations in handling dynamic and sparse features and real-time customer feedback. The Monolith system addresses these issues with a collisionless embedding table, online training architecture, and fault-tolerant design, allowing for real-time learning and successfully being implemented in the BytePlus Recommend product.

Code

DeepSeek-V3

DeepSeek-V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters, achieving performance comparable to leading closed-source models while requiring only 2.788M H800 GPU hours for its full training. The model outperforms other open-source models and demonstrates excellent performance in various evaluations, showcasing its capabilities in natural language processing tasks.

DeepSeek-v3 Technical Report [pdf]

Colab Notebook – RAG on Your Unstructured Data

This repository provides a comprehensive collection of advanced Retrieval-Augmented Generation (RAG) techniques, including implementations, explanations, and evaluation methods. The repository covers various RAG techniques, such as Naive RAG, Hybrid RAG, and Adaptive RAG, and provides notebooks for each technique, allowing users to easily implement and evaluate them.

Gpt.sh

FrameOS is an operating system designed for single-function smart frames, deployable on a Raspberry Pi and compatible with various e-ink and traditional displays. It allows users to create custom scenes or deploy prebuilt ones for applications such as smart home calendars, meeting room displays, and public advertisement screens.

Show HN: Eunomia, Open Source Data Governance for LLM-Based Applications

Eunomia is an open-source framework for enforcing data governance policies in LLM-based applications by working at the token level. It can be installed via pip and used in a modular way, combining multiple instruments to create custom orchestrations for tasks such as identifying and replacing PII in text.