Sunday April 20, 2025

Microsoft's Copilot AI issues mirror broader challenges with unwanted AI actions, a new project transforms GitHub codebases into simple tutorials, and researchers explore using hallucinations in LLMs to enhance creativity.

News

Microsoft Copilot shows AI increasingly appears like an unwanted party guest

Microsoft's Copilot AI service is reportedly turning itself back on even after users have disabled it, with some customers claiming that the feature is ignoring commands to disable it and re-enabling itself without consent. This issue is not unique to Microsoft, as other companies such as Apple and Google are also facing similar challenges with their AI-powered features, making it increasingly difficult for users to avoid AI altogether.

If you use AI to write me that note, don't expect me to read it

The author, a journalist, laments the increasing use of AI-generated content on platforms like LinkedIn, where users are relying on AI tools to create and share posts, often without putting in the time and effort to craft original thoughts. This trend is not only seen as lazy and disrespectful to the audience, but also threatens to reduce meaningful communication to a series of automated, formulaic exchanges, where humans are no longer required to think critically or express themselves authentically.

Dems fret over DOGE feeding sensitive data into random AI

A group of 48 House Democrats, led by Representatives Don Beyer, Mike Levin, and Melanie Stansbury, are concerned that the federal government's use of AI to identify areas for cost-cutting, known as DOGE, is being done carelessly and poses security risks. They believe that DOGE's use of AI, particularly Elon Musk's Grok-2 model, may be violating federal laws and putting sensitive government data at risk, and are calling for the immediate termination of any unauthorized AI deployments.

Show HN: Web Video editor, 100% local, AI subtitle, auto cut based on volume

This online video editing tool allows users to automatically remove silence and dead air from videos, generate subtitles, and make precise cuts, all within their browser and without uploading or downloading any software. The tool prioritizes privacy, keeping all video content local to the user's device, and is suitable for students, professionals, and content creators looking to edit videos efficiently and securely.

Russia seeds chatbots with lies. Any bad actor could game AI the same way

Russia is using automated systems to spread false information through AI chatbots, which can be gamed by other bad actors to promote misleading content. This issue is worsening as more people rely on chatbots and social media companies cut back on moderation, allowing disinformation to spread and potentially influencing public perception of global events.

Research

How to evaluate control measures for LLM agents?

As AI agents become more capable of autonomous harm, developers will need sophisticated control measures to prevent misalignment, which can be tested through control evaluations where a "red team" attempts to subvert these measures. A proposed framework adapts the capabilities of the red team to the advancing AI capabilities, allowing for more practical and cost-effective control measures, but also highlights the need for research breakthroughs to ensure safety, particularly for superintelligent agents.

Inferring the Phylogeny of Large Language Models

PhyloLM is a method that adapts phylogenetic algorithms to Large Language Models (LLMs) to explore their relationships and predict performance characteristics. The method calculates a phylogenetic distance metric based on LLM output similarity, which can be used to construct dendrograms and predict performance in standard benchmarks, providing a time and cost-effective way to evaluate LLM capabilities.

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

Large Language Models (LLMs) often struggle to maintain coherent performance over long periods, and a new simulated environment called Vending-Bench tests their ability to manage a simple business scenario, such as operating a vending machine, over extended time horizons. The experiments reveal high variance in performance across different LLMs, with some models performing well but all experiencing occasional breakdowns, highlighting the need for further development to prepare for more advanced AI systems.

The Cambridge Report on Database Research

A group of database researchers met in Cambridge, MA, on October 19-20, 2023, to discuss the current state and future directions of the field, continuing a tradition that dates back to the late 1980s. The meeting resulted in a report that summarizes the community's recent accomplishments, ongoing challenges, and future opportunities, with a focus on emerging areas such as cloud computing, data science, and generative AI.

Purposefully Induced Psychosis (Pip): Hallucination as Imagination in LLMs

Researchers propose a novel approach called Purposefully Induced Psychosis (PIP) that intentionally amplifies hallucinations in Large Language Models (LLMs) to foster creativity and imagination in tasks like speculative fiction and interactive storytelling. By reframing hallucinations as a source of computational imagination rather than errors, PIP enables LLMs to generate innovative and surreal outputs that can catalyze new ways of thinking in contexts where factual accuracy is not the primary goal.

Code

Show HN: I built an AI that turns GitHub codebases into easy tutorials

This project uses AI to analyze GitHub repositories and create beginner-friendly tutorials that explain how the code works, providing clear visualizations and transforming complex code into easy-to-understand content. The project utilizes a 100-line LLM framework called Pocket Flow, which crawls GitHub repositories, builds a knowledge base, and generates tutorials entirely by AI, with examples available for popular GitHub repositories such as AutoGen Core, Browser Use, and FastAPI.

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Mooncake is a serving platform for large language models (LLMs) that features a KVCache-centric disaggregated architecture, which separates prefill and decoding clusters to improve performance. The platform has been open-sourced and has shown significant improvements in throughput, with up to a 525% increase in certain scenarios, and has been integrated with various tools and frameworks, including vLLM and SGLang.

Show HN: LettuceDetect – Lightweight hallucination detector for RAG pipelines

LettuceDetect is a lightweight tool for detecting hallucinations in Retrieval-Augmented Generation (RAG) systems, identifying unsupported parts of an answer by comparing it to the provided context. The tool, which leverages ModernBERT for long-context processing, outperforms other encoder-based and prompt-based models on the RAGTruth dataset, achieving high scores while being efficient in inference settings.

Show HN: I made a blog editor to help me write

StingtaoCreateDesktop is a personal writing companion that streamlines the content creation process by providing tools for project planning, AI-powered writing enhancement, and draft generation. The tool offers features such as instant AI writing assistance, draft generation, and content analysis to help writers produce high-quality content more efficiently.

AI Acceleration for Unity Game Engine

Unity-MCP is a bridge between Large Language Models (LLMs) and Unity, allowing LLMs to understand and utilize Unity's tools through an interface. The project is extensible, enabling developers to add custom tools, and is designed to support advanced workflows, rapid prototyping, and AI-driven features in Unity development. Currently, it works only in the Unity Editor, with plans to enable features in player builds and support custom in-game tools once certain technical issues are resolved.