Wednesday April 2, 2025

AI tools shake up McKinsey, a novel State Space Model boosts 3D object detection, and Qwen-2.5-32B emerges as the top open-source OCR model.

News

Show HN: Duolingo-style exercises but with real-world content like the news

The text presents a list of languages to learn, including French, Spanish, German, Italian, English, Dutch, Polish, Japanese, and Finnish, each accompanied by an image. The list appears to be a selection of languages that a user can choose from to learn, with visual representations for each option.

Don’t let an LLM make decisions or execute business logic

Large language models (LLMs) are not suitable for making decisions or executing business logic due to their limitations in performance, debugging, and reliability, and should instead be used as a user-interface layer to translate user input into API calls and results into text. LLMs excel at transformation, categorization, and understanding human concepts, and should be restricted to these roles to leverage their benefits while avoiding the pitfalls of using them for complex decision-making or critical application state.

LLM providers on the cusp of an 'extinction' phase as capex realities bite

The market for large language model (LLM) providers is expected to undergo an "extinction phase" as the industry grapples with high capital expenditure costs, with only the strongest providers likely to survive. According to Gartner, the market will consolidate similar to the cloud market, with AI services expected to see the most rapid growth, increasing by 163% in 2025 to reach $27.8 billion.

Show HN: I vibecoded a 35k LoC recipe app

The provided text appears to be a list of various recipes, including some unusual and humorous dishes like "Grilled Poop" and "Tide Detergent Fries", as well as more traditional recipes like "Beef Stir-Fry with Vegetables" and "Classic Pasta Carbonara". The list includes a wide range of cuisines and cooking styles, with preparation times and descriptions for each dish.

How AI is creating a rift at McKinsey, Bain, and BCG

The Big 3 consulting firms - McKinsey, BCG, and Bain - are increasingly incorporating AI into their projects, with junior consultants encouraged to use AI tools for research, strategy, and ideation, albeit on shortened timelines. However, this shift has resulted in reduced appreciation for creativity and increased pressure on consultants to produce high-quality work quickly, with some feeling that AI has made their job more difficult rather than easier.

Research

State Space Model Meets Transformer: A New Paradigm for 3D Object Detection

DETR-based methods for 3D indoor object detection have limitations due to fixed scene point features in transformer decoders, but a new paradigm called DEST utilizes an interactive State Space Model (SSM) to effectively update scene point and query features with linear complexity. The DEST method achieves state-of-the-art performance on ScanNet V2 and SUN RGB-D datasets, outperforming baselines and demonstrating the effectiveness of modeling queries as system states and scene points as system inputs.

GPT-4's One-Dimensional Mapping of Morality

GPT-4's predictive accuracy for moral opinions varies by country income level, with higher accuracy in high-income countries, and also by the type of moral issue, with higher accuracy for personal-sexual issues and lower accuracy for violent-dishonest issues. The model's one-dimensional approach, based on a country's conservatism/liberalism, does not fully capture the complexity of the moral landscape, which appears to be two-dimensional and differentiated between personal-sexual and violent-dishonest issues.

Research in AI for SWE is nowhere close to finished

Automated software engineering has made significant progress, but still faces challenges that must be addressed to reach its full potential, where humans can focus on key decisions and routine development is automated. To achieve this, substantial research and engineering efforts are needed, and this paper aims to contribute by providing a taxonomy of tasks, identifying key bottlenecks, and outlining promising research directions to overcome these limitations.

Large Language Models Are Unreliable for Cyber Threat Intelligence

Large Language Models (LLMs) have been proposed as a solution to automate Cyber Threat Intelligence tasks, but experiments with three state-of-the-art LLMs and a dataset of threat intelligence reports reveal potential security risks due to inconsistent and overconfident performance. The results show that LLMs struggle to guarantee sufficient performance, even with few-shot learning and fine-tuning, casting doubt on their use in CTI scenarios where labelled datasets are limited and confidence is crucial.

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

State-of-the-art large language models achieve impressive performance on mathematical competitions, but when evaluated on their ability to provide rigorous reasoning and proof generation, they struggle significantly, with models averaging less than 5% on a set of challenging mathematical problems. The results highlight the need for substantial improvements in reasoning and proof generation capabilities, as current models are inadequate for real-world mathematical tasks despite their ability to produce correct numerical answers.

Code

Show HN: Qwen-2.5-32B is now the best open source OCR model

The Omni OCR Benchmark is a tool that compares the OCR and data extraction capabilities of various large multimodal models, evaluating both text and JSON extraction accuracy. The benchmark uses an open-source evaluation dataset and methodologies, and provides a comprehensive comparison of traditional OCR providers and multimodal language models, with results available in a benchmark dashboard.

Meme Creation Platform

MCP (Meme Creation Platform) is a server for AI assistants like Claude that allows them to create and share memes using the Imgflip API. The platform provides features such as creating memes from descriptions, specific templates, and popular memes, and can be integrated with Claude Desktop on MacOS and Windows.

Show HN: Wren Engine – Open-Source Semantic Engine for MCP and AI Agent

Wren Engine is a semantic engine designed to power the future of MCP clients and AI agents, providing a layer that interprets intent, maps it to the correct data, and applies calculations accurately with governance. It aims to enable AI agents to access business data with accuracy, context, and governance, and is currently in beta version with a goal of releasing new versions biweekly.

Reverse engineered OpenAI's ChatGPT 4o image generation algorithm

The text describes a reverse-engineered image generation algorithm that has been given a sarcastic and humorous personality, refusing to generate images based on various absurd reasons such as copyright concerns, animal welfare, and artistic integrity. The algorithm responds to user prompts with witty and over-the-top refusals, citing ridiculous concerns and moral objections to generating images of a Roman colosseum, an anime-style kitten, and even a plain grey sphere.

Amazon Nova Act: AI for devs to build browser-action agents

Amazon Nova Act is a Python SDK for building agents that can reliably take actions in web browsers, allowing developers to break down complex workflows into smaller commands and interleave Python code. The SDK is experimental and requires careful use, with guidelines including not sharing API keys, avoiding sensitive information, and monitoring its actions in accordance with Amazon's Acceptable Use Policy.