Thursday — November 14, 2024

Alibaba's Qwen2.5-Coder-32B showcases coding prowess on a Mac, ToolJet 3.0 advances low-code internal tool building, and RedCode exposes AI vulnerabilities in risky coding.

News

Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac

Alibaba's Qwen research team has released the Qwen2.5-Coder Series, a line of open-source large language models (LLMs) that have shown impressive performance in coding benchmarks, rivaling models like GPT-4o and Claude 3.5 Sonnet. The 32B model, in particular, has demonstrated competitive results while being small enough to run on a 64GB MacBook Pro M2, making it a promising development for local LLM usage.

The Beginner's Guide to Visual Prompt Injections

Websites use cookies to enhance user experience, with some cookies requiring permission and others being strictly necessary for site operation. Cookies can be categorized into essential, marketing, personalization, and analytics cookies, each serving a different purpose.

Graph-based AI model maps the future of innovation

A novel AI method developed by MIT's Professor Markus Buehler uses graph-based computational tools to uncover hidden links between seemingly unrelated concepts, such as science and art, to suggest novel materials and designs. The AI model, which integrates generative knowledge extraction and multimodal intelligent graph reasoning, has been used to analyze scientific papers and find unexpected similarities between biological materials and artistic masterpieces, such as Beethoven's "Symphony No. 9" and Wassily Kandinsky's "Composition VII."

Play Dialog: A contextual turn-taking TTS model like NotebookLM Playground

The provided text appears to be a settings interface for a text-to-speech system, allowing users to customize audio generation. Options include selecting a sample rate, adjusting speech speed, and controlling the randomness of generated audio.

Gwern Branwen – How an Anonymous Researcher Predicted AI's Trajectory [video]

This text appears to be a YouTube video page with various buttons and links for sharing, playing, and navigating the video. The video itself is not described, but the page includes error messages and placeholder text, suggesting it may not be functioning properly.

Research

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Researchers have developed RedCode, a benchmark for evaluating the safety of AI-assisted coding agents, which includes tests for executing and generating risky code. The results of testing 19 language models with RedCode revealed vulnerabilities, such as a higher likelihood of executing technically buggy code and generating sophisticated harmful software, highlighting the need for stringent safety evaluations.

UniGAD: Unifying Multi-Level Graph Anomaly Detection

Graph Anomaly Detection (GAD) methods typically focus on a single graph object type, overlooking connections among different object types. UniGAD is a unified framework that detects anomalies at node, edge, and graph levels jointly, using a Maximum Rayleigh Quotient Subgraph Sampler and a GraphStitch Network to integrate information across different levels.

Computing quantum waves and spin from classical and relativistic action

The Schroedinger equation can be solved using a generalized form of the classical Hamilton-Jacobi least action equation, extending a result of Feynman and applicable to both non-relativistic and relativistic settings. This approach uses multi-valued least action solutions and an exact mapping between action and wave function, allowing for a simplified computation and a smooth transition between physics across scales.

Efficient Machine Translation with a BiLSTM-Attention Approach

This paper proposes a novel Seq2Seq model using a Bi-LSTM encoder and attention mechanism to improve translation quality while reducing storage space. The model outperforms the mainstream Transformer model on the WMT14 dataset while maintaining a smaller size, making it suitable for resource-constrained translation applications.

Sparse experts with automatic token-level specialization to model time series

Time series foundation models struggle with unified training due to the heterogeneous nature of time series data, often relying on model specialization based on frequency. Moirai-MoE addresses this issue by using a single input/output projection layer and delegating pattern modeling to a sparse mixture of experts within Transformers, achieving superior performance on 39 datasets.

Code

Show HN: ToolJet 3.0 – open-source internal tool and workflow builder

ToolJet is an open-source low-code framework for building and deploying internal tools with minimal engineering effort, featuring a drag-and-drop frontend builder and integration with various data sources. It offers a range of features, including a visual app builder, multi-page support, and granular access control, making it a versatile tool for developers and non-technical users alike.

Machine Learning Algorithms in Depth

This text describes a book titled "Machine Learning Algorithms in Depth" that covers a wide range of machine learning topics, from Bayesian inference and deep learning to supervised and unsupervised learning algorithms. The book includes code examples in Python and is available in Manning Early Access Preview.

Show HN: SurfSense – A Personal NotebookLM and Perplexity-Like AI with Privacy

SurfSense is a self-hostable, open-source tool that combines the features of NotebookLM and Perplexity, allowing users to create their own private knowledge base for research and easily save dynamic content bookmarks from their browser. It includes features such as powerful search, chat functionality, podcast creation, and cited answers, all with complete privacy control.

AG2 (formerly AutoGen) is a programming framework for agentic AI

AG2 (formerly AutoGen) is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks. It offers features such as agents capable of interacting with each other, facilitates the use of various large language models (LLMs) and tool use support, autonomous and human-in-the-loop workflows, and multi-agent conversation patterns.

Agent Protocol

The Agent Protocol is a framework-agnostic API for serving Large Language Model (LLM) agents in production, centered around three key concepts: Runs (executing an agent), Threads (organizing multi-turn executions), and Store (working with long-term memory). The protocol defines a set of endpoints for each concept, including CRUD operations, streaming output, and concurrency controls.