Saturday — November 16, 2024

OmniVision-968M slashes tokens for edge devices, researchers expose robotic vulnerabilities with RoboPAIR, and LlamaChunk enhances document chunking in Retrieval-Augmented Generation.

News

Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices

OmniVision is a compact multimodal model optimized for edge devices, capable of processing both visual and text inputs with improved accuracy and reduced latency. It features a 9x token reduction and enhanced accuracy through Direct Preference Optimization (DPO) training, outperforming the previous smallest vision-language model, nanoLLAVA, in various benchmark tasks.

Bluesky says it won't train AI on your posts

Bluesky has stated that it has no intention of using user content to train generative AI tools, addressing concerns from artists and creators on the platform. However, the company notes that other companies may still be able to scrape Bluesky posts for training, as its robots.txt file does not exclude crawlers from companies like Google or OpenAI.

AI Sucks at Code Reviews

AI code review tools have significant limitations, including false positives, limited codebase context, and an inability to understand intent, leading to "alert fatigue" and a lack of trust in the tool. Human reviewers are irreplaceable due to their ability to assess code in context, understand team dynamics, and engage in constructive conversations that improve code quality.

Show HN: AI-Generated Perfect Playlist for Your Mood

Mood Songs offers playlists for various moods and genres, including happy, romantic, EDM, hip-hop, and more. The platform features a wide range of categories, from relaxing and motivational to party and devotional music.

Chegg: Bay Area tech company, down from $12B to $159M in value, lays offs

Access to a webpage has been denied, and the user is prompted to confirm they are human by pressing and holding, but the option to do so is not visible. The page provides a reference ID and contact information for users to report the issue and provide feedback.

Research

1-Bit AI Infrastructure

Researchers have developed a software stack to optimize the performance of 1-bit Large Language Models (LLMs), enabling faster and more energy-efficient deployment on various devices. The new software achieves significant speedups, ranging from 1.37x to 6.17x, on both x86 and ARM CPUs across different model sizes.

Jailbreaking LLM-Controlled Robots

Researchers have developed an algorithm called RoboPAIR that can "jailbreak" large language model (LLM)-controlled robots, eliciting harmful physical actions in various scenarios. The study demonstrates the vulnerability of LLMs in robotics, revealing the potential for physical damage and highlighting the need to address this emerging risk for safe deployment.

Exo 2: Growing a Scheduling Language

User-schedulable languages (USLs) allow programmers to optimize programs safely, but current USLs lack a universal approach to control and automation. Exo 2 is a scheduling language that addresses this issue by enabling users to define new scheduling operations externally, allowing for the creation of custom scheduling libraries that can improve performance.

WiFlexFormer: Efficient WiFi-Based Person-Centric Sensing

WiFlexFormer is a highly efficient Transformer-based architecture for WiFi Channel State Information-based person-centric sensing, achieving comparable Human Activity Recognition performance to state-of-the-art models with significantly fewer parameters and faster inference times. It offers real-time inference capabilities and improved cross-domain generalization, making it a potential solution for efficient WiFi-based sensing applications.

Injection Attacks Against End-to-End Encrypted Applications

Researchers have discovered a vulnerability in end-to-end encrypted messaging apps, such as WhatsApp and Signal, where an attacker can inject malicious content and then use the length of the resulting encrypted backups to infer confidential information. The study found proof-of-concept attacks that could recover message or attachment data from WhatsApp and metadata from Signal, highlighting the need for stronger security measures in these applications.

Code

LlamaChunk: Better RAG Chunking Than LlamaIndex

Researchers have developed a novel method for chunking documents in Retrieval-Augmented Generation (RAG) applications using the Llama-70B model, called LlamaChunk, which outperforms existing chunking strategies by semantically grouping content without requiring manual tuning or regex. The LlamaChunk algorithm uses a special token to separate content into relevant groupings, and its performance was evaluated on the LegalBenchConsumerContractsQA dataset alongside two other chunking strategies.

Show HN: Generate short videos with one click using AI LLM

MoneyPrinterTurbo是一款可以自动生成视频文案、视频素材、视频字幕、视频背景音乐，并合成高清短视频的工具。它支持多种视频尺寸、批量视频生成、视频片段时长设置、中文和英文视频文案、多种语音合成等功能。

Applied AI Repo – How Organizations Are Adopting AI/ML

This repository is a curated collection of artificial intelligence and machine learning use cases, best practices, and lessons learned from leading technology companies. It includes blog posts and articles from companies such as Airbnb, Algolia, and eBay, covering topics like deep learning, natural language processing, and computer vision.

Show HN: Hacker News Summarizer with Gemini Nano

HN Summariser is a Chrome extension that uses Google's Gemini Nano model to summarize top comments on Hacker News, allowing users to quickly grasp the gist of discussions without having to read through each comment. The extension is still in development and has some limitations, but it aims to make information consumption easier and more efficient.

Seer: A GUI front end to GDB for Linux

Seer is a graphical user interface (GUI) frontend for the GNU Debugger (gdb) on Linux, aiming to provide a simple and user-friendly interface for debugging. It can be installed from a package manager or built from source, requiring Linux, C++17, gdb with the "mi" interpreter, CMake, and QT6.