Sunday — January 19, 2025

AmazonBot overloads servers despite `robots.txt` attempts, Microsoft reveals eight lessons from red teaming 100 generative AI products, and BrowserAI enables running large language models directly in the browser.

News

Amazon's AI crawler is making my Git server unstable

The author is pleading with Amazon to block their Gitea server from AmazonBot, as the bot's constant requests are overloading their server and they are unable to effectively block it despite configuring their robots.txt file and attempting to filter out the bot's user agent. The author has since moved their server behind a VPN and is working on a proof-of-work reverse proxy to protect it from bots, with a prototype called Anubis already available.

Using ChatGPT is not bad for the environment

Claims about the environmental impact of large language models (LLMs) like ChatGPT, such as high emissions and water usage, are often exaggerated and misleading, creating a distorted picture of their actual climate effects. The focus on individual actions like avoiding LLMs distracts from the more significant issue of transitioning the energy sector to renewables and addressing systemic climate problems.

Perplexity AI submits bid to merge with TikTok

Perplexity AI has submitted a bid to merge with TikTok, with the goal of giving the video app a new corporate home amidst a looming ban in the United States. The proposed merger would create a new entity combining Perplexity, TikTok US, and new equity partners, allowing most investors in TikTok's parent company ByteDance to retain their equity.

Under new law, cops bust famous cartoonist for AI-generated CSAM

A Pulitzer-prize-winning cartoonist, Darrin Bell, has been arrested under a new California law for possessing AI-generated child sex abuse images. The law, which took effect on January 1, considers AI-generated CSAM to be harmful even if it doesn't depict real victims, as it can still be used to groom children and perpetuate abuse.

Pat-Tastrophe:Leaked GitHub Token Could Cripple Virtuals' $4.6B AI&Crypto Empire

AIXBT, an AI agent in the cryptocurrency space, has a market cap of $641M and is correct in its market predictions 83% of the time, and is one of 12,000+ AI agents running on the $4.6 billion Virtuals platform. Security researchers discovered a vulnerability in Virtuals by obtaining a valid GitHub Personal Access Token, which led to the exposure of active AWS keys, Pinecone credentials, and OpenAI tokens, allowing potential control over the AI agents and their cryptocurrency wallets.

Research

Predicting Human Brain States with Transformer

Researchers used functional magnetic resonance imaging (fMRI) and transformer architecture to predict human brain resting states, achieving accurate predictions up to 5.04 seconds based on the previous 21.6 seconds of data. The results demonstrate the potential for developing generative models that learn the functional organization of the human brain, with the generated fMRI brain states reflecting the architecture of the functional connectome.

Lessons from Red Teaming 100 Generative AI Products

Microsoft's experience red teaming over 100 generative AI products has yielded eight key lessons, including the importance of understanding system capabilities, the role of human elements, and the limitations of automation in identifying risks. The company shares these insights, along with case studies and practical recommendations, to help align red teaming efforts with real-world risks and address the ongoing challenges of securing AI systems.

Lessons from Red Teaming 100 Generative AI Products

Byzantine Fault Tolerance in Distributed Machine Learning: A Survey

Byzantine Fault Tolerance (BFT) is a significant challenge in Distributed Machine Learning (DML) due to its ability to generate arbitrary data, making it difficult to deal with malicious components. This paper presents a survey of recent work on BFT in DML, particularly in first-order optimization methods like Stochastic Gradient Descent, and provides a classification of BFT approaches based on criteria such as communication process and optimization method.

Predicting Human Brain States with Transformer

Code

Yek: Serialize your code repo (or part of it) to feed into any LLM

Yek is a fast, Rust-based tool that reads text-based files in a repository or directory, chunks them, and serializes them for Large Language Model (LLM) consumption, with features like automatic ignore patterns and configurable output. The tool can be installed via Homebrew, an install script, or from source, and has a simple usage with options for customizing output and behavior.

i18n-ai-translate: automate i18n JSONs with ChatGPT/Gemini/Ollama

The i18n-ai-translate tool leverages AI models like ChatGPT, Gemini, or Ollama to seamlessly translate localization files, supporting directories of nested translation files in i18next-style JSON format. It can be used as a GitHub Action, run directly from the command line, or integrated as a library in a project to translate files to various languages.

Show HN: ReProm – CLI to bundle code and structure into one Markdown for AI

There is no text to summarize. The input appears to be an error message indicating that a README file could not be retrieved.

Run LLM's in the Browser

BrowserAI is an open-source project that allows running large language models (LLMs) directly in the browser, providing a private, cost-effective, and fast way to build AI-powered applications. It features a simple API, supports multiple engines, and includes pre-configured popular models, with demos available for chat, voice chat, and other use cases.

ClaudeSync automates the synchronization of local files with Claude.ai Projects

ClaudeSync is an open-source tool that bridges local development environments with Claude.ai projects, enabling seamless synchronization to enhance AI-powered workflows. It offers features such as file sync, cross-platform compatibility, and configurability, but users must acknowledge that it is not affiliated with Anthropic or Claude.ai and use it at their own risk.