Wednesday — June 26, 2024
Big tech's 'open-source' AI models are misleading, EGAIR lobbies for strict AI data regulations, Sohu's ASIC outperforms GPUs, new CPU techniques boost inference speeds over 20x, and FiddleCube generates Q&A datasets for LLMs.
News
Not all 'open source' AI models are open: here's a ranking
Many so-called ‘open-source’ AI models restrict access to their code and training data, creating a false sense of openness. An analysis of popular LLMs, including those from Meta and Microsoft, revealed that only smaller players seem to fully embrace openness. The study created a ranking based on 14 parameters like code availability and model transparency. It found that models from big tech often fail to disclose crucial details, potentially misleading the public and researchers. This issue is especially pertinent with the upcoming EU AI Act, which will treat “open source” models more leniently.
EGAIR – European Guild for Artificial Intelligence Regulation
A group of European artists, creatives, and associations proposes strict regulations on how AI companies can use data for training models. Their manifesto calls for AI data usage to require explicit consent from data owners, aligning with GDPR principles and introducing a "training right." They also advocate for prohibitions on using unlicensed names, works, and media for AI training. The group highlights the privacy violations, potential identity thefts, and economic threats associated with AI's current unregulated use of sensitive data. Their efforts include lobbying EU institutions for regulation changes, working with legal experts, and collaborating with other international groups like the Concept Art Association (CAA) in the US.
Sohu: The First Transformer ASIC
In 2022, Sohu was developed as the first specialized ASIC for transformers, massively outperforming traditional GPUs. Sohu's throughput exceeds 500,000 tokens/second for Llama 70B models, outpacing even NVIDIA's next-gen GPUs in speed and cost-efficiency. The specialization to transformer models like ChatGPT and Gemini gives Sohu unprecedented performance, but locks it into that architecture. If transformers remain dominant, Sohu could revolutionize AI scalability.
Research
Inference Acceleration for Large Language Models on CPUs
The study explores efficiently accelerating the inference of large language models using CPUs by exploiting parallel processing and batching. The proposed approach results in an 18-22x improvement in tokens per second, especially with longer sequences and larger models.
Automated Large Language Models Reasoning with Bidirectional Chaining
Bi-Chainer, a new bidirectional chaining method, overcomes the limitations of traditional unidirectional chaining methods in LLMs, such as low prediction accuracy and efficiency issues. It dynamically switches to depth-first reasoning in the opposite direction when encountering multiple branching options, thus leveraging intermediate results for guidance.
Warp: On the Benefits of Weight Averaged Rewarded Policies
RLHF aligns LLMs by training on human preferences, often using KL regularization to retain pre-trained knowledge. This, however, limits reward optimization. To address this, WARP introduces a new strategy by merging policies in the weight space via three methods: using an exponential moving average as a KL anchor, spherical interpolation to create enhanced merged models, and linear interpolation to retain pre-trained features. Iterative application of WARP refines the KL-reward trade-off, enhancing LLM performance.
Code
FiddleCube – Generate Q&A to test your LLM
FiddleCube generates ideal question-answer datasets for testing, evaluating, or training LLMs using vector embeddings from your RAG knowledge corpus. It produces diverse and accurate QnAs, encompassing various question types including complex reasoning and safety alignment. The dataset is automatically updated with any prompt or RAG changes, ensuring consistent relevance. Upcoming features include multi-turn conversations, evaluation metrics integration, CI/CD pipeline support, and failure diagnosis.
ControlFlow – open-source AI workflows
ControlFlow is a Python framework designed for building structured, controllable AI workflows. It lets you define tasks, assign specialized AI agents to those tasks, and combine tasks into complex flows. With a task-centric architecture, support for type-safe outputs via Pydantic models, and the ability to deploy task-specific agents, ControlFlow ensures precise control and transparency in AI-driven processes. Built on Prefect 3.0, it also offers robust observability and debugging capabilities, enabling integration with existing tools and workflows.