Machine Learning
Breaking News
LLMs Can Now Talk in Real-Time with Minimal Latency: Chinese Researchers...
Researchers at the Institute of Computing Technology, Chinese Academy of Sciences, have introduced LLaMA-Omni2, a family of speech-capable large language models (SpeechLMs) now available...
RWKV-X Combines Sparse Attention and Recurrent Memory to Enable Efficient 1M-Token...
LLMs built on Transformer architectures face significant scaling challenges due to their quadratic complexity in sequence length when processing long-context inputs. Methods like Linear...
Scaling Reinforcement Learning Beyond Math: Researchers from NVIDIA AI and CMU...
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities across diverse tasks, with Reinforcement Learning (RL) serving as a crucial mechanism for refining their...
Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary...
LLMs have shown impressive promise in conducting diagnostic conversations, particularly through text-based interactions. However, their evaluation and application have largely ignored the multimodal nature...
IBM AI Releases Granite 4.0 Tiny Preview: A Compact Open-Language Model...
IBM has introduced a preview of Granite 4.0 Tiny, the smallest member of its upcoming Granite 4.0 family of language models. Released under the...
LLMs Can Now Reason in Parallel: UC Berkeley and UCSF Researchers...
Large language models (LLMs) have made significant strides in reasoning capabilities, exemplified by breakthrough systems like OpenAI o1 and DeepSeekR1, which utilize test-time compute...
LLMs Can Learn Complex Math from Just One Example: Researchers from...
Recent advancements in LLMs such as OpenAI-o1, DeepSeek-R1, and Kimi-1.5 have significantly improved their performance on complex mathematical reasoning tasks. Reinforcement Learning with Verifiable...
JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks
JetBrains has officially open-sourced Mellum, a purpose-built 4-billion-parameter language model tailored for software development tasks. Developed from the ground up, Mellum reflects JetBrains’ engineering-first...
Training LLM Agents Just Got More Stable: Researchers Introduce StarPO-S and...
Large language models (LLMs) face significant challenges when trained as autonomous agents in interactive environments. Unlike static tasks, agent settings require sequential decision-making, cross-turn...
Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models...
With rising demand for AI systems that can handle tasks involving multi-step logic, mathematical proofs, and software development, researchers have turned their attention toward...
Building the Internet of Agents: A Technical Dive into AI Agent...
As large language model (LLM) agents gain traction across enterprise and research ecosystems, a foundational gap has emerged: communication. While agents today can autonomously...
DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal...
Formal mathematical reasoning has evolved into a specialized subfield of artificial intelligence that requires strict logical consistency. Unlike informal problem solving, which allows for...
Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that...
Despite notable advancements in large language models (LLMs), effective performance on reasoning-intensive tasks—such as mathematical problem solving, algorithmic planning, or coding—remains constrained by model...
Meta AI Introduces ReasonIR-8B: A Reasoning-Focused Retriever Optimized for Efficiency and...
Addressing the Challenges in Reasoning-Intensive Retrieval
Despite notable progress in retrieval-augmented generation (RAG) systems, retrieving relevant information for complex, multi-step reasoning tasks remains a significant...
ThinkPRM: A Generative Process Reward Models for Scalable Reasoning Verification
Reasoning with LLMs can benefit from utilizing more test compute, which depends on high-quality process reward models (PRMs) to select promising paths for search...
Alibaba Qwen Team Just Released Qwen3: The Latest Generation of Large...
Despite the remarkable progress in large language models (LLMs), critical challenges remain. Many models exhibit limitations in nuanced reasoning, multilingual proficiency, and computational efficiency....
ByteDance Introduces QuaDMix: A Unified AI Framework for Data Quality and...
The pretraining efficiency and generalization of large language models (LLMs) are significantly influenced by the quality and diversity of the underlying training corpus. Traditional...
Optimizing Reasoning Performance: A Comprehensive Analysis of Inference-Time Scaling Methods in...
Language models have shown great capabilities across various tasks. However, complex reasoning remains challenging as it often requires additional computational resources and specialized techniques....
This AI Paper from China Proposes a Novel Training-Free Approach DEER...
Recent progress in large reasoning language models (LRLMs), such as DeepSeek-R1 and GPT-O1, has greatly improved complex problem-solving abilities by extending the length of...
AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real...
Designing and evaluating web interfaces is one of the most critical tasks in today’s digital-first world. Every change in layout, element positioning, or navigation...
Google DeepMind Research Introduces QuestBench: Evaluating LLMs’ Ability to Identify Missing...
Large language models (LLMs) have gained significant traction in reasoning tasks, including mathematics, logic, planning, and coding. However, a critical challenge emerges when applying...
Mila & Universite de Montreal Researchers Introduce the Forgetting Transformer (FoX)...
Transformers have revolutionized sequence modeling by introducing an architecture that handles long-range dependencies efficiently without relying on recurrence. Their ability to process input tokens...
NVIDIA AI Releases OpenMath-Nemotron-32B and 14B-Kaggle: Advanced AI Models for Mathematical...
Mathematical reasoning has long presented a formidable challenge for AI, demanding not only an understanding of abstract concepts but also the ability to perform...
Sequential-NIAH: A Benchmark for Evaluating LLMs in Extracting Sequential Information from...
Evaluating how well LLMs handle long contexts is essential, especially for retrieving specific, relevant information embedded in lengthy inputs. Many recent LLMs—such as Gemini-1.5,...
Muon Optimizer Significantly Accelerates Grokking in Transformers: Microsoft Researchers Explore Optimizer...
Revisiting the Grokking Challenge
In recent years, the phenomenon of grokking—where deep learning models exhibit a delayed yet sudden transition from memorization to generalization—has prompted...
LLMs Can Now Learn without Labels: Researchers from Tsinghua University and...
Despite significant advances in reasoning capabilities through reinforcement learning (RL), most large language models (LLMs) remain fundamentally dependent on supervised data pipelines. RL frameworks...
LLMs Can Now Retain High Accuracy at 2-Bit Precision: Researchers from...
LLMs show impressive capabilities across numerous applications, yet they face challenges due to computational demands and memory requirements. This challenge is acute in scenarios...
Long-Context Multimodal Understanding No Longer Requires Massive Models: NVIDIA AI Introduces...
In recent years, vision-language models (VLMs) have advanced significantly in bridging image, video, and textual modalities. Yet, a persistent limitation remains: the inability to...
OpenAI Releases a Practical Guide to Identifying and Scaling AI Use...
As the deployment of artificial intelligence accelerates across industries, a recurring challenge for enterprises is determining how to operationalize AI in a way that...
ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with...
Reinforcement learning (RL) is a powerful technique for enhancing the reasoning capabilities of LLMs, enabling them to develop and refine long Chain-of-Thought (CoT). Models...
LLMs Can Think While Idle: Researchers from Letta and UC Berkeley...
Large language models (LLMs) have gained prominence for their ability to handle complex reasoning tasks, transforming applications from chatbots to code-generation tools. These models...
LLMs Can Be Misled by Surprising Data: Google DeepMind Introduces New...
Large language models (LLMs) are continually evolving by ingesting vast quantities of text data, enabling them to become more accurate predictors, reasoners, and conversationalists....
Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC...
Fourier Neural Operators (FNO) are powerful tools for learning partial differential equation solution operators, but lack architecture-aware optimizations, with their Fourier layer executing FFT,...
Meta AI Introduces Collaborative Reasoner (Coral): An AI Framework Specifically Designed...
Rethinking the Problem of Collaboration in Language Models
Large language models (LLMs) have demonstrated remarkable capabilities in single-agent tasks such as question answering and structured...
NVIDIA Introduces CLIMB: A Framework for Iterative Data Mixture Optimization in...
Challenges in Constructing Effective Pretraining Data Mixtures
As large language models (LLMs) scale in size and capability, the choice of pretraining data remains a critical...
LLMs Can Now Solve Challenging Math Problems with Minimal Data: Researchers...
Language models have made significant strides in tackling reasoning tasks, with even small-scale supervised fine-tuning (SFT) approaches such as LIMO and s1 demonstrating remarkable...
LLMs Can Now Learn to Try Again: Researchers from Menlo Introduce...
The domain of LLMs has rapidly evolved to include tools that empower these models to integrate external knowledge into their reasoning processes. A significant...
IBM Releases Granite 3.3 8B: A New Speech-to-Text (STT) Model that...
As artificial intelligence continues to integrate into enterprise systems, the demand for models that combine flexibility, efficiency, and transparency has increased. Existing solutions often...
Do Reasoning Models Really Need Transformers?: Researchers from TogetherAI, Cornell, Geneva,...
Effective reasoning is crucial for solving complex problems in fields such as mathematics and programming, and LLMs have demonstrated significant improvements through long-chain-of-thought reasoning....
Model Performance Begins with Data: Researchers from Ai2 Release DataDecide—A Benchmark...
The Challenge of Data Selection in LLM Pretraining
Developing large language models entails substantial computational investment, especially when experimenting with alternative pretraining corpora. Comparing datasets...
SyncSDE: A Probabilistic Framework for Task-Adaptive Diffusion Synchronization in Collaborative Generation
Diffusion models have demonstrated significant success across various generative tasks, including image synthesis, 3D scene creation, video generation, and human motion modeling. However, their...
MIT Researchers Introduce DISCIPL: A Self-Steering Framework Using Planner and Follower...
Language models predict sequences of words based on vast datasets and are increasingly expected to reason and perform complex linguistic manipulations. Yet, despite their...
Transformers Can Now Predict Spreadsheet Cells without Fine-Tuning: Researchers Introduce TabPFN...
Tabular data is widely utilized in various fields, including scientific research, finance, and healthcare. Traditionally, machine learning models such as gradient-boosted decision trees have...
SQL-R1: A Reinforcement Learning-based NL2SQL Model that Outperforms Larger Systems in...
Natural language interface to databases is a growing focus within artificial intelligence, particularly because it allows users to interact with structured databases using plain...
From Logic to Confusion: MIT Researchers Show How Simple Prompt Tweaks...
Large language models are increasingly used to solve math problems that mimic real-world reasoning tasks. These models are tested for their ability to answer...
LLM Reasoning Benchmarks are Statistically Fragile: New Study Shows Reinforcement Learning...
Reasoning capabilities have become central to advancements in large language models, crucial in leading AI systems developed by major research labs. Despite a surge...
Reflection Begins in Pre-Training: Essential AI Researchers Demonstrate Early Emergence of...
What sets large language models (LLMs) apart from traditional methods is their emerging capacity to reflect—recognizing when something in their response doesn’t align with...
Transformers Gain Robust Multidimensional Positional Understanding: University of Manchester Researchers Introduce...
Transformers have emerged as foundational tools in machine learning, underpinning models that operate on sequential and structured data. One critical challenge in this setup...
Multimodal Models Don’t Need Late Fusion: Apple Researchers Show Early-Fusion Architectures...
Multimodal artificial intelligence faces fundamental challenges in effectively integrating and processing diverse data types simultaneously. Current methodologies predominantly rely on late-fusion strategies, where separately...
Underdamped Diffusion Samplers Outperform Traditional Methods: Researchers from Karlsruhe Institute of...
Diffusion processes have emerged as promising approaches for sampling from complex distributions but face significant challenges when dealing with multimodal targets. Traditional methods based...
Foundation Models No Longer Need Prompts or Labels: EPFL Researchers Introduce...
Foundation models, often massive neural networks trained on extensive text and image data, have significantly shifted how artificial intelligence systems handle language and vision...
Reasoning Models Know When They’re Right: NYU Researchers Introduce a Hidden-State...
Artificial intelligence systems have made significant strides in simulating human-style reasoning, particularly mathematics and logic. These models don't just generate answers—they walk through a...
NVIDIA AI Releases UltraLong-8B: A Series of Ultra-Long Context Language Models...
Large language mdoels LLMs have shown remarkable performance across diverse text and multimodal tasks. However, many applications, such as document and video understanding, in-context...
LightPROF: A Lightweight AI Framework that Enables Small-Scale Language Models to...
Large Language Models (LLMs) have revolutionized natural language processing, with abilities on complex zero-shot tasks through extensive training data and vast parameters. However, LLMs...
Google AI Introduce the Articulate Medical Intelligence Explorer (AMIE): A Large...
Developing an accurate differential diagnosis (DDx) is a fundamental part of medical care, typically achieved through a step-by-step process that integrates patient history, physical...
Step by Step Coding Guide to Build a Neural Collaborative Filtering...
This tutorial will walk you through using PyTorch to implement a Neural Collaborative Filtering (NCF) recommendation system. NCF extends traditional matrix factorisation by using...
This AI Paper from Salesforce Introduces VLM2VEC and MMEB: A Contrastive...
Multimodal embeddings combine visual and textual data into a single representational space, enabling systems to understand and relate images and language meaningfully. These embeddings...
LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA,...
HIGGS — the innovative method for compressing large language models was developed in collaboration with teams at Yandex Research, MIT, KAUST and ISTA.
HIGGS makes...
Nvidia Released Llama-3.1-Nemotron-Ultra-253B-v1: A State-of-the-Art AI Model Balancing Massive Scale, Reasoning...
As AI adoption increases in digital infrastructure, enterprises and developers face mounting pressure to balance computational costs with performance, scalability, and adaptability. The rapid...
Balancing Accuracy and Efficiency in Language Models: A Two-Phase RL Post-Training...
Recent advancements in LLMs have significantly enhanced their reasoning capabilities, particularly through RL-based fine-tuning. Initially trained with supervised learning for token prediction, these models...
RoR-Bench: Revealing Recitation Over Reasoning in Large Language Models Through Subtle...
In recent years, the rapid progress of LLMs has given the impression that we are nearing the achievement of Artificial General Intelligence (AGI), with...
Boson AI Introduces Higgs Audio Understanding and Higgs Audio Generation: An...
In today’s enterprise landscape—especially in insurance and customer support —voice and audio data are more than just recordings; they’re valuable touchpoints that can transform...
T* and LV-Haystack: A Spatially-Guided Temporal Search Framework for Efficient Long-Form...
Understanding long-form videos—ranging from minutes to hours—presents a major challenge in computer vision, especially as video understanding tasks expand beyond short clips. One of...
Unveiling Attention Sinks: The Functional Role of First-Token Focus in Stabilizing...
LLMs often show a peculiar behavior where the first token in a sequence draws unusually high attention—known as an "attention sink." Despite seemingly unimportant,...
TorchSim: A Next-Generation PyTorch-Native Atomistic Simulation Engine for the MLIP Era
Radical AI has released TorchSim, a next-generation PyTorch-native atomistic simulation engine for the MLIP era. It accelerates materials simulation by orders of magnitude, transforming...
Salesforce AI Released APIGen-MT and xLAM-2-fc-r Model Series: Advancing Multi-Turn Agent...
AI agents quickly become core components in handling complex human interactions, particularly in business environments where conversations span multiple turns and involve task execution,...
Huawei Noah’s Ark Lab Released Dream 7B: A Powerful Open Diffusion Reasoning Model with...
LLMs have revolutionized artificial intelligence, transforming various applications across industries. Autoregressive (AR) models dominate current text generation, with leading systems like GPT-4, DeepSeek, and...
This AI Paper from ByteDance Introduces MegaScale-Infer: A Disaggregated Expert Parallelism...
Large language models are built on transformer architectures and power applications like chat, code generation, and search, but their growing scale with billions of...
A Code Implementation to Use Ollama through Google Colab and Building...
In this tutorial, we’ll build a fully functional Retrieval-Augmented Generation (RAG) pipeline using open-source tools that run seamlessly on Google Colab. First, we will...
This AI Paper Introduces Inference-Time Scaling Techniques: Microsoft’s Deep Evaluation of...
Large language models are often praised for their linguistic fluency, but a growing area of focus is enhancing their reasoning ability—especially in contexts where...
Scalable and Principled Reward Modeling for LLMs: Enhancing Generalist Reward Models...
Reinforcement Learning RL has become a widely used post-training method for LLMs, enhancing capabilities like human alignment, long-term reasoning, and adaptability. A major challenge,...
Transformer Meets Diffusion: How the Transfusion Architecture Empowers GPT-4o’s Creativity
OpenAI’s GPT-4o represents a new milestone in multimodal AI: a single model capable of generating fluent text and high-quality images in the same output...
This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability...
While the outputs of large language models (LLMs) appear coherent and useful, the underlying mechanisms guiding these behaviors remain largely unknown. As these models...
Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and...
A key advancement in AI capabilities is the development and use of chain-of-thought (CoT) reasoning, where models explain their steps before reaching an answer....
Reducto AI Released RolmOCR: A SoTA OCR Model Built on Qwen...
Optical Character Recognition (OCR) has long been a cornerstone of document digitization, enabling the transformation of printed text into machine-readable formats. However, traditional OCR...
Scalable Reinforcement Learning with Verifiable Rewards: Generative Reward Modeling for Unstructured,...
Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective in enhancing LLMs' reasoning and coding abilities, particularly in domains where structured reference answers allow...
A Code Implementation to Building a Context-Aware AI Assistant in Google...
In this hands-on tutorial, we bring the core principles of the Model Context Protocol (MCP) to life by implementing a lightweight, context-aware AI assistant...
This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost...
Sparse autoencoders are central tools in analyzing how large language models function internally. Translating complex internal states into interpretable components allows researchers to break...
NVIDIA AI Releases HOVER: A Breakthrough AI for Versatile Humanoid Control...
The future of robotics has advanced significantly. For many years, there have been expectations of human-like robots that can navigate our environments, perform complex...
Meet Open-Qwen2VL: A Fully Open and Compute-Efficient Multimodal Large Language Model
Multimodal Large Language Models (MLLMs) have advanced the integration of visual and textual modalities, enabling progress in tasks such as image captioning, visual question...
Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual...
Automatic speech recognition (ASR) technologies have advanced significantly, yet notable disparities remain in their ability to accurately recognize diverse languages. Prominent ASR systems, such...
This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with...
Large language models have transformed how machines comprehend and generate text, especially in complex problem-solving areas like mathematical reasoning. These systems, known as R1-like...
Introduction to MCP: The Ultimate Guide to Model Context Protocol for...
The Model Context Protocol (MCP) is an open standard (open-sourced by Anthropic) that defines a unified way to connect AI assistants (LLMs) with external...
UB-Mesh: A Cost-Efficient, Scalable Network Architecture for Large-Scale LLM Training
As LLMs scale, their computational and bandwidth demands increase significantly, posing challenges for AI training infrastructure. Following scaling laws, LLMs improve comprehension, reasoning, and...
Snowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source...
Text-to-SQL translation, the task of transforming natural language queries into structured SQL statements, is essential for facilitating user-friendly database interactions. However, the task involves...
Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict...
The advancement of large language models (LLMs) has significantly influenced interactive technologies, presenting both benefits and challenges. One prominent issue arising from these models...
Enhancing Strategic Decision-Making in Gomoku Using Large Language Models and Reinforcement...
LLMs have significantly advanced NLP, demonstrating strong text generation, comprehension, and reasoning capabilities. These models have been successfully applied across various domains, including education,...
Open AI Releases PaperBench: A Challenging Benchmark for Assessing AI Agents’...
The rapid progress in artificial intelligence (AI) and machine learning (ML) research underscores the importance of accurately evaluating AI agents' capabilities in replicating complex,...
Mitigating Hallucinations in Large Vision-Language Models: A Latent Space Steering Approach
Hallucination remains a significant challenge in deploying Large Vision-Language Models (LVLMs), as these models often generate text misaligned with visual inputs. Unlike hallucination in...
Nomic Open Sources State-of-the-Art Multimodal Embedding Model
Nomic has announced the release of "Nomic Embed Multimodal," a groundbreaking embedding model that achieves state-of-the-art performance on visual document retrieval tasks. The new...
Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which...
Large Language Models (LLMs) significantly benefit from attention mechanisms, enabling the effective retrieval of contextual information. Nevertheless, traditional attention methods primarily depend on single...
A Comprehensive Guide to LLM Routing: Tools and Frameworks
Deploying LLMs presents challenges, particularly in optimizing efficiency, managing computational costs, and ensuring high-quality performance. LLM routing has emerged as a strategic solution to...
Meet Amazon Nova Act: An AI Agent that can Automate Web...
Amazon has revealed a new artificial intelligence (AI) model called Amazon Nova Act. This AI agent is designed to operate and take actions within...
DeltaProduct: An AI Method that Balances Expressivity and Efficiency of the...
The Transformer architecture revolutionised natural language processing with its self-attention mechanism, enabling parallel computation and effective context retrieval. However, Transformers face significant limitations when...
This AI Paper from ByteDance Introduces a Hybrid Reward System Combining...
Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning LLMs with human values and preferences. Despite introducing non-RL alternatives like DPO, industry-leading models...
Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason...
Large language models (LLMs) have demonstrated significant progress across various tasks, particularly in reasoning capabilities. However, effectively integrating reasoning processes with external search operations...
How to Build a Prototype X-ray Judgment Tool (Open Source Medical...
In this tutorial, we demonstrate how to build a prototype X-ray judgment tool using open-source libraries in Google Colab. By leveraging the power of...
Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights...
Reinforcement Learning from Verifiable Rewards (RLVR) has recently emerged as a promising method for enhancing reasoning abilities in language models without direct supervision. This...
NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that...
Large language models (LLMs) have become vital across domains, enabling high-performance applications such as natural language generation, scientific research, and conversational agents. Underneath these...
This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement...
Supervised fine-tuning (SFT) is the standard training paradigm for large language models (LLMs) and graphic user interface (GUI) agents. However, SFT demands high-quality labeled...