Vineet Kumar, Author at MarkTechPost https://www.marktechpost.com/author/vineet1897/ An Artificial Intelligence News Platform Tue, 04 Mar 2025 20:25:46 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://www.marktechpost.com/wp-content/uploads/2022/04/cropped-Favicon-512-x-512-1-1-32x32.png Vineet Kumar, Author at MarkTechPost https://www.marktechpost.com/author/vineet1897/ 32 32 127842392 Project Alexandria: Democratizing Scientific Knowledge Through Structured Fact Extraction with LLMs https://www.marktechpost.com/2025/03/04/project-alexandria-democratizing-scientific-knowledge-through-structured-fact-extraction-with-llms/ https://www.marktechpost.com/2025/03/04/project-alexandria-democratizing-scientific-knowledge-through-structured-fact-extraction-with-llms/#respond Tue, 04 Mar 2025 20:25:38 +0000 https://www.marktechpost.com/?p=69494 Scientific publishing has expanded significantly in recent decades, yet access to crucial research remains restricted for many, particularly in developing countries, independent researchers, and small academic institutions. The rising costs of journal subscriptions exacerbate this disparity, limiting the availability of knowledge even in well-funded universities. Despite the push for Open Access (OA), barriers persist, as […]

The post Project Alexandria: Democratizing Scientific Knowledge Through Structured Fact Extraction with LLMs appeared first on MarkTechPost.

]]>
Scientific publishing has expanded significantly in recent decades, yet access to crucial research remains restricted for many, particularly in developing countries, independent researchers, and small academic institutions. The rising costs of journal subscriptions exacerbate this disparity, limiting the availability of knowledge even in well-funded universities. Despite the push for Open Access (OA), barriers persist, as demonstrated by large-scale access losses in Germany and the U.S. due to price disputes with publishers. This limitation hinders scientific progress, leading researchers to explore alternative methods for making scientific knowledge more accessible while navigating copyright constraints.

Current methods of accessing scientific content primarily involve direct subscriptions, institutional access, or reliance on legally ambiguous repositories. These approaches are either financially unsustainable or legally contentious. While OA publishing helps, it does not fully resolve the accessibility crisis. Large Language Models (LLMs) offer a new avenue for extracting and summarizing knowledge from scholarly texts, but their use raises copyright concerns. The challenge lies in separating factual content from the creative expressions protected under copyright law.

To address this, the research team proposes Project Alexandria, which introduces Knowledge Units (KUs) as a structured format for extracting factual information while omitting stylistic elements. KUs encode key scientific insights—such as definitions, relationships, and methodological details—in a structured database, ensuring that only non-copyrightable factual content is preserved. This framework aligns with legal principles like the idea-expression dichotomy, which states that facts cannot be copyrighted, only their specific phrasing and presentation.

Reference: https://arxiv.org/pdf/2502.19413

Knowledge Units are generated through an LLM pipeline that processes scholarly texts in paragraph-sized segments, extracting core concepts and their relationships. Each KU contains:

  • Entities: Core scientific concepts identified in the text.
  • Relationships: Connections between entities, including causal or definitional links.
  • Attributes: Specific details related to entities.
  • Context summary: A brief summary ensuring coherence across multiple KUs.
  • Sentence MinHash: A fingerprint to track the source text without storing the original phrasing.

This structured approach balances knowledge retention with legal defensibility. Paragraph-level segmentation ensures optimal granularity—too small, and information is scattered; too large, and LLM performance degrades.

From a legal standpoint, the framework complies with both German and U.S. copyright laws. German law explicitly excludes facts from copyright protection and allows data mining under specific exemptions. Similarly, the U.S. Fair Use doctrine permits transformative uses like text and data mining, provided they do not harm the market value of the original work. The research team demonstrates that KUs satisfy these legal conditions by excluding expressive elements while preserving factual content.

To evaluate the effectiveness of KUs, the team conducted multiple-choice question (MCQ) tests using abstracts and full-text articles from biology, physics, mathematics, and computer science. The results show that LLMs using KUs achieve nearly the same accuracy as those given the original texts. This suggests that the vast majority of relevant information is retained despite the removal of expressive elements. Furthermore, plagiarism detection tools confirm minimal overlap between KUs and the original texts, reinforcing the method’s legal viability.

Beyond legal considerations, the research explores the limitations of existing alternatives. Text embeddings, commonly used for knowledge representation, fail to capture precise factual details, making them unsuitable for scientific knowledge extraction. Direct paraphrasing methods risk maintaining too much similarity to the original text, potentially violating copyright laws. In contrast, KUs provide a more structured and legally sound approach.

The study also addresses common criticisms. While some argue that citation dilution could result from extracting knowledge into databases, traceable attribution systems can mitigate this concern. Others worry that nuances in scientific research may be lost, but the team highlights that most complex elements—like mathematical proofs—are not copyrightable to begin with. Concerns about potential legal risks and hallucination propagation are acknowledged, with recommendations for hybrid human-AI validation systems to enhance reliability.

The broader impact of freely accessible scientific knowledge extends across multiple sectors. Researchers can collaborate more effectively across disciplines, healthcare professionals can access critical medical research more efficiently, and educators can develop high-quality curricula without cost barriers. Additionally, open scientific knowledge promotes public trust and transparency, reducing misinformation and enabling informed decision-making.

Moving forward, the team identifies several research directions, including refining factual accuracy through cross-referencing, developing educational applications for KU-based knowledge dissemination, and establishing interoperability standards for knowledge graphs. They also propose integrating KUs into a broader semantic web for scientific discovery, leveraging AI to automate and validate extracted knowledge at scale.

In summary, Project Alexandria presents a promising framework for making scientific knowledge more accessible while respecting copyright constraints. By systematically extracting factual content from scholarly texts and structuring it into Knowledge Units, this approach provides a legally viable and technically effective solution to the accessibility crisis in scientific publishing. Extensive testing demonstrates its potential for preserving critical information without violating copyright laws, positioning it as a significant step toward democratizing access to knowledge in the scientific community.


Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post Project Alexandria: Democratizing Scientific Knowledge Through Structured Fact Extraction with LLMs appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/03/04/project-alexandria-democratizing-scientific-knowledge-through-structured-fact-extraction-with-llms/feed/ 0 69494
NeoBERT: Modernizing Encoder Models for Enhanced Language Understanding https://www.marktechpost.com/2025/03/03/neobert-modernizing-encoder-models-for-enhanced-language-understanding/ https://www.marktechpost.com/2025/03/03/neobert-modernizing-encoder-models-for-enhanced-language-understanding/#respond Mon, 03 Mar 2025 20:58:35 +0000 https://www.marktechpost.com/?p=69464 Encoder models like BERT and RoBERTa have long been cornerstones of natural language processing (NLP), powering tasks such as text classification, retrieval, and toxicity detection. However, while decoder-based large language models (LLMs) like GPT and LLaMA have evolved rapidly—incorporating architectural innovations, larger datasets, and extended context windows—encoders have stagnated. Despite their critical role in embedding-dependent […]

The post NeoBERT: Modernizing Encoder Models for Enhanced Language Understanding appeared first on MarkTechPost.

]]>
Encoder models like BERT and RoBERTa have long been cornerstones of natural language processing (NLP), powering tasks such as text classification, retrieval, and toxicity detection. However, while decoder-based large language models (LLMs) like GPT and LLaMA have evolved rapidly—incorporating architectural innovations, larger datasets, and extended context windows—encoders have stagnated. Despite their critical role in embedding-dependent applications, BERT-family models rely on outdated architectures, limited training data, and short context lengths, leading to suboptimal performance on modern benchmarks. In this paper, the researchers have presented NeoBERT to revitalize encoder design by integrating advancements from decoder models while addressing inherent limitations of existing encoders.

Traditional encoders like BERT and RoBERTa use absolute positional embeddings, Gaussian Error Linear Unit (GELU) activations, and a fixed 512-token context window. While newer models like GTE and CDE improved fine-tuning strategies for tasks like retrieval, they rely on outdated backbone architectures inherited from BERT. These backbones suffer from inefficiencies:

  1. Architectural Rigidity: Fixed depth-to-width ratios and positional encoding methods limit adaptability to longer sequences.
  2. Data Scarcity: Pre-training on small datasets (e.g., Wikipedia + BookCorpus) restricts knowledge diversity.
  3. Context Constraints: Short sequence lengths (512–2,048 tokens) hinder applications requiring long-context understanding.

Recent fine-tuning advancements masked these issues but failed to modernize the core models. For example, GTE’s contrastive learning boosts retrieval performance but cannot compensate for BERT’s obsolete embeddings. NeoBERT addresses these gaps through architectural overhauls, data scaling, and optimized training:

  1. Architectural Modernization:
    1. Rotary Position Embeddings (RoPE): Replaces absolute positional embeddings with relative positioning, enabling better generalization to longer sequences. RoPE integrates positional information directly into attention mechanisms, reducing degradation on out-of-distribution lengths.
    2. Depth-to-Width Optimization: Adjusts layer depth (28 layers) and width (768 dimensions) to balance parameter efficiency and performance, avoiding the “width-inefficiency” of smaller models.
    3. RMSNorm and SwiGLU: Replaces LayerNorm with RMSNorm for faster computation and adopts SwiGLU activations, enhancing nonlinear modeling while maintaining parameter count.
  1. Data and Training:
    1. RefinedWeb Dataset: Trains on 600B tokens (18× larger than RoBERTa’s data), exposing the model to diverse, real-world text.
    2. Two-Stage Context Extension: First pre-trains on 1,024-token sequences, then fine-tunes on 4,096-token batches using a mix of standard and long-context data. This phased approach mitigates distribution shifts while expanding usable context.
    3. Efficiency Optimizations:
      1. FlashAttention and xFormers: Reduces memory overhead for longer sequences.
      2. AdamW with Cosine Decay: Balances training stability and regularization.Performance and Evaluation

NeoBERT’s improvements are validated across following benchmarks:

  1. GLUE: Scores 89.0%, matching RoBERTa-large’s performance despite having 100M fewer parameters. Key drivers include the RefinedWeb dataset (+3.6% gain) and scaled model size (+2.9%).
  2. MTEB: Outperforms GTE, CDE, and jina-embeddings by +4.5% under standardized contrastive fine-tuning, demonstrating superior embedding quality. The evaluation isolates pre-training benefits by applying identical fine-tuning protocols to all models.
  3. Context Length: NeoBERT4096 achieves stable perplexity on 4,096-token sequences after 50k additional training steps, whereas BERT struggles beyond 512 tokens. Efficiency tests show NeoBERT processes 4,096-token batches 46.7% faster than ModernBERT, despite larger size.

In conclusion, NeoBERT represents a paradigm shift for encoder models, bridging the gap between stagnant architectures and modern LLM advancements. By rethinking depth-to-width ratios, positional encoding, and data scaling, it achieves state-of-the-art performance on GLUE and MTEB while supporting context windows eight times longer than BERT. Its efficiency and open-source availability make it a practical choice for retrieval, classification, and real-world applications requiring robust embeddings. However, reliance on web-scale data introduces biases, necessitating ongoing updates as cleaner datasets emerge. NeoBERT’s success underscores the untapped potential of encoder modernization, setting a roadmap for future research in efficient, scalable language understanding.


Check out the Paper and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post NeoBERT: Modernizing Encoder Models for Enhanced Language Understanding appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/03/03/neobert-modernizing-encoder-models-for-enhanced-language-understanding/feed/ 0 69464
LEAPS: A Neural Sampling Algorithm for Discrete Distributions via Continuous-Time Markov Chains (‘Discrete Diffusion’) https://www.marktechpost.com/2025/02/28/leaps-revolutionizing-discrete-distribution-sampling-through-neural-network-equivariance/ https://www.marktechpost.com/2025/02/28/leaps-revolutionizing-discrete-distribution-sampling-through-neural-network-equivariance/#respond Fri, 28 Feb 2025 16:50:57 +0000 https://www.marktechpost.com/?p=69351 Sampling from probability distributions with known density functions (up to normalization) is a fundamental challenge across various scientific domains. From Bayesian uncertainty quantification to molecular dynamics and quantum physics, the ability to efficiently generate representative samples is crucial. While Markov chain Monte Carlo (MCMC) methods have long been the dominant approach, they often suffer from […]

The post LEAPS: A Neural Sampling Algorithm for Discrete Distributions via Continuous-Time Markov Chains (‘Discrete Diffusion’) appeared first on MarkTechPost.

]]>
Sampling from probability distributions with known density functions (up to normalization) is a fundamental challenge across various scientific domains. From Bayesian uncertainty quantification to molecular dynamics and quantum physics, the ability to efficiently generate representative samples is crucial. While Markov chain Monte Carlo (MCMC) methods have long been the dominant approach, they often suffer from slow convergence, especially when dealing with multimodal distributions.

Traditional MCMC methods frequently struggle with convergence to equilibrium, leading researchers to combine them with non-equilibrium dynamics through techniques like annealed importance sampling (AIS) or sequential Monte Carlo (SMC). However, these methods can still exhibit high variance in their importance weights, resulting in inefficient sampling. The integration of deep learning with sampling algorithms has shown promise in continuous domains, but there remains a significant gap in effective sampling approaches for discrete distributions – despite their prevalence in applications ranging from statistical physics to genomic data and language modeling.

The research team addresses this gap with LEAPS (Locally Equivariant discrete Annealed Proactive Sampler), a novel sampling method that leverages continuous-time Markov chains (CTMCs) to efficiently sample from discrete distributions. LEAPS combines the theoretical foundation of non-equilibrium dynamics with neural network-based learning to create a powerful sampling approach.

LEAPS works by constructing a time-dependent probability path (ρt) that begins with an easy-to-sample distribution (ρ0) and gradually transforms it into the target distribution (ρ1). The central innovation lies in designing a CTMC whose evolution follows this prescribed path, enabling efficient sampling through a combination of:

  1. Proactive Importance Sampling: The researchers developed a novel importance sampling scheme that anticipates where the CTMC will jump next, accumulating weights that reflect the deviation from the true distribution.
  2. Locally Equivariant Neural Networks: A key computational breakthrough that allows efficient calculation of importance weights without the prohibitive costs associated with evaluating all neighboring states.
  3. PINN Objective: A physics-informed neural network objective that trains the CTMC rate matrix by minimizing the variance of importance sampling weights.

Traditional approaches would require evaluating the neural network for each neighbor of a state, making the computation of importance weights prohibitively expensive for high-dimensional spaces. LEAPS introduces the concept of “local equivariance” – an inductive bias that enables computing these weights in a single forward pass of the neural network.

A locally equivariant neural network ensures that the “flux of probability” from a state to its neighbor is exactly negative of the flux from the neighbor back to the state. This property allows the model to efficiently capture the dynamics of the system without redundant calculations.

The research team demonstrates how to construct locally equivariant versions of popular neural network architectures:

  • Multilayer Perceptrons (MLPs) with specifically constrained weight matrices
  • Locally-Equivariant Attention (LEA) layers that maintain the equivariance property
  • Locally-Equivariant Convolutional (LEC) networks that can be stacked into deep architectures

LEAPS is not just computationally efficient but also theoretically sound. The researchers prove that their proactive importance sampling scheme provides unbiased estimates and that the locally equivariant parameterization of rate matrices is universally expressive – meaning it can represent any valid CTMC for the sampling problem.

A noteworthy theoretical result is that LEAPS generalizes both AIS and SMC methods. When the neural network component is set to zero, LEAPS recovers these classical approaches, making it a strict superset of these well-established sampling techniques.

To demonstrate LEAPS in action, the researchers applied it to sampling from a 2D Ising model – a classic challenge in statistical physics. Working with a 15×15 lattice (a 225-dimensional discrete space), they compared different neural architectures implementing their method against ground truth samples generated by long-run Glauber dynamics.

The results are impressive:

  • Convolutional architectures outperformed attention-based models, with deeper networks yielding better results
  • LEAPS accurately captured the magnetization distribution and two-point correlation functions
  • The method achieved high effective sample size (ESS), indicating efficient sampling with low-variance importance weights
  • LEAPS significantly outperformed pure MCMC approaches with the same number of sampling steps

What makes LEAPS particularly valuable is its ability to handle high-dimensional discrete spaces, which are ubiquitous in real-world applications but notoriously challenging for sampling algorithms. The method combines the statistical guarantees of traditional approaches with the representational power of deep learning. Additionally, LEAPS can be integrated with existing MCMC schemes, effectively combining learned transport with traditional random walks to achieve better mixing properties. This hybrid approach provides a practical pathway for researchers to enhance their existing sampling methods.

In conclusion, LEAPS represents a significant advancement in sampling from discrete distributions, especially in high-dimensional settings. By leveraging locally equivariant neural networks and proactive importance sampling, it offers a computationally efficient approach with strong theoretical guarantees. The research team suggests several promising directions for future work, including extending LEAPS to sample from entire families of distributions simultaneously and applying the locally equivariant neural network architecture to other probabilistic modeling tasks. The connection between LEAPS and guidance or reward fine-tuning of generative CTMC models also presents an exciting avenue for further exploration.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post LEAPS: A Neural Sampling Algorithm for Discrete Distributions via Continuous-Time Markov Chains (‘Discrete Diffusion’) appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/02/28/leaps-revolutionizing-discrete-distribution-sampling-through-neural-network-equivariance/feed/ 0 69351
Building an Ideation Agent System with AutoGen: Create AI Agents that Brainstorm and Debate Ideas https://www.marktechpost.com/2025/02/20/building-an-ideation-agent-system-with-autogen-create-ai-agents-that-brainstorm-and-debate-ideas/ https://www.marktechpost.com/2025/02/20/building-an-ideation-agent-system-with-autogen-create-ai-agents-that-brainstorm-and-debate-ideas/#respond Thu, 20 Feb 2025 17:04:11 +0000 https://www.marktechpost.com/?p=69097 Ideation processes often require time-consuming analysis and debate. What if we make two LLMs come up with ideas and then make them debate about those ideas? Sounds interesting right? This tutorial exactly shows how to create an AI-powered solution using two LLM agents that collaborate through structured conversation. For achieving this we will be using […]

The post Building an Ideation Agent System with AutoGen: Create AI Agents that Brainstorm and Debate Ideas appeared first on MarkTechPost.

]]>
Ideation processes often require time-consuming analysis and debate. What if we make two LLMs come up with ideas and then make them debate about those ideas? Sounds interesting right? This tutorial exactly shows how to create an AI-powered solution using two LLM agents that collaborate through structured conversation. For achieving this we will be using AutoGen for building the agent and ChatGPT as LLM for our agent.

1. Setup and Installation  

First install required packages:

pip install -U autogen-agentchat
pip install autogen-ext[openai]

2. Core Components  

Let’s explore the key components of AutoGen that make this ideation system work. Understanding these components will help you customize and extend the system for your specific needs.

1. RoundRobinGroupChat

  • Manages a team of agents in a turn-based manner.
  • Agents take turns responding, and all messages are shared for context.
  • Ensures structured and fair interaction.

2. TextMentionTermination

  • Stops the conversation when a specific keyword (e.g., “FINALIZE”) is detected.
  • Useful for ending discussions when agents reach consensus or complete a task.

3. AssistantAgent

  • Represents an LLM-powered team member with a specific role.
  • Each agent is defined by a system message that guides its behavior.
  • Agents use the conversation history to generate context-aware responses.

These components work together to create a structured, collaborative system where agents brainstorm, debate, and reach decisions efficiently.

 3. Building the Agent Team  

Create two specialized agents with distinct roles:

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.base import TaskResult
from autogen_agentchat.conditions import ExternalTermination, TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient

from apikey import API_KEY

# Create an OpenAI model client.
model_client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=API_KEY,
)

# Create the primary agent.
primary_agent = AssistantAgent(
    "participant1",
    model_client=model_client,
    system_message="You are a participant in an ideation and feedback session. You will be provided with a problem statement and asked to generate ideas. Your ideas will be\
    reviwed by another participant and then you together will narrow down ideas by debating over them. Respond with 'FINALIZE' when you have a final idea.",
)

# Create the critic agent.
critic_agent = AssistantAgent(
    "participant2",
    model_client=model_client,
    system_message="You are a participant in an ideation and feedback session. Your teammate will be provide some ideas that you need to review with your \
        teammate and narrow down ideas by debating over them. Respond with 'FINALIZE' when you have a final idea.",
)

# Define a termination condition that stops the task if the critic approves.
text_termination = TextMentionTermination("FINALIZE")

# Create a team with the primary and critic agents.
team = RoundRobinGroupChat([primary_agent, critic_agent], termination_condition=text_termination)

4. Running the Team  

Execute with asynchronous processing:

result = await team.run(task="Generate ideas for an applications of AI in healthcare.")
print(result)

5. Monitoring Interactions  

You can also track the debate in real-time:

# When running inside a script, use a async main function and call it from `asyncio.run(...)`.
await team.reset()  # Reset the team for a new task.
async for message in team.run_stream(task="Generate ideas for an applications of AI in healthcare."):  # type: ignore
    if isinstance(message, TaskResult):
        print("Stop Reason:", message.stop_reason)
    else:
        print(message)

AutoGen also provides us with a function to visualize the interactions in a prettier ways using console function:

await team.reset()  # Reset the team for a new task.
await Console(team.run_stream(task="Generate ideas for an applications of AI in healthcare."))  # Stream the messages to the console.

Now the system is complete. But there is a-lot to play around with, but I will leave that to you. Here are few ideas to enhance your system:

  • Adding domain-specific agents (medical experts, technical validators)
  • Implementing custom termination conditions
  • Making a simple UI using streamlit
  • Adding more players to the team

References: 

The post Building an Ideation Agent System with AutoGen: Create AI Agents that Brainstorm and Debate Ideas appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/02/20/building-an-ideation-agent-system-with-autogen-create-ai-agents-that-brainstorm-and-debate-ideas/feed/ 0 69097
Breaking the Autoregressive Mold: LLaDA Proves Diffusion Models can Rival Traditional Language Architectures https://www.marktechpost.com/2025/02/19/breaking-the-autoregressive-mold-llada-proves-diffusion-models-can-rival-traditional-language-architectures/ https://www.marktechpost.com/2025/02/19/breaking-the-autoregressive-mold-llada-proves-diffusion-models-can-rival-traditional-language-architectures/#respond Thu, 20 Feb 2025 06:16:03 +0000 https://www.marktechpost.com/?p=69088 The field of large language models has long been dominated by autoregressive methods that predict text sequentially from left to right. While these approaches power today’s most capable AI systems, they face fundamental limitations in computational efficiency and bidirectional reasoning. A research team from China has now challenged the assumption that autoregressive modeling is the […]

The post Breaking the Autoregressive Mold: LLaDA Proves Diffusion Models can Rival Traditional Language Architectures appeared first on MarkTechPost.

]]>
The field of large language models has long been dominated by autoregressive methods that predict text sequentially from left to right. While these approaches power today’s most capable AI systems, they face fundamental limitations in computational efficiency and bidirectional reasoning. A research team from China has now challenged the assumption that autoregressive modeling is the only path to achieving human-like language capabilities, introducing an innovative diffusion-based architecture called LLaDA that reimagines how language models process information.  

Current language models operate through next-word prediction, requiring increasingly complex computations as context windows grow. This sequential nature creates bottlenecks in processing speed and limits effectiveness on tasks requiring reverse reasoning. For instance, traditional autoregressive models suffer from the reversal curse—a phenomenon where models trained to predict the next token struggle with backward logical tasks. Consider poetry completion:  

  • Forward Task (Autoregressive Strength): Given the prompt “Roses are red,” models easily continue with “violets are blue.”  
  • Reversal Task (Autoregressive Weakness): Given “violets are blue,” the same models often fail to recall “Roses are red” as the preceding line.  

This directional bias stems from their training to predict text strictly left-to-right. While masked language models (like BERT) exist, they traditionally use fixed masking ratios, limiting their generative capabilities. The researchers propose LLaDA (Large Language Diffusion with mAsking), which implements a dynamic masking strategy across diffusion steps to overcome these constraints (Illustrated in Fig. 2). Unlike autoregressive models, LLaDA processes tokens in parallel through a bidirectional framework, learning contextual relationships in all directions simultaneously.  

LLaDA’s architecture employs a transformer without causal masking, trained through two phases:  

  1. Pre-training: The model learns to reconstruct randomly masked text segments across 2.3 trillion tokens. Imagine repairing a damaged manuscript where words vanish unpredictably—LLaDA practices filling gaps in any order. For example:  
  •    Start with a masked sentence: “[MASK] are red, [MASK] are blue.”  
  •    Predict “violets” for the second blank first, then “Roses” for the first.  
  •    Repeated masking/unmasking cycles eliminate directional bias.  
  1. Supervised Fine-Tuning: The model adapts to instruction-response pairs by masking only the response portion, enabling task-specific refinement while retaining bidirectional understanding.  

During generation, LLaDA starts with fully masked output fields and iteratively refines predictions through confidence-based remasking:  

  1. At each diffusion step, the model predicts all masked tokens simultaneously.  
  2. Low-confidence predictions (e.g., uncertain words in a poem’s opening line) are remasked for re-evaluation.  
  3. This “semantic annealing” process repeats until coherent text emerges.  
Reference: https://arxiv.org/pdf/2502.09992

Performance evaluations reveal surprising capabilities. When scaled to 8 billion parameters, LLaDA matches or exceeds equivalent-sized autoregressive models like LLaMA2-7B across 15 benchmarks, excelling in mathematical reasoning (GSM8K) and Chinese tasks. Crucially, it overcomes the reversal curse:  

  • Achieved 42% accuracy on backward poem completion tasks vs. GPT-4’s 32%, while maintaining parity in forward generation.  
  • Demonstrated consistent performance on reversal QA tasks (e.g., “Who is Tom Cruise’s mother?” vs. “Who is Mary Lee Pfeiffer’s son?”), where autoregressive models often fail.  

The model also shows efficient scaling—computational costs grow comparably to traditional architectures despite its novel approach. Notably, in tasks such as MMLU and GSM8K, LLaDA exhibits even stronger scalability. 

In summary, this breakthrough suggests key language capabilities emerge from fundamental generative principles, not autoregressive designs alone. While current implementations lag slightly in tasks like MMLU (likely due to data quality variances), LLaDA establishes diffusion models as viable alternatives. The research opens doors to parallel generation and bidirectional reasoning, though challenges remain in inference optimization and alignment with human preferences. As the field explores these alternatives, we may be witnessing the early stages of a paradigm shift in how machines process language—one where models “think holistically” rather than being constrained to linear prediction.  


    Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

    The post Breaking the Autoregressive Mold: LLaDA Proves Diffusion Models can Rival Traditional Language Architectures appeared first on MarkTechPost.

    ]]>
    https://www.marktechpost.com/2025/02/19/breaking-the-autoregressive-mold-llada-proves-diffusion-models-can-rival-traditional-language-architectures/feed/ 0 69088
    ReasonFlux: Elevating LLM Reasoning with Hierarchical Template Scaling https://www.marktechpost.com/2025/02/15/reasonflux-elevating-llm-reasoning-with-hierarchical-template-scaling/ https://www.marktechpost.com/2025/02/15/reasonflux-elevating-llm-reasoning-with-hierarchical-template-scaling/#respond Sat, 15 Feb 2025 22:28:12 +0000 https://www.marktechpost.com/?p=68940 Large language models (LLMs) have demonstrated exceptional problem-solving abilities, yet complex reasoning tasks—such as competition-level mathematics or intricate code generation—remain challenging. These tasks demand precise navigation through vast solution spaces and meticulous step-by-step deliberation. Existing methods, while improving accuracy, often suffer from high computational costs, rigid search strategies, and difficulty generalizing across diverse problems. In […]

    The post ReasonFlux: Elevating LLM Reasoning with Hierarchical Template Scaling appeared first on MarkTechPost.

    ]]>
    Large language models (LLMs) have demonstrated exceptional problem-solving abilities, yet complex reasoning tasks—such as competition-level mathematics or intricate code generation—remain challenging. These tasks demand precise navigation through vast solution spaces and meticulous step-by-step deliberation. Existing methods, while improving accuracy, often suffer from high computational costs, rigid search strategies, and difficulty generalizing across diverse problems. In this paper researchers introduced a new framework, ReasonFlux that addresses these limitations by reimagining how LLMs plan and execute reasoning steps using hierarchical, template-guided strategies.  

    Recent approaches to enhance LLM reasoning fall into two categories: deliberate search and reward-guided methods. Techniques like Tree of Thoughts (ToT) enable LLMs to explore multiple reasoning paths, while Monte Carlo Tree Search (MCTS) decomposes problems into steps guided by process reward models (PRMs). Though effective, these methods scale poorly due to excessive sampling and manual search design. For instance, MCTS requires iterating through thousands of potential steps, making it computationally prohibitive for real-world applications. Meanwhile, retrieval-augmented generation (RAG) methods like Buffer of Thought (BoT) leverage stored problem-solving templates but struggle to integrate multiple templates adaptively, limiting their utility in complex scenarios.  

    ReasonFlux introduces a structured framework that combines a curated library of high-level thought templates with hierarchical reinforcement learning (HRL) to dynamically plan and refine reasoning paths. Instead of optimizing individual steps, it focuses on configuring optimal template trajectories—sequences of abstract problem-solving strategies retrieved from a structured knowledge base. This approach simplifies the search space and enables efficient adaptation to sub-problems. The framework consists of three main components:

    1. Structured Template Library:  The research team constructed a library of 500 thought templates, each encapsulating a problem-solving strategy (e.g., “Trigonometric Substitution for Integral Optimization”). Templates include metadata—names, tags, descriptions, and application steps—enabling efficient retrieval. For example, a template tagged “Irrational Function Optimization” might guide an LLM to apply specific algebraic substitutions.  
    1. Hierarchical Reinforcement Learning:
      1. Structure-Based Fine-Tuning: A base LLM (e.g., Qwen2.5-32B) is fine-tuned to associate template metadata with their functional descriptions, ensuring it understands when and how to apply each template.  
      2. Template Trajectory Optimization: Using preference learning, the model learns to rank template sequences by their effectiveness. For a given problem, multiple trajectories are sampled, and their success rates on similar problems determine rewards. This trains the model to prioritize high-reward sequences, refining its planning capability.  
    1. Adaptive Inference Scaling:  During inference, ReasonFlux acts as a “navigator,” analyzing the problem to retrieve relevant templates and dynamically adjusting the trajectory based on intermediate results. For instance, if a step involving “Polynomial Factorization” yields unexpected constraints, the system might pivot to a “Constraint Propagation” template. This iterative interplay between planning and execution mirrors human problem-solving, where partial solutions inform subsequent steps.  

    ReasonFlux was evaluated on competition-level benchmarks like MATH, AIME, and OlympiadBench, outperforming both frontier models (GPT-4o, Claude) and specialized open-source models (DeepSeek-V3, Mathstral). Key results include:  

    • 91.2% accuracy on MATH, surpassing OpenAI’s o1-preview by 6.7%.  
    • 56.7% on AIME 2024, exceeding DeepSeek-V3 by 45% and matching o1-mini.  
    • 63.3% on OlympiadBench, a 14% improvement over prior methods.  

    Moreover, the structured template library demonstrated strong generalization: when applied to variant problems, it boosted smaller models (e.g., 7B parameters) to outperform larger counterparts using direct reasoning. Additionally, ReasonFlux achieved a superior exploration-exploitation balance, requiring 40% fewer computational steps than MCTS and Best-of-N on complex tasks (Figure 5).  

    In summary, ReasonFlux redefines how LLMs approach complex reasoning by decoupling high-level strategy from step-by-step execution. Its hierarchical template system reduces computational overhead while improving accuracy and adaptability, addressing critical gaps in existing methods. By leveraging structured knowledge and dynamic planning, the framework sets a new standard for efficient, scalable reasoning—proving that smaller, well-guided models can rival even the largest frontier systems. This innovation opens avenues for deploying advanced reasoning in resource-constrained environments, from education to automated code generation.  


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

    The post ReasonFlux: Elevating LLM Reasoning with Hierarchical Template Scaling appeared first on MarkTechPost.

    ]]>
    https://www.marktechpost.com/2025/02/15/reasonflux-elevating-llm-reasoning-with-hierarchical-template-scaling/feed/ 0 68940
    Step by Step Guide on How to Build an AI News Summarizer Using Streamlit, Groq and Tavily https://www.marktechpost.com/2025/02/13/step-by-step-guide-on-how-to-build-an-ai-news-summarizer-using-streamlit-groq-and-tavily/ https://www.marktechpost.com/2025/02/13/step-by-step-guide-on-how-to-build-an-ai-news-summarizer-using-streamlit-groq-and-tavily/#respond Fri, 14 Feb 2025 07:06:31 +0000 https://www.marktechpost.com/?p=68912 Introduction In this tutorial, we will build an advanced AI-powered news agent that can search the web for the latest news on a given topic and summarize the results. This agent follows a structured workflow: To enhance usability, we will also create a simple GUI using Streamlit. Similar to previous tutorials, we will use Groq […]

    The post Step by Step Guide on How to Build an AI News Summarizer Using Streamlit, Groq and Tavily appeared first on MarkTechPost.

    ]]>
    Introduction

    In this tutorial, we will build an advanced AI-powered news agent that can search the web for the latest news on a given topic and summarize the results. This agent follows a structured workflow:

    1. Browsing: Generate relevant search queries and collect information from the web.
    2. Writing: Extracts and compiles news summaries from the collected information.
    3. Reflection: Critiques the summaries by checking for factual correctness and suggests improvements.
    4. Refinement: Improves the summaries based on the critique.
    5. Headline Generation: Generates appropriate headlines for each news summary.

    To enhance usability, we will also create a simple GUI using Streamlit. Similar to previous tutorials, we will use Groq for LLM-based processing and Tavily for web browsing. You can generate free API keys from their respective websites.

    Setting Up the Environment

    We begin by setting up environment variables, installing the required libraries, and importing necessary dependencies:

    Install Required Libraries

    pip install langgraph==0.2.53 langgraph-checkpoint==2.0.6 langgraph-sdk==0.1.36 langchain-groq langchain-community langgraph-checkpoint-sqlite==2.0.1 tavily-python streamlit

    Import Libraries and Set API Keys

    import os
    import sqlite3
    from langgraph.graph import StateGraph
    from langchain_core.messages import SystemMessage, HumanMessage
    from langchain_groq import ChatGroq
    from tavily import TavilyClient
    from langgraph.checkpoint.sqlite import SqliteSaver
    from typing import TypedDict, List
    from pydantic import BaseModel
    import streamlit as st
    
    # Set API Keys
    os.environ['TAVILY_API_KEY'] = "your_tavily_key"
    os.environ['GROQ_API_KEY'] = "your_groq_key"
    
    # Initialize Database for Checkpointing
    sqlite_conn = sqlite3.connect("checkpoints.sqlite", check_same_thread=False)
    memory = SqliteSaver(sqlite_conn)
    
    # Initialize Model and Tavily Client
    model = ChatGroq(model="Llama-3.1-8b-instant")
    tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

    Defining the Agent State

    The agent maintains state information throughout its workflow:

    1. Topic: The topic on which user wants the latest news Drafts: The first drafts of the news summaries 
    2. Content: The research content extracted from the search results of the Tavily 
    3. Critique: The critique and recommendations generated for the draft in the reflection state. 
    4. Refined Summaries: Updated news summaries after incorporating suggesstions from Critique 

    Headings: Headlines generated for each news article class

    class AgentState(TypedDict):
        topic: str
        drafts: List[str]
        content: List[str]
        critiques: List[str]
        refined_summaries: List[str]
        headings: List[str]

    Defining Prompts

    We define system prompts for each phase of the agent’s workflow:

    BROWSING_PROMPT = """You are an AI news researcher tasked with finding the latest news articles on given topics. Generate up to 3 relevant search queries."""
    
    WRITER_PROMPT = """You are an AI news summarizer. Write a detailed summary (1 to 2 paragraphs) based on the given content, ensuring factual correctness, clarity, and coherence."""
    
    CRITIQUE_PROMPT = """You are a teacher reviewing draft summaries against the source content. Ensure factual correctness, identify missing or incorrect details, and suggest improvements.
    ----------
    Content: {content}
    ----------"""
    
    REFINE_PROMPT = """You are an AI news editor. Given a summary and critique, refine the summary accordingly.
    -----------
    Summary: {summary}"""
    
    HEADING_GENERATION_PROMPT = """You are an AI news summarizer. Generate a short, descriptive headline for each news summary."""

    Structuring Queries and News

    We use Pydantic to define the structure of queries and News articles. Pydantic allows us to define the structure of the output of the LLM. This is important because we want the queries to be a list of string and the extracted content from web will have multiple news articles, hence a list of strings.

    from pydantic import BaseModel
    
    class Queries(BaseModel):
        queries: List[str]
    
    class News(BaseModel):
        news: List[str]
    

    Implementing the AI Agents

    1. Browsing Node

    This node generates search queries and retrieves relevant content from the web.

    def browsing_node(state: AgentState):
        queries = model.with_structured_output(Queries).invoke([
            SystemMessage(content=BROWSING_PROMPT),
            HumanMessage(content=state['topic'])
        ])
        content = state.get('content', [])
        for q in queries.queries:
            response = tavily.search(query=q, max_results=2)
            for r in response['results']:
                content.append(r['content'])
        return {"content": content}

    2. Writing Node

    Extracts news summaries from the retrieved content.

    def writing_node(state: AgentState):
        content = "\n\n".join(state['content'])
        news = model.with_structured_output(News).invoke([
            SystemMessage(content=WRITER_PROMPT),
            HumanMessage(content=content)
        ])
        return {"drafts": news.news}

    3. Reflection Node

    Critiques the generated summaries against the content.

    def reflection_node(state: AgentState):
        content = "\n\n".join(state['content'])
        critiques = []
        for draft in state['drafts']:
            response = model.invoke([
                SystemMessage(content=CRITIQUE_PROMPT.format(content=content)),
                HumanMessage(content="draft: " + draft)
            ])
            critiques.append(response.content)
        return {"critiques": critiques}

    4. Refinement Node

    Improves the summaries based on critique.

    def refine_node(state: AgentState):
        refined_summaries = []
        for summary, critique in zip(state['drafts'], state['critiques']):
            response = model.invoke([
                SystemMessage(content=REFINE_PROMPT.format(summary=summary)),
                HumanMessage(content="Critique: " + critique)
            ])
            refined_summaries.append(response.content)
        return {"refined_summaries": refined_summaries}

    5. Headlines Generation Node

    Generates a short headline for each news summary.

    def heading_node(state: AgentState):
        headings = []
        for summary in state['refined_summaries']:
            response = model.invoke([
                SystemMessage(content=HEADING_GENERATION_PROMPT),
                HumanMessage(content=summary)
            ])
            headings.append(response.content)
        return {"headings": headings}

    Building the UI with Streamlit

    # Define Streamlit app
    st.title("News Summarization Chatbot")
    
    # Initialize session state
    if "messages" not in st.session_state:
        st.session_state["messages"] = []
    
    # Display past messages
    for message in st.session_state["messages"]:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])
    
    # Input field for user
    user_input = st.chat_input("Ask about the latest news...")
    
    thread = 1
    if user_input:
        st.session_state["messages"].append({"role": "user", "content": user_input})
        with st.chat_message("assistant"):
            loading_text = st.empty()
            loading_text.markdown("*Thinking...*")
    
            builder = StateGraph(AgentState)
            builder.add_node("browser", browsing_node)
            builder.add_node("writer", writing_node)
            builder.add_node("reflect", reflection_node)
            builder.add_node("refine", refine_node)
            builder.add_node("heading", heading_node)
            builder.set_entry_point("browser")
            builder.add_edge("browser", "writer")
            builder.add_edge("writer", "reflect")
            builder.add_edge("reflect", "refine")
            builder.add_edge("refine", "heading")
            graph = builder.compile(checkpointer=memory)
    
            config = {"configurable": {"thread_id": f"{thread}"}}
            for s in graph.stream({"topic": user_input}, config):
                # loading_text.markdown(f"*{st.session_state['loading_message']}*")
                print(s)
            
            s = graph.get_state(config).values
            refined_summaries = s['refined_summaries']
            headings = s['headings']
            thread+=1
            # Display final response
            loading_text.empty()
            response_text = "\n\n".join([f"{h}\n{s}" for h, s in zip(headings, refined_summaries)])
            st.markdown(response_text)
            st.session_state["messages"].append({"role": "assistant", "content": response_text})

    Conclusion

    This tutorial covered the entire process of building an AI-powered news summarization agent with a simple Streamlit UI. Now you can play around with this and make some further improvements like:

    • A better GUI for enhanced user interaction.
    • Incorporating Iterative refinement to make sure the summaries are accurate and appropriate.
    • Maintaining a context to continue conversation about particular news.

    Happy coding!


    Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

    The post Step by Step Guide on How to Build an AI News Summarizer Using Streamlit, Groq and Tavily appeared first on MarkTechPost.

    ]]>
    https://www.marktechpost.com/2025/02/13/step-by-step-guide-on-how-to-build-an-ai-news-summarizer-using-streamlit-groq-and-tavily/feed/ 0 68912
    Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performance https://www.marktechpost.com/2025/02/13/open-o1-revolutionizing-open-source-ai-with-cutting-edge-reasoning-and-performance/ https://www.marktechpost.com/2025/02/13/open-o1-revolutionizing-open-source-ai-with-cutting-edge-reasoning-and-performance/#respond Fri, 14 Feb 2025 06:48:46 +0000 https://www.marktechpost.com/?p=68908 The Open O1 project is a groundbreaking initiative aimed at matching the powerful capabilities of proprietary models, particularly OpenAI’s O1, through an open-source approach. By leveraging advanced training methodologies and community-driven development, Open O1 seeks to democratize access to state-of-the-art AI models. Proprietary AI models like OpenAI’s O1 have demonstrated exceptional capabilities in reasoning, tool […]

    The post Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performance appeared first on MarkTechPost.

    ]]>
    The Open O1 project is a groundbreaking initiative aimed at matching the powerful capabilities of proprietary models, particularly OpenAI’s O1, through an open-source approach. By leveraging advanced training methodologies and community-driven development, Open O1 seeks to democratize access to state-of-the-art AI models.

    Proprietary AI models like OpenAI’s O1 have demonstrated exceptional capabilities in reasoning, tool use, and mathematical problem-solving. However, these models are closed-source, limiting accessibility and customization for researchers and developers. Existing open-source alternatives often lag behind in performance due to limitations in data quality, training techniques, and computational efficiency.

    The Open O1 project seeks to bridge this gap by curating high-quality Supervised Fine-Tuning (SFT) data for Chain-of-Thought (CoT) Activation, which enhances logical reasoning and problem-solving abilities in smaller models. This innovative approach enables models like LLaMA and Qwen to achieve long-context reasoning capabilities that were previously limited to proprietary systems.

    To achieve performance parity with OpenAI’s O1, the Open O1 team follows a multi-stage approach. First, a specialized O1-style dataset is used to train the models, ensuring high-quality reasoning and contextual understanding. Next, models such as OpenO1-LLaMA-8B and OpenO1-Qwen-7B undergo rigorous Supervised Fine-Tuning (SFT) with optimized hyperparameters for enhanced CoT reasoning. The models incorporate adaptive scaling techniques to maximize efficiency at inference time, allowing for better generalization across tasks. Finally, Open O1 also provides multiple deployment options, including quantized versions for Hugging Face and local infrastructure support.

    Open O1’s performance has been extensively evaluated against industry benchmarks, demonstrating significant improvements over previous open-source models. Below is a comparison of LLaMA3.1-8B-Instruct and OpenO1-LLaMA-8B across multiple benchmarks:

    These results highlight Open O1’s superior performance in mathematical reasoning (MATH), general knowledge understanding (MMLU), and complex reasoning tasks (BBH). Although it slightly trails in Hellaswag, the model’s overall performance demonstrates its potential as a powerful open-source alternative.

    The Open O1 team is committed to continuous innovation and expanding the model’s capabilities. They have planned include enhanced reward model development, introducing a reinforcement learning framework to refine model outputs and reasoning processes, optimizing training pipelines for better scalability and efficiency, and establishing a competitive chatbot arena to benchmark Open O1 against leading models in real-world tasks. Additionally, research into O1-style scaling laws for both training and inference efficiency is underway.

    Built on the principles of transparency, collaboration, and accessibility, Open O1 ensures that AI advancements are not limited to a select few but are available to researchers, developers, and businesses worldwide. And the best part? **It’s completely open-source! **With community-driven innovation, rigorous benchmarking, and a commitment to ethical AI, Open O1 is poised to redefine the landscape of large language models. As the project continues to evolve, it promises to bring powerful, accessible, and high-performance AI tools to the global community, ensuring that the future of AI remains open and inclusive.


    Check out the GitHub Page and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

    The post Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performance appeared first on MarkTechPost.

    ]]>
    https://www.marktechpost.com/2025/02/13/open-o1-revolutionizing-open-source-ai-with-cutting-edge-reasoning-and-performance/feed/ 0 68908
    Building an AI Research Agent for Essay Writing https://www.marktechpost.com/2025/02/11/building-an-ai-research-agent-for-essay-writing/ https://www.marktechpost.com/2025/02/11/building-an-ai-research-agent-for-essay-writing/#respond Tue, 11 Feb 2025 23:52:46 +0000 https://www.marktechpost.com/?p=68830 In this tutorial, we will build an advanced AI-powered research agent that can write essays on given topics. This agent follows a structured workflow: Iterative Refinement: Conducts further research based on critique and revises the essay. The agent will iterate through the reflection and revision process until a set number of improvements are made. Let’s […]

    The post Building an AI Research Agent for Essay Writing appeared first on MarkTechPost.

    ]]>
    In this tutorial, we will build an advanced AI-powered research agent that can write essays on given topics. This agent follows a structured workflow:

    1. Planning: Generates an outline for the essay.
    2. Research: Retrieves relevant documents using Tavily.
    3. Writing: Uses the research to generate the first draft.
    4. Reflection: Critiques the draft for improvements.

    Iterative Refinement: Conducts further research based on critique and revises the essay.

    The agent will iterate through the reflection and revision process until a set number of improvements are made. Let’s dive into the implementation.

    Setting Up the Environment

    We start by setting up environment variables, installing required libraries and importing the necessary libraries:

    pip install langgraph==0.2.53 langgraph-checkpoint==2.0.6 langgraph-sdk==0.1.36 langchain-groq langchain-community langgraph-checkpoint-sqlite==2.0.1 tavily-python
    import os
    os.environ['TAVILY_API_KEY'] = "your_tavily_key"
    os.environ['GROQ_API_KEY'] = "your_groq_key"
    
    from langgraph.graph import StateGraph, END
    from typing import TypedDict, List
    from langchain_core.messages import SystemMessage, HumanMessage
    
    from langgraph.checkpoint.sqlite import SqliteSaver
    import sqlite3
    
    sqlite_conn = sqlite3.connect("checkpoints.sqlite",check_same_thread=False)
    memory = SqliteSaver(sqlite_conn)

    Defining the Agent State

    The agent maintains state information, including:

    • Task: The topic of the essay
    • Plan: The generated plan or outline of the essay
    • Draft: The draft latest draft of the essay
    • Critique: The critique and recommendations generated for the draft in the reflection state.
    • Content: The research content extracted from the search results of the Tavily

    Revision Number: Count of number of revisions happened till now

    class AgentState(TypedDict):
        task: str
        plan: str
        draft: str
        critique: str
        content: List[str]
        revision_number: int
        max_revisions: int

    Initializing the Language Model

    We use the free Llama model API provided by Groq to generate plans, drafts, critiques, and research queries.

    from langchain_groq import ChatGroq
    
    model = ChatGroq(model="Llama-3.3-70b-Specdec")

    Defining the Prompts

    We define system prompts for each phase of the agent’s workflow (you can play around with these if you want):

    PLAN_PROMPT = """You are an expert writer tasked with creating an outline for an essay.
    Generate a structured outline with key sections and relevant notes."""
    
    WRITER_PROMPT = """You are an AI essay writer. Write a well-structured essay based on the given research.
    Ensure clarity, coherence, and proper argumentation.
    
    ------
    
    {content}"""
    
    REFLECTION_PROMPT = """You are a teacher reviewing an essay draft.
    Provide detailed critique and suggestions for improvement."""
    
    RESEARCH_PLAN_PROMPT = """You are an AI researcher tasked with finding supporting information for an essay topic.
    Generate up to 3 relevant search queries."""
    
    RESEARCH_CRITIQUE_PROMPT = """You are an AI researcher refining an essay based on critique.
    Generate up to 3 search queries to address identified weaknesses."""

    Structuring Research Queries

    We use Pydantic to define the structure of research queries. Pydantic allows us to define the structure of the output of the LLM.

    from pydantic import BaseModel
    
    class Queries(BaseModel):
        queries: List[str]

    Integrating Tavily for Research

    As previously, we will use Tavily to fetch relevant documents for research-based essay writing.

    from tavily import TavilyClient
    import os
    
    tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

    Implementing the AI Agents

    1. Planning Node

    Generates an essay outline based on the provided topic.

    def plan_node(state: AgentState):
        messages = [
            SystemMessage(content=PLAN_PROMPT),
            HumanMessage(content=state['task'])
        ]
        response = model.invoke(messages)
        return {"plan": response.content}

    2. Research Plan Node

    Generates search queries and retrieves relevant documents.

    def research_plan_node(state: AgentState):
        queries = model.with_structured_output(Queries).invoke([
            SystemMessage(content=RESEARCH_PLAN_PROMPT),
            HumanMessage(content=state['task'])
        ])
        content = state['content'] if 'content' in state else []
        for q in queries.queries:
            response = tavily.search(query=q, max_results=2)
            for r in response['results']:
                content.append(r['content'])
        return {"content": content}

    3. Writing Node

    Uses research content to generate the first essay draft.

    def generation_node(state: AgentState):
        content = "\n\n".join(state['content'] or [])
        user_message = HumanMessage(content=f"{state['task']}\n\nHere is my plan:\n\n{state['plan']}")
        messages = [
            SystemMessage(content=WRITER_PROMPT.format(content=content)),
            user_message
        ]
        response = model.invoke(messages)
        return {"draft": response.content, "revision_number": state.get("revision_number", 1) + 1}

    4. Reflection Node

    Generates a critique of the current draft.

    def reflection_node(state: AgentState):
        messages = [
            SystemMessage(content=REFLECTION_PROMPT),
            HumanMessage(content=state['draft'])
        ]
        response = model.invoke(messages)
        return {"critique": response.content}

    5. Research Critique Node

    Generates additional research queries based on critique.

    def research_critique_node(state: AgentState):
        queries = model.with_structured_output(Queries).invoke([
            SystemMessage(content=RESEARCH_CRITIQUE_PROMPT),
            HumanMessage(content=state['critique'])
        ])
        content = state['content'] or []
        for q in queries.queries:
            response = tavily.search(query=q, max_results=2)
            for r in response['results']:
                content.append(r['content'])
        return {"content": content}

    Defining the Iteration Condition

    We use the number of iterations as a condition to decide if we want to continue revising or end the loop. So the agent continues revising the essay until the maximum revisions are reached.

    def should_continue(state):
        if state["revision_number"] > state["max_revisions"]:
            return END
        return "reflect"

    Building the Workflow

    We define a state graph to connect different nodes in the workflow.
    builder = StateGraph(AgentState)
    
    builder.add_node("planner", plan_node)
    builder.add_node("generate", generation_node)
    builder.add_node("reflect", reflection_node)
    builder.add_node("research_plan", research_plan_node)
    builder.add_node("research_critique", research_critique_node)
    
    builder.set_entry_point("planner")
    
    builder.add_conditional_edges("generate", should_continue, {END: END, "reflect": "reflect"})
    
    builder.add_edge("planner", "research_plan")
    builder.add_edge("research_plan", "generate")
    builder.add_edge("reflect", "research_critique")
    builder.add_edge("research_critique", "generate")
    
    graph = builder.compile(checkpointer=memory)

    We can also visualize the graph using:

    #from IPython.display import Image
    #Image(graph.get_graph().draw_mermaid_png())

    Running the AI Essay Writer

    thread = {"configurable": {"thread_id": "1"}}
    for s in graph.stream({
        'task': "What is the difference between LangChain and LangSmith",
        "max_revisions": 2,
        "revision_number": 1,
    }, thread):
        print(s)

    And we are done, now go ahead and test it out with different queries and play around with it. In this tutorial we covered the entire process of creating an AI-powered research and writing agent. You can now experiment with different prompts, research sources, and optimization strategies to enhance performance. Here are some future improvements you can try:

    1. Build a GUI for better visualization of working of agent
    2. Improve the end condition from revising fix number of times to end when you are satisfied by the output (i.e, including another llm node to decide or putting human in the loop)
    3. Add support to write directly to pdfs

    References:

    1. (DeepLearning.ai)https://learn.deeplearning.ai/courses/ai-agents-in-langgraph

    The post Building an AI Research Agent for Essay Writing appeared first on MarkTechPost.

    ]]>
    https://www.marktechpost.com/2025/02/11/building-an-ai-research-agent-for-essay-writing/feed/ 0 68830
    Efficient Alignment of Large Language Models Using Token-Level Reward Guidance with GenARM https://www.marktechpost.com/2025/02/10/efficient-alignment-of-large-language-models-using-token-level-reward-guidance-with-genarm/ https://www.marktechpost.com/2025/02/10/efficient-alignment-of-large-language-models-using-token-level-reward-guidance-with-genarm/#respond Mon, 10 Feb 2025 19:46:30 +0000 https://www.marktechpost.com/?p=68789 Large language models (LLMs) must align with human preferences like helpfulness and harmlessness, but traditional alignment methods require costly retraining and struggle with dynamic or conflicting preferences. Test-time alignment approaches using reward models (RMs) avoid retraining but face inefficiencies due to reliance on trajectory-level rewards, which evaluate full responses rather than guiding token-by-token generation.   Existing […]

    The post Efficient Alignment of Large Language Models Using Token-Level Reward Guidance with GenARM appeared first on MarkTechPost.

    ]]>
    Large language models (LLMs) must align with human preferences like helpfulness and harmlessness, but traditional alignment methods require costly retraining and struggle with dynamic or conflicting preferences. Test-time alignment approaches using reward models (RMs) avoid retraining but face inefficiencies due to reliance on trajectory-level rewards, which evaluate full responses rather than guiding token-by-token generation.  

    Existing alignment techniques fall into two categories: training-time methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), which fine-tune LLMs on preference datasets but demand significant computational resources and lack flexibility for new preferences. Test-time methods use RMs to guide frozen LLMs but rely on trajectory-level RMs that assign a single reward to complete responses. This creates a mismatch during autoregressive generation, where next-token decisions require partial response evaluations. For instance, ARGS approximates token-level rewards by applying trajectory RMs to incomplete responses, leading to inaccuracies since these RMs are trained only on full responses. Other methods like Transfer-Q generate multiple full responses per token candidate, multiplying inference costs. These inefficiencies limit scalability and real-time adaptability.  

    Reference: https://arxiv.org/pdf/2410.08193

    To address these issues, researchers from the University of Maryland, College Park and JPMorgan AI Research propose GenARM (Reward Guided Generation with Autoregressive Reward Model), a test-time alignment framework combining a novel autoregressive RM with guided decoding. The key innovation is the Autoregressive Reward Model, which decomposes trajectory-level rewards into token-level components. Instead of assigning a single reward to a full response, it predicts the reward for each token conditioned on prior tokens, enabling dense, step-by-step guidance, allowing rewards to directly influence each token choice without evaluating partial responses inaccurately.  

    During generation, GenARM integrates the autoregressive RM’s token-level rewards with the base LLM’s logits. The next token is sampled from a modified distribution. Unlike prior methods, this requires only one forward pass through the base and reward models per token, avoiding costly candidate expansions.  

    Experiments demonstrate GenARM’s advantages across three scenarios:  

    1. General Human Preference Alignment: On the HH-RLHF dataset, GenARM outperforms test-time baselines like ARGS and Transfer-Q in helpfulness and harmlessness, matching the performance of training-time methods like DPO based on evaluations using GPT-4.

    2. Weak-to-Strong Guidance: A 7B autoregressive RM effectively guides larger base models (13B, 70B) without fine-tuning them. It surpasses DPO at the 7B scale and nearly matches DPO at the 13B scale. At the 70B scale, GenARM recovers more than 70% of the performance gap in both raw and LC win rates between Tulu2-70B and Tulu2-DPO-70B, all without the need to train the 70B LLM, demonstrating that smaller RMs can steer larger LLMs efficiently.  

    3. Multi-Objective Alignment: GenARM balances conflicting preferences (e.g., helpfulness vs. harmlessness) by combining rewards from multiple autoregressive RMs. On the PKU-SafeRLHF-10K dataset, it achieves a Pareto frontier superior to Rewarded Soups and matches multi-objective RL without retraining.

    The autoregressive RM’s design ensures it can express any reward function achievable by traditional RMs within the KL-regularized reinforcement learning framework. This theoretical guarantee, combined with token-level factorization, makes GenARM both expressive and efficient. Unlike trajectory-level RMs, which struggle with partial contexts, autoregressive RMs provide accurate, incremental feedback, preventing reward hacking or incoherent outputs during long generations.  

    In summary, GenARM bridges the gap between training-time and test-time alignment by introducing autoregressive reward models that enable precise, token-level guidance. It eliminates the need for costly LLM retraining, supports dynamic adaptation to diverse preferences, and efficiently scales to larger models. By addressing the inefficiencies of trajectory-level rewards and enabling weak-to-strong guidance, GenARM offers a practical solution for aligning LLMs in resource-constrained scenarios. Future work could extend this approach to tasks like mathematical reasoning or code generation, where token-level rewards might enhance performance without additional fine-tuning.  


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

    The post Efficient Alignment of Large Language Models Using Token-Level Reward Guidance with GenARM appeared first on MarkTechPost.

    ]]>
    https://www.marktechpost.com/2025/02/10/efficient-alignment-of-large-language-models-using-token-level-reward-guidance-with-genarm/feed/ 0 68789