Tutorials Category - MarkTechPost https://www.marktechpost.com/category/tutorials/ An Artificial Intelligence News Platform Tue, 06 May 2025 17:45:22 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://www.marktechpost.com/wp-content/uploads/2022/04/cropped-Favicon-512-x-512-1-1-32x32.png Tutorials Category - MarkTechPost https://www.marktechpost.com/category/tutorials/ 32 32 127842392 Implementing an AgentQL Model Context Protocol (MCP) Server https://www.marktechpost.com/2025/05/06/implementing-an-agentql-model-context-protocol-mcp-server/ https://www.marktechpost.com/2025/05/06/implementing-an-agentql-model-context-protocol-mcp-server/#respond Tue, 06 May 2025 17:45:14 +0000 https://www.marktechpost.com/?p=71141 AgentQL allows you to scrape any website with unstructured data by defining the exact shape of the information you want. It gives you consistent, structured results—even from pages with dynamic content or frequently changing layouts. In this tutorial, we’ll implement an AgentQL MCP server inside Claude Desktop, and use Claude’s built-in visualization capabilities to explore […]

The post Implementing an AgentQL Model Context Protocol (MCP) Server appeared first on MarkTechPost.

]]>
AgentQL allows you to scrape any website with unstructured data by defining the exact shape of the information you want. It gives you consistent, structured results—even from pages with dynamic content or frequently changing layouts.

In this tutorial, we’ll implement an AgentQL MCP server inside Claude Desktop, and use Claude’s built-in visualization capabilities to explore the data. Specifically, we’ll scrape an Amazon search results page for AI books, extracting details like price, rating, and number of reviews.

Step 1: Setting up dependencies

Node JS

We need npx to run the AgentQL server, which comes with Node.js.

  • Download the latest version of Node.js from nodejs.org
  • Run the installer.
  • Leave all settings as default and complete the installation

Claude Desktop

Download Claude using https://claude.ai/download.

AgentQL API

Create your AgentQL API key at dev.agentql.com/api-keys and store it securely — you’ll need it later in this tutorial.

Step 2: Installing the packages

Once Node.js is installed, open your terminal and run the following command:

npm install -g agentql-mcp

Step 3: Configuring the MCP Server

Next, configure Claude to connect to your MCP server. Open the claude_desktop_config.json file located in the Claude installation directory using any text editor. If the file doesn’t exist, you can create it manually. Once opened, enter the following code:

{
    "mcpServers": {
      "agentql": {
        "command": "npx",
        "args": ["-y", "agentql-mcp"],
        "env": {
          "AGENTQL_API_KEY": "<YOUR_API_KEY>"
        }
      }
    }
  }

Replace <YOUR_API_KEY> with the key you generated.

Step 4: Running the server

Once the MCP configuration is complete, your server should appear in Claude. The AgentQL server includes a single powerful tool — extract_web_data — which takes a URL and a natural language description of the data structure you want to extract.

You can use any URL you want to scrape. For this tutorial, I used an Amazon search results page for AI books and asked Claude to visualize the extracted data. Claude provides an interactive terminal where it generates code to process and visualize the data — and you can edit that code as needed. Once the code was finalized, Claude presented a bar chart with interactive options to explore prices, ratings, review counts, and even a price vs. rating scatter plot, along with key summary statistics.

AgentQL can be used to scrape websites, and we can connect it with other servers like Notion or GitHub to automatically send structured data for documentation, tracking, or further automation.

This makes AgentQL a powerful tool for turning unstructured web content into actionable insights — all within a simple, natural language workflow.


Here’s a brief overview of what we’re building at Marktechpost:

The post Implementing an AgentQL Model Context Protocol (MCP) Server appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/05/06/implementing-an-agentql-model-context-protocol-mcp-server/feed/ 0 71141
A Coding Guide to Compare Three Stability AI Diffusion Models (v1.5, v2-Base & SD3-Medium) Diffusion Capabilities Side-by-Side in Google Colab Using Gradio https://www.marktechpost.com/2025/05/05/a-coding-guide-to-compare-three-stability-ai-diffusion-models-v1-5-v2-base-sd3-medium-diffusion-capabilities-side-by-side-in-google-colab-using-gradio/ https://www.marktechpost.com/2025/05/05/a-coding-guide-to-compare-three-stability-ai-diffusion-models-v1-5-v2-base-sd3-medium-diffusion-capabilities-side-by-side-in-google-colab-using-gradio/#respond Mon, 05 May 2025 23:48:41 +0000 https://www.marktechpost.com/?p=71126 In this hands-on tutorial, we’ll unlock the creative potential of Stability AI’s industry-leading diffusion models, Stable Diffusion v1.5, Stability AI’s v2-base, and the cutting-edge Stable Diffusion 3 Medium, to generate eye-catching imagery. Running entirely in Google Colab with a Gradio interface, we’ll experience side-by-side comparisons of three powerful pipelines, rapid prompt iteration, and seamless GPU-accelerated […]

The post A Coding Guide to Compare Three Stability AI Diffusion Models (v1.5, v2-Base & SD3-Medium) Diffusion Capabilities Side-by-Side in Google Colab Using Gradio appeared first on MarkTechPost.

]]>
In this hands-on tutorial, we’ll unlock the creative potential of Stability AI’s industry-leading diffusion models, Stable Diffusion v1.5, Stability AI’s v2-base, and the cutting-edge Stable Diffusion 3 Medium, to generate eye-catching imagery. Running entirely in Google Colab with a Gradio interface, we’ll experience side-by-side comparisons of three powerful pipelines, rapid prompt iteration, and seamless GPU-accelerated inference. Whether we’re a marketer looking to elevate our brand’s visual narrative or a developer eager to prototype AI-driven content workflows, this tutorial showcases how Stability AI’s open-source models can be deployed instantly and at no infrastructure cost, allowing you to focus on storytelling, engagement, and driving real-world results.

!pip install huggingface_hub
from huggingface_hub import notebook_login


notebook_login()

We install the huggingface_hub library and then import and invoke the notebook_login() function, which prompts you to authenticate your notebook session with your Hugging Face account, allowing you to seamlessly access and manage models, datasets, and other hub resources.

!pip uninstall -y torchvision


!pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu118


!pip install --upgrade diffusers transformers accelerate safetensors gradio pillow

We first force-uninstalls any existing torchvision to clear potential conflicts, then reinstalls torch and torchvision from the CUDA 11.8–compatible PyTorch wheels, and finally upgrades key libraries, diffusers, transformers, accelerate, safetensors, gradio, and pillow, to ensure you have the latest versions for building and running GPU-accelerated generative pipelines and web demos.

import torch
from diffusers import StableDiffusionPipeline, StableDiffusion3Pipeline
import gradio as gr


device = "cuda" if torch.cuda.is_available() else "cpu"

We import PyTorch alongside both the Stable Diffusion v1 and v3 pipelines from the Diffusers library, as well as Gradio for building interactive demos. It then checks for CUDA availability and sets the device variable to “cuda” if a GPU is present; otherwise, it falls back to “cpu”, ensuring your models run on the optimal hardware.

pipe1 = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None
).to(device)
pipe1.enable_attention_slicing()

We load the Stable Diffusion v1.5 model in half-precision (float16) without the built-in safety checker, transfers it to your selected device (GPU, if available), and then enables attention slicing to reduce peak VRAM usage during image generation.

pipe2 = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-base",
    torch_dtype=torch.float16,
    safety_checker=None
).to(device)
pipe2.enable_attention_slicing()

We load the Stable Diffusion v2 “base” model in 16-bit precision without the default safety filter, transfers it to your chosen device, and activates attention slicing to optimize memory usage during inference.

pipe3 = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16,
    safety_checker=None
).to(device)
pipe3.enable_attention_slicing()

We pull in Stability AI’s Stable Diffusion 3 “medium” checkpoint in 16-bit precision (skipping the built-in safety checker), transfers it to your selected device, and enables attention slicing to reduce GPU memory usage during generation.

def generate(prompt, steps, scale):
    img1 = pipe1(prompt, num_inference_steps=steps, guidance_scale=scale).images[0]
    img2 = pipe2(prompt, num_inference_steps=steps, guidance_scale=scale).images[0]
    img3 = pipe3(prompt, num_inference_steps=steps, guidance_scale=scale).images[0]
    return img1, img2, img3

Now, this function runs the same text prompt through all three loaded pipelines (pipe1, pipe2, pipe3) using the specified inference steps and guidance scale, then returns the first image from each, making it perfect for comparing outputs across Stable Diffusion v1.5, v2-base, and v3-medium.

def choose(selection):
    return f"✅ You selected: **{selection}**"


with gr.Blocks() as demo:
    gr.Markdown("## AI Social-Post Generator with 3 Models")
    with gr.Row():
        prompt = gr.Textbox(label="Prompt", placeholder="A vibrant beach sunset…")
        steps  = gr.Slider( 1, 100, value=50, step=1,     label="Inference Steps")
        scale  = gr.Slider( 1.0, 20.0, value=7.5, step=0.1, label="Guidance Scale")
    btn = gr.Button("Generate Images")
    with gr.Row():
        out1 = gr.Image(label="Model 1: SD v1.5")
        out2 = gr.Image(label="Model 2: SD v2-base")
        out3 = gr.Image(label="Model 3: SD v3-medium")
    sel = gr.Radio(
        ["Model 1: SD v1.5","Model 2: SD v2-base","Model 3: SD v3-medium"],
        label="Select your favorite"
    )
    txt = gr.Markdown()


    btn.click(fn=generate, inputs=[prompt, steps, scale], outputs=[out1, out2, out3])
    sel.change(fn=choose, inputs=sel, outputs=txt)


demo.launch(share=True)

Finally, this Gradio app builds a three-column UI where you can enter a text prompt, adjust inference steps and guidance scale, then generate and display images from SD v1.5, v2-base, and v3-medium side by side. It also features a radio selector, allowing you to select your preferred model output, and displays a simple confirmation message when a choice is made.

A web interface to compare the three Stability AI models’ output 

In conclusion, by integrating Stability AI’s state-of-the-art diffusion architectures into an easy-to-use Gradio app, you’ve seen how effortlessly you can prototype, compare, and deploy stunning visuals that resonate on today’s platforms. From A/B-testing creative directions to automating campaign assets at scale, Stability AI provides the performance, flexibility, and vibrant community support to transform your content pipeline.


Check out the Colab Notebook. Don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post A Coding Guide to Compare Three Stability AI Diffusion Models (v1.5, v2-Base & SD3-Medium) Diffusion Capabilities Side-by-Side in Google Colab Using Gradio appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/05/05/a-coding-guide-to-compare-three-stability-ai-diffusion-models-v1-5-v2-base-sd3-medium-diffusion-capabilities-side-by-side-in-google-colab-using-gradio/feed/ 0 71126
Building AI Agents Using Agno’s Multi-Agent Teaming Framework for Comprehensive Market Analysis and Risk Reporting https://www.marktechpost.com/2025/05/04/building-ai-agents-using-agnos-multi-agent-teaming-framework-for-comprehensive-market-analysis-and-risk-reporting/ https://www.marktechpost.com/2025/05/04/building-ai-agents-using-agnos-multi-agent-teaming-framework-for-comprehensive-market-analysis-and-risk-reporting/#respond Sun, 04 May 2025 20:27:40 +0000 https://www.marktechpost.com/?p=71099 In today’s fast-paced financial landscape, leveraging specialized AI agents to handle discrete aspects of analysis is key to delivering timely, accurate insights. Agno’s lightweight, model-agnostic framework empowers developers to rapidly spin up purpose-built agents, such as our Finance Agent for structured market data and Risk Assessment Agent for volatility and sentiment analysis, without boilerplate or […]

The post Building AI Agents Using Agno’s Multi-Agent Teaming Framework for Comprehensive Market Analysis and Risk Reporting appeared first on MarkTechPost.

]]>
In today’s fast-paced financial landscape, leveraging specialized AI agents to handle discrete aspects of analysis is key to delivering timely, accurate insights. Agno’s lightweight, model-agnostic framework empowers developers to rapidly spin up purpose-built agents, such as our Finance Agent for structured market data and Risk Assessment Agent for volatility and sentiment analysis, without boilerplate or complex orchestration code. By defining clear instructions and composing a multi-agent “Finance-Risk Team,” Agno handles the coordination, tool invocation, and context management behind the scenes, enabling each agent to focus on its domain expertise while seamlessly collaborating to produce a unified report.

!pip install -U agno google-genai duckduckgo-search yfinance

We install and upgrade the core Agno framework, Google’s GenAI SDK for Gemini integration, the DuckDuckGo search library for querying live information, and YFinance for seamless access to stock market data. By running it at the start of our Colab session, we ensure all necessary dependencies are available and up to date for building and running your finance and risk assessment agents.

from getpass import getpass
import os


os.environ["GOOGLE_API_KEY"] = getpass("Enter your Google API key: ")

The above code securely prompts you to enter your Google API key in Colab without echoing it to the screen, and then it is stored in the GOOGLE_API_KEY environment variable. Agno’s Gemini model wrapper and the Google GenAI SDK can automatically authenticate subsequent API calls by setting this variable.

from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.reasoning import ReasoningTools
from agno.tools.yfinance import YFinanceTools


agent = Agent(
    model=Gemini(id="gemini-1.5-flash"),  
    tools=[
        ReasoningTools(add_instructions=True),
        YFinanceTools(
            stock_price=True,
            analyst_recommendations=True,
            company_info=True,
            company_news=True
        ),
    ],
    instructions=[
        "Use tables to display data",
        "Only output the report, no other text",
    ],
    markdown=True,
)


agent.print_response(
    "Write a report on AAPL",
    stream=True,
    show_full_reasoning=True,
    stream_intermediate_steps=True
)

We initialize an Agno agent powered by Google’s Gemini (1.5 Flash) model, equip it with reasoning capabilities and YFinance tools to fetch stock data, analyst recommendations, company information, and news, and then stream a step-by-step, fully transparent report on AAPL, complete with chained reasoning and intermediate tool calls, directly to the Colab output.

finance_agent = Agent(
    name="Finance Agent",
    model=Gemini(id="gemini-1.5-flash"),
    tools=[
        YFinanceTools(
            stock_price=True,
            analyst_recommendations=True,
            company_info=True,
            company_news=True
        )
    ],
    instructions=[
        "Use tables to display stock price, analyst recommendations, and company info.",
        "Only output the financial report without additional commentary."
    ],
    markdown=True
)


risk_agent = Agent(
    name="Risk Assessment Agent",
    model=Gemini(id="gemini-1.5-flash"),
    tools=[
        YFinanceTools(
            stock_price=True,
            company_news=True
        ),
        ReasoningTools(add_instructions=True)
    ],
    instructions=[
        "Analyze recent price volatility and news sentiment to provide a risk assessment.",
        "Use tables where appropriate and only output the risk assessment section."
    ],
    markdown=True
)

These definitions create two specialized Agno agents using Google’s Gemini (1.5 Flash) model: the Finance Agent fetches and tabulates stock prices, analyst recommendations, company info, and news to deliver a concise financial report, while the Risk Assessment Agent analyzes price volatility and news sentiment, leveraging reasoning tools where needed, to generate a focused risk assessment section.

from agno.team.team import Team
from textwrap import dedent


team = Team(
    name="Finance-Risk Team",
    mode="coordinate",
    model=Gemini(id="gemini-1.5-flash"),
    members=[finance_agent, risk_agent],
    tools=[ReasoningTools(add_instructions=True)],
    instructions=[
        "Delegate financial analysis requests to the Finance Agent.",
        "Delegate risk assessment requests to the Risk Assessment Agent.",
        "Combine their outputs into one comprehensive report."
    ],
    markdown=True,
    show_members_responses=True,
    enable_agentic_context=True
)


task = dedent("""
1. Provide a financial overview of AAPL.
2. Provide a risk assessment for AAPL based on volatility and recent news.
""")


response = team.run(task)
print(response.content)

We assemble a coordinated “Finance-Risk Team” using Agno and Google Gemini. It delegates financial analyses to the Finance Agent and volatility/news assessments to the Risk Assessment Agent, then synthesizes their outputs into a single, comprehensive report. By calling team.run on a two-part AAPL task, it transparently orchestrates each expert agent and prints the unified result.

team.print_response(
    task,
    stream=True,
    stream_intermediate_steps=True,
    show_full_reasoning=True
)

We instruct the Finance-Risk Team to execute the AAPL task in real time, streaming each agent’s internal reasoning, tool invocations, and partial outputs as they happen. By enabling stream_intermediate_steps and show_full_reasoning, we’ll see exactly how Agno coordinates the Finance and Risk Assessment Agents step-by-step before delivering the final, combined report.

In conclusion, harnessing Agno’s multi-agent teaming capabilities transforms what would traditionally be a monolithic AI workflow into a modular, maintainable system of experts. Each agent in the team can specialize in fetching financial metrics, parsing analyst sentiment, or evaluating risk factors. At the same time, Agno’s Team API orchestrates delegation, context-sharing, and final synthesis. The result is a robust, extensible architecture ranging from simple two-agent setups to complex ensembles with minimal code changes and maximal clarity.


Check out the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Building AI Agents Using Agno’s Multi-Agent Teaming Framework for Comprehensive Market Analysis and Risk Reporting appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/05/04/building-ai-agents-using-agnos-multi-agent-teaming-framework-for-comprehensive-market-analysis-and-risk-reporting/feed/ 0 71099
A Step-by-Step Tutorial on Connecting Claude Desktop to Real-Time Web Search and Content Extraction via Tavily AI and Smithery using Model Context Protocol (MCP) https://www.marktechpost.com/2025/05/03/a-step-by-step-tutorial-on-connecting-claude-desktop-to-real-time-web-search-and-content-extraction-via-tavily-ai-and-smithery-using-model-context-protocol-mcp/ https://www.marktechpost.com/2025/05/03/a-step-by-step-tutorial-on-connecting-claude-desktop-to-real-time-web-search-and-content-extraction-via-tavily-ai-and-smithery-using-model-context-protocol-mcp/#respond Sun, 04 May 2025 03:53:01 +0000 https://www.marktechpost.com/?p=71084 In this hands-on tutorial, we’ll learn how to seamlessly connect Claude Desktop to real-time web search and content-extraction capabilities using Tavily AI’s Model Context Protocol (MCP) server and the Smithery client. We’ll begin by reviewing the Tavily homepage and dashboard, where you’ll generate your Developer API key. Next, we’ll explore the Tavily MCP server in […]

The post A Step-by-Step Tutorial on Connecting Claude Desktop to Real-Time Web Search and Content Extraction via Tavily AI and Smithery using Model Context Protocol (MCP) appeared first on MarkTechPost.

]]>
In this hands-on tutorial, we’ll learn how to seamlessly connect Claude Desktop to real-time web search and content-extraction capabilities using Tavily AI’s Model Context Protocol (MCP) server and the Smithery client. We’ll begin by reviewing the Tavily homepage and dashboard, where you’ll generate your Developer API key. Next, we’ll explore the Tavily MCP server in Smithery’s interface, install and configure the tavily-mcp package for Claude via the Smithery “Add Server” flow, and verify the installation with a simple PowerShell command. Finally, you’ll see how Claude can invoke Tavily tools, tavily-search and tavily-extract, to fetch and parse live content from sites. By the end of this tutorial, we’ll have a fully integrated pipeline that empowers your AI workflows with up-to-the-minute information directly from the web.

Step 01: Go to the Tavily AI Homepage to sign up and access the Tavily API to set up the MCP server on the Claude desktop.

Step 2: Here you see the Tavily dashboard under the “Researcher” plan, with an API usage bar (0/1,000 credits) and the generated dev key (tvly-dev-…) ready to be copied for authenticating your requests.

Step 3: In Smithery’s server list, the Tavily MCP Server appears as a remote, scanned integration, with its two primary tools, tavily-search and tavily-extract, detailed under the Tools section.

Step 4: Clicking “Add Server” opens Smithery’s client selector in Auto mode, listing supported integrations such as Claude Desktop, Cursor, VS Code, and more.

Step 5: The Claude Desktop configuration modal shows the “Personal” profile selected by default and prompts you to enter your Tavily API key to enable the MCP connection.

Step 6: A Windows PowerShell window confirms successful resolution and installation of the Tavily MCP package for the Claude client, indicating you can now trust and use this server integration.

Step 7: Now, Tavily MCP would have been set up in Claude. Just close and exit the Claude desktop and restart to see it in settings.

Step 8: The tool-toggle menu in Claude lets you enable or disable tavily-search and tavily-extract on the fly, offering granular control over which MCP tools the assistant may call.

Step 9: Within Claude’s chat UI, you can observe the assistant invoking the tavily-search and tavily-extract tool calls inline as it searches marktechpost.com for recent AI articles and extracts their content.

In conclusion, Integrating Tavily’s MCP server with Claude Desktop via Smithery has unlocked a powerful synergy of real-time web search and content extraction within your AI workflows. This setup doesn’t just keep your models up to date, it empowers them to source, analyze, and synthesize fresh information on the fly, whether you’re conducting market research, fueling a RAG pipeline, or automating domain-specific insights. To take full advantage, revisit the Tavily dashboard and Smithery tool configuration to fine-tune query parameters, combine tavily-search and tavily-extract in your prompts, and explore advanced features like custom filters or scheduled queries.


Don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post A Step-by-Step Tutorial on Connecting Claude Desktop to Real-Time Web Search and Content Extraction via Tavily AI and Smithery using Model Context Protocol (MCP) appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/05/03/a-step-by-step-tutorial-on-connecting-claude-desktop-to-real-time-web-search-and-content-extraction-via-tavily-ai-and-smithery-using-model-context-protocol-mcp/feed/ 0 71084
Vision Foundation Models: Implementation and Business Applications https://www.marktechpost.com/2025/05/03/vision-foundation-models-implementation-and-business-applications/ https://www.marktechpost.com/2025/05/03/vision-foundation-models-implementation-and-business-applications/#respond Sat, 03 May 2025 19:59:58 +0000 https://www.marktechpost.com/?p=71072 In this tutorial, we’ll explore implementing various vision foundation models for business applications. We’ll focus on practical code implementation, technical details, and business use cases rather than theoretical aspects. Setup and Environment Configuration First, let’s set up our environment and install the necessary libraries: # Verify CUDA availability for GPU acceleration 1. CLIP: Contrastive Language-Image […]

The post Vision Foundation Models: Implementation and Business Applications appeared first on MarkTechPost.

]]>
In this tutorial, we’ll explore implementing various vision foundation models for business applications. We’ll focus on practical code implementation, technical details, and business use cases rather than theoretical aspects.

Setup and Environment Configuration

First, let’s set up our environment and install the necessary libraries:

!pip install torch torchvision transformers timm pillow matplotlib opencv-python tensorflow-hub tensorflow
!pip install huggingface_hub sentence-transformers ftfy regex tqdm
!pip install accelerate

# Verify CUDA availability for GPU acceleration

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
   print(f"CUDA device: {torch.cuda.get_device_name(0)}")

1. CLIP: Contrastive Language-Image Pre-training

CLIP by OpenAI excels at connecting images with natural language, making it powerful for zero-shot image classification and retrieval tasks.

Business Applications:

  • Product image search and recommendation
  • Content moderation
  • Visual brand monitoring
  • Cross-modal retrieval systems
import torch
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
import matplotlib.pyplot as plt
import numpy as np


# Load model and processor
model_id = "openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_id)
processor = CLIPProcessor.from_pretrained(model_id)


# Function to get image embeddings
def get_clip_image_embedding(image_path):
   image = Image.open(image_path) if isinstance(image_path, str) else image_path
   inputs = processor(images=image, return_tensors="pt")
   with torch.no_grad():
       image_features = model.get_image_features(**inputs)
   return image_features


# Function to perform zero-shot classification
def classify_image_with_clip(image_path, categories):
   image = Image.open(image_path) if isinstance(image_path, str) else image_path
   inputs = processor(
       text=categories,
       images=image,
       return_tensors="pt",
       padding=True
   )


   with torch.no_grad():
       outputs = model(**inputs)
       logits_per_image = outputs.logits_per_image
       probs = logits_per_image.softmax(dim=1)


   # Return dict of categories and probabilities
   return {categories[i]: probs[0][i].item() for i in range(len(categories))}


# Example: Product categorization
url = "https://images.unsplash.com/photo-1542291026-7eec264c27ff?q=80&w=1470&auto=format&fit=crop"
image = Image.open(requests.get(url, stream=True).raw)


product_categories = [
   "sneakers", "formal shoes", "sandals", "boots",
   "sports equipment", "casual wear", "luxury item"
]


results = classify_image_with_clip(image, product_categories)


# Sort results by probability
sorted_results = dict(sorted(results.items(), key=lambda x: x[1], reverse=True))


# Display the image and classification results
plt.figure(figsize=(12, 6))


# Plot the image on the left
plt.subplot(1, 2, 1)
plt.imshow(np.array(image))
plt.title("Input Image")
plt.axis("off")


# Plot the classification results on the right
plt.subplot(1, 2, 2)
categories = list(sorted_results.keys())
scores = list(sorted_results.values())


y_pos = np.arange(len(categories))
plt.barh(y_pos, scores, align="center")
plt.yticks(y_pos, categories)
plt.xlabel("Probability")
plt.title("CLIP Classification Results")


plt.tight_layout()
plt.show()


# Also print results to console
print("Classification Results:")
for category, score in sorted_results.items():
   print(f"{category}: {score:.4f}")
Output

2. DINO v2: Self-supervised Vision Transformer

DINO v2 by Meta AI Research provides powerful visual features without requiring labeled data, making it excellent for various downstream tasks.

Business Applications:

  • Visual similarity search
  • Anomaly detection
  • Product clustering
  • Image feature extraction for downstream ML tasks
import torch
import torchvision.transforms as T
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from torch.nn import functional as F
import requests
from io import BytesIO


# Load DINOv2 model
dinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
dinov2_vits14.eval()


# Preprocess images for DINOv2
transform = T.Compose([
   T.Resize(256),
   T.CenterCrop(224),
   T.ToTensor(),
   T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


# Function to extract features
def extract_dinov2_features(image_path):
   image = Image.open(image_path).convert('RGB') if isinstance(image_path, str) else image_path
   img_tensor = transform(image).unsqueeze(0)


   with torch.no_grad():
       features = dinov2_vits14(img_tensor)


   return features


# Function to compute similarity between images
def compute_similarity(img1_path, img2_path):
   feat1 = extract_dinov2_features(img1_path)
   feat2 = extract_dinov2_features(img2_path)


   # Normalize features
   feat1 = F.normalize(feat1, dim=1)
   feat2 = F.normalize(feat2, dim=1)


   # Compute cosine similarity
   similarity = torch.mm(feat1, feat2.transpose(0, 1)).item()
   return similarity


# Function to download image from URL
def download_image(url):
   response = requests.get(url, stream=True)
   return Image.open(BytesIO(response.content)).convert('RGB')


# Function to visualize image pair with similarity score
def visualize_similarity(img1_path, img2_path, title=None):
   # Load images
   if img1_path.startswith(('http://', 'https://')):
       img1 = download_image(img1_path)
   else:
       img1 = Image.open(img1_path).convert('RGB')


   if img2_path.startswith(('http://', 'https://')):
       img2 = download_image(img2_path)
   else:
       img2 = Image.open(img2_path).convert('RGB')


   # Compute similarity
   similarity = compute_similarity(img1, img2)


   # Create figure for visualization
   fig, axes = plt.subplots(1, 2, figsize=(12, 6))


   # Display images
   axes[0].imshow(np.array(img1))
   axes[0].set_title("Image 1")
   axes[0].axis("off")


   axes[1].imshow(np.array(img2))
   axes[1].set_title("Image 2")
   axes[1].axis("off")


   # Add similarity score as figure title
   fig_title = f"Similarity Score: {similarity:.4f}"
   if title:
       fig_title = f"{title}n{fig_title}"
   fig.suptitle(fig_title, fontsize=16)


   plt.tight_layout()
   plt.show()


   return similarity


# Example: Use direct URLs instead of downloading files first
# Sample sneaker images from Unsplash
url1 = "https://images.unsplash.com/photo-1560769629-975ec94e6a86?w=500"  # Red sneaker
url2 = "https://images.unsplash.com/photo-1600185365926-3a2ce3cdb9eb?w=500"  # White sneaker
url3 = "https://images.unsplash.com/photo-1491553895911-0055eca6402d?w=500"  # Another sneaker


# Visualize pairs with similarity scores
print("Comparing Product 1 and Product 2:")
similarity_1_2 = visualize_similarity(url1, url2, "Red Sneaker vs White Sneaker")


print("nComparing Product 1 and Product 3:")
similarity_1_3 = visualize_similarity(url1, url3, "Red Sneaker vs Another Sneaker")


print("nComparing Product 2 and Product 3:")
similarity_2_3 = visualize_similarity(url2, url3, "White Sneaker vs Another Sneaker")


# Print summary of all similarities
print("nSummary of Similarity Scores:")
print(f"Similarity between product 1 and 2: {similarity_1_2:.4f}")
print(f"Similarity between product 1 and 3: {similarity_1_3:.4f}")
print(f"Similarity between product 2 and 3: {similarity_2_3:.4f}")
Output

3. Segment Anything Model (SAM): Advanced Image Segmentation

SAM by Meta AI provides powerful zero-shot segmentation capabilities for various business applications.

Business Applications:

Automated image cataloging

Precise product measurement in retail

Medical image analysis

Agricultural crop monitoring

Content creation and editing

# Install required libraries for SAM
!pip install git+https://github.com/facebookresearch/segment-anything.git


import torch
import numpy as np
import matplotlib.pyplot as plt
from segment_anything import sam_model_registry, SamPredictor
import cv2
from PIL import Image
import requests


# Download SAM checkpoint
!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth


# Load SAM model
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
device = "cuda" if torch.cuda.is_available() else "cpu"
sam.to(device)
predictor = SamPredictor(sam)


# Function to perform automatic segmentation
def segment_image(image_path):
   # Load image
   image = cv2.imread(image_path)
   image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)


   # Set image for SAM
   predictor.set_image(image_rgb)


   # Generate automatic masks
   masks, scores, logits = predictor.predict(
       point_coords=None,
       point_labels=None,
       multimask_output=True,
       box=None
   )


   return image_rgb, masks, scores


# Function to visualize segmentation results
def visualize_segmentation(image, masks, scores, limit=5):
   plt.figure(figsize=(15, 10))


   # Display original image
   plt.subplot(1, limit+1, 1)
   plt.imshow(image)
   plt.title("Original Image")
   plt.axis('off')


   # Display top masks
   top_indices = np.argsort(scores)[-limit:][::-1]
   for i, idx in enumerate(top_indices):
       plt.subplot(1, limit+1, i+2)
       plt.imshow(image)
       plt.imshow(masks[idx], alpha=0.7, cmap='jet')
       plt.title(f"Mask {i+1}nScore: {scores[idx]:.3f}")
       plt.axis('off')


   plt.tight_layout()
   plt.show()


# Example: Product segmentation for e-commerce
!wget -q -O product_image.jpg "https://images.unsplash.com/photo-1525966222134-fcfa99b8ae77?w=800"


image_rgb, masks, scores = segment_image("product_image.jpg")
visualize_segmentation(image_rgb, masks, scores)


# Business application: Calculate precise product measurements
def calculate_object_dimensions(mask):
   # Find contours in the mask
   contours, _ = cv2.findContours((mask * 255).astype(np.uint8),
                                  cv2.RETR_EXTERNAL,
                                  cv2.CHAIN_APPROX_SIMPLE)


   if not contours:
       return None


   # Get the largest contour
   largest_contour = max(contours, key=cv2.contourArea)


   # Get bounding rectangle
   x, y, w, h = cv2.boundingRect(largest_contour)


   # Calculate aspect ratio
   aspect_ratio = w / h


   # Calculate area in pixels
   area_pixels = cv2.contourArea(largest_contour)


   return {
       'width': w,
       'height': h,
       'aspect_ratio': aspect_ratio,
       'area_pixels': area_pixels
   }


# Apply to the highest scoring mask
best_mask_idx = np.argmax(scores)
dimensions = calculate_object_dimensions(masks[best_mask_idx])


print("Product Dimensions:")
print(f"Width: {dimensions['width']} pixels")
print(f"Height: {dimensions['height']} pixels")
print(f"Aspect Ratio: {dimensions['aspect_ratio']:.2f}")
print(f"Area: {dimensions['area_pixels']} square pixels")
Output

4. BLIP-2: Vision-Language Model for Business Intelligence

BLIP-2 provides advanced vision-language capabilities for multimodal business applications.

Business Applications:

  • Automated product description generation
  • Image-based customer service automation
  • Visual content analysis for marketing
  • Social media content understanding
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch
from PIL import Image
import requests
import matplotlib.pyplot as plt
import numpy as np
from io import BytesIO


# Load BLIP-2 model
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16)


if torch.cuda.is_available():
   model = model.to("cuda")


# Function to download image from URL
def download_image(url):
   response = requests.get(url, stream=True)
   return Image.open(BytesIO(response.content)).convert('RGB')


# Function for image captioning
def generate_caption(image_path):
   # Load image from path or URL
   if isinstance(image_path, str):
       if image_path.startswith(('http://', 'https://')):
           image = download_image(image_path)
       else:
           image = Image.open(image_path).convert('RGB')
   else:
       image = image_path


   inputs = processor(images=image, return_tensors="pt")


   if torch.cuda.is_available():
       inputs = {k: v.to("cuda") for k, v in inputs.items()}


   generated_ids = model.generate(**inputs, max_new_tokens=50)
   generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()


   return generated_text


# Function for visual question answering
def visual_qa(image_path, question):
   # Load image from path or URL
   if isinstance(image_path, str):
       if image_path.startswith(('http://', 'https://')):
           image = download_image(image_path)
       else:
           image = Image.open(image_path).convert('RGB')
   else:
       image = image_path


   # FIX: Properly format the question for the model
   # BLIP-2 needs a specific prompt format for QA
   prompt = f"Question: {question} Answer:"
   inputs = processor(images=image, text=prompt, return_tensors="pt")


   if torch.cuda.is_available():
       inputs = {k: v.to("cuda") for k, v in inputs.items()}


   generated_ids = model.generate(
       **inputs,
       max_new_tokens=30,
       do_sample=False  # Use greedy decoding for more precise answers
   )


   answer = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
   # Remove the prompt part from the answer
   answer = answer.replace(prompt, "").strip()


   return answer


# Function to visualize image with caption and QA
def visualize_product_analysis(image_path, questions=None):
   # Load image
   if isinstance(image_path, str):
       if image_path.startswith(('http://', 'https://')):
           image = download_image(image_path)
       else:
           image = Image.open(image_path).convert('RGB')
   else:
       image = image_path


   # Generate caption
   caption = generate_caption(image)


   # Default questions if none provided
   if questions is None:
       questions = [
           "What color is this product?",
           "What material is this product made of?",
           "What is the target demographic for this product?",
           "What is a key feature of this product?"
       ]


   # Get answers
   answers = []
   for question in questions:
       answer = visual_qa(image, question)
       answers.append((question, answer))


   # Create visualization
   plt.figure(figsize=(12, 10))


   # Display image
   plt.subplot(2, 1, 1)
   plt.imshow(np.array(image))
   plt.title("Product Image", fontsize=14)
   plt.axis('off')


   # Display caption and Q&A
   plt.subplot(2, 1, 2)
   plt.axis('off')


   text_content = f"Generated Description: {caption}nn"
   text_content += "Product Analysis:n"
   for q, a in answers:
       text_content += f"Q: {q}nA: {a}nn"


   plt.text(0.01, 0.99, text_content, transform=plt.gca().transAxes,
            fontsize=12, verticalalignment='top', wrap=True)


   plt.tight_layout()
   plt.show()


   return caption, answers


# Business application: Automated product listing
def create_product_listing(image_path):
   # Load image
   if isinstance(image_path, str):
       if image_path.startswith(('http://', 'https://')):
           image = download_image(image_path)
       else:
           image = Image.open(image_path).convert('RGB')
   else:
       image = image_path


   # Get basic caption
   caption = generate_caption(image)


   # Extract product attributes with more specific prompting
   color = visual_qa(image, "What colors are visible in this product?")
   material = visual_qa(image, "What material does this product appear to be made of?")
   use_case = visual_qa(image, "What would be the main use case for this product?")
   unique_features = visual_qa(image, "What are any unique or notable features of this product?")


   # Create structured listing
   listing = {
       "title": caption,
       "attributes": {
           "color": color,
           "material": material,
           "primary_use": use_case,
           "unique_features": unique_features
       }
   }


   # Visualize the listing
   plt.figure(figsize=(14, 10))


   # Display image
   plt.subplot(1, 2, 1)
   plt.imshow(np.array(image))
   plt.title("Product Image", fontsize=14)
   plt.axis('off')


   # Display listing details
   plt.subplot(1, 2, 2)
   plt.axis('off')


   listing_text = f"PRODUCT LISTINGnn"
   listing_text += f"Title: {listing['title']}nn"
   listing_text += "Product Attributes:n"
   for attr, value in listing['attributes'].items():
       listing_text += f"{attr.replace('_', ' ').title()}: {value}n"


   plt.text(0.01, 0.99, listing_text, transform=plt.gca().transAxes,
            fontsize=12, verticalalignment='top')


   plt.tight_layout()
   plt.show()


   return listing


# Function for marketing content analysis
def analyze_marketing_content(image_path):
   # Load image
   if isinstance(image_path, str):
       if image_path.startswith(('http://', 'https://')):
           image = download_image(image_path)
       else:
           image = Image.open(image_path).convert('RGB')
   else:
       image = image_path


   # Marketing-specific questions
   marketing_questions = [
       "What emotions does this image evoke?",
       "What brand values are communicated in this image?",
       "What target audience would this image appeal to?",
       "What call to action would pair well with this image?",
       "What marketing channel would this image be most effective on?"
   ]


   # Get answers
   marketing_insights = {}
   for question in marketing_questions:
       answer = visual_qa(image, question)
       key = question.split("?")[0].strip().lower().replace(" ", "_")
       marketing_insights[key] = answer


   # Visualize the analysis
   plt.figure(figsize=(14, 10))


   # Display image
   plt.subplot(1, 2, 1)
   plt.imshow(np.array(image))
   plt.title("Marketing Visual", fontsize=14)
   plt.axis('off')


   # Display marketing insights
   plt.subplot(1, 2, 2)
   plt.axis('off')


   insights_text = "MARKETING CONTENT ANALYSISnn"
   for question, key in zip(marketing_questions, marketing_insights.keys()):
       insights_text += f"{question}n{marketing_insights[key]}nn"


   plt.text(0.01, 0.99, insights_text, transform=plt.gca().transAxes,
            fontsize=12, verticalalignment='top')


   plt.tight_layout()
   plt.show()


   return marketing_insights


# Function for social media understanding
def analyze_social_media_content(image_path):
   # Load image
   if isinstance(image_path, str):
       if image_path.startswith(('http://', 'https://')):
           image = download_image(image_path)
       else:
           image = Image.open(image_path).convert('RGB')
   else:
       image = image_path


   # Generate caption
   caption = generate_caption(image)


   # Social media specific analysis
   engagement_potential = visual_qa(image, "How likely is this image to engage viewers on social media?")
   suggested_hashtags = visual_qa(image, "What hashtags would be appropriate for this image on social media?")
   platform_fit = visual_qa(image, "Which social media platform would this image perform best on?")
   content_type = visual_qa(image, "What type of social media post would this image be suitable for?")


   # Create analysis dict
   social_analysis = {
       "caption": caption,
       "engagement_potential": engagement_potential,
       "suggested_hashtags": suggested_hashtags,
       "platform_fit": platform_fit,
       "content_type": content_type
   }


   # Visualize the analysis
   plt.figure(figsize=(14, 10))


   # Display image
   plt.subplot(1, 2, 1)
   plt.imshow(np.array(image))
   plt.title("Social Media Content", fontsize=14)
   plt.axis('off')


   # Display social media insights
   plt.subplot(1, 2, 2)
   plt.axis('off')


   insights_text = "SOCIAL MEDIA CONTENT ANALYSISnn"
   insights_text += f"Caption: {social_analysis['caption']}nn"
   insights_text += f"Engagement Potential: {social_analysis['engagement_potential']}nn"
   insights_text += f"Suggested Hashtags: {social_analysis['suggested_hashtags']}nn"
   insights_text += f"Best Platform: {social_analysis['platform_fit']}nn"
   insights_text += f"Content Type: {social_analysis['content_type']}n"


   plt.text(0.01, 0.99, insights_text, transform=plt.gca().transAxes,
            fontsize=12, verticalalignment='top')


   plt.tight_layout()
   plt.show()


   return social_analysis


# Example usage
if __name__ == "__main__":
   # Example: E-commerce product analysis
   product_url = "https://images.unsplash.com/photo-1598033129183-c4f50c736f10?w=800"


   print("1. Basic Product Analysis")
   caption, qa_results = visualize_product_analysis(product_url)


   print("n2. Creating Automated Product Listing")
   product_listing = create_product_listing(product_url)


   print("n3. Marketing Content Analysis")
   marketing_url = "https://images.unsplash.com/photo-1581252584837-9f0b1d3bf82c?ixlib=rb-4.0.3&q=80"
   marketing_insights = analyze_marketing_content(marketing_url)


   print("n4. Social Media Content Analysis")
   social_url = "https://images.unsplash.com/photo-1534442072653-dbbf80c5e1ae?ixlib=rb-4.0.3&q=80"
   social_analysis = analyze_social_media_content(social_url)
Output 1
Output 2

Conclusion

This tutorial provides hands-on implementation guidance for deploying four key computer vision foundation models into business applications: CLIP (zero-shot classification), DINO v2 (self-supervised learning), SAM (image segmentation), and BLIP-2 (vision-language tasks).Future experimentation could explore model ensemble techniques, fine-tuning on domain-specific datasets, edge deployment optimization, and integration with business intelligence platforms to maximize ROI on vision AI investments.


Check out the Notebook here. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Vision Foundation Models: Implementation and Business Applications appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/05/03/vision-foundation-models-implementation-and-business-applications/feed/ 0 71072
Building a REACT-Style Agent Using Fireworks AI with LangChain that Fetches Data, Generates BigQuery SQL, and Maintains Conversational Memory https://www.marktechpost.com/2025/05/01/building-a-react-style-agent-using-fireworks-ai-with-langchain-that-fetches-data-generates-bigquery-sql-and-maintains-conversational-memory/ https://www.marktechpost.com/2025/05/01/building-a-react-style-agent-using-fireworks-ai-with-langchain-that-fetches-data-generates-bigquery-sql-and-maintains-conversational-memory/#respond Fri, 02 May 2025 03:19:00 +0000 https://www.marktechpost.com/?p=71025 In this tutorial, we will explore how to leverage the capabilities of Fireworks AI for building intelligent, tool-enabled agents with LangChain. Starting from installing the langchain-fireworks package and configuring your Fireworks API key, we’ll set up a ChatFireworks LLM instance, powered by the high-performance llama-v3-70b-instruct model, and integrate it with LangChain’s agent framework. Along the […]

The post Building a REACT-Style Agent Using Fireworks AI with LangChain that Fetches Data, Generates BigQuery SQL, and Maintains Conversational Memory appeared first on MarkTechPost.

]]>
In this tutorial, we will explore how to leverage the capabilities of Fireworks AI for building intelligent, tool-enabled agents with LangChain. Starting from installing the langchain-fireworks package and configuring your Fireworks API key, we’ll set up a ChatFireworks LLM instance, powered by the high-performance llama-v3-70b-instruct model, and integrate it with LangChain’s agent framework. Along the way, we’ll define custom tools such as a URL fetcher for scraping webpage text and an SQL generator for converting plain-language requirements into executable BigQuery queries. By the end, we’ll have a fully functional REACT-style agent that can dynamically invoke tools, maintain conversational memory, and deliver sophisticated, end-to-end workflows powered by Fireworks AI.

!pip install -qU langchain langchain-fireworks requests beautifulsoup4

We bootstrap the environment by installing all the required Python packages, including langchain, its Fireworks integration, and common utilities such as requests and beautifulsoup4. This ensures that we have the latest versions of all necessary components to run the rest of the notebook seamlessly.

import requests
from bs4 import BeautifulSoup
from langchain.tools import BaseTool
from langchain.agents import initialize_agent, AgentType
from langchain_fireworks import ChatFireworks
from langchain import LLMChain, PromptTemplate
from langchain.memory import ConversationBufferMemory
import getpass
import os

We bring in all the necessary imports: HTTP clients (requests, BeautifulSoup), the LangChain agent framework (BaseTool, initialize_agent, AgentType), the Fireworks-powered LLM (ChatFireworks), plus prompt and memory utilities (LLMChain, PromptTemplate, ConversationBufferMemory), as well as standard modules for secure input and environment management.

os.environ["FIREWORKS_API_KEY"] = getpass("🚀 Enter your Fireworks API key: ")

Now, it prompts you to enter your Fireworks API key via getpass securely and sets it in the environment. This step ensures that subsequent calls to the ChatFireworks model are authenticated without exposing your key in plain text.

llm = ChatFireworks(
    model="accounts/fireworks/models/llama-v3-70b-instruct",
    temperature=0.6,
    max_tokens=1024,
    stop=["\n\n"]
)

We demonstrate how to instantiate a ChatFireworks LLM configured for instruction-following, utilizing llama-v3-70b-instruct, a moderate temperature, and a token limit, allowing you to immediately start issuing prompts to the model.

prompt = [
    {"role":"system","content":"You are an expert data-scientist assistant."},
    {"role":"user","content":"Analyze the sentiment of this review:\n\n"
                           "\"The new movie was breathtaking, but a bit too long.\""}
]
resp = llm.invoke(prompt)
print("Sentiment Analysis →", resp.content)

Next, we demonstrate a simple sentiment-analysis example: it builds a structured prompt as a list of role-annotated messages, invokes llm.invoke(), and prints out the model’s sentiment interpretation of the provided movie review.

template = """
You are a data-science assistant. Keep track of the convo:


{history}
User: {input}
Assistant:"""


prompt = PromptTemplate(input_variables=["history","input"], template=template)
memory = ConversationBufferMemory(memory_key="history")


chain = LLMChain(llm=llm, prompt=prompt, memory=memory)


print(chain.run(input="Hey, what can you do?"))
print(chain.run(input="Analyze: 'The product arrived late, but support was helpful.'"))
print(chain.run(input="Based on that, would you recommend the service?"))

We illustrate how to add conversational memory, which involves defining a prompt template that incorporates past exchanges, setting up a ConversationBufferMemory, and chaining everything together with LLMChain. Running a few sample inputs shows how the model retains context across turns.

class FetchURLTool(BaseTool):
    name: str = "fetch_url"
    description: str = "Fetch the main text (first 500 chars) from a webpage."


    def _run(self, url: str) -> str:
        resp = requests.get(url, timeout=10)
        doc = BeautifulSoup(resp.text, "html.parser")
        paras = [p.get_text() for p in doc.find_all("p")][:5]
        return "\n\n".join(paras)


    async def _arun(self, url: str) -> str:
        raise NotImplementedError

We define a custom FetchURLTool by subclassing BaseTool. This tool fetches the first few paragraphs from any URL using requests and BeautifulSoup, making it easy for your agent to retrieve live web content.

class GenerateSQLTool(BaseTool):
    name: str = "generate_sql"
    description: str = "Generate a BigQuery SQL query (with comments) from a text description."


    def _run(self, text: str) -> str:
        prompt = f"""
-- Requirement:
-- {text}


-- Write a BigQuery SQL query (with comments) to satisfy the above.
"""
        return llm.invoke([{"role":"user","content":prompt}]).content


    async def _arun(self, text: str) -> str:
        raise NotImplementedError


tools = [FetchURLTool(), GenerateSQLTool()]


agent = initialize_agent(
    tools,
    llm,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)


result = agent.run(
    "Fetch https://en.wikipedia.org/wiki/ChatGPT "
    "and then generate a BigQuery SQL query that counts how many times "
    "the word 'model' appears in the page text."
)


print("\n🔍 Generated SQL:\n", result)

Finally, GenerateSQLTool is another BaseTool subclass that wraps the LLM to transform plain-English requirements into commented BigQuery SQL. It then wires both tools into a REACT-style agent via initialize_agent, runs a combined fetch-and-generate example, and prints out the resulting SQL query.

In conclusion, we have integrated Fireworks AI with LangChain’s modular tooling and agent ecosystem, unlocking a versatile platform for building AI applications that extend beyond simple text generation. We can extend the agent’s capabilities by adding domain-specific tools, customizing prompts, and fine-tuning memory behavior, all while leveraging Fireworks’ scalable inference engine. As next steps, explore advanced features such as function-calling, chaining multiple agents, or incorporating vector-based retrieval to craft even more dynamic and context-aware assistants.


Check out the Notebook here. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Building a REACT-Style Agent Using Fireworks AI with LangChain that Fetches Data, Generates BigQuery SQL, and Maintains Conversational Memory appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/05/01/building-a-react-style-agent-using-fireworks-ai-with-langchain-that-fetches-data-generates-bigquery-sql-and-maintains-conversational-memory/feed/ 0 71025
A Step-by-Step Coding Guide to Integrate Dappier AI’s Real-Time Search and Recommendation Tools with OpenAI’s Chat API https://www.marktechpost.com/2025/04/30/a-step-by-step-coding-guide-to-integrate-dappier-ais-real-time-search-and-recommendation-tools-with-openais-chat-api/ https://www.marktechpost.com/2025/04/30/a-step-by-step-coding-guide-to-integrate-dappier-ais-real-time-search-and-recommendation-tools-with-openais-chat-api/#respond Thu, 01 May 2025 02:14:53 +0000 https://www.marktechpost.com/?p=70986 In this tutorial, we will learn how to harness the power of Dappier AI, a suite of real-time search and recommendation tools, to enhance our conversational applications. By combining Dappier’s cutting-edge RealTimeSearchTool with its AIRecommendationTool, we can query the latest information from across the web and surface personalized article suggestions from custom data models. We […]

The post A Step-by-Step Coding Guide to Integrate Dappier AI’s Real-Time Search and Recommendation Tools with OpenAI’s Chat API appeared first on MarkTechPost.

]]>
In this tutorial, we will learn how to harness the power of Dappier AI, a suite of real-time search and recommendation tools, to enhance our conversational applications. By combining Dappier’s cutting-edge RealTimeSearchTool with its AIRecommendationTool, we can query the latest information from across the web and surface personalized article suggestions from custom data models. We guide you step-by-step through setting up our Google Colab environment, installing dependencies, securely loading API keys, and initializing each Dappier module. We will then integrate these tools with an OpenAI chat model (e.g., gpt-3.5-turbo), construct a composable prompt chain, and execute end-to-end queries, all within nine concise notebook cells. Whether we need up-to-the-minute news retrieval or AI-driven content curation, this tutorial provides a flexible framework for building intelligent, data-driven chat experiences.

!pip install -qU langchain-dappier langchain langchain-openai langchain-community langchain-core openai

We bootstrap our Colab environment by installing the core LangChain libraries, both the Dappier extensions and the community integrations, alongside the official OpenAI client. With these packages in place, we will have seamless access to Dappier’s real-time search and recommendation tools, the latest LangChain runtimes, and the OpenAI API, all in one environment.

import os
from getpass import getpass


os.environ["DAPPIER_API_KEY"] = getpass("Enter our Dappier API key: ")


os.environ["OPENAI_API_KEY"] = getpass("Enter our OpenAI API key: ")

We securely capture our Dappier and OpenAI API credentials at runtime, thereby avoiding the hard-coding of sensitive keys in our notebook. By using getpass, the prompts ensure our inputs remain hidden, and setting them as environment variables makes them available to all subsequent cells without exposing them in logs.

from langchain_dappier import DappierRealTimeSearchTool


search_tool = DappierRealTimeSearchTool()
print("Real-time search tool ready:", search_tool)

We import Dappier’s real‐time search module and create an instance of the DappierRealTimeSearchTool, enabling our notebook to execute live web queries. The print statement confirms that the tool has been initialized successfully and is ready to handle search requests.

from langchain_dappier import DappierAIRecommendationTool


recommendation_tool = DappierAIRecommendationTool(
    data_model_id="dm_01j0pb465keqmatq9k83dthx34",
    similarity_top_k=3,
    ref="sportsnaut.com",
    num_articles_ref=2,
    search_algorithm="most_recent",
)
print("Recommendation tool ready:", recommendation_tool)

We set up Dappier’s AI-powered recommendation engine by specifying our custom data model, the number of similar articles to retrieve, and the source domain for context. The DappierAIRecommendationTool instance will now use the “most_recent” algorithm to pull in the top-k relevant articles (here, two) from our specified reference, ready for query-driven content suggestions.

from langchain.chat_models import init_chat_model


llm = init_chat_model(
    model="gpt-3.5-turbo",
    model_provider="openai",
    temperature=0,
)
llm_with_tools = llm.bind_tools([search_tool])
print("✅ llm_with_tools ready")

We create an OpenAI chat model instance using gpt-3.5-turbo with a temperature of 0 to ensure consistent responses, and then bind the previously initialized search tool so that the LLM can invoke real-time searches. The final print statement confirms that our LLM is ready to call Dappier’s tools within our conversational flows.

import datetime
from langchain_core.prompts import ChatPromptTemplate


today = datetime.datetime.today().strftime("%Y-%m-%d")
prompt = ChatPromptTemplate([
    ("system", f"we are a helpful assistant. Today is {today}."),
    ("human", "{user_input}"),
    ("placeholder", "{messages}"),
])


llm_chain = prompt | llm_with_tools
print("✅ llm_chain built")

We construct the conversational “chain” by first building a ChatPromptTemplate that injects the current date into a system prompt and defines slots for user input and prior messages. By piping the template (|) into our llm_with_tools, we create an llm_chain that automatically formats prompts, invokes the LLM (with real-time search capability), and handles responses in a seamless workflow. The final print confirms the chain is ready to drive end-to-end interactions.

from langchain_core.runnables import RunnableConfig, chain


@chain
def tool_chain(user_input: str, config: RunnableConfig):
    ai_msg = llm_chain.invoke({"user_input": user_input}, config=config)
    tool_msgs = search_tool.batch(ai_msg.tool_calls, config=config)
    return llm_chain.invoke(
        {"user_input": user_input, "messages": [ai_msg, *tool_msgs]},
        config=config
    )


print("✅ tool_chain defined")

We define an end-to-end tool_chain that first sends our prompt to the LLM (capturing any requested tool calls), then executes those calls via search_tool.batch, and finally feeds both the AI’s initial message and the tool outputs back into the LLM for a cohesive response. The @chain decorator transforms this into a single, runnable pipeline, allowing us to simply call tool_chain.invoke(…) to handle both thinking and searching in a single step.

res = search_tool.invoke({"query": "What happened at the last Wrestlemania"})
print("🔍 Search:", res)

We demonstrate a direct query to Dappier’s real-time search engine, asking “What happened at the last WrestleMania,” and immediately print the structured result. It shows how easily we can leverage search_tool.invoke to fetch up-to-the-moment information and inspect the raw response in our notebook.

rec = recommendation_tool.invoke({"query": "latest sports news"})
print("📄 Recommendation:", rec)


out = tool_chain.invoke("Who won the last Nobel Prize?")
print("🤖 Chain output:", out)

Finally, we showcase both our recommendation and full-chain workflows in action. First, it calls recommendation_tool.invoke with “latest sports news” to fetch relevant articles from our custom data model, then prints those suggestions. Next, it runs the tool_chain.invoke(“Who won the last Nobel Prize?”) to perform an end-to-end LLM query combined with real-time search, printing the AI’s synthesized answer, and integrating live data.

In conclusion, we now have a robust baseline for embedding Dappier AI capabilities into any conversational workflow. We’ve seen how effortlessly Dappier’s real-time search empowers our LLM to access fresh facts, while the recommendation tool enables us to deliver contextually relevant insights from proprietary data sources. From here, we can customize search parameters (e.g., refining query filters) or fine-tune recommendation settings (e.g., adjusting similarity thresholds and reference domains) to suit our domain.


Check out the Dappier Platform and Notebook here. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post A Step-by-Step Coding Guide to Integrate Dappier AI’s Real-Time Search and Recommendation Tools with OpenAI’s Chat API appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/04/30/a-step-by-step-coding-guide-to-integrate-dappier-ais-real-time-search-and-recommendation-tools-with-openais-chat-api/feed/ 0 70986
Tutorial on Seamlessly Accessing Any LinkedIn Profile with exa-mcp-server and Claude Desktop Using the Model Context Protocol MCP https://www.marktechpost.com/2025/04/30/tutorial-on-seamlessly-accessing-any-linkedin-profile-with-exa-mcp-server-and-claude-desktop-using-the-model-context-protocol-mcp/ https://www.marktechpost.com/2025/04/30/tutorial-on-seamlessly-accessing-any-linkedin-profile-with-exa-mcp-server-and-claude-desktop-using-the-model-context-protocol-mcp/#respond Wed, 30 Apr 2025 07:04:01 +0000 https://www.marktechpost.com/?p=70949 In this tutorial, we’ll learn how to harness the power of the exa-mcp-server alongside Claude Desktop to access any LinkedIn page programmatically. The exa-mcp-server provides a lightweight, high-performance implementation of the Model Context Protocol, enabling Claude Desktop to issue HTTP requests and return raw HTML or structured data on demand. Throughout this guide, we’ll install […]

The post Tutorial on Seamlessly Accessing Any LinkedIn Profile with exa-mcp-server and Claude Desktop Using the Model Context Protocol MCP appeared first on MarkTechPost.

]]>
In this tutorial, we’ll learn how to harness the power of the exa-mcp-server alongside Claude Desktop to access any LinkedIn page programmatically. The exa-mcp-server provides a lightweight, high-performance implementation of the Model Context Protocol, enabling Claude Desktop to issue HTTP requests and return raw HTML or structured data on demand. Throughout this guide, we’ll install and configure exa-mcp-server, connect it to your local Claude Desktop instance, and craft the precise protocol messages needed to fetch and display LinkedIn profiles, all without writing a single line of manual web-scraping code. By the end, we’ll have a reusable workflow that leverages an LLM-driven agent to retrieve and process LinkedIn content seamlessly. 

Step 1: Download the Claude Desktop

Step 2: Enable the Developer Mode from the left pane on Claude Desktop

Step 3: https://smithery.ai/server/exa – Retrieve the code containing the API key and installation of the exa server by copying the rounded text in the image below to run on a desktop terminal

Step 4: Copy the above code in the terminal and run it there

Step 5: Open Edit Config from the developer part on the top left pane on Claude Desktop

Step 6: Now, open the claude_desktop_config.json file and check for the EXA server and API key; it should be set there.

It will be something like the one shared below.

{
  "mcpServers": {
    "exa": {
      "command": "cmd",
      "args": [
        "/c",
        "npx",
        "-y",
        "@smithery/cli@latest",
        "run",
        "exa",
        "--key",
        "Your Key",
        "--config",
        "\"{\\\"exaApiKey\\\":\\\"Your exa API Key\\\"}\""
      ]
    }
  }
}

Step 7: Finally, close the Claude desktop and reopen it. Check by searching for any LinkedIn page. Here we have searched for our own LinkedIn page for marktechpost.com.

In conclusion, we’ve set up exa-mcp-server, linked it to Claude Desktop, and successfully issued Model Context Protocol commands to retrieve LinkedIn pages on demand. This approach streamlines access to protected or dynamically rendered web content while also laying the groundwork for LLM-powered automation across any site that relies on authenticated or JavaScript-driven pages. From here, you can extend your setup for web_search_exa, research_paper_search, twitter_search, company_research, crawling, and competitor_finder.

Sources

The post Tutorial on Seamlessly Accessing Any LinkedIn Profile with exa-mcp-server and Claude Desktop Using the Model Context Protocol MCP appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/04/30/tutorial-on-seamlessly-accessing-any-linkedin-profile-with-exa-mcp-server-and-claude-desktop-using-the-model-context-protocol-mcp/feed/ 0 70949
How to Create a Custom Model Context Protocol (MCP) Client Using Gemini https://www.marktechpost.com/2025/04/29/how-to-create-a-custom-model-context-protocol-mcp-client-using-gemini/ https://www.marktechpost.com/2025/04/29/how-to-create-a-custom-model-context-protocol-mcp-client-using-gemini/#respond Tue, 29 Apr 2025 21:20:52 +0000 https://www.marktechpost.com/?p=70930 In this tutorial, we will be implementing a custom Model Context Protocol (MCP) Client using Gemini. By the end of this tutorial, you will be able to connect your own AI applications with MCP servers, unlocking powerful new capabilities to supercharge your projects. Step 1: Setting up the dependencies Gemini API  We’ll be using the […]

The post How to Create a Custom Model Context Protocol (MCP) Client Using Gemini appeared first on MarkTechPost.

]]>
In this tutorial, we will be implementing a custom Model Context Protocol (MCP) Client using Gemini. By the end of this tutorial, you will be able to connect your own AI applications with MCP servers, unlocking powerful new capabilities to supercharge your projects.

Step 1: Setting up the dependencies

Gemini API 

We’ll be using the Gemini 2.0 Flash model for this tutorial.

To get your Gemini API key, visit Google’s Gemini API Key page and follow the instructions.

Once you have the key, store it safely—you’ll need it later.

Node.js

Some of the MCP servers require Node.js to run. Download the latest version of Node.js from nodejs.org

  • Run the installer.
  • Leave all settings as default and complete the installation.

National Park Services API

For this tutorial, we will be exposing the National Park Services MCP server to our client. To use the National Park Service API, you can request an API key by visiting this link and filling out a short form. Once submitted, the API key will be sent to your email.

Make sure to keep this key accessible—we’ll be using it shortly.

Installing Python libraries

In the command prompt, enter the following code to install the python libraries:

pip install mcp python-dotenv google-genai

Step 2: Setting up the configuration files

Creating mcp.json file

Next, create a file named mcp.json.

This file will store configuration details about the MCP servers your client will connect to.

Once the file is created, add the following initial content:

{
    "mcpServers": {
      "nationalparks": {
        "command": "npx",
        "args": ["-y", "mcp-server-nationalparks"],
        "env": {
            "NPS_API_KEY": <”YOUR_NPS_API_KEY”>
        }
      }
    }
}

Replace <YOUR_NPS_API_KEY> with the key you generated.

Creating .env file

Create a .env file in the same directory as the mcp.json file and enter the following code:

GEMINI_API_KEY = <YOUR_GEMINI_API_KEY>

Replace <YOUR_GEMINI_API_KEY> with the key you generated.

Step 3: Implementing the MCP Client

We will now create a client.py file to implement our MCP Client. Make sure that this file is in the same directory as mcp.json and .env

Basic Client Structure

We will first import the necessary libraries and create a basic client class

import asyncio
import json
import os
from typing import List, Optional
from contextlib import AsyncExitStack
import warnings

from google import genai
from google.genai import types
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from dotenv import load_dotenv

load_dotenv()
warnings.filterwarnings("ignore", category=ResourceWarning)

def clean_schema(schema): # Cleans the schema by keeping only allowed keys
    allowed_keys = {"type", "properties", "required", "description", "title", "default", "enum"}
    return {k: v for k, v in schema.items() if k in allowed_keys}

class MCPGeminiAgent:
    def __init__(self):
        self.session: Optional[ClientSession] = None
        self.exit_stack = AsyncExitStack()
        self.genai_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
        self.model = "gemini-2.0-flash"
        self.tools = None
        self.server_params = None
        self.server_name = None

The __init__ method initializes the MCPGeminiAgent by setting up an asynchronous session manager, loading the Gemini API client, and preparing placeholders for model configuration, tools, and server details.

It lays the foundation for managing server connections and interacting with the Gemini model.

Selecting the MCP Server

async def select_server(self):
        with open('mcp.json', 'r') as f:
            mcp_config = json.load(f)
        servers = mcp_config['mcpServers']
        server_names = list(servers.keys())
        print("Available MCP servers:")
        for idx, name in enumerate(server_names):
            print(f"  {idx+1}. {name}")
        while True:
            try:
                choice = int(input(f"Please select a server by number [1-{len(server_names)}]: "))
                if 1 <= choice <= len(server_names):
                    break
                else:
                    print("That number is not valid. Please try again.")
            except ValueError:
                print("Please enter a valid number.")
        self.server_name = server_names[choice-1]
        server_cfg = servers[self.server_name]
        command = server_cfg['command']
        args = server_cfg.get('args', [])
        env = server_cfg.get('env', None)
        self.server_params = StdioServerParameters(
            command=command,
            args=args,
            env=env
        )

This method prompts the user to choose a server from the available options listed in mcp.json. It loads and prepares the selected server’s connection parameters for later use.

Connecting to the MCP Server

async def connect(self):
        await self.select_server()
        self.stdio_transport = await self.exit_stack.enter_async_context(stdio_client(self.server_params))
        self.stdio, self.write = self.stdio_transport
        self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))
        await self.session.initialize()
        print(f"Successfully connected to: {self.server_name}")
        # List available tools for this server
        mcp_tools = await self.session.list_tools()
        print("\nAvailable MCP tools for this server:")
        for tool in mcp_tools.tools:
            print(f"- {tool.name}: {tool.description}")

This establishes an asynchronous connection to the selected MCP server using stdio transport. It initializes the MCP session and retrieves the available tools from the server.

Handling User query and tool calls

async def agent_loop(self, prompt: str) -> str:
        contents = [types.Content(role="user", parts=[types.Part(text=prompt)])]
        mcp_tools = await self.session.list_tools()
        tools = types.Tool(function_declarations=[
            {
                "name": tool.name,
                "description": tool.description,
                "parameters": clean_schema(getattr(tool, "inputSchema", {}))
            }
            for tool in mcp_tools.tools
        ])
        self.tools = tools
        response = await self.genai_client.aio.models.generate_content(
            model=self.model,
            contents=contents,
            config=types.GenerateContentConfig(
                temperature=0,
                tools=[tools],
            ),
        )
        contents.append(response.candidates[0].content)
        turn_count = 0
        max_tool_turns = 5
        while response.function_calls and turn_count < max_tool_turns:
            turn_count += 1
            tool_response_parts: List[types.Part] = []
            for fc_part in response.function_calls:
                tool_name = fc_part.name
                args = fc_part.args or {}
                print(f"Invoking MCP tool '{tool_name}' with arguments: {args}")
                tool_response: dict
                try:
                    tool_result = await self.session.call_tool(tool_name, args)
                    print(f"Tool '{tool_name}' executed.")
                    if tool_result.isError:
                        tool_response = {"error": tool_result.content[0].text}
                    else:
                        tool_response = {"result": tool_result.content[0].text}
                except Exception as e:
                    tool_response = {"error":  f"Tool execution failed: {type(e).__name__}: {e}"}
                tool_response_parts.append(
                    types.Part.from_function_response(
                        name=tool_name, response=tool_response
                    )
                )
            contents.append(types.Content(role="user", parts=tool_response_parts))
            print(f"Added {len(tool_response_parts)} tool response(s) to the conversation.")
            print("Requesting updated response from Gemini...")
            response = await self.genai_client.aio.models.generate_content(
                model=self.model,
                contents=contents,
                config=types.GenerateContentConfig(
                    temperature=1.0,
                    tools=[tools],
                ),
            )
            contents.append(response.candidates[0].content)
        if turn_count >= max_tool_turns and response.function_calls:
            print(f"Stopped after {max_tool_turns} tool calls to avoid infinite loops.")
        print("All tool calls complete. Displaying Gemini's final response.")
        return response

This method sends the user’s prompt to Gemini, processes any tool calls returned by the model, executes the corresponding MCP tools, and iteratively refines the response. It manages multi-turn interactions between Gemini and the server tools.

Interactive Chat Loop

async def chat(self):
        print(f"\nMCP-Gemini Assistant is ready and connected to: {self.server_name}")
        print("Enter your question below, or type 'quit' to exit.")
        while True:
            try:
                query = input("\nYour query: ").strip()
                if query.lower() == 'quit':
                    print("Session ended. Goodbye!")
                    break
                print(f"Processing your request...")
                res = await self.agent_loop(query)
                print("\nGemini's answer:")
                print(res.text)
            except KeyboardInterrupt:
                print("\nSession interrupted. Goodbye!")
                break
            except Exception as e:
                print(f"\nAn error occurred: {str(e)}")

This provides a command-line interface where users can submit queries and receive answers from Gemini, continuously until they exit the session.

Cleaning up resources

async def cleanup(self):
        await self.exit_stack.aclose()

This closes the asynchronous context and cleans up all open resources like the session and connection stack gracefully.

Main entry point

async def main():
    agent = MCPGeminiAgent()
    try:
        await agent.connect()
        await agent.chat()
    finally:
        await agent.cleanup()

if __name__ == "__main__":
    import sys
    import os
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("Session interrupted. Goodbye!")
    finally:
        sys.stderr = open(os.devnull, "w")

This is the main execution logic.

Apart from main(), all other methods are part of the MCPGeminiAgent class. You can find the complete client.py file here.

Step 4: Running the client

Run the following prompt in the terminal to run your client:

The client will:

  • Read the mcp.json file to list the different available MCP servers.
  • Prompt the user to select one of the listed servers.
  • Connect to the selected MCP server using the provided configuration and environment settings.
  • Interact with the Gemini model through a series of queries and responses.
  • Allow users to issue prompts, execute tools, and process responses iteratively with the model.
  • Provide a command-line interface for users to engage with the system and receive real-time results.
  • Ensure proper cleanup of resources after the session ends, closing connections and releasing memory.

The post How to Create a Custom Model Context Protocol (MCP) Client Using Gemini appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/04/29/how-to-create-a-custom-model-context-protocol-mcp-client-using-gemini/feed/ 0 70930
A Coding Guide to Different Function Calling Methods to Create Real-Time, Tool-Enabled Conversational AI Agents https://www.marktechpost.com/2025/04/29/a-coding-guide-to-different-function-calling-methods-to-create-real-time-tool-enabled-conversational-ai-agents/ https://www.marktechpost.com/2025/04/29/a-coding-guide-to-different-function-calling-methods-to-create-real-time-tool-enabled-conversational-ai-agents/#respond Tue, 29 Apr 2025 07:03:48 +0000 https://www.marktechpost.com/?p=70917 Function calling lets an LLM act as a bridge between natural-language prompts and real-world code or APIs. Instead of simply generating text, the model decides when to invoke a predefined function, emits a structured JSON call with the function name and arguments, and then waits for your application to execute that call and return the […]

The post A Coding Guide to Different Function Calling Methods to Create Real-Time, Tool-Enabled Conversational AI Agents appeared first on MarkTechPost.

]]>
Function calling lets an LLM act as a bridge between natural-language prompts and real-world code or APIs. Instead of simply generating text, the model decides when to invoke a predefined function, emits a structured JSON call with the function name and arguments, and then waits for your application to execute that call and return the results. This back-and-forth can loop, potentially invoking multiple functions in sequence, enabling rich, multi-step interactions entirely under conversational control. In this tutorial, we’ll implement a weather assistant with Gemini 2.0 Flash to demonstrate how to set up and manage that function-calling cycle. We will implement different variants of Function Calling. By integrating function calls, we transform a chat interface into a dynamic tool for real-time tasks, whether fetching live weather data, checking order statuses, scheduling appointments, or updating databases. Users no longer fill out complex forms or navigate multiple screens; they simply describe what they need, and the LLM orchestrates the underlying actions seamlessly. This natural language automation enables the easy construction of AI agents that can access external data sources, perform transactions, or trigger workflows, all within a single conversation.

Function Calling with Google Gemini 2.0 Flash

!pip install "google-genai>=1.0.0" geopy requests

We install the Gemini Python SDK (google-genai ≥ 1.0.0), along with geopy for converting location names to coordinates and requests for making HTTP calls, ensuring all the core dependencies for our Colab weather assistant are in place.

import os
from google import genai


GEMINI_API_KEY = "Use_Your_API_Key"  


client = genai.Client(api_key=GEMINI_API_KEY)


model_id = "gemini-2.0-flash"

We import the Gemini SDK, set your API key, and create a genai.Client instance configured to use the “gemini-2.0-flash” model, establishing the foundation for all subsequent function-calling requests.

res = client.models.generate_content(
    model=model_id,
    contents=["Tell me 1 good fact about Nuremberg."]
)
print(res.text)

We send a user prompt (“Tell me 1 good fact about Nuremberg.”) to the Gemini 2.0 Flash model via generate_content, then print out the model’s text reply, demonstrating a basic, end-to-end text‐generation call using the SDK.

Function Calling with JSON Schema

weather_function = {
    "name": "get_weather_forecast",
    "description": "Retrieves the weather using Open-Meteo API for a given location (city) and a date (yyyy-mm-dd). Returns a list dictionary with the time and temperature for each hour.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g., San Francisco, CA"
            },
            "date": {
                "type": "string",
                "description": "the forecasting date for when to get the weather format (yyyy-mm-dd)"
            }
        },
        "required": ["location","date"]
    }
}

Here, we define a JSON Schema for our get_weather_forecast tool, specifying its name, a descriptive prompt to guide Gemini on when to use it, and the exact input parameters (location and date) with their types, descriptions, and required fields, so the model can emit valid function calls.

from google.genai.types import GenerateContentConfig


config = GenerateContentConfig(
    system_instruction="You are a helpful assistant that use tools to access and retrieve information from a weather API. Today is 2025-03-04.",
    tools=[{"function_declarations": [weather_function]}],
)

We create a GenerateContentConfig that tells Gemini it’s acting as a weather‐retrieval assistant and registers your weather function under tools. Hence, the model knows how to generate structured calls when asked for forecast data.

response = client.models.generate_content(
    model=model_id,
    contents='Whats the weather in Berlin today?'
)
print(response.text)

This call sends the bare prompt (“What’s the weather in Berlin today?”) without including your config (and thus no function definitions), so Gemini falls back to plain text completion, offering generic advice instead of invoking your weather‐forecast tool.

response = client.models.generate_content(
    model=model_id,
    config=config,
    contents='Whats the weather in Berlin today?'
)


for part in response.candidates[0].content.parts:
    print(part.function_call)

By passing in config (which includes your JSON‐schema tool), Gemini recognizes it should call get_weather_forecast rather than reply in plain text. The loop over response.candidates[0].content.parts then prints out each part’s .function_call object, showing you exactly which function the model decided to invoke (with its name and arguments).

from google.genai import types
from geopy.geocoders import Nominatim
import requests


geolocator = Nominatim(user_agent="weather-app")
def get_weather_forecast(location, date):
    location = geolocator.geocode(location)
    if location:
        try:
            response = requests.get(f"https://api.open-meteo.com/v1/forecast?latitude={location.latitude}&longitude={location.longitude}&hourly=temperature_2m&start_date={date}&end_date={date}")
            data = response.json()
            return {time: temp for time, temp in zip(data["hourly"]["time"], data["hourly"]["temperature_2m"])}
        except Exception as e:
            return {"error": str(e)}
    else:
        return {"error": "Location not found"}


functions = {
    "get_weather_forecast": get_weather_forecast
}


def call_function(function_name, **kwargs):
    return functions[function_name](**kwargs)


def function_call_loop(prompt):
    contents = [types.Content(role="user", parts=[types.Part(text=prompt)])]
    response = client.models.generate_content(
        model=model_id,
        config=config,
        contents=contents
    )
    for part in response.candidates[0].content.parts:
        contents.append(types.Content(role="model", parts=[part]))
        if part.function_call:
            print("Tool call detected")
            function_call = part.function_call
            print(f"Calling tool: {function_call.name} with args: {function_call.args}")
            tool_result = call_function(function_call.name, **function_call.args)
            function_response_part = types.Part.from_function_response(
                name=function_call.name,
                response={"result": tool_result},
            )
            contents.append(types.Content(role="user", parts=[function_response_part]))
            print(f"Calling LLM with tool results")
            func_gen_response = client.models.generate_content(
                model=model_id, config=config, contents=contents
            )
            contents.append(types.Content(role="model", parts=[func_gen_response]))
    return contents[-1].parts[0].text.strip()
   
result = function_call_loop("Whats the weather in Berlin today?")
print(result)

We implement a full “agentic” loop: it sends your prompt to Gemini, inspects the response for a function call, executes get_weather_forecast (using Geopy plus an Open-Meteo HTTP request), and then feeds the tool’s result back into the model to produce and return the final conversational reply.

Function Calling using Python functions

from geopy.geocoders import Nominatim
import requests


geolocator = Nominatim(user_agent="weather-app")


def get_weather_forecast(location: str, date: str) -> str:
    """
    Retrieves the weather using Open-Meteo API for a given location (city) and a date (yyyy-mm-dd). Returns a list dictionary with the time and temperature for each hour."
   
    Args:
        location (str): The city and state, e.g., San Francisco, CA
        date (str): The forecasting date for when to get the weather format (yyyy-mm-dd)
    Returns:
        Dict[str, float]: A dictionary with the time as key and the temperature as value
    """
    location = geolocator.geocode(location)
    if location:
        try:
            response = requests.get(f"https://api.open-meteo.com/v1/forecast?latitude={location.latitude}&longitude={location.longitude}&hourly=temperature_2m&start_date={date}&end_date={date}")
            data = response.json()
            return {time: temp for time, temp in zip(data["hourly"]["time"], data["hourly"]["temperature_2m"])}
        except Exception as e:
            return {"error": str(e)}
    else:
        return {"error": "Location not found"}

The get_weather_forecast function first uses Geopy’s Nominatim to convert a city-and-state string into coordinates, then sends an HTTP request to the Open-Meteo API to retrieve hourly temperature data for the given date, returning a dictionary that maps each timestamp to its corresponding temperature. It also handles errors gracefully, returning an error message if the location isn’t found or the API call fails.

from google.genai.types import GenerateContentConfig


config = GenerateContentConfig(
    system_instruction="You are a helpful assistant that can help with weather related questions. Today is 2025-03-04.", # to give the LLM context on the current date.
    tools=[get_weather_forecast],
    automatic_function_calling={"disable": True}
)

This config registers your Python get_weather_forecast function as a callable tool. It sets a clear system prompt (including the date) for context, while disabling “automatic_function_calling” so that Gemini will emit the function call payload instead of invoking it internally.

r = client.models.generate_content(
    model=model_id,
    config=config,
    contents='Whats the weather in Berlin today?'
)
for part in r.candidates[0].content.parts:
    print(part.function_call)

By sending the prompt with your custom config (including the Python tool but with automatic calls disabled), this snippet captures Gemini’s raw function‐call decision. Then it loops over each response part to print out the .function_call object, letting you inspect exactly which tool the model wants to invoke and with what arguments.

from google.genai.types import GenerateContentConfig


config = GenerateContentConfig(
    system_instruction="You are a helpful assistant that use tools to access and retrieve information from a weather API. Today is 2025-03-04.", # to give the LLM context on the current date.
    tools=[get_weather_forecast],
)


r = client.models.generate_content(
    model=model_id,
    config=config,
    contents='Whats the weather in Berlin today?'
)


print(r.text)

With this config (which includes your get_weather_forecast function and leaves automatic calling enabled by default), calling generate_content will have Gemini invoke your weather tool behind the scenes and then return a natural‐language reply. Printing r.text outputs that final response, including the actual temperature forecast for Berlin on the specified date.

from google.genai.types import GenerateContentConfig


config = GenerateContentConfig(
    system_instruction="You are a helpful assistant that use tools to access and retrieve information from a weather API.",
    tools=[get_weather_forecast],
)


prompt = f"""
Today is 2025-03-04. You are chatting with Andrew, you have access to more information about him.


User Context:
- name: Andrew
- location: Nuremberg


User: Can i wear a T-shirt later today?"""


r = client.models.generate_content(
    model=model_id,
    config=config,
    contents=prompt
)


print(r.text)

We extend your assistant with personal context, telling Gemini Andrew’s name and location (Nuremberg) and asking if it’s T-shirt weather, while still using the get_weather_forecast tool under the hood. It then prints the model’s natural-language recommendation based on the actual forecast for that day.

In conclusion, we now know how to define functions (via JSON schema or Python signatures), configure Gemini 2.0 Flash to detect and emit function calls, and implement the “agentic” loop that executes those calls and composes the final response. With these building blocks, we can extend any LLM into a capable, tool-enabled assistant that automates workflows, retrieves live data, and interacts with your code or APIs as effortlessly as chatting with a colleague.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post A Coding Guide to Different Function Calling Methods to Create Real-Time, Tool-Enabled Conversational AI Agents appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2025/04/29/a-coding-guide-to-different-function-calling-methods-to-create-real-time-tool-enabled-conversational-ai-agents/feed/ 0 70917