AutoGen Plugin: LLM

The FlotorchAutogenLLM provides an AutoGen-compatible interface for accessing language models through FloTorch Gateway. It implements AutoGen’s ChatCompletionClient interface, enabling seamless integration with AutoGen’s agent framework while leveraging FloTorch’s managed model infrastructure. It handles complexities such as message conversion, tool call processing, structured output generation, and usage accounting.

Prerequisites

Before using FlotorchAutogenLLM, ensure you have completed the general prerequisites outlined in the AutoGen Plugin Overview, including installation and environment configuration.

Configuration

Parameters

Configure your LLM instance with the following parameters:

FlotorchAutogenLLM(
    model_id: str,      # Model identifier from FloTorch Console (required)
    api_key: str,       # FloTorch API key for authentication (required)
    base_url: str       # FloTorch Gateway endpoint URL (required)
)

Parameter Details:

model_id - The unique identifier of the model configured in FloTorch Console
api_key - Authentication key for accessing FloTorch Gateway (can be set via environment variable)
base_url - The FloTorch Gateway endpoint URL (can be set via environment variable)

Features

ChatCompletionClient Interface

Fully implements AutoGen’s ChatCompletionClient interface:

Message Conversion - Seamlessly converts AutoGen messages to FloTorch format
Tool Support - Handles tool calls and function bindings
Structured Output - Supports structured JSON output when json_output is provided
Usage Accounting - Tracks token usage and provides usage statistics

Response Processing

Provides comprehensive response handling:

Content Extraction - Extracts text content from model responses
Function Calls - Processes function calls and tool invocations
Finish Reasons - Handles various completion states (stop, length, tool_calls)
Streaming Support - Supports streaming responses via create_stream method

Gateway Integration

Seamlessly integrates with FloTorch Gateway:

OpenAI-Compatible API - Uses FloTorch Gateway /api/openai/v1/chat/completions endpoint
Model Registry - Works with models configured in FloTorch Model Registry
Authentication - Handles API key authentication automatically
Error Handling - Provides robust error handling for network and API issues

Structured Output Support

Enables structured output generation:

JSON Output - Supports structured JSON output when json_output is provided
Automatic Activation - Automatically enables structured output when tools are absent or tool results are present
Schema Validation - Validates output against provided schemas

Usage Example

Basic LLM Usage

from flotorch.autogen.llm import FlotorchAutogenLLM

# Initialize FloTorch LLM
llm = FlotorchAutogenLLM(
    model_id="your-model-id",
    api_key="your_api_key",
    base_url="https://gateway.flotorch.cloud"
)

# Use with AutoGen agents
from autogen import AssistantAgent

agent = AssistantAgent(
    name="my-agent",
    model_client=llm,
    system_message="You are a helpful assistant."
)

LLM with Tools

from flotorch.autogen.llm import FlotorchAutogenLLM

# Initialize FloTorch LLM
llm = FlotorchAutogenLLM(
    model_id="your-model-id",
    api_key="your_api_key",
    base_url="https://gateway.flotorch.cloud"
)

# Define tools
def get_weather(location: str) -> str:
    """Get weather for a location."""
    return f"Weather in {location}: Sunny, 72°F"

tools = [get_weather]

# Create agent with tools
agent = AssistantAgent(
    name="weather-agent",
    model_client=llm,
    tools=tools,
    system_message="You are a weather assistant."
)

Streaming Responses

from flotorch.autogen.llm import FlotorchAutogenLLM

# Initialize FloTorch LLM
llm = FlotorchAutogenLLM(
    model_id="your-model-id",
    api_key="your_api_key",
    base_url="https://gateway.flotorch.cloud"
)

# Stream responses
messages = [{"role": "user", "content": "Tell me a story"}]
async for chunk in llm.create_stream(messages):
    if isinstance(chunk, str):
        print(chunk, end="", flush=True)
    else:
        # Final CreateResult
        print(f"\nUsage: {chunk.usage}")

Best Practices

Environment Variables - Use environment variables for credentials to enhance security
Model Selection - Choose appropriate models based on your task requirements and performance needs
Error Handling - Implement proper error handling for production environments
Tool Integration - Define tools with clear descriptions and proper error handling
Structured Output - Use structured output for predictable response formats when needed
Streaming - Use streaming for long-running conversations to improve user experience

Next Steps

Agent Configuration - Learn how to integrate LLMs with agents
Memory Integration - Add memory capabilities to your LLM-powered agents
Session Management - Implement persistent conversations