Quickstart
This guide will walk you through setting up the project, running your first agent, and executing evaluations.
Installation & Setup
First, clone the repository and set up the Python environment.
# Clone the project repository
git clone https://github.com/TencentCloudADP/youtu-agent.git
cd youtu-agent
# We use `uv` to manage the virtual environment and dependencies
# Create the virtual environment
uv venv
# Activate the environment
source .venv/bin/activate
# Install all dependencies, including development tools
uv sync --group dev
# Create your environment configuration file from the example
cp .env.example .env
After creating the .env
file, you must edit it to add your necessary API keys (e.g., UTU_LLM_API_KEY
, SERPER_API_KEY
, etc.).
Running an Agent
You can interact with agents directly from the command line using the cli_chat.py
script.
Simple Agent
Run a simple agent defined by a configuration file. For example, to run an agent with search capabilities:
# python scripts/cli_chat.py --help
python scripts/cli_chat.py --config_name simple_agents/search_agent.yaml --stream
Orchestra Agent
Run a multi-agent (Plan-and-Execute) orchestra agent by specifying its configuration file:
python examples/svg_generator/main.py
You can also run a web UI for the agent:
python examples/svg_generator/main_web.py
See more in frontend.
Running Evaluations
The framework includes a powerful evaluation harness to benchmark agent performance.
Run a Full Experiment
This command runs a complete evaluation, from agent rollout to judging.
python scripts/run_eval.py --config_name <your_eval_config> --exp_id <your_exp_id> --dataset WebWalkerQA --concurrency 5
Re-judge Existing Results
If you have already run the rollout and only want to re-run the judgment phase, use this script:
python scripts/run_eval_judge.py --config_name <your_eval_config> --exp_id <your_exp_id> --dataset WebWalkerQA
Dump Experiment Data
You can also dump the trajectories and results from the database for a specific experiment:
python scripts/db/dump_db.py --exp_id "<your_exp_id>"
Advanced Setup
Database Configuration
The evaluation framework uses a SQL database (defaulting to SQLite) to store datasets and experiment results. To use a different database (e.g., PostgreSQL), set the DB_URL
environment variable:
DB_URL="postgresql://user:password@host:port/database"
Tracing
We use Phoenix as our default tracing service for observing agent behavior. To enable it, set the following environment variables:
- PHOENIX_ENDPOINT
- PHOENIX_BASE_URL
- PHOENIX_PROJECT_NAME
The framework also supports any tracing service compatible with the openai-agents
library. See the official list of tracing processors for more options.
Customizing the Agent
Create a config file
# configs/agents/sample_tool.yaml
defaults:
- /model/base
- /tools/search@toolkits.search # Loads the 'search' toolkit
- _self_
agent:
name: simple-tool-agent
instructions: "You are a helpful assistant that can search the web."
Write and run the Python script
import asyncio
from utu.agents import SimpleAgent
async def main():
async with SimpleAgent(config="sample_tool.yaml") as agent:
await agent.chat("What's the weather in Beijing today?")
asyncio.run(main())
Next Steps
- Explore Examples: Check the
/examples
directory for more detailed use cases and advanced scripts. - Dive into Evaluations: Learn more about how the evaluation framework works by reading the Evaluation Framework documentation.