Configuration System
The project's configuration is managed through a system based on pydantic for data validation and hydra for loading from YAML files. This provides a powerful and flexible way to define agents and experiments.
All configurations are stored as .yaml files inside the /configs directory.
ConfigLoader
The ConfigLoader is the main entry point for loading configurations from YAML files into pydantic models. It abstracts away the file paths and loading logic.
Usage:
from utu.config import ConfigLoader
# Load an agent configuration from /configs/agents/my_agent.yaml
agent_config = ConfigLoader.load_agent_config("my_agent")
# Load an evaluation configuration from /configs/eval/my_eval.yaml
eval_config = ConfigLoader.load_eval_config("my_eval")
AgentConfig
AgentConfig is the central data structure for defining an agent. It specifies everything the agent needs to operate, including its model, tools, and personality.
Key Components
type: The agent's architecture. Can be:simple: A single agent that performs a task.orchestra: A more complex, multi-agent system with a planner and workers.
model: (ModelConfigs) Defines the primary LLM the agent will use, including the API provider, model name, and settings like temperature.agent: (ProfileConfig) Defines the agent's profile, such as itsnameand system-levelinstructions(e.g., "You are a helpful assistant.").env: (EnvConfig) Specifies the environment the agent operates in (e.g.,shell_localorbrowser_docker). See Agent Environments for more details.toolkits: (dict[str, ToolkitConfig]) A dictionary defining the tools available to the agent. Each toolkit can be loaded inbuiltinmode (running in the main process) ormcpmode (running as a separate process).max_turns: The maximum number of conversational turns the agent can take before stopping.
For the orchestra type, AgentConfig also includes fields for defining the planner, workers, and reporter agents.
EvalConfig
EvalConfig defines a complete evaluation experiment. It specifies the dataset to use, the agent to test, and how to judge the results.
Key Components for Evaluation
data: (DataConfig) Defines the dataset to be used for the evaluation, including its name/path and the relevant fields (question_field,gt_field).rollout: This section defines the execution phase of the evaluation.agent: (AgentConfig) A fullAgentConfigfor the agent being tested.concurrency: The number of parallel processes to use when running the agent on the dataset.
judgement: This section defines the judgment phase.judge_model: (ModelConfigs) The configuration for the LLM that will act as the judge.judge_concurrency: The number of parallel processes to use for judging the results.eval_method: The method used for evaluation (e.g., comparing against a ground truth answer).