Custom Chat-Agent Environment Setup

This guide provides instructions for setting up a custom environment. We also provide two examples of custom environments: Education (basic) and Airline (advanced).

If you have a complex environment and need assistance with the integration, we offer this service. Feel free to reach out to us via Plurai.ai for more information.

You should create a new config_env.yml file and define there the chatbot environment variables

environment:
    prompt_path:  # Path to prompt
    tools_file: # Optional! Path to a python script that include all the tools functions 
    database_folder: #  Optional! Path to database folder
    database_validators: # Optional! Path to the file with the validators functions

After defining properly all the variables, you can run the simulator on the new custom environment:

python run.py \
    --output_path PATH     # Required: Directory where output files will be saved
    --config_path PATH     # Optional: Path to config_env.yml (default: ./config_default.yml)
    --dataset NAME         # Optional: Dataset name to use (default: 'latest')

Environment Variables

prompt_path

This variable specifies the path to a prompt file. The file should be a simple text file (.txt) or a markdown file (.md) containing the desired prompt. The prompt doesn't have to be the exact chatbot system prompt, it can also be a document that describes the list of policies that should be tested. In this case the chatbot system prompt should be also provided (see tool chatbot modification).

Example of prompt file: For a complete example of a prompt file, see the airline chat-agent system prompt.

database_folder

This variable specifies the path to a folder containing CSV files. Each CSV file represents a database table used by the system and must include at least one row as an example. It is recommended to provide meaningful and indicative names for the columns in each CSV file.

Example of database_folder: For a complete example of a database folder, see the airline chat-agent database scheme folder.

The folder should contain CSV files that define your database tables. Here's an example structure from an airline booking system:

flights.csv

flight_number	origin	destination	scheduled_departure_time_est	scheduled_arrival_time_est	dates
HAT001	PHL	LGA	06:00:00	07:00:00	{"2024-05-16": {"status": "available", "available_seats": {"basic_economy": 16, "economy": 10, "business": 13}, "prices": {"basic_economy": 87, "economy": 122, "business": 471}}}

reservations.csv

reservation_id	user_id	origin	destination	flight_type	cabin	flights	passengers	payment_history	created_at	total_baggages	nonfree_baggages	insurance
4WQ150	chen_jackson_3290	DFW	LAX	round_trip	business	[{"origin": "DFW", "destination": "LAX", "flight_number": "HAT170", "date": "2024-05-22"}]	[{"first_name": "Chen", "last_name": "Jackson", "dob": "1956-07-07"}]	[{"payment_id": "gift_card_3576581", "amount": 4986}]	2024-05-02 03:10:19	5	0	no

users.csv

user_id	name	address	email	dob	payment_methods	saved_passengers	membership	reservations
mia_li_3668	{"first_name": "Mia", "last_name": "Li"}	{"address1": "975 Sunset Drive", "city": "Austin", "country": "USA"}	mia.li@example.com	1990-04-05	{"credit_card_4421486": {"source": "credit_card", "last_four": "7447"}}	[]	gold	["NO6JO3"]

tools_file (optional)

This variable specifies the path to a python script containing all the agent tool functions.

The tool functions must be implemented using one of the following approaches: - Using LangChain's @tool decorator: LangChain Tool Decorator Guide - Using LangChain's StructuredTool: LangChain StructuredTool Guide

If the tool needs to access the database you should add to the function a variable 'data', and use langchain InjectedState class. In the following way:

def tool_function(data: Annotated[dict, InjectedState("dataset")]):

The data variable will contain a dictionary of dataframe, where the name is the table name (according to the csv file name in the database folder).

Optionally, you can define a tool schema by creating a variable named <function_name>_schema. If no schema variable is provided, the system will infer the schema automatically.

Example of a valid tools_file:
See airline chat-agent tools python script for reference.

database_validators (optional)

Data validators are crucial components of the system.
These functions guide the database generation pipeline, ensuring data integrity and consistency. They are particularly important when dealing with duplicate information across different tables, as they allow for consistency checks.

The database_validators variable specifies the path to a Python script that contains the data validation functions.

To define a validation function, use the @validator decorator and specify the table to which the function applies.

Example Validator Function:

from simulator.utils.file_reading import validator

@validator(table='users')
def user_id_validator(new_df, dataset):
    if 'users' not in dataset:
        return new_df, dataset
    users_dataset = dataset['users']
    for index, row in new_df.iterrows():
        if row['user_id'] in users_dataset.values:
            error_message = f"User id {row['user_id']} already exists in the users data. You should choose a different user id."
            raise ValueError(error_message)
    return new_df, dataset

The @validator decorator requires the table name as an argument.
The validator function is applied before new data is inserted into the database.

For a complete example of validators in action, see the airline booking system validators at airline chat-agent database validators python script. This example includes validators for: - User ID validation (preventing duplicate users) - Flight ID validation (ensuring unique flight numbers) - Flight validation (verifying flight details in reservations) - User validation (maintaining consistency between reservations and user data)