Checkpoints and Cost Saving and Monitoring
The IntellAgent system is optimized for cost efficiency and provides multiple levels of checkpoints to reduce expenses.
This document outlines the different types of checkpoints and their locations within the pipeline.
Policies Graph Checkpoint
When initializing the SimulatorExecutor
, the system searches for the DescriptionGenerator
checkpoint at:
<args.output_path>/policies_graph/descriptions_generator.pickle
If the checkpoint exists, the descriptions_generator
(which contains the policies graph) will be loaded. If it does not exist, the system will generate the graph. During execution, the system automatically saves the descriptions_generator
to this path.
As a result, it will be loaded by default in subsequent runs unless the output_path
is changed.
Dataset Events Checkpoints
Generating dataset events is a computationally expensive process. It is recommended to run it once and reuse the same dataset for all experiments to ensure consistency across experiments.
You can specify the dataset name using the --dataset
argument. The dataset will be saved at the following location:
<args.output_path>/datasets/<dataset_name>.pickle
During each iteration of size mini_batch_size
(as defined in the configuration file), all generated events up to that point will be saved as checkpoints. This allows you to resume the dataset generation from the last mini-batch if it is interrupted.
By default, the --dataset
argument is set to latest
, which will automatically load the most recently generated dataset.
Additionally, you can set a cost_limit
(in dollars) by defining the cost_limit
variable in the configuration file. Note that this feature may not be supported by all models.
Experiment Checkpoint
Running the simulator on all events can be both costly and time-consuming.
It is important to note that, unlike previous checkpoints, a new experiment is generated by default at each run.
The experiment name can be specified using the --experiment
variable. If not provided, it will default to exp_{i}
, where i
is a sequential number.
All experiment dumps will be saved in the following folder:
<args.output_path>/experiments/<dataset_name>__<experiment_name>
The simulator results are saved after every mini_batch_size
(as defined in the configuration file).
If the run is interrupted and you want to resume it, you need to set the --experiment
variable to the experiment_name
.
Additionally, you can define a cost_limit
(in dollars) in the configuration file by setting the cost_limit
variable. Note that this feature may not be supported by all models.