Resuming Experiments / Continuing Interrupted Training
Resume interrupted or completed SwanLab experiments.
Continuing interrupted training means that if you have a previously completed or interrupted experiment and need to add more experimental data, you can resume the experiment using the resume
and id
parameters. The experiment will then revert to an "in progress" state.
Use Cases
- Continuing Interrupted Training: The previous training process was interrupted. When resuming training from a checkpoint, you want the experiment charts to continue from the original SwanLab experiment rather than creating a new one.
- Supplementing Charts: Training and evaluation are split into two processes, but you want both recorded in the same SwanLab experiment.
- Updating Hyperparameters: Some parameters in the config were incorrect, and you want to update them.
Basic Usage
Resuming an experiment primarily relies on two parameters: resume
and id
:
swanlab.init(
project="<project>",
workspace="<workspace>",
resume=True,
id="<exp_id>", # The ID must be a 21-character string
)
The resume
parameter controls the behavior of experiment resumption and has the following options:
must
: If an experiment with the corresponding ID exists in the project, it will be resumed; otherwise, an error will be raised.allow
: If an experiment with the corresponding ID exists in the project, it will be resumed; otherwise, a new experiment will be created.never
: Passing anid
parameter will raise an error; otherwise, a new experiment will be created (i.e., the effect of not enabling resume).True
: Equivalent toallow
.False
: Equivalent tonever
.
The experiment ID is the unique identifier for an experiment. It can be found in the "Environment" tab of the experiment or in the URL and must be a 21-character string:
Alternatively, you can open an experiment and locate the <exp_id>
section in its URL structure:
https://swanlab.cn/@<username>/<project>/runs/<exp_id>/...
Here, <exp_id>
is the experiment ID.
Example
import swanlab
run = swanlab.init(project="resume_test")
swanlab.log({"loss": 2, "acc": 0.4})
# Complete the experiment
run.finish()
# Resume the experiment
run = swanlab.init(project="resume_test", resume=True, id=run.id)
swanlab.log({"loss": 0.2, "acc": 0.9})