Update & Rollback
This page introduces the version update and rollback procedures for SwanLab Kubernetes deployment.
1. Pre-Update Preparation
1.1 Understand Version Information
Before updating, it is recommended to first check the current deployment version information and available update versions.
View current release and history versions:
helm list -n <your_namespace>
helm history swanlab-self-hosted -n <your_namespace>Where:
swanlab-self-hostedis the default release name, which can be adjusted to the release name in your cluster as needed;REVISIONis the rollback index, which can be used to initiate a rollback at any time if the update fails or there are other compatibility issues.
1.2 Sync Remote Repository
# Add Helm repository (skip if already added)
helm repo add swanlab https://helm.swanlab.cn
# Update repository index
helm repo update
# List all available versions
helm search repo swanlab/self-hosted --versions1.3 Pre-Update Checklist
WARNING
Before updating, please ensure you have completed the following:
- PVC and Snapshot Policy Configured: Please confirm that PVCs for storage resource services have been successfully created, and that corresponding snapshot policies have been configured to ensure data security.
- Confirm Image Repository Accessibility: You need to ensure the cluster can access
repo.swanlab.cnto pull images normally (otherwise you need to pull the known images and push them to a private repository). For details, see SwanLab Self-Hosted Resource Inventory - Check Image Tag Configuration: In your
values.yaml, ensure the image tags for the following applications are empty strings or specified version tags, notlatest:swanlab-cloudswanlab-nextswanlab-houseswanlab-server
# values.yaml example
image:
repository: repo.swanlab.cn/self-hosted/swanlab-cloud
# tag set to empty string or version tag, e.g., v2.8.0, do not set to latest
tag: ""
pullPolicy: "IfNotPresent"2. Execute Update
You can choose one of the following update methods based on your cluster's network environment.
Option 1: Helm Repository Update
If your cluster nodes can directly access the Helm repository (i.e., cluster nodes can directly access github.com), you can execute the update with the following commands:
⚠️ Note: The chart packages at https://helm.swanlab.cn are indexed by version tags in GitHub Release. Please confirm network connectivity in advance!
# It is recommended to use --dry-run first to verify template compatibility
helm upgrade swanlab-self-hosted swanlab/self-hosted \
--version <target_version> \
-f <your_own_values.yaml> \
--namespace <your_namespace> \
--dry-run- After confirming there are no errors, remove the
--dry-runoption to execute the update
Option 2: Local Chart Package Update
If your cluster nodes cannot directly access the Helm repository (i.e., cluster nodes cannot directly access github.com), you can pull the chart package to local via OCI method and then execute the update:
If you encounter a 401 authentication failure, you can clear any existing helm login state on your machine with
helm registry logout xxx.com.
# Pull chart package to local
helm pull oci://swanlab-registry.cn-hangzhou.cr.aliyuncs.com/chart/self-hosted --version <target_version>
# Extract chart package, expecting only one self-hosted/ folder
tar -zxvf self-hosted-<target_version>.tgzThen use the local chart package to verify:
# It is recommended to use --dry-run first to verify template compatibility
helm upgrade swanlab-self-hosted ./self-hosted/ \
-f <your_own_values.yaml> \
--namespace <your_namespace> \
--dry-run- After confirming there are no errors, remove the
--dry-runoption to execute the update
Note:
--dry-runis used to verify the compatibility of the update template. It is recommended to do template syntax verification before each update.
3. Update Verification
After the update is complete, please follow the steps below to verify that the service is running normally.
3.1 Release Version Verification
Confirm the release version has been updated:
helm list -n <your_namespace>Ensure that the Service and Pod in the cluster are in normal status, and the status is deployed.
3.2 Pod Health Status Check
Ensure all Pods are running normally:
kubectl get pods -n <your_namespace>All Pods should be in Running or Completed status, and there should be no abnormal statuses such as CrashLoopBackOff or Error.
3.3 Metrics Reporting Test
- Page Access: Confirm that the frontend page can be accessed normally and metrics can be downloaded normally
- Python SDK: Confirm that the SDK connection is normal and experiments can be uploaded normally
import swanlab
import random
import numpy as np
import time
swanlab.login(
api_key="xxxxx", # Valid api_key under your private swanlab service
host="xxxxxx" # Your private swanlab service domain
)
# Create a SwanLab project
swanlab.init(
# Set project name
project="my-first-project",
experiment_name="my-first-experiment",
# Set hyperparameters
config={
"learning_rate": 0.02,
"architecture": "CNN",
"dataset": "CIFAR-100",
"epochs": 10
}
)
# Simulate a training session
epochs = 10
offset = random.random() / 5
for epoch in range(2, epochs):
acc = 1 - 2 ** -epoch - random.random() / epoch - offset
loss = 2 ** -epoch + random.random() / epoch + offset
swanlab.log({
"step_time": acc,
"speed": loss
})
# Generate random noise image (64x64 RGB, random pixel values)
random_noise = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8)
img = swanlab.Image(random_noise, caption="Random Noise")
swanlab.log({
"image": img
})
# [Optional] Finish training, this is necessary in notebook environments
swanlab.finish()4. Rollback
If the service fails to start after an update (e.g., CrashLoopBackOff), or there are other compatibility issues, please immediately rollback with the following command:
4.1 Rollback to Previous Version
helm rollback swanlab-self-hosted -n <your_namespace>4.2 Rollback to Specified Version
First view the history versions to determine the REVISION number to rollback to:
helm history swanlab-self-hosted -n <your_namespace>Then execute the rollback:
helm rollback swanlab-self-hosted <revision_number> -n <your_namespace>After the rollback is complete, please refer to the Update Verification steps to confirm the service status again.