Ascend NPU & MindSpore

SwanLab supports hardware detection for the Ascend series GPUs and training tracking for MindSpore projects. (Hardware monitoring is planned to be available by December 15, 2024.)

SwanLab experiment recording Ascend NPU information screenshot:

device

Introduction

This case implements an IMDB dataset sentiment classification task and uses SwanLab to track model training progress.

Task Overview

The IMDB sentiment classification task is a natural language processing task aimed at analyzing the text content of IMDB (Internet Movie Database) movie reviews to determine the sentiment tendency of the reviews, typically classified into positive (Positive) and negative (Negative) categories. This task is widely used in sentiment analysis research, especially in supervised learning and deep learning fields.

The dataset usually contains preprocessed review texts and their corresponding sentiment labels, with each review labeled as either positive or negative. As shown below:

data_image

LSTM (Long Short-Term Memory) is an improved recurrent neural network designed to handle and predict long-term dependencies in sequential data. Compared to traditional RNNs, LSTM introduces memory cells and gate mechanisms, effectively mitigating the vanishing and exploding gradient problems, making it perform well in modeling long-sequence data. Using LSTM, the IMDB sentiment classification task can be easily accomplished. For detailed principles of LSTM, it is recommended to refer to this expert blog.

lstm

This code is based on the MindSpore official documentation, with some implementations simplified and organized.

Environment Setup

Clone the Project

Attached is the GitHub project link and the download command:

bash

git clone https://github.com/ShaohonChen/mindspore_imdb_train.git

If GitHub is inaccessible, you can find the code section later in this blog.

It is recommended to use GitHub 😉

CPU Environment Setup

You can install MindSpore in a CPU environment. Although it may not seem as user-friendly as PyTorch, the documentation is very detailed, reflecting the rigor of Huawei engineers. When used with Ascend GPUs, it is a very promising framework (MAC can't type the word "sheng" for some reason).

Official installation documentation link.

Alternatively, you can use the following command to install:

bash

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/2.4.1/MindSpore/unified/x86_64/mindspore-2.4.1-cp311-cp311-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple

Command to verify successful installation:

bash

python -c "import mindspore;mindspore.set_context(device_target='CPU');mindspore.run_check()"

If the following information is output, MindSpore has been successfully installed:

bash

MindSpore version: 2.4.1
The result of multiplication calculation is correct, MindSpore has been installed on platform [CPU] successfully!

Huawei Ascend NPU GPU Environment Setup

Due to the complexity of the Huawei Ascend environment setup, it is recommended to refer to the MindSpore Installation Tutorial and Pitfall Record to complete the MindSpore environment setup. Below is a brief overview of the MindSpore installation process.

This blog was written on December 6, 2024, and the installed version is MindSpore 2.4.1. Given the rapid changes in MindSpore, the time and version are specifically noted.

Driver Installation & Verification

First, ensure that the NPU card and NPU-related drivers are installed. The driver version is 8.0.RC3.beta1. If not installed, refer to the CANN Official Installation Tutorial.

After installation, the detection method is to run:

bash

npu-smi info

If the following information is displayed, the driver installation is complete.

npu-smi

Install MindSpore

It is recommended to use conda for installation, as it is easier to manage environments and automatically installs more dependencies.

First, install the prerequisite packages:

bash

pip install sympy
pip install "numpy>=1.20.0,<2.0.0"
pip install /usr/local/Ascend/ascend-toolkit/latest/lib64/te-*-py3-none-any.whl
pip install /usr/local/Ascend/ascend-toolkit/latest/lib64/hccl-*-py3-none-any.whl

If the local download is slow, you can use the following commands with domestic mirrors:

bash

pip install sympy -i https://mirrors.cernet.edu.cn/pypi/web/simple
pip install "numpy>=1.20.0,<2.0.0" -i https://mirrors.cernet.edu.cn/pypi/web/simple
pip install /usr/local/Ascend/ascend-toolkit/latest/lib64/te-*-py3-none-any.whl -i https://mirrors.cernet.edu.cn/pypi/web/simple
pip install /usr/local/Ascend/ascend-toolkit/latest/lib64/hccl-*-py3-none-any.whl  -i https://mirrors.cernet.edu.cn/pypi/web/simple

Conda installation method for MindSpore:

bash

conda install mindspore=2.4.1 -c mindspore -c conda-forge

Due to certain well-known reasons, the conda source may sometimes fail, resulting in the conda installation of MindSpore being stuck at 0% progress, as shown below:

condainstallfailed

You can use the following method to specify a domestic mirror:

bash

conda install mindspore=2.4.1 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/MindSpore/ -c conda-forge

Pip installation command for MindSpore:

bash

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/2.4.1/MindSpore/unified/aarch64/mindspore-2.4.1-cp311-cp311-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple

After installation, you can use the following command to test:

bash

python -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check()"

If an error occurs at this step, refer to the Environment Installation Troubleshooting section later in this document.

If the version information and calculation verification are displayed, the installation is successful.

bash

MindSpore version:  2.4.1
The result of multiplication calculation is correct, MindSpore has been installed on platform [Ascend] successfully!

Also attached is the official installation tutorial link MindSpore Official Installation Tutorial. Note that this tutorial uses MindSpore 2.4.1, and it is recommended to keep the environment consistent with this tutorial.

Additionally, this tutorial uses SwanLab for training process tracking. SwanLab supports hardware identification and tracking for the Ascend series NPUs.

Don't Forget to Install SwanLab 😉

Installation method:

bash

pip install swanlab

Dataset & Word Encoding File Preparation

Dataset Preparation

Use the following command to download and extract the dataset on Linux:

bash

wget -P ./data/ https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar -xzvf data/aclImdb_v1.tar.gz -C data/

If the download is too slow, you can use the Huawei Cloud provided domestic link to download. Extract it in the ./data/ directory.

If you can't extract tar.gz, it is recommended to install the 7zip extractor, an open-source and universal extractor.

Word Encoder Preparation

Use the following command to download and extract the word encoder file:

bash

wget -P ./embedding/ https://nlp.stanford.edu/data/glove.6B.zip
unzip embedding/glove.6B.zip -d embedding/

If the download is too slow, you can use the Huawei Cloud provided domestic link to download. Extract it in the ./embedding/ directory.

Start Training

Use the following command to start training:

python train.py

However, this

If prompted to log in to SwanLab, refer to How to Log in to SwanLab, which will allow you to use the cloud dashboard to view the training process and results in real-time.

After setup, you can view the training progress in real-time on the cloud. My experiment record can be referenced at Complete Experiment Record.

log_img

Also attached are other scripts and online experiment records:

Content	Training Command	Experiment Log
Baseline	`python train.py configs/baseline.json`	log
CPU Run	`python train.py configs/baseline.json CPU`	log
Two-Layer LSTM	`python train.py configs/two_layer.json`	log
Small Batch Size	`python train.py configs/small_batch.json`	log
Larger Hidden Layer	`python train.py configs/large_hs.json`	log
Larger Learning Rate	`python train.py configs/large_hs.json`	log

Related hyperparameters and final results can be viewed in the Chart View.

log_table

PS: After observing the logs, it seems that the training volume is insufficient. It is recommended to increase the training volume (40-50 epochs would be more appropriate).

Code Section

If GitHub is inaccessible, a test code snippet is provided below, although it cannot use other hyperparameters T_T.

python

# Read training parameters + initialize log recording
import os
import sys
import json
import mindspore as ms
import swanlab

# ms.set_context(device_target="CPU") # Use CPU
ms.set_context(device_target="Ascend")  # Use NPU

args={  # Hyperparameters
    "hidden_size": 256,
    "output_size": 1,
    "num_layers": 2,
    "lr": 0.001,
    "num_epochs": 10,
    "batch_size": 64,
    "report_interval": 10
}

exp_name = "baseline"
swanlab.init(project="Ascend_IMDB_CLS", experiment_name=exp_name, config=args)


# Construct dataset
import mindspore.dataset as ds


class IMDBData:
    label_map = {"pos": 1, "neg": 0}

    def __init__(self, path, mode="train"):
        self.docs, self.labels = [], []
        for label in self.label_map.keys():
            doc_dir = os.path.join(path, mode, label)
            doc_list = os.listdir(doc_dir)
            for fname in doc_list:
                with open(os.path.join(doc_dir, fname)) as f:
                    doc = f.read()
                    doc = doc.lower().split()
                    self.docs.append(doc)
                    self.labels.append([self.label_map[label]])

    def __getitem__(self, idx):
        return self.docs[idx], self.labels[idx]

    def __len__(self):
        return len(self.docs)


imdb_path = "data/aclImdb"
imdb_train = ds.GeneratorDataset(
    IMDBData(imdb_path, "train"), column_names=["text", "label"], shuffle=True
)
imdb_test = ds.GeneratorDataset(
    IMDBData(imdb_path, "test"), column_names=["text", "label"], shuffle=False
)

# Construct embedding vocabulary
import numpy as np


def load_glove(glove_path):
    embeddings = []
    tokens = []
    with open(os.path.join(glove_path, "glove.6B.100d.txt"), encoding="utf-8") as gf:
        for glove in gf:
            word, embedding = glove.split(maxsplit=1)
            tokens.append(word)
            embeddings.append(np.fromstring(embedding, dtype=np.float32, sep=" "))
    # Add <unk>, <pad> two special placeholders corresponding embedding
    embeddings.append(np.random.rand(100))
    embeddings.append(np.zeros((100,), np.float32))

    vocab = ds.text.Vocab.from_list(
        tokens, special_tokens=["<unk>", "<pad>"], special_first=False
    )
    embeddings = np.array(embeddings).astype(np.float32)
    return vocab, embeddings


vocab, embeddings = load_glove("./embedding")
print(f"VOCAB SIZE: {len(vocab.vocab())}")

# Data preprocessing
import mindspore as ms

lookup_op = ds.text.Lookup(vocab, unknown_token="<unk>")
pad_op = ds.transforms.PadEnd([500], pad_value=vocab.tokens_to_ids("<pad>"))
type_cast_op = ds.transforms.TypeCast(ms.float32)

imdb_train = imdb_train.map(operations=[lookup_op, pad_op], input_columns=["text"])
imdb_train = imdb_train.map(operations=[type_cast_op], input_columns=["label"])

imdb_test = imdb_test.map(operations=[lookup_op, pad_op], input_columns=["text"])
imdb_test = imdb_test.map(operations=[type_cast_op], input_columns=["label"])

imdb_train, imdb_valid = imdb_train.split([0.7, 0.3])

print(f"TRAIN SET SIZE: {len(imdb_train)}")
print(f"VALID SET SIZE: {len(imdb_valid)}")
print(f"TEST SET SIZE: {len(imdb_test)}")

imdb_train = imdb_train.batch(args["batch_size"], drop_remainder=True)
imdb_valid = imdb_valid.batch(args["batch_size"], drop_remainder=True)


# LSTM classifier implementation
import math
import mindspore as ms
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore.common.initializer import Uniform, HeUniform


class LSTM_CLS(nn.Cell):
    def __init__(self, embeddings, hidden_dim, output_dim, n_layers, pad_idx):
        super().__init__()
        vocab_size, embedding_dim = embeddings.shape
        self.embedding = nn.Embedding(
            vocab_size,
            embedding_dim,
            embedding_table=ms.Tensor(embeddings),
            padding_idx=pad_idx,
        )
        self.rnn = nn.LSTM(
            embedding_dim, hidden_dim, num_layers=n_layers, batch_first=True
        )
        weight_init = HeUniform(math.sqrt(5))
        bias_init = Uniform(1 / math.sqrt(hidden_dim * 2))
        self.fc = nn.Dense(
            hidden_dim, output_dim, weight_init=weight_init, bias_init=bias_init
        )

    def construct(self, inputs):
        embedded = self.embedding(inputs)
        _, (hidden, _) = self.rnn(embedded)
        hidden = hidden[-1, :, :]
        output = self.fc(hidden)
        return output


model = LSTM_CLS(
    embeddings,
    args["hidden_size"],
    args["output_size"],
    args["num_layers"],
    vocab.tokens_to_ids("<pad>"),
)

# 损失函数与优化器
loss_fn = nn.BCEWithLogitsLoss(reduction="mean")
optimizer = nn.Adam(model.trainable_params(), learning_rate=args["lr"])

# 训练过程实现
from tqdm import tqdm
import time


def forward_fn(data, label):
    logits = model(data)
    loss = loss_fn(logits, label)
    return loss


grad_fn = ms.value_and_grad(forward_fn, None, optimizer.parameters)


def train_step(data, label):
    loss, grads = grad_fn(data, label)
    optimizer(grads)
    return loss


def train_one_epoch(model, train_dataset, epoch=0):
    model.set_train()
    total = train_dataset.get_dataset_size()
    step_total = 0
    last_time = time.time()
    for i in train_dataset.create_tuple_iterator():
        loss = train_step(*i)
        step_total += 1
        loss_item = loss.item()
        if step_total % args["report_interval"] == 1:
            now_time = time.time()
            per_batch_time = (now_time - last_time) / args["report_interval"]
            last_time = now_time
            swanlab.log(
                {
                    "train/epoch": epoch,
                    "train/step": step_total,
                    "train/loss": loss_item,
                    "train/per_batch_time(s)": per_batch_time,
                }
            )
            print(
                f"[train epoch-{epoch:2d} step-{step_total:4d}/{total:4d}] loss:{loss_item:.4f} use_time:{per_batch_time:10.4f}s"
            )


# 评估过程实现
def binary_accuracy(preds, y):
    rounded_preds = np.around(ops.sigmoid(preds).asnumpy())
    correct = (rounded_preds == y).astype(np.float32)
    acc = correct.sum() / len(correct)
    return acc


def evaluate(model, test_dataset, criterion, epoch=0, mode="eval"):
    last_time = time.time()
    total = test_dataset.get_dataset_size()
    epoch_loss = 0
    epoch_acc = 0
    model.set_train(False)
    for i in test_dataset.create_tuple_iterator():
        predictions = model(i[0])
        loss = criterion(predictions, i[1])
        epoch_loss += loss.asnumpy()
        acc = binary_accuracy(predictions, i[1])
        epoch_acc += acc

    final_loss = float(epoch_loss / total)
    final_acc = float(epoch_acc / total)
    use_time = time.time() - last_time
    swanlab.log(
        {
            f"{mode}/loss": final_loss,
            f"{mode}/acc": final_acc,
            f"{mode}/use_time": use_time,
        }
    )
    print(
        f"[{mode} epoch-{epoch:2d} loss:{final_loss:.4f} acc:{final_acc*100:.2f}% use_time:{use_time:10.4f}s"
    )

    return final_loss, final_acc


# 开启训练=
best_valid_loss = float("inf")
save_path = os.path.join("output", exp_name)
os.makedirs(save_path, exist_ok=True)
ckpt_file_name = os.path.join(save_path, "sentiment-analysis.ckpt")


for epoch in range(args["num_epochs"]):
    train_one_epoch(model, imdb_train, epoch)
    valid_loss, _ = evaluate(model, imdb_valid, loss_fn, epoch)

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        ms.save_checkpoint(model, ckpt_file_name)


# 开始测试
param_dict = ms.load_checkpoint(ckpt_file_name)
ms.load_param_into_net(model, param_dict)
imdb_test = imdb_test.batch(64)
test_loss, test_acc = evaluate(model, imdb_test, loss_fn, mode="test")

# 开始预测
score_map = {1: "Positive", 0: "Negative"}


def predict_sentiment(model, vocab, sentence):
    model.set_train(False)
    tokenized = sentence.lower().split()
    indexed = vocab.tokens_to_ids(tokenized)
    tensor = ms.Tensor(indexed, ms.int32)
    tensor = tensor.expand_dims(0)
    prediction = model(tensor)
    return score_map[int(np.round(ops.sigmoid(prediction).asnumpy()))]


predict_sentiment(model, vocab, "This film is great")
predict_sentiment(model, vocab, "This film is terrible")

Troubleshooting

Possible Problem 1: Incompatible MindSpore and CANN Versions

Ensure that the MindSpore version is consistent with the driver version. Otherwise, the following error will occur:

bash

[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:11.112.000 [mindspore/run_check/_check_version.py:357] MindSpore version 2.3.1 and Ascend AI software package (Ascend Data Center Solution)version 7.5 does not match, the version of software package expect one of ['7.2', '7.3']. Please refer to the match info on: https://www.mindspore.cn/install
/home/huawei/miniconda3/envs/mindspore231/lib/python3.10/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/huawei/miniconda3/envs/mindspore231/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/huawei/miniconda3/envs/mindspore231/lib/python3.10/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/huawei/miniconda3/envs/mindspore231/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:13.700.000 [mindspore/run_check/_check_version.py:375] MindSpore version 2.3.1 and "te" wheel package version 7.5 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:13.701.000 [mindspore/run_check/_check_version.py:382] MindSpore version 2.3.1 and "hccl" wheel package version 7.5 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:13.702.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 3
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:14.703.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 2
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:15.704.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 1
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:18.608.000 [mindspore/run_check/_check_version.py:357] MindSpore version 2.3.1 and Ascend AI software package (Ascend Data Center Solution)version 7.5 does not match, the version of software package expect one of ['7.2', '7.3']. Please refer to the match info on: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:18.608.000 [mindspore/run_check/_check_version.py:375] MindSpore version 2.3.1 and "te" wheel package version 7.5 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:18.608.000 [mindspore/run_check/_check_version.py:382] MindSpore version 2.3.1 and "hccl" wheel package version 7.5 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:18.608.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 3
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:19.609.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 2
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:20.611.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 1
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:21.614.000 [mindspore/run_check/_check_version.py:357] MindSpore version 2.3.1 and Ascend AI software package (Ascend Data Center Solution)version 7.5 does not match, the version of software package expect one of ['7.2', '7.3']. Please refer to the match info on: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:21.614.000 [mindspore/run_check/_check_version.py:375] MindSpore version 2.3.1 and "te" wheel package version 7.5 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:21.614.000 [mindspore/run_check/_check_version.py:382] MindSpore version 2.3.1 and "hccl" wheel package version 7.5 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:21.615.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 3
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:22.616.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 2
[WARNING] ME(1049852:281473041023008,MainProcess):2024-12-06-12:23:23.617.000 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 1
MindSpore version:  2.3.1
Segmentation fault (core dumped)

Solution: Install the correct version. For MindSpore 2.4.1, install the 8.0.RC3.beta1 driver.

Possible Problem 2: Missing Prerequisite Packages

The following error may occur:

bash

[ERROR] ME(1051780:281473416683552,MainProcess):2024-12-06-12:39:02.460.00 [mindspore/run_check/_check_version.py:360] CheckFailed: cannot import name 'version' from 'te' (unknown location)
[ERROR] ME(1051780:281473416683552,MainProcess):2024-12-06-12:39:02.460.00 [mindspore/run_check/_check_version.py:361] MindSpore relies on whl packages of "te" and "hccl" in the "latest" folder of the Ascend AI software package (Ascend Data Center Solution). Please check whether they are installed correctly or not, refer to the match info on: https://www.mindspore.cn/install
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/huawei/miniconda3/envs/mindspore241/lib/python3.11/site-packages/mindspore/__init__.py", line 19, in <module>
    from mindspore import common, dataset, mindrecord, train, log, amp
...
ImportError: cannot import name 'util' from 'tbe.tvm.topi.cce' (unknown location)
Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: finalizing (tstate=0x00000000008aceb0)

Aborted (core dumped)

Possible Problem 3: Error in pip Installation Stage - opc-tool 0.1.0 requires attrs, which is not installed

If the following error occurs (there is a probability that pip may report the following error during previous installations):

bash

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
auto-tune 0.1.0 requires decorator, which is not installed.
dataflow 0.0.1 requires jinja2, which is not installed.
opc-tool 0.1.0 requires attrs, which is not installed.
opc-tool 0.1.0 requires decorator, which is not installed.
opc-tool 0.1.0 requires psutil, which is not installed.
schedule-search 0.0.1 requires absl-py, which is not installed.
schedule-search 0.0.1 requires decorator, which is not installed.
te 0.4.0 requires attrs, which is not installed.
te 0.4.0 requires cloudpickle, which is not installed.
te 0.4.0 requires decorator, which is not installed.
te 0.4.0 requires ml-dtypes, which is not installed.
te 0.4.0 requires psutil, which is not installed.
te 0.4.0 requires scipy, which is not installed.
te 0.4.0 requires tornado, which is not installed.

Try to solve it using the following command:

bash

pip install attrs cloudpickle decorator jinja2 ml-dtypes psutil scipy tornado absl-py

Possible Problem 4: KeyError: 'op_debug_dir' Occurs During Testing or Actual Training

The following situation is likely to occur when the environment variable command has not been run.

bash

Traceback (most recent call last):
  File "/home/huawei/miniconda3/envs/mindspore241/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/huawei/miniconda3/envs/mindspore241/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/huawei/.local/lib/python3.11/site-packages/te_fusion/parallel_compilation.py", line 249, in exec_compilation_task
    check_dict_paras(dict_ops)
  File "/home/huawei/.local/lib/python3.11/site-packages/te_fusion/parallel_compilation.py", line 183, in check_dict_paras
    if dict_ops['op_debug_dir'] == None or dict_ops['op_debug_dir'] == '':
       ~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'op_debug_dir'

Solution: Set the environment variables using the following commands

bash

# control log level. 0-DEBUG, 1-INFO, 2-WARNING, 3-ERROR, 4-CRITICAL, default level is WARNING.
export GLOG_v=2

# environment variables
LOCAL_ASCEND=/usr/local/Ascend # Set to the actual installation path of the software package

# set environmet variables using script provided by CANN, swap "ascend-toolkit" with "nnae" if you are using CANN-nnae package instead
source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh

When using conda, it seems that the above commands need to be run every time. If you want to solve this problem permanently, you can use the following commands:

bash

export LOCAL_ASCEND=/usr/local/Ascend # Set to the actual installation path of the software package
echo "source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh" >> ~/.bashrc
source ~/.bashrc

Ascend NPU & MindSpore ​

Introduction ​

Task Overview ​

Environment Setup ​

Clone the Project ​

CPU Environment Setup ​

Huawei Ascend NPU GPU Environment Setup ​

Driver Installation & Verification ​

Install MindSpore ​

Don't Forget to Install SwanLab 😉 ​

Dataset & Word Encoding File Preparation ​

Dataset Preparation ​

Word Encoder Preparation ​

Start Training ​

Code Section ​

Troubleshooting ​

Possible Problem 1: Incompatible MindSpore and CANN Versions ​

Possible Problem 2: Missing Prerequisite Packages ​

Possible Problem 3: Error in pip Installation Stage - opc-tool 0.1.0 requires attrs, which is not installed ​

Possible Problem 4: KeyError: 'op_debug_dir' Occurs During Testing or Actual Training ​

Ascend NPU & MindSpore

Introduction

Task Overview

Environment Setup

Clone the Project

CPU Environment Setup

Huawei Ascend NPU GPU Environment Setup

Driver Installation & Verification

Install MindSpore

Don't Forget to Install SwanLab 😉

Dataset & Word Encoding File Preparation

Dataset Preparation

Word Encoder Preparation

Start Training

Code Section

Troubleshooting

Possible Problem 1: Incompatible MindSpore and CANN Versions

Possible Problem 2: Missing Prerequisite Packages

Possible Problem 3: Error in pip Installation Stage - opc-tool 0.1.0 requires attrs, which is not installed

Possible Problem 4: KeyError: 'op_debug_dir' Occurs During Testing or Actual Training