Skip to content

Qwen3 Large Model Fine-Tuning in Practice: Medical Reasoning Dialogue

09-01

Qwen3 is the latest open-source large language model (LLM) released by Alibaba's Tongyi Lab, which claimed the top spot on open-source LLM leaderboards upon release. Meanwhile, the Qwen series has surpassed LLaMA to become the most popular open-source LLM on HuggingFace.

09-02

Whether for research or practical applications, Qwen is increasingly becoming one of the best options for developers.

Using Qwen3 as the base model and applying full-parameter fine-tuning to achieve domain-specific conversational capabilities—even supporting DeepSeek R1 / QwQ-style reasoning dialogues—is an introductory task for learning LLM fine-tuning.

In this article, we will fine-tune the Qwen3-1.7b model on the delicate_medical_r1_data dataset, enabling the fine-tuned Qwen3 to provide reasoning-based responses to medical questions. The training utilizes tools like transformers and datasets, with SwanLab for monitoring and evaluating model performance.

Full-parameter fine-tuning requires approximately 32GB of GPU memory. If your GPU memory is insufficient, consider using Qwen3-0.6b or LoRA fine-tuning.

Key Concept: What is Full-Parameter Fine-Tuning?

Full-parameter fine-tuning refers to updating and optimizing all parameters of a pre-trained large model, distinguishing it from partial fine-tuning and LoRA fine-tuning.

This method involves updating the entire model weights (including embedding layers, intermediate feature extraction layers, and task-specific adaptation layers) through gradient backpropagation on downstream task data. Compared to partial fine-tuning, full-parameter fine-tuning better leverages the generalization capabilities of pre-trained models while deeply adapting them to specific tasks, typically performing better in scenarios with significant domain shifts or high task complexity.

09-03

However, full-parameter fine-tuning demands higher computational resources and storage and carries a risk of overfitting (especially on small datasets). In practice, techniques like learning rate scheduling, parameter grouping, or regularization are often applied to mitigate these issues.

Full-parameter fine-tuning is commonly used in high-performance scenarios, such as domain-specific QA or high-precision text generation.

For more fine-tuning techniques, see: https://zhuanlan.zhihu.com/p/682082440

Now, let’s dive into the practical steps:

1. Environment Setup

This tutorial requires Python>=3.8. Ensure Python is installed on your system.

Additionally, you’ll need at least one NVIDIA/Ascend GPU (approximately 32GB memory is recommended).

Install the following Python libraries (ensure PyTorch and CUDA are already installed):

bash
swanlab
modelscope==1.22.0
transformers>=4.50.0
datasets==3.2.0
accelerate
pandas
addict

One-command installation:

bash
pip install swanlab modelscope==1.22.0 "transformers>=4.50.0" datasets==3.2.0 accelerate pandas addict

Tested with: modelscope==1.22.0, transformers==4.51.3, datasets==3.2.0, peft==0.11.1, accelerate==1.6.0, swanlab==0.5.7

2. Preparing the Dataset

We use the delicate_medical_r1_data dataset, designed for medical dialogue models.

The dataset contains 2,000+ entries, each with six columns: Instruction, question, think, answer, and metrics:

09-04

We only use question, think, and answer:

  • question: The user's input query.
  • think: The model’s reasoning process (similar to DeepSeek R1’s output).
  • answer: The model’s final response.

Our goal is to fine-tune the model to generate a combined think + answer response based on question, with clear visual distinction between reasoning and answers.

A sample data entry:

json
{
  "question": "My father was just diagnosed with active bleeding. The doctor said immediate action is needed—what should we do?",
  "think": "Hmm, the user’s question is about general measures for active bleeding...",
  "answer": "First, your father needs bed rest. Avoid food intake during active bleeding..."
}

During training, think and answer are formatted as:

<think>
Hmm, the user’s question is about general measures for active bleeding...
</think>

First, your father needs bed rest. Avoid food intake during active bleeding...

Downloading and Formatting the Dataset

Run the following script to preprocess the data:

python
from modelscope.msdatasets import MsDataset
import json
import random

random.seed(42)

ds = MsDataset.load('krisfu/delicate_medical_r1_data', subset_name='default', split='train')
data_list = list(ds)
random.shuffle(data_list)

split_idx = int(len(data_list) * 0.9)

train_data = data_list[:split_idx]
val_data = data_list[split_idx:]

with open('train.jsonl', 'w', encoding='utf-8') as f:
    for item in train_data:
        json.dump(item, f, ensure_ascii=False)
        f.write('\n')

with open('val.jsonl', 'w', encoding='utf-8') as f:
    for item in val_data:
        json.dump(item, f, ensure_ascii=False)
        f.write('\n')

print(f"Train Set Size: {len(train_data)}")
print(f"Val Set Size: {len(val_data)}")

This generates train.jsonl and val.jsonl.

3. Loading the Model

Download Qwen3-1.7B from ModelScope (faster and more stable in China) and load it via Transformers:

python
from modelscope import snapshot_download, AutoTokenizer
from transformers import AutoModelForCausalLM

model_dir = snapshot_download("Qwen/Qwen3-1.7B", cache_dir="./", revision="master")

tokenizer = AutoTokenizer.from_pretrained("./Qwen/Qwen3-1.7B", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("./Qwen/Qwen3-1.7B", device_map="auto", torch_dtype=torch.bfloat16)

4. Configuring Training Visualization

We use SwanLab to monitor training and evaluate model performance.

SwanLab is an open-source, lightweight AI training tracking and visualization tool, often called the "Chinese Weights & Biases + TensorBoard." It supports cloud/offline use and integrates with 40+ frameworks (PyTorch, Transformers, etc.).

09-05
09-06

Integration with Transformers:

python
from transformers import TrainingArguments

args = TrainingArguments(
    ...,
    report_to="swanlab",
    run_name="qwen3-1.7B",
)

First-time users: Register at https://swanlab.cn, copy your API Key, and paste it when prompted:

09-07

5. Full Training Code

Directory structure:

|--- train.py
|--- train.jsonl
|--- val.jsonl

train.py:

python
import json
import pandas as pd
import torch
from datasets import Dataset
from modelscope import snapshot_download, AutoTokenizer
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForSeq2Seq
import os
import swanlab

os.environ["SWANLAB_PROJECT"] = "qwen3-sft-medical"
PROMPT = "You are a medical expert. Provide well-reasoned answers to user questions."
MAX_LENGTH = 2048

swanlab.config.update({
    "model": "Qwen/Qwen3-1.7B",
    "prompt": PROMPT,
    "data_max_length": MAX_LENGTH,
})

def dataset_jsonl_transfer(origin_path, new_path):
    """Convert raw dataset to fine-tuning format."""
    messages = []
    with open(origin_path, "r") as file:
        for line in file:
            data = json.loads(line)
            input = data["question"]
            output = f"<think>{data['think']}</think>\n{data['answer']}"
            message = {
                "instruction": PROMPT,
                "input": input,
                "output": output,
            }
            messages.append(message)
    with open(new_path, "w", encoding="utf-8") as file:
        for message in messages:
            file.write(json.dumps(message, ensure_ascii=False) + "\n")

def process_func(example):
    """Preprocess dataset."""
    instruction = tokenizer(
        f"<|im_start|>system\n{PROMPT}<|im_end|>\n<|im_start|>user\n{example['input']}<|im_end|>\n<|im_start|>assistant\n",
        add_special_tokens=False,
    )
    response = tokenizer(f"{example['output']}", add_special_tokens=False)
    input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
    attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
    if len(input_ids) > MAX_LENGTH:  # Truncate if needed
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {"input_ids": input_ids, "attention_mask": attention_mask, "labels": labels}

# Load model
model_dir = snapshot_download("Qwen/Qwen3-1.7B", cache_dir="./", revision="master")
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", torch_dtype=torch.bfloat16)
model.enable_input_require_grads()  # Enable gradient checkpointing

# Load and preprocess data
dataset_jsonl_transfer("train.jsonl", "train_format.jsonl")
dataset_jsonl_transfer("val.jsonl", "val_format.jsonl")

train_df = pd.read_json("train_format.jsonl", lines=True)
train_ds = Dataset.from_pandas(train_df)
train_dataset = train_ds.map(process_func, remove_columns=train_ds.column_names)

eval_df = pd.read_json("val_format.jsonl", lines=True)
eval_ds = Dataset.from_pandas(eval_df)
eval_dataset = eval_ds.map(process_func, remove_columns=eval_ds.column_names)

# Training arguments
args = TrainingArguments(
    output_dir="./output/Qwen3-1.7B",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=2,
    learning_rate=1e-4,
    report_to="swanlab",
    run_name="qwen3-1.7B",
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)

trainer.train()

Training starts when the progress bar appears:

09-08

6. Training Results

View logs on SwanLab:

09-09

Key metrics: train_loss and eval_loss, plus 3 sample model outputs.

Plotting train_loss (blue) vs. eval_loss (green) reveals overfitting:

09-10

eval_loss rises after the first epoch, suggesting 1 epoch is sufficient for this dataset size.


Sample Outputs:

09-11
09-12

The fine-tuned model now provides structured reasoning (<think>) before answers. Example:

<think>
[Reasoning about ulcer medications...]
</think>

The main categories of anti-ulcer drugs are... [Detailed answer follows].

7. Inference with the Fine-Tuned Model

The model is saved in ./output/Qwen3. Inference script:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def predict(messages, model, tokenizer):
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer([text], return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=2048)
    return tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)

tokenizer = AutoTokenizer.from_pretrained("./output/Qwen3-1.7B/checkpoint-1000", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("./output/Qwen3-1.7B/checkpoint-1000", device_map="auto", torch_dtype=torch.bfloat16)

test_question = {
    "instruction": "You are a medical expert. Provide well-reasoned answers.",
    "input": "Doctor, I was recently diagnosed with diabetes. How should I choose carbohydrates?"
}

messages = [
    {"role": "system", "content": test_question["instruction"]},
    {"role": "user", "content": test_question["input"]}
]

print(predict(messages, model, tokenizer))

References