What is Supervised Fine-Tuning (SFT)?
Supervised fine-tuning is a training strategy where a pre-trained language model is further refined on a carefully curated dataset of prompt-response pairs. The primary goal is to “teach” the model how to generate appropriate, contextually relevant, and human-aligned responses.
Key points about SFT include:
- Data Curation: The model is exposed to a dataset that contains high-quality examples—often created by human annotators—that demonstrate the desired behavior (e.g., step-by-step reasoning, correct coding outputs, or helpful dialogue responses).
- Instruction Following: By training on these examples, the model learns to interpret prompts as instructions and produce answers that mimic the reasoning and style of the training data.
- Limitations: While SFT works well to instill basic response quality, it is typically limited by the dataset’s scope and may not encourage the model to “think” beyond what is explicitly provided. Furthermore, excessive fine-tuning can lead to overfitting and reduce the model’s ability to generalize to unseen tasks.
For many contemporary language models, SFT is the standard method used to bridge the gap between raw pre-training and interactive, user-facing performance.
We will be checking out TRL – Transformer Reinforcement Learning library for learning SFT
Let’s try Fine-Tuning SmolLM2
https://huggingface.co/HuggingFaceTB/SmolLM2-135M
SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device.
Let’s try with a small base model, this is not instruct model, this only predicts the next word.
We will use this dataset for SFT : https://huggingface.co/datasets/HuggingFaceTB/smoltalk
# Import necessary libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer, setup_chat_format
import torch
device = (
"cuda"
if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available() else "cpu"
)
# Load the model and tokenizer
model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=model_name
).to(device)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
# Set up the chat format
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
# Set our name for the finetune to be saved &/ uploaded to
finetune_name = "SmolLM2-FT-MyDataset"
finetune_tags = ["smol-course", "module_1"]
Model Loading:
The model SmolLM2-135M
is loaded from the Hugging Face Hub and moved to the selected device (GPU/CPU).
Tokenizer Loading:
The tokenizer associated with the model is also loaded.
Chat Format Setup:
The setup_chat_format
function modifies both the model and tokenizer so that they support chat-style interactions. This typically involves configuring special tokens (such as <|im_start|>
and <|im_end|>
) to mark the beginning and end of messages.
# Let's test the base model before training
prompt = "Write a haiku about programming"
# Format with template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=100)
print("Before training:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Output it produces is garbage, because the base model is not properly formatted to handle chat-style prompts., If our prompt is like text completions, it will do a decent job. Let’s try with a text completion style prompt.
prompt = "Write a haiku about programming. Code is "
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Output :
Write a haiku about programming. Code is 100% free.
What is a haiku?
A haiku is a Japanese poem that consists of only three lines. The first line is called the “zen” line, and the second and third lines are called the
Dataset Preparation
We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
TRL will format input messages based on the model’s chat templates. They need to be represented as a list of dictionaries with the keys: role
and content
https://huggingface.co/datasets/HuggingFaceTB/smoltalk
This dataset is already in the required form. For example
[ { "content": "How many positive integers with four digits have a thousands digit of 2?", "role": "user" }, { "content": "Since the thousands digit must be 2, we have only one choice for that digit.\nFor the hundreds digit, we have 10 choices (0-9).\nFor the tens and units digits, we also have 10 choices each.\nTherefore, there are $1 \\times 10 \\times 10 \\times 10 = \\boxed{1000}$ positive integers with four digits that have a thousands digit of 2.\nThe answer is: 1000", "role": "assistant" } ]
Configuring the SFTTrainer
The SFTTrainer
is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
# Configure the SFTTrainer
sft_config = SFTConfig(
output_dir="./sft_output",
max_steps=1000, # Adjust based on dataset size and desired training duration
per_device_train_batch_size=4, # Set according to your GPU memory capacity
learning_rate=5e-5, # Common starting point for fine-tuning
logging_steps=10, # Frequency of logging training metrics
save_steps=100, # Frequency of saving model checkpoints
evaluation_strategy="steps", # Evaluate the model at regular intervals
eval_steps=50, # Frequency of evaluation
use_mps_device=(
True if device == "mps" else False
), # Use MPS for mixed precision training
hub_model_id=finetune_name, # Set a unique name for your model
)
# Initialize the SFTTrainer
trainer = SFTTrainer(
model=model,
args=sft_config,
train_dataset=ds["train"],
tokenizer=tokenizer,
eval_dataset=ds["test"],
)
Training the Model
With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model’s parameters to minimize this loss.
# Train the model
trainer.train()
# Save the model
trainer.save_model(f"./{finetune_name}")
Test the fine-tuned model on the same prompt
# Let's test the base model before training
prompt = "Write a haiku about programming"
# Format with template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)

You can also push the trained model to huggingface
trainer.push_to_hub(tags=finetune_tags)
Observations
Grad norm stabilizes, indicating well-behaved gradients.
The loss trend suggests that the model is improving.
For the complete code check this : https://github.com/nkalra0123/sft/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb