Skip to content

nearai fine-tuning guide

As a part of the nearai project, we provide a collection of tools to fine-tune and evaluate models. Fine-tuning is a set of techniques to tune model parameters in a parameter-efficient way to improve model performance on specific tasks. More commonly, fine-tuning is used to modify the behavior of a pre-trained model. Some examples of this are to produce structured output (JSON, XLM, etc), to produce stylized output (poetic, neutral, etc), or to respond properly to instruction based prompts.

In this guide, we will walk through the process of fine-tuning llama-3-8b-instruct on the orca-math-word-problems-200k dataset to improve its performance on the gsm8k benchmark.

Step 1: Create the dataset

The two datasets we will be using are orca-math-word-problems-200k and gsm8k. Both datasets are a collection of word based math problems + answers. For convenience, we will download the datasets from huggingface, save it to disk, and then upload it into the nearai registry.

import re
from textwrap import dedent
from datasets import load_dataset, concatenate_datasets, DatasetDict
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

ds_math_word_problems = load_dataset("HuggingFaceH4/orca-math-word-problems-200k")
ds_gsm8k = load_dataset("openai/gsm8k", "main")['train']

## create new column by concatenating the 'question' and 'answer' columns
def to_q_a(x):
    q_n_a = tokenizer.apply_chat_template(
        [
            {
                "role": "system",
                "content": "You are a helpful assistant. Please answer the math question."
            },
            {
                "role": "user",
                "content": x["question"]
            },
            {
                "role": "assistant",
                "content": x["answer"]
            }
        ],
        tokenize=False
    )
    return {
        "question_and_answer": q_n_a
    }
ds_math_word_problems = ds_math_word_problems.map(to_q_a)

def to_q_a_gsm8k(x):
    q_n_a = tokenizer.apply_chat_template(
        [
            {
                "role": "system",
                "content": "You are a helpful assistant. Please answer the math question."
            },
            {
                "role": "user",
                "content": x["question"]
            },
            {
                "role": "assistant",
                "content": re.sub(r"<<.*?>>", "", x["answer"])
            }
        ],
        tokenize=False
    )
    return {
        "question_and_answer": q_n_a
    }
ds_gsm8k = ds_gsm8k.map(to_q_a_gsm8k)

## combine the datasets
ds_combined = concatenate_datasets([ds_math_word_problems['train_sft'], ds_gsm8k])
ds_combined = ds_combined.remove_columns([col for col in ds_combined.column_names if col != "question_and_answer"])

# Add a split on 'train'
ds_combined = ds_combined.train_test_split(test_size=0.0001, seed=42)
ds_dict = DatasetDict({
    'train': ds_combined['train'],
    'validation': ds_combined['test']
})
ds_dict.save_to_disk("orca_math_gsm8k_train")
nearai registry upload ./orca_math_gsm8k_train

Step 2: Fine-tune the model

Under the hood, nearai uses torchtune to manage the fine-tuning process. To launch a fine-tuning job you can use nearai finetune.

Here is the command we used to fine-tune llama-3-8b-instruct on our combined orca-math-word-problems-200k and gsm8k dataset on an 8-GPU machine:

poetry run python3 -m nearai finetune start \
    --model llama-3-8b-instruct \
    --format llama3-8b \
    --tokenizer llama-3-8b-instruct/tokenizer.model \
    --dataset ./orca_math_gsm8k_train \
    --method nearai.finetune.text_completion.dataset \
    --column question_and_answer \
    --split train \
    --num_procs 8

To change the configuration of the fine-tuning job, edit etc/finetune/llama3-8b.yml.

Included in the output of the command is the path to the fine-tuned model checkpoint. In this case, the path was ~.nearai/finetune/job-2024-08-29_20-58-08-207756662/checkpoint_output. The path may/will be different based on the time you run the command.

Step 3: Serve the fine-tuned model

To serve fine-tuned models, we use vllm. Once we serve the fine-tuned model + the baseline model, we will benchmark it against both.

poetry run python3 -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Meta-Llama-3-8B-Instruct \
    --enable-lora \
    --lora-modules mynewlora=<path_to_checkpoint> \
    --tensor-parallel 8

Now we will run the gsm8k benchmark on both the baseline model and the fine-tuned model using nearai benchmark. The solvers will call our fine-tuned model and the baseline model through the vllm server.

python3 -m nearai benchmark run \
    cmrfrd.near/gsm8k/0.0.2 \
    GSM8KSolverStrategy \
    --subset test \
    --model local::meta-llama/Meta-Llama-3-8B-Instruct

python3 -m nearai benchmark run \
    cmrfrd.near/gsm8k/0.0.2 \
    GSM8KSolverStrategy \
    --subset test \
    --model local::mynewlora

And we can see the results of the benchmark. And we can see that the fine-tuned model performs better than the baseline model.

# meta-llama/Meta-Llama-3-8B-Instruct
Correct/Seen - 1061/1319 - 80.44%

# fine tuned llama3-8b-instruct
Correct/Seen - 966/1319 - 73.24%

From these results, we can see that our fine-tuned model needs improvement to perform better than the baseline model.