Skip to content

great-luao/my_min_gpt

Repository files navigation

MY_MIN_GPT

Introduction

This repository contains the implementation of my_min_gpt, inspired by the YouTube video: "Let's build GPT: from scratch, in code, spelled out".

Some of the code structure is derived from the video. What's different is that my input data only consists of 5,000 movie reviews, each labeled with a sentiment (positive or negative).

Objective

The goal is to enable my GPT model to complete a movie review given prompts like "好看" (good) or "难看" (bad).

Branches

Currently, there are two branches in the repository. Both branches use the same model but differ in how the data is handled during traning.

  • The master branch combines all reviews together, treating them as one large text (like a book).
  • The hugging_face branch processes each review individually, with a different tokenization approach. This branch is still under construction.

Results So Far

Using my favorite (I have trained several models) custom small model with the following configuration:

  • 9M parameters
  • 12 blocks
  • 12 heads
  • 252 embedding dimensions
  • 64 block size

which loss graphs look like:

train/loss

val/loss

I have achieved the following results:

Given the prompt "好看" (good) and "难看" (bad), the model outputs 10 reviews:

Output for "好看"

Output for "难看"

The reproduction of negative reviews seems less accurate compared to positive ones. This might be due to an imbalance in the training data, and I am actively working on improving it.

Building GPT

To produce my own GPT model, the following files are essential:

  • gpt_model.py: Defines the architecture of the GPT model.
  • train.py: Handles the training process.

Additionally, data/prepare.py is used to preprocess the training data (CSV file), and it's straightforward to understand.

To train the model yourself, follow these steps:

cd data
python prepare.py
cd ..
python train.py

Ensure that each file runs correctly, and configure the hyperparameters before starting the training.

Inference

To generate text with the trained model, use inference.py. Set the model path in the file, then run the file with a command like this:

python inference.py \
    --start="好看" \
    --num_samples=5 --max_new_tokens=100

To-Do

  • Improve the performance of negative sentiment generation.
  • Finalize the hugging_face branch.

References

YouTube Videos:

Papers:

  • Language Models Are Unsupervised Multitask Learners (GPT-2 paper)
  • Improving Language Understanding by Generative Pre-Training (GPT-1 paper)
  • Attention Is All You Need (Transformer paper)

GitHub Repos:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published