Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CLI for finetuning #85

Merged
merged 5 commits into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,37 +123,37 @@ In this updated schema, we use the `Field` class from `pydantic` to add descript
!!! note "Code, schema, and prompt"
We can run `openai_schema` to see exactly what the API will see, notice how the docstrings, attributes, types, and field descriptions are now part of the schema. This describes on this library's core philosophies.

```python hl_lines="2 3"
class UserDetails(OpenAISchema):
"Correctly extracted user information"
name: str = Field(..., description="User's full name")
age: int

UserDetails.openai_schema
```

```json hl_lines="3 8"
{
"name": "UserDetails",
"description": "Correctly extracted user information",
"parameters": {
"type": "object",
"properties": {
"name": {
"description": "User's full name",
"type": "string"
},
"age": {
"type": "integer"
}
},
"required": [
"age",
"name"
]
}
```python hl_lines="2 3"
class UserDetails(OpenAISchema):
"Correctly extracted user information"
name: str = Field(..., description="User's full name")
age: int

UserDetails.openai_schema
```

```json hl_lines="3 8"
{
"name": "UserDetails",
"description": "Correctly extracted user information",
"parameters": {
"type": "object",
"properties": {
"name": {
"description": "User's full name",
"type": "string"
},
"age": {
"type": "integer"
}
```
},
"required": [
"age",
"name"
]
}
}
```

### Section 3: Calling the ChatCompletion

Expand Down
89 changes: 89 additions & 0 deletions docs/finetune.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Using the Command Line Interface
The instructor CLI provides functionalities for managing fine-tuning jobs on OpenAI.

## Creating a Fine-Tuning Job

### View Jobs Options

```sh
$ instructor jobs --help

Usage: instructor jobs [OPTIONS] COMMAND [ARGS]...

Monitor and create fine tuning jobs

Options:
--help Show this message and exit.

Commands:
cancel Cancel a fine-tuning job.
create-from-file Create a fine-tuning job from a file.
create-from-id Create a fine-tuning job from an existing ID.
list Monitor the status of the most recent fine-tuning jobs.
```

### Create from File

The create-from-file command uploads and trains a model in a single step:

```sh
$ instructor jobs create-from-file transformed_data.jsonl
```

### Create from ID

The create-from-id command uses an uploaded file and trains a model

```sh
$ instructor files upload transformed_data.jsonl
$ instructor files list
...
$ instructor jobs create-from-file <file_id>
```


### Viewing Files and Jobs

#### Viewing Jobs

```sh
$ instructor jobs list

OpenAI Fine Tuning Job Monitoring
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ ┃ ┃ ┃ Completion ┃ ┃ ┃ ┃ ┃
┃ Job ID ┃ Status ┃ Creation Time ┃ Time ┃ Model Name ┃ File ID ┃ Epochs ┃ Base Model ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ ftjob-PWo6uwk… │ 🚫 cancelled │ 2023-08-23 │ N/A │ │ file-F7lJg6Z4… │ 3 │ gpt-3.5-turbo-… │
│ │ │ 23:10:54 │ │ │ │ │ │
│ ftjob-1whjva8… │ 🚫 cancelled │ 2023-08-23 │ N/A │ │ file-F7lJg6Z4… │ 3 │ gpt-3.5-turbo-… │
│ │ │ 22:47:05 │ │ │ │ │ │
│ ftjob-wGoBDld… │ 🚫 cancelled │ 2023-08-23 │ N/A │ │ file-F7lJg6Z4… │ 3 │ gpt-3.5-turbo-… │
│ │ │ 22:44:12 │ │ │ │ │ │
│ ftjob-yd5aRTc… │ ✅ succeeded │ 2023-08-23 │ 2023-08-23 │ ft:gpt-3.5-tur… │ file-IQxAUDqX… │ 3 │ gpt-3.5-turbo-… │
│ │ │ 14:26:03 │ 15:02:29 │ │ │ │ │
└────────────────┴──────────────┴────────────────┴────────────────┴─────────────────┴────────────────┴────────┴─────────────────┘
Automatically refreshes every 5 seconds, press Ctrl+C to exit
```


#### Viewing Files

```sh
$ instructor files list

OpenAI Files
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓
┃ File ID ┃ Size (bytes) ┃ Creation Time ┃ Filename ┃ Purpose ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩
│ file-0lw2BSNRUlXZXRRu2beCCWjl │ 369523 │ 2023-08-23 23:31:57 │ file │ fine-tune │
│ file-IHaUXcMEykmFUp1kt2puCDEq │ 369523 │ 2023-08-23 23:09:35 │ file │ fine-tune │
│ file-ja9vRBf0FydEOTolaa3BMqES │ 369523 │ 2023-08-23 22:42:29 │ file │ fine-tune │
│ file-F7lJg6Z47CREvmx4kyvyZ6Sn │ 369523 │ 2023-08-23 22:42:03 │ file │ fine-tune │
│ file-YUxqZPyJRl5GJCUTw3cNmA46 │ 369523 │ 2023-08-23 22:29:10 │ file │ fine-tune │
└───────────────────────────────┴──────────────┴─────────────────────┴──────────┴───────────┘
```


## Conclusion
The instructor CLI offers an intuitive interface for managing OpenAI's fine-tuning jobs and related files. By utilizing simple commands, you can create, monitor, and manage your fine-tuning tasks with ease. Feel free to explore further options and parameters by using the --help flag with any command.
Empty file added instructor/cli/__init__.py
Empty file.
11 changes: 11 additions & 0 deletions instructor/cli/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import typer
import instructor.cli.jobs as jobs
import instructor.cli.files as files

app = typer.Typer(
name="instructor-ft",
help="A CLI for fine-tuning OpenAI's models",
)

app.add_typer(jobs.app, name="jobs", help="Monitor and create fine tuning jobs")
app.add_typer(files.app, name="files", help="Manage files on OpenAI's servers")
123 changes: 123 additions & 0 deletions instructor/cli/files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
from typing import List
from typing_extensions import Annotated
from rich.live import Live
from rich.table import Table
from rich.spinner import Spinner
from rich.console import Console

from datetime import datetime
import openai
import typer
import time

app = typer.Typer()
console = Console()


# Sample response data
def generate_file_table(files: List[openai.File]) -> Table:
table = Table(
title="OpenAI Files",
)
table.add_column("File ID", style="dim")
table.add_column("Size (bytes)", justify="right")
table.add_column("Creation Time")
table.add_column("Filename")
table.add_column("Purpose")

for file in files:
table.add_row(
file["id"],
str(file["bytes"]),
str(datetime.fromtimestamp(file["created_at"])),
file["filename"],
file["purpose"],
)

return table


def get_files(limit: int = 5) -> List[openai.File]:
files = openai.File.list(limit=limit)["data"] # type: ignore
files = sorted(files, key=lambda x: x["created_at"], reverse=True)
return files[:limit]


def get_file_status(file_id: str) -> str:
response = openai.File.retrieve(file_id)
return response["status"]


@app.command(
help="Upload a file to OpenAI's servers, will monitor the upload status until it is processed",
)
def upload(
filepath: str = typer.Argument(..., help="Path to the file to upload"),
purpose: str = typer.Option("fine-tune", help="Purpose of the file"),
poll: int = typer.Option(5, help="Polling interval in seconds"),
):
with open(filepath, "rb") as file:
response = openai.File.create(file=file, purpose=purpose)
file_id = response["id"]
with console.status(f"Monitoring upload: {file_id}...") as status:
status.spinner_style = "dots"
while True:
file_status = get_file_status(file_id)
if file_status == "processed":
console.log(f"[bold green]File {file_id} uploaded successfully!")
break
time.sleep(poll)


@app.command(
help="Download a file from OpenAI's servers",
)
def download(
file_id: str = typer.Argument(..., help="ID of the file to download"),
output: str = typer.Argument(..., help="Output path for the downloaded file"),
):
with console.status(
f"[bold green]Downloading file {file_id}...", spinner="dots"
) as status:
content = openai.File.download(file_id)
with open(output, "wb") as file:
file.write(content)
console.log(f"[bold green]File {file_id} downloaded successfully!")


@app.command(
help="Delete a file from OpenAI's servers",
)
def delete(file_id: str = typer.Argument(..., help="ID of the file to delete")):
with console.status(
f"[bold red]Deleting file {file_id}...", spinner="dots"
) as status:
try:
openai.File.delete(file_id)
console.log(f"[bold red]File {file_id} deleted successfully!")
except Exception as e:
console.log(f"[bold red]Error deleting file {file_id}: {e}")
return


@app.command(
help="Monitor the status of a file on OpenAI's servers",
)
def status(
file_id: str = typer.Argument(..., help="ID of the file to check the status of")
):
with console.status(f"Monitoring status of file {file_id}...") as status:
while True:
file_status = get_file_status(file_id)
status.update(f"File status: {file_status}")
if file_status in ["pending", "processed"]:
break
time.sleep(5)


@app.command(
help="List the files on OpenAI's servers",
)
def list(limit: int = typer.Option(5, help="Limit the number of files to list")):
files = get_files(limit=limit)
console.log(generate_file_table(files))
Loading
Loading