Skip to content

Commit

Permalink
datasets examples in shell/python
Browse files Browse the repository at this point in the history
  • Loading branch information
tomatillos committed Aug 28, 2024
1 parent c814a73 commit 3212e99
Showing 1 changed file with 38 additions and 34 deletions.
72 changes: 38 additions & 34 deletions benchmarking/datasets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,12 @@ This data is especially important, as it represents the *true distribution* obse
before deployment.

It's easy to extract any prompt queries previously made to the API,
via the [X]() endpoint, as explained [here]().
For example, the last 100 prompts for subject `Y` can be extracted as follows:
via the [`prompt_history`](benchmarks/get_prompt_history) endpoint, as explained [here]().
For example, the last 100 prompts with the tag `physics` can be extracted as follows:

CODE
```python
phyiscs_prompts = client.prompt_history(tag="phyiscs", limit=100)
```

We can then add this to the local `.jsonl` file as follows:

Expand All @@ -64,19 +66,19 @@ CODE
## Uploading Datasets

As shown above, the representation for prompt datasets is `.jsonl`,
which is a effectively a list of json structures (or in Python, a list of dicts).
which is a file format where each line is a json object (or in Python, a list of dicts).

Lets upload our `english_language.jsonl` dataset.

We can do this via the REST API as follows:

```
import requests
url = "https://api.unify.ai/v0/dataset"
headers = {"Authorization": "Bearer $UNIFY_API_KEY",}
data = {"name": "english_language"}
files = {"file": open('/path/to/english_language.jsonl' ,'rb')}
response = requests.post(url, data=data, files=files, headers=headers)
```shell
curl --request POST \
--url 'https://api.unify.ai/v0/dataset' \
--header 'Authorization: Bearer <UNIFY_KEY>' \
--header 'Content-Type: multipart/form-data' \
--form 'file=@english_language.jsonl'\
--form 'name=english_language'
```

Or we can create a `Dataset` instance in Python,
Expand All @@ -90,31 +92,32 @@ We can delete the dataset just as easily as we created it.

First, using the REST API:

```
import requests
url = "https://api.unify.ai/v0/dataset"
headers = {"Authorization": "Bearer $UNIFY_API_KEY"}
data = {"name": "english_language"}
response = requests.delete(url, params=data, headers=headers)
```shell
curl --request DELETE \
--url 'https://api.unify.ai/v0/dataset?name=english_language' \
--header 'Authorization: Bearer <UNIFY_KEY>'

```

Or via Python:

CODE

```python
client.datasets.delete(name="english_language")
```
## Listing Datasets

We can retrieve a list of our uploaded datasets using the `/dataset/list` endpoint.

```shell
curl --request GET \
--url 'https://api.unify.ai/v0/dataset/list' \
--header 'Authorization: Bearer <UNIFY_KEY>'
```
import requests
url = "https://api.unify.ai/v0/dataset/list"
headers = {"Authorization": "Bearer $UNIFY_API_KEY"}
response = requests.get(url, headers=headers)
print(response.text)
```


```python
datasets = client.datasets.list()
print(datasets)
```


## Renaming Datasets
Expand All @@ -126,23 +129,24 @@ and `english language`.
We can easily rename the dataset without deleting and re-uploading,
via the following REST API command:

```
import requests
url = "https://api.unify.ai/v0/dataset/rename"
headers = {"Authorization": "Bearer $UNIFY_API_KEY"}
data = {"name": "english", "new_name": "english_literature"}
response = requests.post(url, params=data, headers=headers)
```shell
curl --request POST \
--url 'https://api.unify.ai/v0/dataset/rename?name=english&new_name=english_literature' \
--header 'Authorization: Bearer $UNIFY_KEY'

```

Or via Python:

CODE
```python
client.datasets.rename(name="english", new_name="english_literature")
```

## Appending to Datasets

As explained above, we might want to add to an existing dataset, either because we have
[generated some synthetic examples](), or perhaps because we have some relevant
[production traffic]().
[production traffic](datasets#production-data).

In the examples above, we simply appended to these datasets locally,
before then uploading the full `.jsonl` file. However,
Expand Down

0 comments on commit 3212e99

Please sign in to comment.