Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
vedpatwardhan committed Sep 2, 2024
1 parent ed2f4df commit 221d657
Showing 1 changed file with 258 additions and 0 deletions.
258 changes: 258 additions & 0 deletions python/dataset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,262 @@
title: 'dataset'
---

<a id="dataset.Dataset"></a>

## Dataset

```python
class Dataset()
```

<a id="dataset.Dataset.__init__"></a>

---

### \_\_init\_\_

```python
def __init__(name: str = None,
queries: List[Union[str, ChatCompletion]] = None,
extra_fields: Dict[str, List[Any]] = None,
data: List[Dict[str, Union[ChatCompletion, Any]]] = None,
auto_sync: bool = False,
api_key: Optional[str] = None)
```

Initialize a local dataset of LLM queries.

**Arguments**:

- `name` - The name of the dataset.

- `queries` - List of LLM queries to initialize the dataset with.

- `extra_fields` - Dictionary of lists for arbitrary extra fields contained
within the dataset.

- `data` - If neither `queries` nor `extra_fields` are specified, then this can
be specified instead, which is simply a list of dicts with each first key as
"query" and all other keys otherwise coming from the `extra_fields`. This is
the internal representation used by the class.

- `auto_sync` - Whether to automatically keep this dataset fully synchronized
with the upstream variant at all times.

- `api_key` - API key for accessing the Unify API. If None, it attempts to
retrieve the API key from the environment variable UNIFY_KEY. Defaults to
None.


**Raises**:

- `UnifyError` - If the API key is missing.

<a id="dataset.Dataset.from_upstream"></a>

---

### from\_upstream

```python
@staticmethod
def from_upstream(name: str,
auto_sync: bool = False,
api_key: Optional[str] = None)
```

Initialize a local dataset of LLM queries, from the upstream dataset.

**Arguments**:

- `name` - The name of the dataset.

- `auto_sync` - Whether to automatically keep this dataset fully synchronized
with the upstream variant at all times.

- `api_key` - API key for accessing the Unify API. If None, it attempts to
retrieve the API key from the environment variable UNIFY_KEY. Defaults to
None.


**Raises**:

- `UnifyError` - If the API key is missing.

<a id="dataset.Dataset.from_file"></a>

---

### from\_file

```python
@staticmethod
def from_file(filepath: str,
name: str = None,
auto_sync: bool = False,
api_key: Optional[str] = None)
```

Loads the dataset from a local .jsonl filepath.

**Arguments**:

- `filepath` - Filepath (.jsonl) to load the dataset from.

- `name` - The name of the dataset.

- `auto_sync` - Whether to automatically keep this dataset fully synchronized
with the upstream variant at all times.

- `api_key` - API key for accessing the Unify API. If None, it attempts to
retrieve the API key from the environment variable UNIFY_KEY. Defaults to
None.

<a id="dataset.Dataset.upload"></a>

---

### upload

```python
def upload(overwrite=False)
```

Uploads all unique local data in the dataset to the user account upstream.
This function will not download any uniques from upstream.
Use `sync` to synchronize and superset the datasets in both directions.
Set `overwrite=True` to disregard any pre-existing upstream data.

**Arguments**:

- `overwrite` - Whether to overwrite the upstream dataset if it already exists

<a id="dataset.Dataset.download"></a>

---

### download

```python
def download(overwrite=False)
```

Downloads all unique upstream data from the user account to the local dataset.
This function will not upload any unique values stored locally.
Use `sync` to synchronize and superset the datasets in both directions.
Set `overwrite=True` to disregard any pre-existing data stored in this class.

**Arguments**:

- `overwrite` - Whether to overwrite the local data, if any already exists

<a id="dataset.Dataset.sync"></a>

---

### sync

```python
def sync()
```

Synchronize the dataset in both directions, downloading any values missing
locally, and uploading any values missing from upstream in the account.

<a id="dataset.Dataset.upstream_diff"></a>

---

### upstream\_diff

```python
def upstream_diff()
```

Prints the difference between the local dataset and the upstream dataset.

<a id="dataset.Dataset.save_to_file"></a>

---

### save\_to\_file

```python
def save_to_file(filepath: str)
```

Saves to dataset to a local .jsonl filepath.

**Arguments**:

- `filepath` - Filepath (.jsonl) to save the dataset to.

<a id="dataset.Dataset.add"></a>

---

### add

```python
def add(other: __class__)
```

Adds another dataset to this one, return a new Dataset instance, with this
new dataset receiving all unique queries from the other added dataset.

**Arguments**:

- `other` - The other dataset being added to this one.

<a id="dataset.Dataset.sub"></a>

---

### sub

```python
def sub(other: __class__)
```

Subtracts another dataset from this one, return a new Dataset instance, with
this new dataset losing all queries from the other subtracted dataset.

**Arguments**:

- `other` - The other dataset being added to this one.

<a id="dataset.Dataset.__iadd__"></a>

---

### \_\_iadd\_\_

```python
def __iadd__(other)
```

Adds another dataset to this one, with this dataset receiving all unique queries
from the other added dataset.

**Arguments**:

- `other` - The other dataset being added to this one.

<a id="dataset.Dataset.__isub__"></a>

---

### \_\_isub\_\_

```python
def __isub__(other)
```

Subtracts another dataset from this one, with this dataset losing all queries
from the other subtracted dataset.

**Arguments**:

- `other` - The other dataset being added to this one.

<a id="exceptions"></a>

0 comments on commit 221d657

Please sign in to comment.