updated docs

unifyai · Sep 2, 2024 · 221d657 · 221d657
1 parent ed2f4df
commit 221d657
Showing 1 changed file with 258 additions and 0 deletions.
diff --git a/python/dataset.mdx b/python/dataset.mdx
@@ -2,4 +2,262 @@
 title: 'dataset'
 ---
 
+<a id="dataset.Dataset"></a>
+
+## Dataset
+
+```python
+class Dataset()
+```
+
+<a id="dataset.Dataset.__init__"></a>
+
+---
+
+### \_\_init\_\_
+
+```python
+def __init__(name: str = None,
+             queries: List[Union[str, ChatCompletion]] = None,
+             extra_fields: Dict[str, List[Any]] = None,
+             data: List[Dict[str, Union[ChatCompletion, Any]]] = None,
+             auto_sync: bool = False,
+             api_key: Optional[str] = None)
+```
+
+Initialize a local dataset of LLM queries.
+
+**Arguments**:
+
+- `name` - The name of the dataset.
+
+- `queries` - List of LLM queries to initialize the dataset with.
+
+- `extra_fields` - Dictionary of lists for arbitrary extra fields contained
+  within the dataset.
+
+- `data` - If neither `queries` nor `extra_fields` are specified, then this can
+  be specified instead, which is simply a list of dicts with each first key as
+  "query" and all other keys otherwise coming from the `extra_fields`. This is
+  the internal representation used by the class.
+
+- `auto_sync` - Whether to automatically keep this dataset fully synchronized
+  with the upstream variant at all times.
+
+- `api_key` - API key for accessing the Unify API. If None, it attempts to
+  retrieve the API key from the environment variable UNIFY_KEY. Defaults to
+  None.
+
+
+**Raises**:
+
+- `UnifyError` - If the API key is missing.
+
+<a id="dataset.Dataset.from_upstream"></a>
+
+---
+
+### from\_upstream
+
+```python
+@staticmethod
+def from_upstream(name: str,
+                  auto_sync: bool = False,
+                  api_key: Optional[str] = None)
+```
+
+Initialize a local dataset of LLM queries, from the upstream dataset.
+
+**Arguments**:
+
+- `name` - The name of the dataset.
+
+- `auto_sync` - Whether to automatically keep this dataset fully synchronized
+  with the upstream variant at all times.
+
+- `api_key` - API key for accessing the Unify API. If None, it attempts to
+  retrieve the API key from the environment variable UNIFY_KEY. Defaults to
+  None.
+
+
+**Raises**:
+
+- `UnifyError` - If the API key is missing.
+
+<a id="dataset.Dataset.from_file"></a>
+
+---
+
+### from\_file
+
+```python
+@staticmethod
+def from_file(filepath: str,
+              name: str = None,
+              auto_sync: bool = False,
+              api_key: Optional[str] = None)
+```
+
+Loads the dataset from a local .jsonl filepath.
+
+**Arguments**:
+
+- `filepath` - Filepath (.jsonl) to load the dataset from.
+
+- `name` - The name of the dataset.
+
+- `auto_sync` - Whether to automatically keep this dataset fully synchronized
+  with the upstream variant at all times.
+
+- `api_key` - API key for accessing the Unify API. If None, it attempts to
+  retrieve the API key from the environment variable UNIFY_KEY. Defaults to
+  None.
+
+<a id="dataset.Dataset.upload"></a>
+
+---
+
+### upload
+
+```python
+def upload(overwrite=False)
+```
+
+Uploads all unique local data in the dataset to the user account upstream.
+This function will not download any uniques from upstream.
+Use `sync` to synchronize and superset the datasets in both directions.
+Set `overwrite=True` to disregard any pre-existing upstream data.
+
+**Arguments**:
+
+- `overwrite` - Whether to overwrite the upstream dataset if it already exists
+
+<a id="dataset.Dataset.download"></a>
+
+---
+
+### download
+
+```python
+def download(overwrite=False)
+```
+
+Downloads all unique upstream data from the user account to the local dataset.
+This function will not upload any unique values stored locally.
+Use `sync` to synchronize and superset the datasets in both directions.
+Set `overwrite=True` to disregard any pre-existing data stored in this class.
+
+**Arguments**:
+
+- `overwrite` - Whether to overwrite the local data, if any already exists
+
+<a id="dataset.Dataset.sync"></a>
+
+---
+
+### sync
+
+```python
+def sync()
+```
+
+Synchronize the dataset in both directions, downloading any values missing
+locally, and uploading any values missing from upstream in the account.
+
+<a id="dataset.Dataset.upstream_diff"></a>
+
+---
+
+### upstream\_diff
+
+```python
+def upstream_diff()
+```
+
+Prints the difference between the local dataset and the upstream dataset.
+
+<a id="dataset.Dataset.save_to_file"></a>
+
+---
+
+### save\_to\_file
+
+```python
+def save_to_file(filepath: str)
+```
+
+Saves to dataset to a local .jsonl filepath.
+
+**Arguments**:
+
+- `filepath` - Filepath (.jsonl) to save the dataset to.
+
+<a id="dataset.Dataset.add"></a>
+
+---
+
+### add
+
+```python
+def add(other: __class__)
+```
+
+Adds another dataset to this one, return a new Dataset instance, with this
+new dataset receiving all unique queries from the other added dataset.
+
+**Arguments**:
+
+- `other` - The other dataset being added to this one.
+
+<a id="dataset.Dataset.sub"></a>
+
+---
+
+### sub
+
+```python
+def sub(other: __class__)
+```
+
+Subtracts another dataset from this one, return a new Dataset instance, with
+this new dataset losing all queries from the other subtracted dataset.
+
+**Arguments**:
+
+- `other` - The other dataset being added to this one.
+
+<a id="dataset.Dataset.__iadd__"></a>
+
+---
+
+### \_\_iadd\_\_
+
+```python
+def __iadd__(other)
+```
+
+Adds another dataset to this one, with this dataset receiving all unique queries
+from the other added dataset.
+
+**Arguments**:
+
+- `other` - The other dataset being added to this one.
+
+<a id="dataset.Dataset.__isub__"></a>
+
+---
+
+### \_\_isub\_\_
+
+```python
+def __isub__(other)
+```
+
+Subtracts another dataset from this one, with this dataset losing all queries
+from the other subtracted dataset.
+
+**Arguments**:
+
+- `other` - The other dataset being added to this one.
+
 <a id="exceptions"></a>