Skip to content

Commit

Permalink
Docs: Clean up readme and module docs (#316)
Browse files Browse the repository at this point in the history
  • Loading branch information
aaronsteers authored Jul 30, 2024
1 parent aee6273 commit 1bcb440
Show file tree
Hide file tree
Showing 6 changed files with 228 additions and 58 deletions.
61 changes: 7 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,8 @@ PyAirbyte brings the power of Airbyte to every Python developer. PyAirbyte provi
[![PyPI version](https://badge.fury.io/py/airbyte.svg)](https://badge.fury.io/py/airbyte)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/airbyte)](https://pypi.org/project/airbyte/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/airbyte)](https://pypi.org/project/airbyte/)
<!-- [![PyPI - License](https://img.shields.io/pypi/l/airbyte)](https://pypi.org/project/airbyte/) -->
[![PyPI - Wheel](https://img.shields.io/pypi/wheel/airbyte)](https://pypi.org/project/airbyte/)
<!-- [![PyPI - Status](https://img.shields.io/pypi/status/airbyte)](https://pypi.org/project/airbyte/) -->
[![PyPI - Implementation](https://img.shields.io/pypi/implementation/airbyte)](https://pypi.org/project/airbyte/)
[![PyPI - Format](https://img.shields.io/pypi/format/airbyte)](https://pypi.org/project/airbyte/)
[![Star on GitHub](https://img.shields.io/github/stars/airbytehq/pyairbyte.svg?style=social&label=★%20on%20GitHub)](https://github.com/airbytehq/pyairbyte)

- [Getting Started](#getting-started)
- [Secrets Management](#secrets-management)
- [Connector compatibility](#connector-compatibility)
- [Contributing](#contributing)
- [Frequently asked Questions](#frequently-asked-questions)

## Getting Started

Watch this [Getting Started Loom video](https://www.loom.com/share/3de81ca3ce914feca209bf83777efa3f?sid=8804e8d7-096c-4aaa-a8a4-9eb93a44e850) or run one of our Quickstart tutorials below to see how you can use PyAirbyte in your python code.
Expand All @@ -29,62 +18,26 @@ Watch this [Getting Started Loom video](https://www.loom.com/share/3de81ca3ce914
* [GitHub](https://github.com/airbytehq/quickstarts/blob/main/pyairbyte_notebooks/PyAirbyte_Github_Incremental_Demo.ipynb)
* [Postgres (cache)](https://github.com/airbytehq/quickstarts/blob/main/pyairbyte_notebooks/PyAirbyte_Postgres_Custom_Cache_Demo.ipynb)


## Secrets Management

PyAirbyte can auto-import secrets from the following sources:

1. Environment variables.
2. Variables defined in a local `.env` ("Dotenv") file.
3. [Google Colab secrets](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75).
4. Manual entry via [`getpass`](https://docs.python.org/3.9/library/getpass.html).

_Note: You can also build your own secret manager by subclassing the `CustomSecretManager` implementation. For more information, see the `airbyte.secrets.CustomSecretManager` class definiton._

### Retrieving Secrets

```python
import airbyte as ab

source = ab.get_source("source-github")
source.set_config(
"credentials": {
"personal_access_token": ab.get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"),
}
)
```

By default, PyAirbyte will search all available secrets sources. The `get_secret()` function also accepts an optional `sources` argument of specific source names (`SecretSourceEnum`) and/or secret manager objects to check.

By default, PyAirbyte will prompt the user for any requested secrets that are not provided via other secret managers. You can disable this prompt by passing `allow_prompt=False` to `get_secret()`.

For more information, see the `airbyte.secrets` module.

### Secrets Auto-Discovery

If you have a secret matching an expected name, PyAirbyte will automatically use it. For example, if you have a secret named `GITHUB_PERSONAL_ACCESS_TOKEN`, PyAirbyte will automatically use it when configuring the GitHub source.

The naming convention for secrets is as `{CONNECTOR_NAME}_{PROPERTY_NAME}`, for instance `SNOWFLAKE_PASSWORD` and `BIGQUERY_CREDENTIALS_PATH`.

PyAirbyte will also auto-discover secrets for interop with hosted Airbyte: `AIRBYTE_CLOUD_API_URL`, `AIRBYTE_CLOUD_API_KEY`, etc.

## Contributing

To learn how you can contribute to PyAirbyte, please see our [PyAirbyte Contributors Guide](./CONTRIBUTING.md).
To learn how you can contribute to PyAirbyte, please see our [PyAirbyte Contributors Guide](./docs/CONTRIBUTING.md).

## Frequently asked Questions

**1. Does PyAirbyte replace Airbyte?**
No.
No. PyAirbyte is a Python library that allows you to use Airbyte connectors in Python, but it does not have orchestration
or scheduling capabilities, nor does is provide logging, alerting, or other features for managing pipelines in
production. Airbyte is a full-fledged data integration platform that provides connectors, orchestration, and scheduling capabilities.

**2. What is the PyAirbyte cache? Is it a destination?**
Yes, you can think of it as a built-in destination implementation, but we avoid the word "destination" in our docs to prevent confusion with our certified destinations list [here](https://docs.airbyte.com/integrations/destinations/).
Yes and no. You can think of it as a built-in destination implementation, but we avoid the word "destination" in our docs to prevent confusion with our certified destinations list [here](https://docs.airbyte.com/integrations/destinations/).

**3. Does PyAirbyte work with data orchestration frameworks like Airflow, Dagster, and Snowpark,**
Yes, it should. Please give it a try and report any problems you see. Also, drop us a note if works for you!

**4. Can I use PyAirbyte to develop or test when developing Airbyte sources?**
Yes, you can, but only for Python-based sources.
Yes, you can. PyAirbyte makes it easy to test connectors in Python, and you can use it to develop new local connectors
as well as existing already-published ones.

**5. Can I develop traditional ETL pipelines with PyAirbyte?**
Yes. Just pick the cache type matching the destination - like SnowflakeCache for landing data in Snowflake.
Expand Down
120 changes: 118 additions & 2 deletions airbyte/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,125 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
"""PyAirbyte brings Airbyte ELT to every Python developer.
.. include:: ../README.md
PyAirbyte brings the power of Airbyte to every Python developer. PyAirbyte provides a set of
utilities to use Airbyte connectors in Python.
## API Reference
[![PyPI version](https://badge.fury.io/py/airbyte.svg)](https://badge.fury.io/py/airbyte)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/airbyte)](https://pypi.org/project/airbyte/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/airbyte)](https://pypi.org/project/airbyte/)
[![Star on GitHub](https://img.shields.io/github/stars/airbytehq/pyairbyte.svg?style=social&label=★%20on%20GitHub)](https://github.com/airbytehq/pyairbyte)
# Getting Started
## Reading Data
You can connect to any of [hundreds of sources](https://docs.airbyte.com/integrations/sources/)
using the `get_source` method. You can then read data from sources using `Source.read` method.
```python
from airbyte import get_source
source = get_source(
"source-faker",
config={},
)
read_result = source.read()
for record in read_result["users"].records:
print(record)
```
For more information, see the `airbyte.sources` module.
## Writing to SQL Caches
Data can be written to caches using a number of SQL-based cache implementations, including
Postgres, BigQuery, Snowflake, DuckDB, and MotherDuck. If you do not specify a cache, PyAirbyte
will automatically use a local DuckDB cache by default.
For more information, see the `airbyte.caches` module.
## Writing to Destination Connectors
Data can be written to destinations using the `Destination.write` method. You can connect to
destinations using the `get_destination` method. PyAirbyte supports all Airbyte destinations, but
Docker is required on your machine in order to run Java-based destinations.
**Note:** When loading to a SQL database, we recommend using SQL cache (where available,
[see above](#writing-to-sql-caches)) instead of a destination connector. This is because SQL caches
are Python-native and therefor more portable when run from different Python-based environments which
might not have Docker container support. Destinations in PyAirbyte are uniquely suited for loading
to non-SQL platforms such as vector stores and other reverse ETL-type use cases.
For more information, see the `airbyte.destinations` module and the full list of destination
connectors [here](https://docs.airbyte.com/integrations/destinations/).
# PyAirbyte API
## Importing as `ab`
Most examples in the PyAirbyte documentation use the `import airbyte as ab` convention. The `ab`
alias is recommended, making code more concise and readable. When getting started, this
also saves you from digging in submodules to find the classes and functions you need, since
frequently-used classes and functions are available at the top level of the `airbyte` module.
## Navigating the API
While many PyAirbyte classes and functions are available at the top level of the `airbyte` module,
you can also import classes and functions from submodules directly. For example, while you can
import the `Source` class from `airbyte`, you can also import it from the `sources` submodule like
this:
```python
from airbyte.sources import Source
```
Whether you import from the top level or from a submodule, the classes and functions are the same.
We expect that most users will import from the top level when getting started, and then import from
submodules when they are deploying more complex implementations.
For quick reference, top-Level modules are listed in the left sidebar of this page.
# Other Resources
- [PyAirbyte GitHub Readme](https://github.com/airbytehq/pyairbyte)
- [PyAirbyte Issue Tracker](https://github.com/airbytehq/pyairbyte/issues)
- [Frequently Asked Questions](https://github.com/airbytehq/PyAirbyte/blob/main/docs/faq.md)
- [PyAirbyte Contributors Guide](https://github.com/airbytehq/PyAirbyte/blob/main/docs/CONTRIBUTING.md)
- [GitHub Releases](https://github.com/airbytehq/PyAirbyte/releases)
----------------------
# API Reference
Below is a list of all classes, functions, and modules available in the top-level `airbyte`
module. (This is a long list!) If you are just starting out, we recommend beginning by selecting a
submodule to navigate to from the left sidebar or from the list below:
Each module
has its own documentation and code samples related to effectively using the related capabilities.
- **`airbyte.cloud`** - Working with Airbyte Cloud, including running jobs remotely.
- **`airbyte.caches`** - Working with caches, including how to inspect a cache and get data from it.
- **`airbyte.datasets`** - Working with datasets, including how to read from datasets and convert to
other formats, such as Pandas, Arrow, and LLM Document formats.
- **`airbyte.destinations`** - Working with destinations, including how to write to Airbyte
destinations connectors.
- **`airbyte.documents`** - Working with LLM documents, including how to convert records into
document formats, for instance, when working with AI libraries like LangChain.
- **`airbyte.exceptions`** - Definitions of all exception and warning classes used in PyAirbyte.
- **`airbyte.experimental`** - Experimental features and utilities that do not yet have a stable
API.
- **`airbyte.records`** - Internal record handling classes.
- **`airbyte.results`** - Documents the classes returned when working with results from
`Source.read` and `Destination.write`
- **`airbyte.secrets`** - Tools for managing secrets in PyAirbyte.
- **`airbyte.sources`** - Tools for creating and reading from Airbyte sources. This includes
`airbyte.source.get_source` to declare a source, `airbyte.source.Source.read` for reading data,
and `airbyte.source.Source.get_records()` to peek at records without caching or writing them
directly.
----------------------
"""

Expand Down
63 changes: 62 additions & 1 deletion airbyte/secrets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,66 @@
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
"""Secrets management for PyAirbyte."""
"""Secrets management for PyAirbyte.
PyAirbyte provides a secrets management system that allows you to securely store and retrieve
sensitive information. This module provides the secrets functionality.
## Secrets Management
PyAirbyte can auto-import secrets from the following sources:
1. Environment variables.
2. Variables defined in a local `.env` ("Dotenv") file.
3. [Google Colab secrets](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75).
4. Manual entry via [`getpass`](https://docs.python.org/3.9/library/getpass.html).
**Note:** You can also build your own secret manager by subclassing the `CustomSecretManager`
implementation. For more information, see the `airbyte.secrets.CustomSecretManager` reference docs.
### Retrieving Secrets
To retrieve a secret, use the `get_secret()` function. For example:
```python
import airbyte as ab
source = ab.get_source("source-github")
source.set_config(
"credentials": {
"personal_access_token": ab.get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"),
}
)
```
By default, PyAirbyte will search all available secrets sources. The `get_secret()` function also
accepts an optional `sources` argument of specific source names (`SecretSourceEnum`) and/or secret
manager objects to check.
By default, PyAirbyte will prompt the user for any requested secrets that are not provided via other
secret managers. You can disable this prompt by passing `allow_prompt=False` to `get_secret()`.
### Secrets Auto-Discovery
If you have a secret matching an expected name, PyAirbyte will automatically use it. For example, if
you have a secret named `GITHUB_PERSONAL_ACCESS_TOKEN`, PyAirbyte will automatically use it when
configuring the GitHub source.
The naming convention for secrets is as `{CONNECTOR_NAME}_{PROPERTY_NAME}`, for instance
`SNOWFLAKE_PASSWORD` and `BIGQUERY_CREDENTIALS_PATH`.
PyAirbyte will also auto-discover secrets for interop with hosted Airbyte: `AIRBYTE_CLOUD_API_URL`,
`AIRBYTE_CLOUD_API_KEY`, etc.
## Custom Secret Managers
If you need to build your own secret manager, you can subclass the
`airbyte.secrets.CustomSecretManager` class. This allows you to build a custom secret manager that
can be used with the `get_secret()` function, securely storing and retrieving secrets as needed.
## API Reference
_Below are the classes and functions available in the `airbyte.secrets` module._
"""

from __future__ import annotations

Expand Down
4 changes: 3 additions & 1 deletion CONTRIBUTING.md → docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@ Regular documentation lives in the `/docs` folder. Based on the doc strings of p
To generate the documentation, run:

```console
poetry run generate-docs
poe generate-docs
```

or `poetry run poe generate-docs` if you don't have [Poe](https://poethepoet.natn.io/index.html) installed.

The `generate-docs` CLI command is mapped to the `run()` function of `docs/generate.py`.

Documentation pages will be generated in the `docs/generated` folder. The `test_docs.py` test in pytest will automatically update generated content. This updates must be manually committed before docs tests will pass.
Expand Down
36 changes: 36 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# PyAirbyte Frequently asked Questions

**1. Does PyAirbyte replace Airbyte?**

No. PyAirbyte is a Python library that allows you to use Airbyte connectors in Python but it does
not have orchestration or scheduling capabilities, nor does is provide logging, alerting, or other
features for managing data pipelines in production. Airbyte is a full-fledged data integration
platform that provides connectors, orchestration, and scheduling capabilities.

**2. What is the PyAirbyte cache? Is it a destination?**

Yes and no. You can think of it as a built-in destination implementation, but we avoid the word
"destination" in our docs to prevent confusion with our certified destinations list
[here](https://docs.airbyte.com/integrations/destinations/).

**3. Does PyAirbyte work with data orchestration frameworks like Airflow, Dagster, and Snowpark,
etc.?**

Yes, it should. Please give it a try and report any problems you see. Also, drop us a note if works
for you!

**4. Can I use PyAirbyte to develop or test when developing Airbyte sources?**

Yes, you can. PyAirbyte makes it easy to test connectors in Python, and you can use it to develop
new local connectors as well as existing already-published ones.

**5. Can I develop traditional ETL pipelines with PyAirbyte?**

Yes. Just pick the cache type matching the destination - like SnowflakeCache for landing data in
Snowflake.

**6. Can PyAirbyte import a connector from a local directory that has python project files, or does
it have to be installed from PyPi?**

Yes, PyAirbyte can use any local install that has a CLI - and will automatically find connectors b
name if they are on PATH.
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,8 @@ coverage-reset = { shell = "coverage erase" }

check = { shell = "ruff check . && mypy . && pytest --collect-only -qq" }

docs-generate = {env = {PDOC_ALLOW_EXEC = "1"}, shell = "generate-docs && open docs/generated/index.html" }

fix = { shell = "ruff format . && ruff check --fix -s || ruff format ." }
fix-unsafe = { shell = "ruff format . && ruff check --fix --unsafe-fixes . && ruff format ." }
fix-and-check = { shell = "poe fix && poe check" }
Expand Down

0 comments on commit 1bcb440

Please sign in to comment.