Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add Cloud Interop and Robust Secrets Management #143

Merged
merged 120 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from 103 commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
b9e8317
`poetry add airbyte-api`
aaronsteers Mar 25, 2024
93abc6f
install `airbyte-api` from remote branch
aaronsteers Mar 25, 2024
4d64326
import and revise code from https://github.com/airbytehq/airbyte/pull…
aaronsteers Mar 25, 2024
4f4a596
add created/delete integration test for sources and destinations
aaronsteers Mar 26, 2024
72d5226
fix tests
aaronsteers Mar 26, 2024
68e9df1
remove type hints
aaronsteers Mar 26, 2024
90c17b5
add tests to add/delete connections (passing)
aaronsteers Mar 26, 2024
1740374
fixes
aaronsteers Mar 26, 2024
7585106
fix missing bigquery cache import
aaronsteers Mar 26, 2024
e571903
add placeholder deployment ids
aaronsteers Mar 26, 2024
ee8057b
add missing copyright msg
aaronsteers Mar 26, 2024
9712855
rename test file
aaronsteers Mar 26, 2024
a42d760
use constant
aaronsteers Mar 26, 2024
c077081
working deploy methods and tests
aaronsteers Mar 26, 2024
9440bc7
remove commented code
aaronsteers Mar 26, 2024
44cf945
remove deploy key
aaronsteers Mar 26, 2024
288e3df
fix extra import
aaronsteers Mar 26, 2024
4038444
fix lint issues, skip hanging test
aaronsteers Mar 26, 2024
4c76c78
working run sync
aaronsteers Mar 26, 2024
0796219
improve error and timeout handling
aaronsteers Mar 26, 2024
ee2da5e
rename arg to 'wait'
aaronsteers Mar 26, 2024
3ab045a
tidy up, add comments
aaronsteers Mar 26, 2024
a5356d8
remove defaults in low-level functions
aaronsteers Mar 26, 2024
d1c4f3f
implement logs lookup
aaronsteers Mar 26, 2024
142d7f6
remove extra import
aaronsteers Mar 26, 2024
24c462c
fix get_destination()
aaronsteers Mar 30, 2024
6e7f635
split destination util, add destinations module
aaronsteers Mar 30, 2024
f548007
remove unused
aaronsteers Mar 30, 2024
12170f3
add read-from-destination-cache scaffold and failing test
aaronsteers Apr 1, 2024
4aaedbf
format fix
aaronsteers Apr 4, 2024
b616fa3
implement missing parts
aaronsteers Apr 4, 2024
13dce05
Merge remote-tracking branch 'origin/main' into aj/feat/add-airbyte-a…
aaronsteers Apr 4, 2024
07f5549
adapt nullable credentials
aaronsteers Apr 4, 2024
0da393b
fix pydantic "|" compat issue
aaronsteers Apr 4, 2024
ad3378e
rename exceptions
aaronsteers Apr 4, 2024
b0f8996
update exception names
aaronsteers Apr 4, 2024
a22c36e
improve deploy_connection()
aaronsteers Apr 4, 2024
c7967ac
use standard inserts option for the bigquery dest
aaronsteers Apr 4, 2024
2b38740
add test for read-from-destination
aaronsteers Apr 4, 2024
4470603
remove redundant conftest import
aaronsteers Apr 4, 2024
0983dd1
snowflake passing tests 🎉
aaronsteers Apr 6, 2024
98f376b
add motherduck test
aaronsteers Apr 6, 2024
87b3876
fix name
aaronsteers Apr 6, 2024
10407f5
improve tests, add parameterized test fixture for deployable cache
aaronsteers Apr 6, 2024
28bd489
handle table prefix and stream names
aaronsteers Apr 6, 2024
b209ea3
parameterize read test
aaronsteers Apr 6, 2024
c7996e6
use constant for secret name
aaronsteers Apr 7, 2024
f10a84c
add `is_interactive()` check
aaronsteers Apr 7, 2024
e18271d
use constant
aaronsteers Apr 7, 2024
787b7b0
fix bigquery region
aaronsteers Apr 7, 2024
cac40dd
add caching for SyncResult cache
aaronsteers Apr 7, 2024
37389dd
re-implement secrets management
aaronsteers Apr 7, 2024
be7c044
improve fixtures
aaronsteers Apr 7, 2024
5b3bcaf
improved tests
aaronsteers Apr 7, 2024
0863a13
rename enum
aaronsteers Apr 7, 2024
3808fd7
add `ci_credentials` dev dependency
aaronsteers Apr 7, 2024
7ee3c72
chore: require pytest marks to be declared, add missing mark
aaronsteers Apr 7, 2024
6cbd360
add 'super_slow' pytest mark and skip these in ci
aaronsteers Apr 7, 2024
997d014
docs: update text in secrets readme
aaronsteers Apr 7, 2024
6467daa
Merge branch 'main' into aj/feat/add-airbyte-api-library
aaronsteers Apr 7, 2024
d93e867
Auto-fix lint and format issues
Apr 7, 2024
9ac8cfb
chore: install all extras in ci
aaronsteers Apr 7, 2024
e38d90e
lint: fix
aaronsteers Apr 7, 2024
36f5c84
move conftest into cloud subfolder
aaronsteers Apr 7, 2024
9337537
fix no-creds test filter
aaronsteers Apr 7, 2024
3f5f879
skip requires_creds tests on 3.9
aaronsteers Apr 7, 2024
dd360e1
chore: refactor fixture hierarchy
aaronsteers Apr 7, 2024
e62c6d4
chore: fix tests
aaronsteers Apr 7, 2024
5a92b57
fix: only use class name as last result
aaronsteers Apr 7, 2024
89402d1
fix secret manager names
aaronsteers Apr 7, 2024
29a19bf
declare new SecretString class
aaronsteers Apr 7, 2024
42ca509
apply SecretString to cache config
aaronsteers Apr 8, 2024
afdb236
refactor: add CloudConnection
aaronsteers Apr 8, 2024
b5a44de
avoid to_pandas on bigquery
aaronsteers Apr 8, 2024
700ea82
remove unnecessary tests
aaronsteers Apr 8, 2024
b55c467
change parent class of built-in secretmanagers
aaronsteers Apr 9, 2024
2fc8ce8
refactor: secrets module
aaronsteers Apr 9, 2024
7c98292
refactor: integration tests, remove get_ci_secret() functions
aaronsteers Apr 9, 2024
35d7bb1
refactor: remove legacy methods
aaronsteers Apr 9, 2024
9269627
refactor(secrets): fix circular refs
aaronsteers Apr 9, 2024
a01d31c
fix tests
aaronsteers Apr 9, 2024
e57de8c
fix more tests, add docstring with usage
aaronsteers Apr 9, 2024
aa049b1
remove `ci_credentials` library and `poetry lock`
aaronsteers Apr 9, 2024
8779198
lint auto-fixes
aaronsteers Apr 9, 2024
38c8a17
fix format
aaronsteers Apr 9, 2024
96975c5
fix mypy
aaronsteers Apr 9, 2024
d88f1ee
get all integtest secrets from GSM
aaronsteers Apr 9, 2024
9b373ac
remove bespoke ci secret manager class
aaronsteers Apr 9, 2024
56abe46
re-allow ci tests that need creds on 3.9
aaronsteers Apr 9, 2024
3decb32
fix trailing quote
aaronsteers Apr 9, 2024
2a86d7f
revert: remove ' --all-extras' flag
aaronsteers Apr 9, 2024
6af01bb
revert quotes
aaronsteers Apr 9, 2024
b5f5856
Apply suggestions from code review
aaronsteers Apr 9, 2024
5d4d876
apply suggestion
aaronsteers Apr 9, 2024
def5e2c
apply suggestion
aaronsteers Apr 9, 2024
3dad193
doc: add comment about `api_util` module
aaronsteers Apr 9, 2024
3b494e8
add skip condition on missing `GCP_GSM_CREDENTIALS` secret in integra…
aaronsteers Apr 9, 2024
9942199
Fix empty value for GCP_GSM_CREDENTIALS in pytest workflow
aaronsteers Apr 9, 2024
7ce967e
ci: fix test_cloud_api_util.py to use airbyte_cloud_api_root and airb…
aaronsteers Apr 9, 2024
ea1b5f7
fix: remove marks since they don't work on fixtures
aaronsteers Apr 9, 2024
681e5d0
re-order secrets submodules
aaronsteers Apr 9, 2024
f6ed93d
update docs and import submodules
aaronsteers Apr 9, 2024
d1c535e
fix links in docs, add missing get_connection() implementation
aaronsteers Apr 9, 2024
65d37e5
chore: don't commit docs zip
aaronsteers Apr 10, 2024
bcb598d
un-feat: Refactor delete methods in CloudConnection and CloudWorkspac…
aaronsteers Apr 10, 2024
f01e911
fix imports and docstring
aaronsteers Apr 10, 2024
b630f96
remove redundant submodule declarations in pdoc
aaronsteers Apr 10, 2024
5755bc7
fix pdoc rendering bug
aaronsteers Apr 10, 2024
03c6c5a
add badges
aaronsteers Apr 10, 2024
ad3e8ac
update code sample
aaronsteers Apr 10, 2024
e644ffb
update docstring
aaronsteers Apr 10, 2024
f2bac11
un-feat: make `deploy*()` methods private for now
aaronsteers Apr 10, 2024
251d4ad
chore: remove and ignore .DS_Store files
aaronsteers Apr 10, 2024
6d46801
feat: add `airbyte.cloud.experimental` module
aaronsteers Apr 10, 2024
8fd29df
add and fix tests for stream names and prefixes
aaronsteers Apr 10, 2024
33fa49b
clean up and fix tests
aaronsteers Apr 10, 2024
41dba0f
lint fix
aaronsteers Apr 10, 2024
c2f045a
fix test
aaronsteers Apr 10, 2024
ba011db
improve sync result properties, fix tests
aaronsteers Apr 10, 2024
ae261ff
pin `airbyte-api` to a specific commit
aaronsteers Apr 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
11 changes: 7 additions & 4 deletions .github/workflows/python_pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,11 @@ jobs:
# Job-specific step(s):
- name: Run Pytest (No-Creds)
env:
# Force this to an invalid value to ensure tests that no creds are required are run.
GCP_GSM_CREDENTIALS: "no-creds"
run: poetry run pytest -m "not requires_creds"
# Force this to a blank value.
GCP_GSM_CREDENTIALS: ""
run: >
poetry run pytest -m
"not requires_creds and not linting and not super_slow"

pytest:
name: Pytest (All, Python ${{ matrix.python-version }}, ${{ matrix.os }})
Expand Down Expand Up @@ -114,4 +116,5 @@ jobs:
- name: Run Pytest
env:
GCP_GSM_CREDENTIALS: ${{ secrets.GCP_GSM_CREDENTIALS }}
run: poetry run pytest -m "not linting"
run: >
poetry run pytest -m "not linting and not super_slow"
2 changes: 1 addition & 1 deletion .github/workflows/test-pr-command.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ jobs:
- name: Run Pytest
env:
GCP_GSM_CREDENTIALS: ${{ secrets.GCP_GSM_CREDENTIALS }}
run: poetry run pytest
run: poetry run pytest -m "not super_slow"

log-success-comment:
name: Append 'Success' Comment
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Directories and subdirectories called 'secrets' or '.secrets'
# Directories and subdirectories called '.secrets' and the top-level '/secrets' directory
.secrets
secrets
/secrets

# Virtual Environments
.venv
Expand Down
23 changes: 16 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,24 +29,34 @@ PyAirbyte can auto-import secrets from the following sources:
3. [Google Colab secrets](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75).
4. Manual entry via [`getpass`](https://docs.python.org/3.9/library/getpass.html).

_Note: Additional secret store options may be supported in the future. [More info here.](https://github.com/airbytehq/airbyte-lib-private-beta/discussions/5)_
_Note: You can also build your own secret manager by subclassing the `CustomSecretManager` implementation. For more information, see the `airbyte.secrets.CustomSecretManager` class definiton._

### Retrieving Secrets

```python
from airbyte import get_secret, SecretSource
import airbyte as ab

source = get_source("source-github")
source = ab.get_source("source-github")
source.set_config(
"credentials": {
"personal_access_token": get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"),
"personal_access_token": ab.get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"),
}
)
```

The `get_secret()` function accepts an optional `source` argument of enum type `SecretSource`. If omitted or set to `SecretSource.ANY`, PyAirbyte will search all available secrets sources. If `source` is set to a specific source, then only that source will be checked. If a list of `SecretSource` entries is passed, then the sources will be checked using the provided ordering.
By default, PyAirbyte will search all available secrets sources. The `get_secret()` function also accepts an optional `sources` argument of specific source names (`SecretSourceEnum`) and/or secret manager objects to check.

By default, PyAirbyte will prompt the user for any requested secrets that are not provided via other secret managers. You can disable this prompt by passing `prompt=False` to `get_secret()`.
By default, PyAirbyte will prompt the user for any requested secrets that are not provided via other secret managers. You can disable this prompt by passing `allow_prompt=False` to `get_secret()`.

For more information, see the `airbyte.secrets` module.

### Secrets Auto-Discovery

If you have a secret matching an expected name, PyAirbyte will automatically use it. For example, if you have a secret named `GITHUB_PERSONAL_ACCESS_TOKEN`, PyAirbyte will automatically use it when configuring the GitHub source.

The naming convention for secrets is as `{CONNECTOR_NAME}_{PROPERTY_NAME}`, for instance `SNOWFLAKE_PASSWORD` and `BIGQUERY_CREDENTIALS_PATH`.

PyAirbyte will also auto-discover secrets for interop with hosted Airbyte: `AIRBYTE_CLOUD_API_URL`, `AIRBYTE_CLOUD_API_KEY`, etc.

## Connector compatibility

Expand Down Expand Up @@ -120,7 +130,6 @@ Yes. Just pick the cache type matching the destination - like SnowflakeCache for
**6. Can PyAirbyte import a connector from a local directory that has python project files, or does it have to be pip install**
Yes, PyAirbyte can use any local install that has a CLI - and will automatically find connectors by name if they are on PATH.


## Changelog and Release Notes

For a version history and list of all changes, please see our [GitHub Releases](https://github.com/airbytehq/PyAirbyte/releases) page.
8 changes: 5 additions & 3 deletions airbyte/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
"""PyAirbyte brings Airbyte ELT to every Python developer.

.. include:: ../README.md
Expand All @@ -7,14 +8,14 @@
"""
from __future__ import annotations

from airbyte import caches, datasets, documents, exceptions, results, secrets, sources
from airbyte import caches, cloud, datasets, documents, exceptions, results, secrets, sources
from airbyte.caches.bigquery import BigQueryCache
from airbyte.caches.duckdb import DuckDBCache
from airbyte.caches.util import get_default_cache, new_local_cache
from airbyte.datasets import CachedDataset
from airbyte.records import StreamRecord
from airbyte.results import ReadResult
from airbyte.secrets import SecretSource, get_secret
from airbyte.secrets import SecretSourceEnum, get_secret
from airbyte.sources import registry
from airbyte.sources.base import Source
from airbyte.sources.registry import get_available_connectors
Expand All @@ -23,6 +24,7 @@

__all__ = [
# Modules
"cloud",
"caches",
"datasets",
"documents",
Expand All @@ -43,7 +45,7 @@
"CachedDataset",
"DuckDBCache",
"ReadResult",
"SecretSource",
"SecretSourceEnum",
"Source",
"StreamRecord",
]
Expand Down
4 changes: 2 additions & 2 deletions airbyte/_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def __init__(
The 'name' param is required if 'metadata' is None.
"""
if not name and not metadata:
raise exc.AirbyteLibInternalError(message="Either name or metadata must be provided.")
raise exc.PyAirbyteInternalError(message="Either name or metadata must be provided.")

self.name: str = name or cast(ConnectorMetadata, metadata).name # metadata is not None here
self.metadata: ConnectorMetadata | None = metadata
Expand Down Expand Up @@ -270,7 +270,7 @@ def _get_installed_version(
if not self.interpreter_path.exists():
# No point in trying to detect the version if the interpreter does not exist
if raise_on_error:
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Connector's virtual environment interpreter could not be found.",
context={
"interpreter_path": self.interpreter_path,
Expand Down
8 changes: 4 additions & 4 deletions airbyte/_processors/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def __init__(
self._expected_streams: set[str] | None = None
self.cache: CacheBase = cache
if not isinstance(self.cache, CacheBase):
raise exc.AirbyteLibInputError(
raise exc.PyAirbyteInputError(
message=(
f"Expected config class of type 'CacheBase'. "
f"Instead received type '{type(self.cache).__name__}'."
Expand Down Expand Up @@ -92,7 +92,7 @@ def register_source(
) -> None:
"""Register the source name and catalog."""
if not self._catalog_manager:
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Catalog manager should exist but does not.",
)
self._catalog_manager.register_source(
Expand Down Expand Up @@ -226,7 +226,7 @@ def _finalize_state_messages(
) -> None:
"""Handle state messages by passing them to the catalog manager."""
if not self._catalog_manager:
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Catalog manager should exist but does not.",
)
if state_messages and self._source_name:
Expand All @@ -251,7 +251,7 @@ def _get_stream_config(
) -> ConfiguredAirbyteStream:
"""Return the definition of the given stream."""
if not self._catalog_manager:
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Catalog manager should exist but does not.",
)

Expand Down
2 changes: 1 addition & 1 deletion airbyte/_processors/file/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def process_record_message(
batch_handle = self._new_batch(stream_name=stream_name)

if batch_handle.open_file_writer is None:
raise exc.AirbyteLibInternalError(message="Expected open file writer.")
raise exc.PyAirbyteInternalError(message="Expected open file writer.")

self._write_record_dict(
record_dict=StreamRecord.from_record_message(
Expand Down
16 changes: 8 additions & 8 deletions airbyte/_processors/sql/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ def _get_table_by_name(
query. To ignore the cache and force a refresh, set 'force_refresh' to True.
"""
if force_refresh and shallow_okay:
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Cannot force refresh and use shallow query at the same time."
)

Expand Down Expand Up @@ -453,7 +453,7 @@ def _ensure_compatible_table_schema(
]
if missing_columns:
if raise_on_error:
raise exc.AirbyteLibCacheTableValidationError(
raise exc.PyAirbyteCacheTableValidationError(
violation="Cache table is missing expected columns.",
context={
"stream_column_names": stream_column_names,
Expand Down Expand Up @@ -666,7 +666,7 @@ def _write_files_to_new_table(

# Pandas will auto-create the table if it doesn't exist, which we don't want.
if not self._table_exists(temp_table_name):
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Table does not exist after creation.",
context={
"temp_table_name": temp_table_name,
Expand Down Expand Up @@ -727,7 +727,7 @@ def _write_temp_table_to_final_table(
has_pks: bool = bool(self._get_primary_keys(stream_name))
has_incremental_key: bool = bool(self._get_incremental_key(stream_name))
if write_strategy == WriteStrategy.MERGE and not has_pks:
raise exc.AirbyteLibInputError(
raise exc.PyAirbyteInputError(
message="Cannot use merge strategy on a stream with no primary keys.",
context={
"stream_name": stream_name,
Expand Down Expand Up @@ -783,7 +783,7 @@ def _write_temp_table_to_final_table(
)
return

raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Write strategy is not supported.",
context={
"write_strategy": write_strategy,
Expand Down Expand Up @@ -843,9 +843,9 @@ def _swap_temp_table_with_final_table(
Databases that do not support this syntax can override this method.
"""
if final_table_name is None:
raise exc.AirbyteLibInternalError(message="Arg 'final_table_name' cannot be None.")
raise exc.PyAirbyteInternalError(message="Arg 'final_table_name' cannot be None.")
if temp_table_name is None:
raise exc.AirbyteLibInternalError(message="Arg 'temp_table_name' cannot be None.")
raise exc.PyAirbyteInternalError(message="Arg 'temp_table_name' cannot be None.")

_ = stream_name
deletion_name = f"{final_table_name}_deleteme"
Expand Down Expand Up @@ -909,7 +909,7 @@ def _get_column_by_name(self, table: str | Table, column_name: str) -> Column:
# Try to get the column in a case-insensitive manner
return next(col for col in table.c if col.name.lower() == column_name.lower())
except StopIteration:
raise exc.AirbyteLibInternalError(
raise exc.PyAirbyteInternalError(
message="Could not find matching column.",
context={
"table": table,
Expand Down
6 changes: 3 additions & 3 deletions airbyte/_processors/sql/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def _table_exists(
return False

except ValueError as ex:
raise exc.AirbyteLibInputError(
raise exc.PyAirbyteInputError(
message="Invalid project name or dataset name.",
context={
"table_id": table_id,
Expand Down Expand Up @@ -225,9 +225,9 @@ def _swap_temp_table_with_final_table(
ALTER TABLE my_schema.my_old_table_name RENAME TO my_new_table_name;
"""
if final_table_name is None:
raise exc.AirbyteLibInternalError(message="Arg 'final_table_name' cannot be None.")
raise exc.PyAirbyteInternalError(message="Arg 'final_table_name' cannot be None.")
if temp_table_name is None:
raise exc.AirbyteLibInternalError(message="Arg 'temp_table_name' cannot be None.")
raise exc.PyAirbyteInternalError(message="Arg 'temp_table_name' cannot be None.")

_ = stream_name
deletion_name = f"{final_table_name}_deleteme"
Expand Down
21 changes: 21 additions & 0 deletions airbyte/_util/api_duck_types.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
"""A set of duck-typed classes for working with the Airbyte API."""

from __future__ import annotations

from typing import TYPE_CHECKING, Protocol


if TYPE_CHECKING:
import requests


class AirbyteApiResponseDuckType(Protocol):
"""Used for duck-typing various Airbyte API responses."""

content_type: str
r"""HTTP response content type for this operation"""
status_code: int
r"""HTTP response status code for this operation"""
raw_response: requests.Response
r"""Raw HTTP response; suitable for custom response parsing"""
Loading
Loading