-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Adding method to generate an example config based on the connector spec. #80
Conversation
@tinomerl - Thanks for creating this PR and the related issue! I'll be reviewing this afternoon and will try to have a review posted tonight or tomorrow. |
@aaronsteers , thanks for the response! It's just a quick wrapper for, hopefully, lowering the entry barrier for generating a config. The method could potentially be more sophisticated and taking the user a bit more by the hand. But that's just an idea for the moment. Let me know if there's anything to improve or do in a better way, I'm eager to help! :) |
Hi, @tinomerl. Thanks for your patience in my being back to you. I've looked at the code and I am not sure if this would generate an example config, or if it would return the JSON schema spec that is expected. Do you have a sample output from running? Either way, I think both are potentially helpful:
Related to this, yesterday I was speaking with Paul, creator of Airbyte Serverless and he was leveraging Octavia CLI code to generate sample config in the form of a Yaml document. I haven't looked deeply into that approach but it seems like another potentially good path forward. (Whether rendered as Yaml or JSON is less of a concern, but would be good to understand the prior art used there.) If you have thoughts and especially if you can provide sample outputs, that would be appreciated. I'll plan to spend a bit more time on this and will report back this coming week. Cc @marcosmarxm |
Hey @aaronsteers, no worries. Currently it generates a config by fetching the JSON Spec. It is then returned to the Terminal or as a file. So it's more like your first point. Here's an example JSON from the Faker source
I looked into the repo for airbyte_serverless. If it's this one. In it's core jinja templates are used to generate a config which is then rendered and saved as a yaml file with a lot of comments showing which part is required/optional or part of Example config for Faker source
Example config for file source
```
source:
docker_image: "airbyte/source-file:0.4.0" # GENERATED | string | A Public Docker Airbyte Source. Example: `airbyte/source-faker:0.1.4`. (see connectors list at: "https://hub.docker.com/search?q=airbyte%2Fsource-" )
config: # PREGENERATED | object | PLEASE UPDATE this pre-generated config by following the documentation https://docs.airbyte.com/integrations/sources/file
dataset_name: # REQUIRED | string | The Name of the final table to replicate this file into (should include letters, numbers dash and underscores only).
format: "csv" # REQUIRED | string | The Format of the file which should be replicated (Warning: some formats may be experimental, please refer to the docs).
reader_options: # OPTIONAL | string | This should be a string in JSON format. It depends on the chosen file format to provide additional options and tune its behavior. | Examples: {},] }
url: # REQUIRED | string | The URL path to access the file which should be replicated. | Examples: https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv, gs://my-google-bucket/data.csv, s3://gdelt-open-data/events/20190914.export.csv
provider:
## -------- Pick one valid structure among the examples below: --------
storage: "HTTPS" # REQUIRED | string
user_agent: ""# OPTIONAL | boolean | Add User-Agent to request
## -------- Another valid structure for provider: --------
# storage: "GCS" # REQUIRED | string
# service_account_json: destination: remote_runner:
|
@tinomerl - Thank you for your patience. We've discussed internally how to represent this to the users, and I think we would want to expose this as a simple dictionary We are planning to pick this up this week. |
@aaronsteers, thanks for the message and the info. No worries about the silence. |
@tinomerl - Thanks for this feedback. Can you take a look at the updates in this PR here: I've applied the changes as mentioned above - renaming the method to be clear it is generating the spec, but not exactly a perfect example, and decoupling the print from the property output itself. I've also kept the proposed capability from here to optionally specify an output file. If this looks good to you, we can continue from that PR. Either way, I'll make sure to call out your valued contribution in our release notes. |
@aaronsteers just looked at the PR and gotta say i like it. Including the filepath as a parameter and the general naming is a bit more straightforward. I'm taking notes. We can continue in the new PR then. Do you need anything else from me? :) |
@tinomerl - Thanks for the review and feedback! To preserve your name as original author and contributor on our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for merge. Thanks again for contributing to PyAirbyte. 🚀
Proposed fix for #79 . Adds a new method which is a bit more straightforward for generating a config.
All the tests were successful, except for the snowflake ones. I currently don't have access to a snowflake instance.