diff --git a/airbyte-integrations/connectors/source-google-sheets/source_google_sheets/spec.yaml b/airbyte-integrations/connectors/source-google-sheets/source_google_sheets/spec.yaml index 7a060c3329bc..f80b54da507a 100644 --- a/airbyte-integrations/connectors/source-google-sheets/source_google_sheets/spec.yaml +++ b/airbyte-integrations/connectors/source-google-sheets/source_google_sheets/spec.yaml @@ -18,7 +18,7 @@ connectionSpecification: row_batch_size: type: integer title: Row Batch Size - description: The number of rows fetched when making a Google Sheet API call. Defaults to 200. + description: The number of rows fetched when making a Google Sheet API call. Defaults to 200. You can increase this value according to your needs to avoid rate limits if your data is particularly wide. default: 200 examples: - 50 diff --git a/docs/integrations/sources/google-sheets.inapp.md b/docs/integrations/sources/google-sheets.inapp.md index cfb62877c043..2f470fb83644 100644 --- a/docs/integrations/sources/google-sheets.inapp.md +++ b/docs/integrations/sources/google-sheets.inapp.md @@ -1,6 +1,6 @@ ## Prerequisites - -- Access to a Google Sheet +- Spreadsheet Link - The link to the Google spreadsheet you want to sync. +- A Google Workspace user with access to the spreadsheet :::info The Google Sheets source connector pulls data from a single Google Sheets spreadsheet. To replicate multiple spreadsheets, set up multiple Google Sheets source connectors in your Airbyte instance. @@ -13,10 +13,10 @@ The Google Sheets source connector pulls data from a single Google Sheets spread - **(Recommended)** To authenticate your Google account via OAuth, click **Sign in with Google** and complete the authentication workflow. - To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22). 3. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**. -4. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200. +4. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200. You can increase this value according to your needs to avoid rate limits if your data is particularly wide. +5. For **Convert Column Names to SQL-Compliant Format**, enable to use the conversion of column names to a standardized, SQL-compliant format. For example, 'My Name' -> 'my_name'. Enable this option if your destination is SQL-based. ### Google Sheets format requirements -- Sheet names and column headers must only contain alphanumeric characters or `_`, as specified in the [**Airbyte Protocol**](../../understanding-airbyte/airbyte-protocol.md). For example, if your sheet or column header is named `the data`, rename it to `the_data`. This restriction does not apply to non-header cell values. - Airbyte only supports replicating [Grid](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#SheetType) sheets. For detailed information on supported sync modes, supported streams, performance considerations, refer to the full documentation for [Google Sheets](https://docs.airbyte.com/integrations/sources/google-sheets/). diff --git a/docs/integrations/sources/google-sheets.md b/docs/integrations/sources/google-sheets.md index 4a12b395fa5a..f8e46b42e524 100644 --- a/docs/integrations/sources/google-sheets.md +++ b/docs/integrations/sources/google-sheets.md @@ -8,7 +8,15 @@ The Google Sheets source connector pulls data from a single Google Sheets spread ### Prerequisites - Spreadsheet Link - The link to the Google spreadsheet you want to sync. -- Authentication - Credentials for connecting to the Google Sheets API. + +- **For Airbyte Cloud** A Google Workspace user with access to the spreadsheet + + +- **For Airbyte Open Source:** + - A GCP project + - Enable the Google Sheets API in your GCP project + - Service Account Key with access to the Spreadsheet you want to replicate + ## Setup guide @@ -26,8 +34,8 @@ To set up Google Sheets as a source in Airbyte Cloud: - **(Recommended)** To authenticate your Google account via OAuth, click **Sign in with Google** and complete the authentication workflow. - To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22). 6. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**. -7. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200. -7. For **Columns Name Conversion**, if set as true column names will be transformed. Examples: My_Name -> my_name, My Name -> my_name, MyName -> my_name, My123name -> my_123_name, привіт світ -> privit_svit. +7. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200. You can increase this value according to your needs to avoid rate limits if your data is particularly wide. +8. For **Convert Column Names to SQL-Compliant Format**, enable to use the conversion of column names to a standardized, SQL-compliant format. For example, 'My Name' -> 'my_name'. Enable this option if your destination is SQL-based. @@ -50,15 +58,12 @@ To set up Google Sheets as a source in Airbyte Open Source: - To authenticate your Google account via OAuth, enter your Google application's [client ID, client secret, and refresh token](https://developers.google.com/identity/protocols/oauth2). - To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22). 6. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**. -7. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200. -8. For **Columns Name Conversion**, if set as true column names will be transformed. Examples: My_Name -> my_name, My Name -> my_name, MyName -> my_name, My123name -> my_123_name, привіт світ -> privit_svit. +7. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200. You can increase this value according to your needs to avoid rate limits if your data is particularly wide. +8. For **Convert Column Names to SQL-Compliant Format**, enable to use the conversion of column names to a standardized, SQL-compliant format. For example, 'My Name' -> 'my_name'. Enable this option if your destination is SQL-based. ### Output schema Each sheet in the selected spreadsheet is synced as a separate stream. Each selected column in the sheet is synced as a string field. - :::note - Sheet names and column headers must contain only alphanumeric characters or `_`, as specified in the** [**Airbyte Protocol**](../../understanding-airbyte/airbyte-protocol.md). For example, if your sheet or column header is named `the data`, rename it to `the_data`. This restriction does not apply to non-header cell values. - ::: Airbyte only supports replicating [Grid](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#SheetType) sheets. @@ -82,14 +87,13 @@ The Google Sheets source connector supports the following sync modes: The [Google API rate limit](https://developers.google.com/sheets/api/limits) is 100 requests per 100 seconds per user and 500 requests per 100 seconds per project. Airbyte batches requests to the API in order to efficiently pull data and respects these rate limits. We recommended not using the same service user for more than 3 instances of the Google Sheets source connector to ensure high transfer speeds. ## Troubleshooting -- Data changes in spreadsheets. If your sheet becomes empty, deleted, etc., its synchronization will be stopped. You will see the corresponding extended logs about this problem. -- Permissions. The connector cannot synchronize tables to which you do not have proper permissions. +- Data changes in spreadsheets. If your sheet is completely empty(no header rows) or deleted, Airbyte will stop attempting to sync it until it comes back. If this happens, the sync logs will contain a message saying the sheet has been skipped when syncing the full spreadsheet. ## Changelog | Version | Date | Pull Request | Subject | |---------|------------|----------------------------------------------------------|-----------------------------------------------------------------------------------| -| 0.3.6 | 2023-08-16 | | Update to latest CDK | +| 0.3.6 | 2023-08-16 | [29491](https://github.com/airbytehq/airbyte/pull/29491) | Update to latest CDK | | 0.3.5 | 2023-08-16 | [29427](https://github.com/airbytehq/airbyte/pull/29427) | Add stop reading in case of 429 error | | 0.3.4 | 2023-05-15 | [29453](https://github.com/airbytehq/airbyte/pull/29453) | Update spec descriptions | | 0.3.3 | 2023-08-10 | [29327](https://github.com/airbytehq/airbyte/pull/29327) | Add user-friendly error message for 404 and 403 error while discover |