New Source: Google Workspace Email Audit API #33159
Replies: 1 comment
-
@collinscangarella
Yes! As long as it's idempotent i.e: google doesn't care if you ask for it multiple times it should work fairly well. You are pointing out the right concerns:
As much as possible, you want the connector to be "checkpoint-able", so the smaller batches you can split things into the better. However, this may come at the cost of performance, so you'll have to find the sweet spot. In some cases like FB marketing, there is some auto correction going on where if a job asking for too large a date range is faulty, it'll automatically make the range smaller etc.. but honestly I wouldn't worry about that level of optimization unless you know it's needed |
Beta Was this translation helpful? Give feedback.
-
Tell us about the new connector you’d like to have
Hello. I'm planning on building a new source connector for the current version of Google Workspace's Email Audit API.
That being said, the paradigm is different from the majority of other APIs so I need a bit of help in understanding the best way to do this with Airbyte. Here are the requirements:
POST
to the API to create a the export request (including parameters which specify thebeginDate
andendDate
of the data to be exported)GET
the status of the export requestCOMPLETED
mbox
file intojson
.Steps 5 and 6 should be relatively easy to do in the
parse_response
function (I'll be using the python SDK to build this connector). I recalled that Amazon Seller's Reporting API has the same workflow and checked into your code for that; it looks like you handle steps 1-3 in theread_records
function and basically just sleep until the desired status is reached. Is this the recommended approach? Is there anything that I should know about this approach, any potential pitfalls that I should be aware of? I'm assuming the trick to this approach is to ensure that the request doesn't take too long to complete and that more small requests will be easier on our K8s cluster than fewer large requests.Describe the context around this new connector
We're planning on syncing the mailbox download endpoint daily in order to better understand how employees are communicating with customers and sales prospects.
Describe the alternative you are considering or using
We'd likely just write it using an asynchronous task framework like celery and trigger it via airflow.
Are you willing to submit a PR?
Yep.
Beta Was this translation helpful? Give feedback.
All reactions