Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New docs for singer #30

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
82fce65
Add files via upload
ashhath Jul 18, 2017
2d2d3be
Update 01_README.md
ashhath Jul 18, 2017
eb35ab5
Update 01_README.md
ashhath Jul 18, 2017
54c99d4
Update 01_README.md
ashhath Jul 18, 2017
096e299
Add files via upload
ashhath Jul 18, 2017
94753ab
Delete SPEC.md
ashhath Jul 18, 2017
cf2d7dd
Delete SCHEMAS.md
ashhath Jul 18, 2017
4fc21d2
Delete README.md
ashhath Jul 18, 2017
8695f4d
Delete BEST_PRACTICES.md
ashhath Jul 18, 2017
e36a406
Delete PROPOSALS.md
ashhath Jul 18, 2017
0b98bf6
Update 01_README.md
ashhath Jul 18, 2017
f8574d4
Update 01_README.md
ashhath Jul 18, 2017
773c4b1
Update 01_README.md
ashhath Jul 18, 2017
4caf3cf
Update 01_README.md
ashhath Jul 18, 2017
7bfd713
updated stuff
Jul 24, 2017
ef9d652
all changes and links made
Jul 24, 2017
5203316
Rename 01_README.md to README.md
ashhath Jul 24, 2017
a8902aa
Rename 02_EXTRACT_WITH_TAPS.md to 01_EXTRACT_WITH_TAPS.md
ashhath Jul 24, 2017
e7fa864
Rename 03_SEND_TO_TARGETS.md to 02_SEND_TO_TARGETS.md
ashhath Jul 24, 2017
232386a
Rename 04_COOL_TAPS_CLUB.md to 03_COOL_TAPS_CLUB.md
ashhath Jul 24, 2017
e61e8d6
Rename 05_MAKE_IT_OFFICIAL.md to 04_MAKE_IT_OFFICIAL.md
ashhath Jul 24, 2017
f9be6d4
Rename 06_BEST_PRACTICES.md to 05_BEST_PRACTICES.md
ashhath Jul 24, 2017
b0ef164
Rename 07_SPEC.md to 06_SPEC.md
ashhath Jul 24, 2017
2a751ad
Rename 08_PROPOSALS.md to 07_PROPOSALS.md
ashhath Jul 24, 2017
4814f46
Rename 09_CODE_OF_CONDUCT.md to 08_CODE_OF_CONDUCT.md
ashhath Jul 24, 2017
8ddf81a
changed links one number back
Jul 24, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions 01_EXTRACT_WITH_TAPS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# 🍺 All about TAPS 🍺

## Taps extract data from any source and write that data to a standard stream in a JSON-based format.

Be Check out our [official](04_MAKE_IT_OFFICIAL.md) and [unofficial](03_COOL_TAPS_CLUB.md) pages before creating your own since it might save you some time in the long run.

### Making Taps

If a tap for your use case doesn't exist yet have no fear! This documentation will help. Let's get started:

### 👩🏽‍💻 👨🏻‍💻 Hello, world

A Tap is just a program, written in any language, that outputs data to `stdout` according to the [Singer spec](06_SPEC.md).

In fact, your first Tap can be written from the command line, without any programming at all:

```bash
› printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n'
```

This writes the datapoint `{"value":"world"}` to the *hello* stream along with a schema indicating that `value` is a string.

That data can be piped into any Target, like the [Google Sheets Target], over `stdin`:

```bash
› printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' | target-gsheet -c config.json
```

### 🐍🐍🐍 A Python Tap

To move beyond *Hello, world* you'll need a real programming language. Although any language will do, we have built a Python library to help you get up and running quickly. This is because Python is the defacto standard for data engineers or folks interested in moving data like yourself.

If you need help ramping up or getting started with Python there's fantastic community support [here](https://www.python.org/about/gettingstarted/).

Let's write a Tap called `tap_ip.py` that retrieves the current IP using icanhazip.com, and writes that data with a timestamp.

First, install the Singer helper library with `pip`:

```bash
› pip install singer-python
```

Then, open up a new file called `tap_ip.py` in your favorite editor.

```python
import singer
import urllib.request
from datetime import datetime, timezone
```

We'll use the `datetime` module to get the current timestamp, the
`singer` module to write data to `stdout` in the correct format, and
the `urllib.request` module to make a request to icanhazip.com.

```python
now = datetime.now(timezone.utc).isoformat()
schema = {
'properties': {
'ip': {'type': 'string'},
'timestamp': {'type': 'string', 'format': 'date-time'},
},
}

```

This sets up some of the data we'll need - the current time, and the
schema of the data we'll be writing to the stream formatted as a [JSON
Schema].

```python
with urllib.request.urlopen('http://icanhazip.com') as response:
ip = response.read().decode('utf-8').strip()
singer.write_schema('my_ip', schema, 'timestamp')
singer.write_records('my_ip', [{'timestamp': now, 'ip': ip}])
```

Finally, we make the HTTP request, parse the response, and then make
two calls to the `singer` library:

- `singer.write_schema` which writes the schema of the `my_ip` stream and defines its primary key
- `singer.write_records` to write a record to that stream

We can send this data to Google Sheets as an example by running our new Tap
with the Google Sheets Target:

```
› python tap_ip.py | target-gsheet -c config.json
```

Alternatively you could send it to a csv just as easy by doing this:

```
› python tap_ip.py | target-csv -c config.json
```

## To summarize the formula for pulling with a tap and sending to a target is:

```
› python YOUR_TAP_FILE.py -c TAP_CONFIG_FILE_HERE.json | TARGET-NAME -c TARGET_CONFIG_FILE_HERE.json
To summarize the formula for pulling with a tap and sending to a target is:

```
› python YOUR_TAP_FILE.py -c TAP_CONFIG_FILE_HERE.json | TARGET-TYPE -c TARGET_CONFIG_FILE_HERE.json
```

You might not always need config files, in which case it would just be:

```
› python YOUR_TAP_FILE.py | TARGET-NAME
```

More simply the formula is:
```
› python YOUR_TAP_FILE.py | TARGET-TYPE
```

This assumes your target is intalled locally. Which you can read more about by heading over to the [targets page](02_SEND_TO_TARGETS).
49 changes: 49 additions & 0 deletions 02_SEND_TO_TARGETS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# 🎯 All about TARGETS 🎯


## Targets are very similar to TAPS in that they still adhere to the Singer spec.

Right now there are targets made to send to a csv, Google Sheets, Magento, or Stitch but the possibilities are endless. To send your tap to a target here is an example using Google Sheets:


```bash
<<<<<<< HEAD
› tap-ip | target-gsheet -c config.json
```

Alternatively you could send it to a csv just as easy by doing this:

```bash
<<<<<<< HEAD
› tap-ip | target-csv -c config.json
```

To summarize the formula for pulling with a tap and sending to a target is:

```bash
<<<<<<< HEAD
› TAP-NAME -c TAP_CONFIG_FILE_HERE.json | TARGET-NAME -c TARGET_CONFIG_FILE_HERE.json
```

You might not always need config files, in which case it would just be:

```bash
<<<<<<< HEAD
› TAP-NAME | TARGET-NAME
```
See? Easy.

## If you'd like to create your own TARGET it's just like building a tap.

Essentially you consume the messages that a tap outputs and then the target determines what to do with it / where to send it.

Prior to it being package up, in your dev environment you would run it like this
```bash
› TAP-NAME -c TAP_CONFIG_FILE_HERE.json | python my_target.py -c TARGET_CONFIG_FILE_HERE.json
```

Once both tap and target are bundled as packages then you can install via pip or your other fav package system & then its this:

```bash
› TAP-NAME | TARGET-NAME
```
63 changes: 63 additions & 0 deletions 03_COOL_TAPS_CLUB.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## THE OFFICIAL (AND UNOFFICIAL) COLLECTION OF TAPS

### Defining Official and Unofficial

To make a tap or target official the Singer team has reviewed the work and it conforms to the best practices. Official means stamp of review & approval. It also means the repo is put into the Singer github organizaiton. For transparency the Singer team is comprised of Stitch employees. [Stitch](https://stitchdata.com)

Regardless of being official or unofficial you can still move or pull all the data you want. *This is why unofficial taps are so important and why we value them so much!*

### If you've created a tap or target be sure to simply submit a pull request to our unofficial table and show the world your work.
Also be sure to drop us a line so we can send you *EPIC SWAG* 🎁

Without further ado here are the unofficial taps:


| TYPE | NAME + REPO | USER 👨🏽‍💻 👩🏻‍💻 👑 |
| -------- |----------------------------------------------------------------------------|------------------------------------------------------------------|
| tap | [tap-csv](https://github.com/robertjmoore/tap-csv) | [robertjmoore](https://github.com/robertjmoore/) |
| tap | [tap-clubhouse](https://github.com/envoy/tap-clubhouse) | [envoy](https://github.com/envoy) |
| tap | [singer-airtable](https://github.com/StantonVentures/singer-airtable) | [StantonVentures](https://github.com/stantonventures) |
| tap | [tap-s3-csv](https://github.com/fishtown-analytics/tap-s3-csv) | [fishtown-analytics](https://github.com/fishtown-analytics) |
| tap | [tap-jsonfeed](https://github.com/briansloane/tap-jsonfeed) | [briansloane](https://github.com/briansloane/) |
| tap | [tap-csv](https://github.com/robertjmoore/tap-csv robertjmoore) | [robertjmoore](https://github.com/robertjmoore/) |
| tap | [tap-clubhouse](https://github.com/envoy/tap-clubhouse) | [envoy](https://github.com/envoy) |
| tap | [tap-shippo](https://github.com/robertjmoore/tap-shippo) | [robertjmoore](https://github.com/robertjmoore/) |
| tap | [singer-airtable](https://github.com/StantonVentures/singer-airtable) | [StantonVentures](https://github.com/stantonventures) |
| tap | [tap-s3-csv](https://github.com/fishtown-analytics/tap-s3-csv) | [fishtown-analytics](https://github.com/fishtown-analytics) |
| tap | [tap-jsonfeed](https://github.com/briansloane/tap-jsonfeed) | [briansloane](https://github.com/briansloane/) |
| tap | [tap-reviewscouk](https://github.com/onedox/tap-reviewscouk) | [ondex](https://github.com/onedox) |
| tap | [tap-fake-users](https://github.com/bengarvey/tap-fake-users) | [bengarvey](https://github.com/bengarvey) |
| tap | [tap-awin](https://github.com/onedox/tap-awin) | [onedox](https://github.com/onedox) |
| tap | [marvel-tap](https://github.com/ashhath/marvel-tap) | [ashhath](https://github.com/ashhath) |
| tap | [tap-mixpanel](https://github.com/Kierchon/tap-mixpanel) | [kierchon](https://github.com/kierchon) |
| tap | [tap-appsflyer](https://github.com/ezcater/tap-appsflyer) | [ezcater](https://github.com/ezcater) |
| tap | [tap-fullstory](https://github.com/expectedbehavior/tap-fullstory) | [expectedbehavior](https://github.com/expectedbehavior) |
| tap | [stitch-stream-deputy](https://github.com/DeputyApp/stitch-stream-deputy) | [deputyapp](https://github.com/deputyapp) |

And then just in case here's a tidy list of the official ones integrated with and supported by Stitch:

<<<<<<< HEAD
| TYPE | NAME + REPO | CONTRIBUTOR |
| -------- |-----------------------------------------------------------------------------|------------------------------------------------------------------|
| tap | [Hubspot](https://github.com/singer-io/tap-hubspot) | [Stitch Data](https://stitchdata.com) |
| tap | [Marketo](https://github.com/singer-io/tap-marketo) | [Stitch Data](https://stitchdata.com) |
| tap | [Shippo](https://github.com/singer-io/tap-shippo) | [Robert J Moore](https://github.com/robertjmoore/) |
| tap | [GitHub](https://github.com/singer-io/tap-github) | [Stitch Data](https://stitchdata.com) |
| tap | [Close.io](https://github.com/singer-io/tap-closeio) | [Stitch Data](https://stitchdata.com) |
| tap | [Referral SaaSquatch](https://github.com/singer-io/tap-referral-saasquatch) | [Stitch Data](https://stitchdata.com) |
| tap | [Freshdesk](https://github.com/singer-io/tap-freshdesk) | [Stitch Data](https://stitchdata.com) |
| tap | [Braintree](https://github.com/singer-io/tap-braintree) | [Stitch Data](https://stitchdata.com) |
| tap | [GitLab](https://github.com/singer-io/tap-gitlab) | [Stitch Data](https://stitchdata.com) |
| tap | [Wootric](https://github.com/singer-io/tap-wootric) | [Stitch Data](https://stitchdata.com) |
| tap | [Fixer.io](https://github.com/singer-io/tap-fixerio) | [Stitch Data](https://stitchdata.com) |
| tap | [Outbrain](https://github.com/singer-io/tap-outbrain) | [Fishtown Analytics](https://github.com/fishtown-analytics) |
| tap | [Harvest](https://github.com/singer-io/tap-harvest) | [Facet Interactive](https://github.com/facetinteractive) |
| tap | [Taboola](https://github.com/singer-io/tap-taboola) | [Fishtown Analytics](https://github.com/fishtown-analytics) |
| tap | [Facebook](https://github.com/singer-io/tap-facebook) | [Stitch Data](https://stitchdata.com) |
| tap | [Google AdWords](https://github.com/singer-io/tap-adwords) | [Stitch Data](https://stitchdata.com) |
| tap | [Fullstory](https://github.com/singer-io/tap-fullstory) | [Expected Behavior](https://github.com/expectedbehavior) |
| target | [Stitch](https://github.com/singer-io/target-stitch) | [Stitch Data](https://stitchdata.com) |
| target | [CSV](https://github.com/singer-io/target-csv) | [Stitch Data](https://stitchdata.com) |
| target | [Google Sheets](https://github.com/singer-io/target-gsheet) | [Stitch Data](https://stitchdata.com) |
| target | [Magento BI](https://github.com/robertjmoore/target-magentobi) | [Robert J Moore](https://github.com/robertjmoore/) |

18 changes: 18 additions & 0 deletions 04_MAKE_IT_OFFICIAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# BECOME OFFICIALLY COOL

So you've built a tap or a target have you? We think that's pretty groovy. To submit a tap for integration with Stitch an become official we ask that they follow a set standard. If you're interested in submitting to be an official tap we're mighty obliged and created a checklist so you can increase your chances of integration.

### Check out the [BEST PRACTICES](05_BEST_PRACTICES.md) doc which will have all the instructions and way more in depth details of the following:
- [ ] Your work has a `start_date` field in the config
- [ ] Your work accepts a `user_agent` field in the config
- [ ] Your work respects API rate limits
- [ ] Your work doesn't impose memory constraints
- [ ] Your dates are all in RFC3339 format
- [ ] All states are in date format
- [ ] All data is streamed in ascending order if possible
- [ ] Your work doesn't contain any sensitive info like API keys, client work, etc.
- [ ] Please keep your schemas stored in a schema folder
- [ ] You've tested your work
- [ ] Please run pylint on your work
- [ ] Your work shows metrics
- [ ] Message [@BrianSloan]([email protected]) or [@Ash_Hathaway]([email protected]) or reach out to them on [Slack](https://singer-slackin.herokuapp.com/) and let them know you'd like some swag, please.
3 changes: 1 addition & 2 deletions BEST_PRACTICES.md → 05_BEST_PRACTICES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
Best Practices for Building a Singer Tap
============================================
# BEST PRACTICES

Language
--------
Expand Down
68 changes: 67 additions & 1 deletion SPEC.md → 06_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Example:
SCHEMA messages describe the datatypes of data in the stream. They
must have the following properties:

- `schema` **Required**. A [JSON Schema] describing the
- `schema` **Required**. A [JSON Schema](http://json-schema.org/) describing the
`data` property of RECORDs from the same `stream`

- `stream` **Required**. The string name of the stream that this
Expand Down Expand Up @@ -188,3 +188,69 @@ should be a new MINOR version.

[JSON Schema]: http://json-schema.org/ "JSON Schema"
[Semantic Versioning]: http://semver.org/ "Semantic Versioning"



# Data Types and Schemas

JSON is used to represent data because it is ubiquitous, readable, and
especially appropriate for the large universe of sources that expose data
as JSON like web APIs. However, JSON is far from perfect:

- it has a limited type system, without support for common types like
dates, and no distinction between integers and floating point numbers

- while its flexibility makes it easy to use, it can also cause
compatibility problems

*Schemas* are used to solve these problems. Generally speaking, a schema
is anything that describes how data is structured. In Streams, schemas are
written by streamers in *SCHEMA* messages, formatted following the
[JSON Schema](http://json-schema.org/) spec.

Schemas solve the limited data types problem by providing more information
about how to interpret JSON's basic types. For example, the [JSON Schema]
spec distinguishes between `integer` and `number` types, where the latter
is appropriately interpretted as a floating point. Additionally, it
defines a string format called `date-time` that can be used to indicate
when a data point is expected to be a
[properly formatted](https://tools.ietf.org/html/rfc3339) timestamp
string.

Schemas mitigate JSON's compatibility problem by providing an easy way to
validate the structure of a set of data points. Streams deploys this
concept by encouraging use of only a single schema for each substream, and
validating each data point against its schema prior to persistence. This
forces the streamer author to think about how to resolve schema evolution
and compatibility questions, placing that responsibility as close to the
original data source as possible, and freeing downstream systems from
making uninformed assumptions to resolve these issues.

Schemas are required, but they can be defined in the broadest terms - a
JSON Schema of '{}' validates all data points. However, it is a best
practice for streamer authors to define schemas as narrowly as possible.

## Schemas in Stitch

The Stitch persister and Stitch API use schemas as follows:

- the Stitch persister fails when it encounters a data point that doesn't
validate against its stream's latest schema
- schemas must be an 'object' at the top level
- Stitch supports schemas with objects nested to any depth, and arrays of
objects nested to any depth - more info in the
[Stitch docs](https://www.stitchdata.com/docs/data-structure/nested-data-structures-row-count-impact)
- properties of type `string` and format `date-time` are converted to
the appropriate timestamp or datetime type in the destination database
- properties of type `integer` are converted to integer in the destination
database
- properties of type `number` are converted to decimal or numeric in the
destination database
- (soon) the `maxLength` parameter of a property of type `string` is used
to define the width of the corresponding varchar column in the
destination database
- when Stitch encounters a schema for a stream that is incompatible with
the table that stream is to be loaded into in the destination database,
it adds the data to the
[reject pile](https://www.stitchdata.com/docs/data-structure/identifying-rejected-records)

File renamed without changes.
Loading