An ETL (Extract, Transform, Load) framework.
In the virtual environment for the project you’d like to use Aye Aye in, run:-
pip install ayeaye
Use Pipenv to manage a python virtual environment and package management0
pipenv shell
pipenv install ayeaye
Within the environment created by pipenv above, run one of the examples:-
curl "https://raw.githubusercontent.com/Aye-Aye-Dev/AyeAye/master/examples/poisonous_animals.py" \
--output poisonous_animals.py
mkdir data
curl https://raw.githubusercontent.com/Aye-Aye-Dev/AyeAye/master/examples/data/poisonous_animals.json \
--output data/poisonous_animals.json
python poisonous_animals.py
This model takes a small input dataset of animals and collates them by the country they are found. It doesn't write to a dataset, it just outputs a log. The log for this example contains the name of the country and the animals found there.
There are more examples in the Aye-Aye-Recipes git repo.
An Aye Aye ETL model inherits from ayeaye.model
and uses class level variables to declare connectors to the data it acts on.
Example:-
import ayeaye
class PoisonousAnimals(ayeaye.Model):
poisonous_animals = ayeaye.Connect(engine_url='json://data/poisonous_animals.json')
When instantiated, self.poisonous_animals
will be a dataset that ETL operations can be done with.
The engine_url
parameter passed to ayeaye.Connect
is specifying the dataset type JSON in this case) and exact location for the data (data/poisonous_animals.json
is a relative file path).
Instead of engine_url
you could also specify a ref
and this uses the data catalogue to lookup the engine_url
. (TODO this feature is coming soon!). When used this way, ayeaye.Connect
is responsible for resolving the ref
to an engine_url
and passing this to a subclass of ayeaye.connectors.base.DataConnector
which can read and maybe write this data type.
Ensure the working directory is the base Aye Aye directory (i.e. the same directory as the Pipfile):
pipenv install --dev
export PYTHONPATH=`pwd`/lib
pipenv run python -m unittest discover
To use the latest code in editable mode-
pipenv install -e git+https://github.com/Aye-Aye-Dev/AyeAye#egg=ayeaye
When venv
is being used, add this line to requirements.txt
-
git+https://github.com/Aye-Aye-Dev/AyeAye#egg=ayeaye
Extra dependencies for API usage within Aye-aye models can be installed like this:
pipenv install "ayeaye[api]"
or multiple
pipenv install "ayeaye[api,aws,compression]"
Label | Functionality |
---|---|
api | Restful JSON via http(s) |
aws | File based connectors can use Amazon Web Service S3 file storage |
compression | On the fly compression for file based connectors |
Aye Aye is distributed under the terms of the Apache License 2.0 and Copyright Progressive Logic Limit 2021 and onwards.