Data Ingestion

Data ingestion is one of the main component of the pipeline. We have used PySpark, a big data pipeline, to read data using Avro or CSV schema. This has proved to be much more efficient than reading data using standard reader module. The current module support reading data from the local storage and from the SFTP server. Our next aim is to integrate S3 integration to data ingestion as well.

We have also created a custom data reading function which can be used outside the pipeline. This would facilitate researchers to read data much more quickly with the least amount of effort.

Custom data reading module

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Ingestion

Custom data reading module

RADAR Pipeline

Clone this wiki locally