Data Engineer, but I mainly work with/as Data/Dev/MLOps, so am I really a data engineer? Idk.
- NY Taxi Data & MLOps Pipeline: Automated data & MLOps pipeline leveraging Kubernetes and Apache Airflow. Integrates Spark, Kafka, and DBT with a focus on data quality. Tailors solutions for diverse user needs.
- Xbox Data Scraping & Analysis Pipeline: Automated data-driven project leveraging Python, Airflow, and GKE. Scrapes diverse data sources, providing insights into Xbox hardware and game data.
- Easy Expectations: A python package that abstracts away the complexity of
Great Expectations
and allow for easy no-knowledge-required implementation for basic use cases. - SchemaDiff: A python package that efficiently detects files with inconsistent schemas amidst thousands of files by reading the parquet files metadata.
- Order of The Template: A Python toolkit for parsing and processing YAML templates, capable of resolving Bash syntax environment variables and Jinja templating. It also offers schema validation functionality.