Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/cicd initial implementation - part 2 #14

Merged
merged 9 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
371 changes: 50 additions & 321 deletions README.md

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions docs/assets/k_blue.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/assets/k_in_logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/assets/k_in_logo_white.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/assets/k_white.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/koheesio_logo.png
Binary file not shown.
File renamed without changes
31 changes: 31 additions & 0 deletions docs/assets/logo_total_white.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
66 changes: 49 additions & 17 deletions docs/community/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ There are a few guidelines that we need contributors to follow so that we are ab
## Getting Started

* Review our [Code of Conduct](https://github.com/Nike-Inc/nike-inc.github.io/blob/master/CONDUCT.md)
* Submit the [Individual Contributor License Agreement](https://www.clahub.com/agreements/Nike-Inc/fastbreak)
* Make sure you have a [GitHub account](https://github.com/signup/free)
* Submit a ticket for your issue, assuming one does not already exist.
* Clearly describe the issue including steps to reproduce when it is a bug.
Expand All @@ -18,14 +17,14 @@ There are a few guidelines that we need contributors to follow so that we are ab

* Create a feature branch off of `main` before you start your work.
* Please avoid working directly on the `main` branch.
* Setup the required package manager [poetry](#-package-manager)
* Setup the required package manager [hatch](#-package-manager)
* Setup the dev environment [see below](#-dev-environment-setup)
* Make commits of logical units.
* You may be asked to squash unnecessary commits down to logical units.
* Check for unnecessary whitespace with `git diff --check` before committing.
* Write meaningful, descriptive commit messages.
* Please follow existing code conventions when working on a file
* Make sure to check the standards on the code [see below](#-linting-and-standards)
* Make sure to check the standards on the code, [see below](#-linting-and-standards)
* Make sure to test the code before you push changes [see below](#-testing)

## 🤝 Submitting Changes
Expand All @@ -37,19 +36,39 @@ if it isn't showing any activity.
* Bug fixes or features that lack appropriate tests may not be considered for merge.
* Changes that lower test coverage may not be considered for merge.

### 📦 Package manager
### 🔨 Make commands

We use `make` for managing different steps of setup and maintenance in the project. You can install make by following
the instructions [here](https://formulae.brew.sh/formula/make)

We use `poetry` as our package manager.

Please DO NOT use pip or conda to install the dependencies. Instead, use poetry:
For a full list of available make commands, you can run:

```bash
make poetry-install
make help
```


### 📦 Package manager

We use `hatch` as our package manager.

> Note: Please DO NOT use pip or conda to install the dependencies. Instead, use hatch.

To install hatch, run the following command:
```console
make init
```

or,
```console
make hatch-install
```

This will install hatch using brew if you are on a Mac.

If you are on a different OS, you can follow the instructions [here]( https://hatch.pypa.io/latest/install/)


### 📌 Dev Environment Setup

To ensure our standards, make sure to install the required packages.
Expand All @@ -58,29 +77,42 @@ To ensure our standards, make sure to install the required packages.
make dev
```

This will install all the required packages for development in the project under the `.venv` directory.
Use this virtual environment to run the code and tests during local development.

### 🧹 Linting and Standards

We use `pylint`, `black` and `mypy` to maintain standards in the codebase
We use `ruff`, `pylint`, `isort`, `black` and `mypy` to maintain standards in the codebase.

Run the following two commands to check the codebase for any issues:

```bash
make check
```
This will run all the checks including pylint and mypy.

Make sure that the linter does not report any errors or warnings before submitting a pull request.
```bash
make fmt
```
This will format the codebase using black, isort, and ruff.

Make sure that the linters and formatters do not report any errors or warnings before submitting a pull request.

### 🧪 Testing

We use `pytest` to test our code. You can run the tests by running the following command:
We use `pytest` to test our code.

```bash
make test
```

Make sure that all tests pass before submitting a pull request.
You can run the tests by running one of the following commands:

## 🚀 Release Process
```bash
make cov # to run the tests and check the coverage
make all-tests # to run all the tests
make spark-tests # to run the spark tests
make non-spark-tests # to run the non-spark tests
```

At the moment, the release process is manual. We try to make frequent releases. Usually, we release a new version when we have a new feature or bugfix. A developer with admin rights to the repository will create a new release on GitHub, and then publish the new version to PyPI.
Make sure that all tests pass and that you have adequate coverage before submitting a pull request.

# Additional Resources

Expand Down
15 changes: 12 additions & 3 deletions docs/css/custom.css
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
:root {
--md-code-font: "Roboto Mono";
/* --md-primary-fg-color: #84A0C6; */
--md-primary-fg-color: #F8AE44;
/*--md-primary-fg-color: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);*/
/*--md-primary-fg-color: rgba(229,119,39,1);*/
/* --md-primary-fg-color: #E4AF68; */
--md-primary-fg-color--light: #FCFCFC;
--md-primary-fg-color--dark: #333;
/*--md-primary-fg-color--light: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);*/
/*--md-primary-fg-color--dark: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);*/
--md-default-fg-color: #111;
--md-default-fg-color--light: #000000d0;
--md-default-fg-color--lighter: #00000052;
Expand Down Expand Up @@ -57,4 +58,12 @@
.md-content a[href^="http"]:hover::after {
background-color: var(--md-accent-fg-color);
background-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path fill="rgb(255, 255, 255)" d="M18.25 15.5a.75.75 0 00.75-.75v-9a.75.75 0 00-.75-.75h-9a.75.75 0 000 1.5h7.19L6.22 16.72a.75.75 0 101.06 1.06L17.5 7.56v7.19c0 .414.336.75.75.75z"></path></svg>');
}

.md-header {
background: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);
}

.md-tabs {
background: none;
}
47 changes: 17 additions & 30 deletions docs/gen_ref_nav.py
Original file line number Diff line number Diff line change
@@ -1,46 +1,33 @@
# -----------------------------------------------------#
# Library imports #
# -----------------------------------------------------#
from pathlib import Path

import mkdocs_gen_files

# -----------------------------------------------------#
# Configuration #
# -----------------------------------------------------#
src_dir = "koheesio"
nav = mkdocs_gen_files.Nav()
mod_symbol = '<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code>'

# -----------------------------------------------------#
# Runner #
# -----------------------------------------------------#
""" Generate code reference pages and navigation
# Iterate over each Python file
for path in sorted(Path("src").rglob("*.py")):
module_path = path.relative_to("src").with_suffix("")
doc_path = path.relative_to("src/koheesio").with_suffix(".md")
full_doc_path = Path("api_reference", doc_path)

Based on the recipe of mkdocstrings:
https://github.com/mkdocstrings/mkdocstrings
parts = tuple(module_path.parts)

Credits:
Timothée Mazzucotelli
https://github.com/pawamoy
"""
# Iterate over each Python file
for path in sorted(Path(src_dir).rglob("*.py")):
# Get path in module, documentation and absolute
module_path = path.relative_to(src_dir).with_suffix("")
doc_path = path.relative_to(src_dir).with_suffix(".md")
full_doc_path = Path("koheesio", doc_path)

# Handle edge cases
parts = (src_dir,) + tuple(module_path.parts)
if parts[-1] == "__init__":
parts = parts[:-1]
doc_path = doc_path.with_name("index.md")
full_doc_path = full_doc_path.with_name("index.md")
elif parts[-1] == "__main__":
elif parts[-1].startswith("_"):
continue

# Write docstring documentation to disk via parser
nav_parts = [f"{mod_symbol} {part}" for part in parts]
nav[tuple(nav_parts)] = doc_path.as_posix()

with mkdocs_gen_files.open(full_doc_path, "w") as fd:
ident = ".".join(parts)
fd.write(f"::: {ident}")
# Update parser
mkdocs_gen_files.set_edit_path(full_doc_path, path)

mkdocs_gen_files.set_edit_path(full_doc_path, ".." / path)

with mkdocs_gen_files.open("api_reference/SUMMARY.txt", "w") as nav_file:
nav_file.writelines(nav.build_literate_nav())
32 changes: 20 additions & 12 deletions docs/tutorials/advanced-data-processing.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Advanced Data Processing with Koheesio

In this guide, we will explore some advanced data processing techniques using Koheesio. We will cover topics such as complex transformations, handling large datasets, and optimizing performance.
In this guide, we will explore some advanced data processing techniques using Koheesio. We will cover topics such as
complex transformations, handling large datasets, and optimizing performance.

## Complex Transformations

Koheesio provides a variety of built-in transformations, but sometimes you may need to perform more complex operations on your data. In such cases, you can create custom transformations.
Koheesio provides a variety of built-in transformations, but sometimes you may need to perform more complex operations
on your data. In such cases, you can create custom transformations.

Here's an example of a custom transformation that normalizes a column in a DataFrame:

```python
from pyspark.sql import DataFrame
from koheesio.steps.transformations import Transform

from koheesio.spark.transformations.transform import Transform

def normalize_column(df: DataFrame, column: str) -> DataFrame:
max_value = df.agg({column: "max"}).collect()[0][0]
Expand Down Expand Up @@ -42,15 +43,22 @@ class MyTask(EtlTask):
target = DeltaTableWriter(table="my_table", partitionBy=["column1", "column2"])
```

## Caching
Caching is another technique that can improve performance by storing the result of a transformation in memory, so it
doesn't have to be recomputed each time it's used. You can use the cache method to cache the result of a transformation.
[//]: # (## Caching)

```python
from koheesio.steps.transformations import CacheTransformation
[//]: # (Caching is another technique that can improve performance by storing the result of a transformation in memory, so it )

[//]: # (doesn't have to be recomputed each time it's used. You can use the cache method to cache the result of a transformation.)

class MyTask(EtlTask):
transformations = [NormalizeColumnTransform(column="my_column"), CacheTransformation()]
```
[//]: # ()
[//]: # (```python)

[//]: # (from koheesio.steps.transformations.cache import CacheTransformation)

[//]: # ()
[//]: # (class MyTask&#40;EtlTask&#41;:)

[//]: # ( transformations = [NormalizeColumnTransform&#40;column="my_column"&#41;, CacheTransformation&#40;&#41;])

[//]: # (```)

[//]: # ()
File renamed without changes.
Loading
Loading