Nike-Inc · dannymeijer · May 29, 2024 · May 24, 2024 · May 27, 2024 · May 27, 2024
@@ -7,7 +7,6 @@ There are a few guidelines that we need contributors to follow so that we are ab
 ## Getting Started
 
 * Review our [Code of Conduct](https://github.com/Nike-Inc/nike-inc.github.io/blob/master/CONDUCT.md)
-* Submit the [Individual Contributor License Agreement](https://www.clahub.com/agreements/Nike-Inc/fastbreak)
 * Make sure you have a [GitHub account](https://github.com/signup/free)
 * Submit a ticket for your issue, assuming one does not already exist.
     * Clearly describe the issue including steps to reproduce when it is a bug.
@@ -18,14 +17,14 @@ There are a few guidelines that we need contributors to follow so that we are ab
 
 * Create a feature branch off of `main` before you start your work.
     * Please avoid working directly on the `main` branch.
-* Setup the required package manager [poetry](#-package-manager)
+* Setup the required package manager [hatch](#-package-manager)
 * Setup the dev environment [see below](#-dev-environment-setup)
 * Make commits of logical units.
     * You may be asked to squash unnecessary commits down to logical units.
 * Check for unnecessary whitespace with `git diff --check` before committing.
 * Write meaningful, descriptive commit messages.
 * Please follow existing code conventions when working on a file
-* Make sure to check the standards on the code [see below](#-linting-and-standards)
+* Make sure to check the standards on the code, [see below](#-linting-and-standards)
 * Make sure to test the code before you push changes [see below](#-testing)
 
 ## 🤝 Submitting Changes
@@ -37,19 +36,39 @@ if it isn't showing any activity.
 * Bug fixes or features that lack appropriate tests may not be considered for merge.
 * Changes that lower test coverage may not be considered for merge.
 
-### 📦 Package manager
+### 🔨 Make commands
 
 We use `make` for managing different steps of setup and maintenance in the project. You can install make by following
 the instructions [here](https://formulae.brew.sh/formula/make)
 
-We use `poetry` as our package manager.
-
-Please DO NOT use pip or conda to install the dependencies. Instead, use poetry:
+For a full list of available make commands, you can run:
 
 ```bash
-make poetry-install
+make help
+```
+
+
+### 📦 Package manager
+
+We use `hatch` as our package manager.
+
+> Note: Please DO NOT use pip or conda to install the dependencies. Instead, use hatch.
+
+To install hatch, run the following command:
+```console
+make init
 ```
 
+or,
+```console
+make hatch-install
+```
+
+This will install hatch using brew if you are on a Mac. 
+
+If you are on a different OS, you can follow the instructions [here]( https://hatch.pypa.io/latest/install/)
+
+
 ### 📌 Dev Environment Setup
 
 To ensure our standards, make sure to install the required packages.
@@ -58,29 +77,42 @@ To ensure our standards, make sure to install the required packages.
 make dev
 ```
 
+This will install all the required packages for development in the project under the `.venv` directory.
+Use this virtual environment to run the code and tests during local development.
+
 ### 🧹 Linting and Standards
 
-We use `pylint`, `black` and `mypy` to maintain standards in the codebase
+We use `ruff`, `pylint`, `isort`, `black` and `mypy` to maintain standards in the codebase.
+
+Run the following two commands to check the codebase for any issues:
 
 ```bash
 make check
 ```
+This will run all the checks including pylint and mypy.
 
-Make sure that the linter does not report any errors or warnings before submitting a pull request.
+```bash
+make fmt
+```
+This will format the codebase using black, isort, and ruff.
+
+Make sure that the linters and formatters do not report any errors or warnings before submitting a pull request.
 
 ### 🧪 Testing
 
-We use `pytest` to test our code. You can run the tests by running the following command:
+We use `pytest` to test our code. 
 
-```bash
-make test
-```
 
-Make sure that all tests pass before submitting a pull request.
+You can run the tests by running one of the following commands:
 
-## 🚀 Release Process
+```bash
+make cov  # to run the tests and check the coverage
+make all-tests  # to run all the tests
+make spark-tests  # to run the spark tests
+make non-spark-tests  # to run the non-spark tests
+```
 
-At the moment, the release process is manual. We try to make frequent releases. Usually, we release a new version when we have a new feature or bugfix. A developer with admin rights to the repository will create a new release on GitHub, and then publish the new version to PyPI.
+Make sure that all tests pass and that you have adequate coverage before submitting a pull request.
 
 # Additional Resources
 

@@ -1,10 +1,11 @@
 :root {
   --md-code-font: "Roboto Mono";
   /* --md-primary-fg-color: #84A0C6; */
-  --md-primary-fg-color: #F8AE44;
+  /*--md-primary-fg-color: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);*/
+  /*--md-primary-fg-color: rgba(229,119,39,1);*/
   /* --md-primary-fg-color: #E4AF68; */
-  --md-primary-fg-color--light: #FCFCFC;
-  --md-primary-fg-color--dark: #333;
+  /*--md-primary-fg-color--light: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);*/
+  /*--md-primary-fg-color--dark: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);*/
   --md-default-fg-color: #111;
   --md-default-fg-color--light: #000000d0;
   --md-default-fg-color--lighter: #00000052;
@@ -57,4 +58,12 @@
 .md-content a[href^="http"]:hover::after {
   background-color: var(--md-accent-fg-color);
   background-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path fill="rgb(255, 255, 255)" d="M18.25 15.5a.75.75 0 00.75-.75v-9a.75.75 0 00-.75-.75h-9a.75.75 0 000 1.5h7.19L6.22 16.72a.75.75 0 101.06 1.06L17.5 7.56v7.19c0 .414.336.75.75.75z"></path></svg>');
+}
+
+.md-header {
+  background: linear-gradient(142deg, rgba(229,119,39,1) 3%, rgba(172,56,56,1) 31%, rgba(133,59,96,1) 51%, rgba(31,67,103,1) 79%, rgba(31,99,120,1) 94%, rgba(32,135,139,1) 100%);
+}
+
+.md-tabs {
+  background: none;
 }
@@ -1,46 +1,33 @@
-# -----------------------------------------------------#
-#                   Library imports                   #
-# -----------------------------------------------------#
 from pathlib import Path
 
 import mkdocs_gen_files
 
-# -----------------------------------------------------#
-#                    Configuration                    #
-# -----------------------------------------------------#
-src_dir = "koheesio"
+nav = mkdocs_gen_files.Nav()
+mod_symbol = '<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code>'
 
-# -----------------------------------------------------#
-#                       Runner                        #
-# -----------------------------------------------------#
-""" Generate code reference pages and navigation
+# Iterate over each Python file
+for path in sorted(Path("src").rglob("*.py")):
+    module_path = path.relative_to("src").with_suffix("")
+    doc_path = path.relative_to("src/koheesio").with_suffix(".md")
+    full_doc_path = Path("api_reference", doc_path)
 
-    Based on the recipe of mkdocstrings:
-    https://github.com/mkdocstrings/mkdocstrings
+    parts = tuple(module_path.parts)
 
-    Credits:
-    Timothée Mazzucotelli
-    https://github.com/pawamoy
-"""
-# Iterate over each Python file
-for path in sorted(Path(src_dir).rglob("*.py")):
-    # Get path in module, documentation and absolute
-    module_path = path.relative_to(src_dir).with_suffix("")
-    doc_path = path.relative_to(src_dir).with_suffix(".md")
-    full_doc_path = Path("koheesio", doc_path)
-
-    # Handle edge cases
-    parts = (src_dir,) + tuple(module_path.parts)
     if parts[-1] == "__init__":
         parts = parts[:-1]
         doc_path = doc_path.with_name("index.md")
         full_doc_path = full_doc_path.with_name("index.md")
-    elif parts[-1] == "__main__":
+    elif parts[-1].startswith("_"):
         continue
 
-    # Write docstring documentation to disk via parser
+    nav_parts = [f"{mod_symbol} {part}" for part in parts]
+    nav[tuple(nav_parts)] = doc_path.as_posix()
+
     with mkdocs_gen_files.open(full_doc_path, "w") as fd:
         ident = ".".join(parts)
         fd.write(f"::: {ident}")
-    # Update parser
-    mkdocs_gen_files.set_edit_path(full_doc_path, path)
+
+    mkdocs_gen_files.set_edit_path(full_doc_path, ".." / path)
+
+with mkdocs_gen_files.open("api_reference/SUMMARY.txt", "w") as nav_file:
+    nav_file.writelines(nav.build_literate_nav())
@@ -1,17 +1,18 @@
 # Advanced Data Processing with Koheesio
 
-In this guide, we will explore some advanced data processing techniques using Koheesio. We will cover topics such as complex transformations, handling large datasets, and optimizing performance.
+In this guide, we will explore some advanced data processing techniques using Koheesio. We will cover topics such as 
+complex transformations, handling large datasets, and optimizing performance.
 
 ## Complex Transformations
 
-Koheesio provides a variety of built-in transformations, but sometimes you may need to perform more complex operations on your data. In such cases, you can create custom transformations.
+Koheesio provides a variety of built-in transformations, but sometimes you may need to perform more complex operations 
+on your data. In such cases, you can create custom transformations.
 
 Here's an example of a custom transformation that normalizes a column in a DataFrame:
 
 ```python
 from pyspark.sql import DataFrame
-from koheesio.steps.transformations import Transform
-
+from koheesio.spark.transformations.transform import Transform
 
 def normalize_column(df: DataFrame, column: str) -> DataFrame:
     max_value = df.agg({column: "max"}).collect()[0][0]
@@ -42,15 +43,22 @@ class MyTask(EtlTask):
     target = DeltaTableWriter(table="my_table", partitionBy=["column1", "column2"])
 ```
 
-## Caching
-Caching is another technique that can improve performance by storing the result of a transformation in memory, so it 
-doesn't have to be recomputed each time it's used. You can use the cache method to cache the result of a transformation.
+[//]: # (## Caching)
 
-```python
-from koheesio.steps.transformations import CacheTransformation
+[//]: # (Caching is another technique that can improve performance by storing the result of a transformation in memory, so it )
 
+[//]: # (doesn't have to be recomputed each time it's used. You can use the cache method to cache the result of a transformation.)
 
-class MyTask(EtlTask):
-    transformations = [NormalizeColumnTransform(column="my_column"), CacheTransformation()]
-```
+[//]: # ()
+[//]: # (```python)
+
+[//]: # (from koheesio.steps.transformations.cache import CacheTransformation)
+
+[//]: # ()
+[//]: # (class MyTask&#40;EtlTask&#41;:)
+
+[//]: # (    transformations = [NormalizeColumnTransform&#40;column="my_column"&#41;, CacheTransformation&#40;&#41;])
+
+[//]: # (```)
 
+[//]: # ()