Thank you for contributing to SystemDS. 😄
The following are mostly guidelines, not rules. Use your best judgement, and feel free to propose changes to this doc.
Before contributing a pull request for review, let's make sure the changes are consistent with the guidelines and coding style.
- Inclusion of unit tests when contributing new features, will help
- prove that the code works correctly, and
- guard against future breaking changes.
- Formatting changes can be handled in a separate PR.
Example
bf4ba16b
- New features (e.g., a new cutting edge machine learning algorithm) typically will
live in
scripts/staging
or its equivalent folder for specific feature to get some airtime and sufficient testing before a decision is made regarding whether they are to migrated to the top-level. - When a new contribution is made to SystemDS, the maintenance burden is (by default) transferred to the SystemDS team. The benefit of the contribution is to be compared against the cost of maintaining the feature.
We suggest applying a code formatter to the written code. Generally, this is done automatically.
We have provided at profile for java located in Codestyle File ./dev/CodeStyle.eclipse.xml
. This can be loaded in most editors e.g.:
SystemDS project has linear history throughout. Rebase cleanly and verify that the commit-SHA's of the Apache Repo are not altered. A general guideline is never to change a commit inside the systemds official repo, which is the standard practice. If you have accidentally committed a functionality or tag, let others know. And create a new commit to revert the functionality of the previous commit but do not force push.
The tags can be used in combination to one another. These are the only tags available.
-
[MINOR]
: Small changesets with additional functionalityExamples:
This commit makes small software updates with refactoring
030fdab3
-[MINOR] Added package to R dependency; updated Docker test image
This commit cleans up the redundant code for simplification
f4fa5650
-[MINOR][SYSTEMDS-43] Cleanup scale builtin function (readability)
-
[SYSTEMDS-#]
: A changeset which will a specific purpose such as a bug, Improvement, New feature, etc. The tag value is found from SystemDS jira issue tracker. -
[DOC]
also[DOCS]
: Changes to the documentation -
[HOTFIX]
: Introduces changes into the already released versions.Example:
This commit fixes the corrupted language path
87bc3584
-[HOTFIX] Fix validation of scalar-scalar binary min/max operations
Protip: Addressing multiple jira issues in a single commit,
[SYSTEMDS-123,SYSTEMDS-124]
or[SYSTEMDS-123][SYSTEMDS-124]
A commit or PR description is a public record of what change is being made and why.
- A summary of what the changeset.
- A complete sentence, crafted as though it was an order.
- an imperative sentence
- Writing the rest of the description as an imperative is optional.
- Follow by an empty line.
It consists of the following.
- A brief description of the problem solved.
- Why this is the best approach?.
- Shortcomings to the approach, if any (important!).
Additional info
- background information
- bug/issue/jira numbers
- benchmark/test results
- links to design documents
- code attributions
- Include enough context for
- reviewers
- future readers to understand the Changes.
- Add PR number, like
Closes #1000
.
The following is a commit description with all the points mentioned.
Commit message:
[SYSTEMDS-418] Performance improvements lineage reuse probing/spilling
This patch makes some minor performance improvements to the lineage
reuse probing and cache put operations. Specifically, we now avoid
unnecessary lineage hashing and comparisons by using lists instead of
hash maps, move the time computations into the reuse path (to not affect
the code path without lineage reuse), avoid unnecessary branching, and
materialize the score of cache entries to avoid repeated computation
for the log N comparisons per add/remove/constaints operation.
For 100K iterations and ~40 ops per iteration, lineage tracing w/ reuse
improved from 41.9s to 38.8s (pure lineage tracing: 27.9s).
The following are some of the types of code changes with examples.
Commit message:
[SYSTEMDS-2603] New hybrid approach for lineage deduplication
This patch makes a major refactoring of the lineage deduplication
framework. This changes the design of tracing all the
distinct paths in a loop-body before the first iteration, to trace
during execution. The number of distinct paths grows exponentially
with the number of control flow statements. Tracing all the paths
in advance can be a huge waste and overhead.
We now trace an iteration during execution. We count the number of
distinct paths before the iterations start, and we stop tracing
once all the paths are traced. Tracing during execution fits
very well with our multi-level reuse infrastructure.
Refer to JIRA for detailed discussions.
Refactoring is a series of behaviour preserving changes with restructuring to existing code body. Refactoring does not alter the external behaviour or the output of a function, but the internal changes to the function to keep the code more organized, readable or to accomodate a more complex functionality.
Commit message:
[SYSTEMDS-2575] Fix eval function calls (incorrect pinning of inputs)
This patch fixes an issue of indirect eval function calls where wrong
input variable names led to missing pinning of inputs and thus too eager
cleanup of these variables (which causes crashes if the inputs are used
in other operations of the eval call).
The fix is simple. We avoid such inconsistent construction and
invocation of fcall instructions by using a narrower interface and
constructing the materialized names internally in the fcall.
Commit message:
[MINOR] Update docker images organization
Changes the docker images to use the docker organization systemds
add install dependency for R dbScan
Change the tests to use the new organizations docker images
Closes #1008
Protip: to reference other commits use first 7 letters of the commit SHA-1. eg.
1b81d8c
for referencing1b81d8cb19d8da6d865b7fca5a095dd5fec8d209
Including a license at the top of new files helps validate the consistency of license.
Examples:
- C/C++/cmake/cuda
- Python
- Java
- Bash
- XML/HTML
- dml/R
- Makefile/.proto
- Markdown - refer to the top of this file in raw format.
Note: Generated code, like the Java classes generated from the
.proto
file and other such files can be excluded if needed for license checks unless the code generation tools generate a new license header different from the required license.
Thanks again for taking your time to help improve SystemDS! 👍