Skip to content

Commit

Permalink
Pre release breaking changes (#26)
Browse files Browse the repository at this point in the history
* rename torch->pytorch; add descriptions

* rename al rules yml->yaml

* add technology and references to all rules

* update readme; add contributing

* small changes to workflows

* codeowners
  • Loading branch information
GrosQuildu authored Jan 27, 2023
1 parent 0d7345f commit e3016e6
Show file tree
Hide file tree
Showing 48 changed files with 382 additions and 106 deletions.
18 changes: 0 additions & 18 deletions .github/workflows/main.yml

This file was deleted.

33 changes: 14 additions & 19 deletions .github/workflows/semgrep-rules-test.yml
Original file line number Diff line number Diff line change
@@ -1,29 +1,24 @@
name: semgrep-rules-test

on:
pull_request:
push:
branches:
- main

- main
jobs:
build:
name: run semgrep rules tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v1
with:
python-version: 3.9

- name: installation
run: |
python -m pip install --upgrade pip
python3 -m pip install semgrep
- name: tests
run: |
python -m semgrep --quiet --test python/
python -m semgrep --quiet --test rust/
python -m semgrep --quiet --test go/
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: installation
run: |
python -m pip install --upgrade pip
python3 -m pip install semgrep
- name: validations
run: python -m semgrep --validate --config .
- name: tests
run: python -m semgrep --test --test-ignore-todo
16 changes: 16 additions & 0 deletions .github/workflows/update-semgrep-registry.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: update-semgrep-registry
on:
push:
branches:
- main
jobs:
build:
name: Update semgrep.dev
runs-on: ubuntu-latest
steps:
- name: update dev.semgrep.dev
run: curl --fail -X POST -L https://dev.semgrep.dev/api/admin/update-registry
- name: update staging.semgrep.dev
run: curl --fail -X POST -L https://staging.semgrep.dev/api/admin/update-registry
- name: update semgrep.dev
run: curl --fail -X POST -L https://semgrep.dev/api/admin/update-registry
3 changes: 3 additions & 0 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
* @GrosQuildu
/go/ @hex0punk @GrosQuildu @Vasco-jofra
/python/ @suhacker1 @GrosQuildu @Vasco-jofra
96 changes: 96 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
Contributing to Trail of Bits Semgrep Rules
=========================

Thank you for your interest in contributing to ToB `semgrep-rules`!

The information below will help you set up a local development environment,
as well as performing common development tasks.

## Requirements

`semgrep-rules`'s only development environment requirement *should* be Python 3.7
or newer. Development and testing is actively performed on macOS and Linux,
but Windows and other supported platforms that are supported by Python
should also work.

## Development steps

First, clone this repository:

```bash
git clone https://github.com/trailofbits/semgrep-rules
cd semgrep-rules
```

Then [install semgrep CLI](https://semgrep.dev/docs/getting-started/), and you are good to start development.

### Linting

Currenty we don't use any linting tools. In the future we plan to use `yamlfmt`.

### Testing

You can run tests locally with:

```bash
semgrep --test --test-ignore-todo --metrics=off
```

To test a specific file:

```bash
semgrep --test --test-ignore-todo --metrics=off --config ./go/iterate-over-empty-map.yaml ./go/iterate-over-empty-map.go
```

### Development practices

Before publishing a new rule, or updating an existing one, make sure to review the checklist below:

- [ ] Add metadata. Semgrep [defines which metadata fields are required](https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository/#writing-a-rule-for-semgrep-registry)
- [ ] Add a non-standard `metadata.description` field. It will be used as a description in the `semgrep-rules` README table.
- For `metadata.references` provide a link to official documentation, Trail of Bits blogpost, GitHub issue, or some reputable website. Avoid linking to websites that may disappear in the future.

- [ ] Validate metadata against the official schema
- Download python validation script `wget https://raw.githubusercontent.com/returntocorp/semgrep-rules/develop/.github/scripts/validate-metadata.py`
- Download rules schema `wget https://raw.githubusercontent.com/returntocorp/semgrep-rules/develop/metadata-schema.yaml.schm`
- Run `python ./validate-metadata.py -s ./metadata-schema.yaml.schm -f .`

- [ ] Add tests
- [ ] At least one true positive (`ruleid: ` comment)
- [ ] At least one true negative (`ok: ` comment)
- Tests are allowed to crash when running them directly or to be meaningless
- However, try writing tests that can be compiled or parsed by the language interpreter
- The first few test cases should be easy to understand, the later should be more complex or check for edge-cases
- [ ] Make sure all tests pass, run `semgrep --test --test-ignore-todo --metrics=off`

- [ ] Run official semgrep lints with `semgrep --validate --metrics=off --config ./<new-rule>.yaml`

- [ ] Review style of the rules
- [ ] Use 2 spaces for indentation
- [ ] Use `>-` for multiline messages
- [ ] Use backticks in messages e.g., `$VAR`, `$FUNC`, `some.method()`
- The `languages` field in `[go, java]` format are preferable (not `- go \n -java`)

- [ ] Check amount of false-positives on some large public repositories

- [ ] Check performance - take a look at [r2c methodology](https://github.com/returntocorp/semgrep-rules/blob/main/tests/performance/test_public_repos.py)

- [ ] Add the new rules to the README
- Run `python ./rules_table_generator.py` to re-generate the table
- Manually check if the table was correctly generated

### Documentation

We don't provide any documentation for the rules. All information that you need to understand a rule is inside it. Semgrep documentation can be found [here](https://semgrep.dev/docs/).

### Releasing

**NOTE**: If you're a non-maintaining contributor, you don't need the steps
here! They're documented for completeness and for onboarding future maintainers.

We don't have a release cycle yet.

All changes to the repository's `main` branch are automatically pushed to the semgrep registry (with a GitHub action).

Modifying rule's filename, path, or ID will result in duplication of the rule in the registry.
This is a known issue, r2c team still works on resolving it.
88 changes: 58 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,33 +45,61 @@ $ semgrep --config /path/to/semgrep-rules/hanging-goroutine.yml -o leaks.txt'
## Rules
Rule ID | Language | What it Finds
--- | --- | ---
[anonymous-race-condition](go/anonymous-race-condition.yml) | Go | Race conditions within anonymous goroutines
[hanging-goroutine](go/hanging-goroutine.yml) | Go | Goroutine leaks
[iterate-over-empty-collection](go/iterate-over-empty-collection.yml) | Go | Iterations over empty collection
[nil-check-after-call](go/nil-check-after-call.yml) | Go | Possible nil dereferences
[invalid-usage-of-modified-variable](go/invalid-usage-of-modified-variable.yml) | Go | Possible unintentional assignment when an error occurs
[servercodec-readrequestbody-unhandled-nil](go/servercodec-readrequestbody-unhandled-nil.yml) | Go | Possible incorrect `ServerCodec` interface implementation
[string-to-int-signedness-cast](go/string-to-int-signedness-cast.yml) | Go | Integer underflows
[sync-mutex-value-copied](go/sync-mutex-value-copied.yml) | Go | Copying of `sync.Mutex` via value receivers
[waitgroup-add-called-inside-goroutine](go/waitgroup-add-called-inside-goroutine.yml) | Go | Calls to `sync.WaitGroup.Add` inside of anonymous goroutines
[waitgroup-wait-inside-loop](go/waitgroup-wait-inside-loop.yml) | Go | Calls to `sync.WaitGroup.Wait` inside a loop
[racy-append-to-slice](go/racy-append-to-slice.yml) | Go | Concurrent calls to `append` from multiple goroutines
[racy-write-to-map](go/racy-write-to-map.yml) | Go | Concurrent writes to the same map in multiple goroutines
[missing-unlock-before-return](go/missing-unlock-before-return.yml) | Go | Missing mutex unlock before returning from a function. This could cause panics resulting from double lock operations
[missing-runlock-on-rwmutex](go/missing-runlock-on-rwmutex.yml) | Go | Missing RUnlock on an RWMutex lock before returning from a function
[unsafe-dll-loading.yml](go/unsafe-dll-loading.yml) | Go | Use of function vulnerable to DLL hijacking attacks
[tarfile-extractall-traversal](python/tarfile-extractall-traversal.yml) | Python | Potential path traversal in call to `extractall` for a `tarfile`
[automatic-memory-pinning](python/automatic-memory-pinning.yml) | Python | Memory is not automatically pinned
[lxml-in-pandas](python/lxml-in-pandas.yml) | Python | Potential XXE attacks from loading lxml in pandas
[numpy-in-pytorch-modules](python/numpy-in-pytorch-modules.yml) | Python | Uses NumPy functions inside PyTorch modules
[numpy-in-torch-datasets](python/numpy-in-torch-datasets.yml) | Python | Calls to the Numpy RNG inside of a Torch dataset
[pickles-in-numpy](python/pickles-in-numpy.yml) | Python | Potential arbitrary code execution from NumPy functions reliant on pickling
[pickles-in-pandas](python/pickles-in-pandas.yml) | Python | Potential arbitrary code execution from Pandas functions reliant on pickling
[pickles-in-pytorch](python/pickles-in-pytorch.yml) | Python | Potential arbitrary code execution from PyTorch functions reliant on pickling
[pickles-in-torch-distributed](python/pickles-in-torch-distributed.yml) | Python | Potential arbitrary code execution from PyTorch Distributed functions reliant on pickling
[torch-package](python/torch-package.yml) | Python | Potential arbitrary code execution from torch.package
[torch-tensor](python/torch-tensor.yml) | Python | Possible parsing issues and inefficiency from improper tensor creation
[waiting-with-torch-distributed](python/waiting-with-torch-distributed.yml) | Python | Possible undefined behavior when not waiting for requests
[panic-in-function-returning-result](rs/panic-in-function-returning-result.yml) | Rust | Calling `unwrap` or `expect` in a function returning a `Result`
### go
| ID | Playground | Impact | Confidence | Description |
| -- | :--------: | :----: | :--------: | ----------- |
| [anonymous-race-condition](go/anonymous-race-condition.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.anonymous-race-condition.anonymous-race-condition) | 🟧 | πŸŒ— | Race conditions within anonymous goroutines |
| [hanging-goroutine](go/hanging-goroutine.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.hanging-goroutine.hanging-goroutine) | 🟩 | πŸŒ— | Goroutine leaks |
| [invalid-usage-of-modified-variable](go/invalid-usage-of-modified-variable.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.invalid-usage-of-modified-variable.invalid-usage-of-modified-variable) | 🟧 | 🌘 | Possible unintentional assignment when an error occurs |
| [iterate-over-empty-map](go/iterate-over-empty-map.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.iterate-over-empty-map.iterate-over-empty-map) | 🟩 | πŸŒ— | Probably redundant iteration over an empty map |
| [missing-runlock-on-rwmutex](go/missing-runlock-on-rwmutex.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.missing-runlock-on-rwmutex.missing-runlock-on-rwmutex) | 🟧 | πŸŒ— | Missing `RUnlock` on an `RWMutex` lock before returning from a function |
| [missing-unlock-before-return](go/missing-unlock-before-return.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.missing-unlock-before-return.missing-unlock-before-return) | 🟧 | πŸŒ— | Missing `mutex` unlock before returning from a function |
| [nil-check-after-call](go/nil-check-after-call.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.nil-check-after-call.nil-check-after-call) | 🟧 | πŸŒ— | Possible nil dereferences |
| [racy-append-to-slice](go/racy-append-to-slice.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.racy-append-to-slice.racy-append-to-slice) | 🟧 | πŸŒ— | Concurrent calls to `append` from multiple goroutines |
| [racy-write-to-map](go/racy-write-to-map.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.racy-write-to-map.racy-write-to-map) | 🟧 | πŸŒ— | Concurrent writes to the same map in multiple goroutines |
| [servercodec-readrequestbody-unhandled-nil](go/servercodec-readrequestbody-unhandled-nil.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.servercodec-readrequestbody-unhandled-nil.servercodec-readrequestbody-unhandled-nil) | 🟩 | 🌘 | Possible incorrect `ServerCodec` interface implementation |
| [string-to-int-signedness-cast](go/string-to-int-signedness-cast.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.string-to-int-signedness-cast.string-to-int-signedness-cast) | 🟧 | 🌘 | Integer underflows |
| [sync-mutex-value-copied](go/sync-mutex-value-copied.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.sync-mutex-value-copied.sync-mutex-value-copied) | 🟩 | 🌘 | Copying of `sync.Mutex` via value receivers |
| [unsafe-dll-loading](go/unsafe-dll-loading.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.unsafe-dll-loading.unsafe-dll-loading) | πŸŸ₯ | 🌘 | Use of function vulnerable to DLL hijacking attacks |
| [waitgroup-add-called-inside-goroutine](go/waitgroup-add-called-inside-goroutine.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.waitgroup-add-called-inside-goroutine.waitgroup-add-called-inside-goroutine) | 🟧 | πŸŒ— | Calls to `sync.WaitGroup.Add` inside of anonymous goroutines |
| [waitgroup-wait-inside-loop](go/waitgroup-wait-inside-loop.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.go.waitgroup-wait-inside-loop.waitgroup-wait-inside-loop) | 🟧 | πŸŒ— | Calls to `sync.WaitGroup.Wait` inside a loop |
### python
| ID | Playground | Impact | Confidence | Description |
| -- | :--------: | :----: | :--------: | ----------- |
| [automatic-memory-pinning](python/automatic-memory-pinning.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.automatic-memory-pinning.automatic-memory-pinning) | 🟩 | 🌘 | `PyTorch` memory not automatically pinned |
| [lxml-in-pandas](python/lxml-in-pandas.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.lxml-in-pandas.lxml-in-pandas) | 🟧 | 🌘 | Potential XXE attacks from loading `lxml` in pandas |
| [numpy-distutils](python/numpy-distutils.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.numpy-distutils.numpy-distutils) | 🟩 | 🌘 | Use of deprecated `numpy.distutils` |
| [numpy-f2py-compile](python/numpy-f2py-compile.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.numpy-f2py-compile.numpy-f2py-compile) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `NumPy` `f2py` compilation |
| [numpy-in-pytorch-datasets](python/numpy-in-pytorch-datasets.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.numpy-in-pytorch-datasets.numpy-in-pytorch-datasets) | 🟩 | 🌘 | Calls to the `NumPy` RNG inside of a `Torch` dataset |
| [numpy-in-pytorch-modules](python/numpy-in-pytorch-modules.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.numpy-in-pytorch-modules.numpy-in-pytorch-modules) | 🌫️ | πŸŒ— | Uses of `NumPy` functions inside `PyTorch` modules |
| [numpy-load-library](python/numpy-load-library.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.numpy-load-library.numpy-load-library) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `NumPy` library loading |
| [onnx-session-options](python/onnx-session-options.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.onnx-session-options.onnx-session-options) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `ONNX` library loading |
| [pickles-in-numpy](python/pickles-in-numpy.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pickles-in-numpy.pickles-in-numpy) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `NumPy` functions reliant on pickling |
| [pickles-in-pandas](python/pickles-in-pandas.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pickles-in-pandas.pickles-in-pandas) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `Pandas` functions reliant on pickling |
| [pickles-in-pytorch-distributed](python/pickles-in-pytorch-distributed.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pickles-in-pytorch-distributed.pickles-in-pytorch-distributed) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `PyTorch.Distributed` functions reliant on pickling |
| [pickles-in-pytorch](python/pickles-in-pytorch.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pickles-in-pytorch.pickles-in-pytorch) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `PyTorch` functions reliant on pickling |
| [pytorch-classes-load-library](python/pytorch-classes-load-library.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pytorch-classes-load-library.pytorch-classes-load-library) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `PyTorch` library loading |
| [pytorch-package](python/pytorch-package.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pytorch-package.pytorch-package) | πŸŸ₯ | πŸŒ• | Potential arbitrary code execution from `torch.package` |
| [pytorch-tensor](python/pytorch-tensor.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.pytorch-tensor.pytorch-tensor) | 🌫️ | 🌘 | Possible parsing issues and inefficiency from improper tensor creation |
| [scikit-joblib-load](python/scikit-joblib-load.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.scikit-joblib-load.scikit-joblib-load) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `SciKit.Joblib` functions reliant on pickling |
| [tarfile-extractall-traversal](python/tarfile-extractall-traversal.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.tarfile-extractall-traversal.tarfile-extractall-traversal) | 🟧 | πŸŒ— | Potential path traversal in call to `extractall` for a `tarfile` |
| [tensorflow-load-library](python/tensorflow-load-library.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.tensorflow-load-library.tensorflow-load-library) | πŸŸ₯ | πŸŒ— | Potential arbitrary code execution from `TensorFlow` library loading |
| [waiting-with-pytorch-distributed](python/waiting-with-pytorch-distributed.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.python.waiting-with-pytorch-distributed.waiting-with-pytorch-distributed) | 🟩 | πŸŒ— | Possible `PyTorch` undefined behavior when not waiting for requests |
### rs
| ID | Playground | Impact | Confidence | Description |
| -- | :--------: | :----: | :--------: | ----------- |
| [panic-in-function-returning-result](rs/panic-in-function-returning-result.yaml) | [πŸ›πŸ”—](https://semgrep.dev/playground/r/trailofbits.rs.panic-in-function-returning-result.panic-in-function-returning-result) | 🟩 | 🌘 | Calling `unwrap` or `expect` in a function returning a `Result` |
## Contributing
Pull Requests and issues are welcomed!
See [CONTRIBUTING.md](CONTRIBUTING.md) for more information.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ rules:
confidence: MEDIUM
likelihood: HIGH
impact: MEDIUM
technology: [--no-technology--]
description: "Race conditions within anonymous goroutines"
references:
- https://github.com/golang/go/wiki/CommonMistakes#using-goroutines-on-loop-iterator-variables

Expand Down
2 changes: 2 additions & 0 deletions go/hanging-goroutine.yml β†’ go/hanging-goroutine.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ rules:
confidence: MEDIUM
likelihood: MEDIUM
impact: LOW
technology: [--no-technology--]
description: "Goroutine leaks"
references:
- https://blog.trailofbits.com/2021/11/08/discovering-goroutine-leaks-with-semgrep

Expand Down
Loading

0 comments on commit e3016e6

Please sign in to comment.