-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML rules fixes, new rule for msgpack-numpy #39
Merged
Merged
Changes from 8 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
918982f
improves some rules and tests
ca927a0
msgpack-numpy rule
e3607be
new rule for pandas eval functions
c788668
Merge branch 'main' into lucas
GrosQuildu 834a3f6
keras and tf loading functions
dhalf 967660d
signed commit
dhalf 043f701
Merge branch 'lucas2' into lucas
dhalf 462b9e9
new rules: keras and tf loading
dhalf b4d28c6
fix formatting, less FPs for pandas-eval
GrosQuildu 975492b
pandas-eval metadata
GrosQuildu 904b5aa
merge conflicts
GrosQuildu 0051875
pandas-eval, add testcase for empty f-string
GrosQuildu 3a6867d
numpy-in-pytorch-modules, generic match
GrosQuildu b3e93ed
pickles-in-keras-deprecation - regex
GrosQuildu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import msgpack | ||
import msgpack_numpy as m | ||
import numpy as np | ||
|
||
x = np.random.rand(5) | ||
# ruleid: msgpack-numpy | ||
x_enc = msgpack.packb(x, default=m.encode) | ||
# ruleid: msgpack-numpy | ||
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode) | ||
|
||
# ok: msgpack-numpy | ||
x_enc2 = msgpack.packb(x) | ||
# ok: msgpack-numpy | ||
x_rec2 = msgpack.unpackb(x_enc2) | ||
|
||
m.patch() | ||
|
||
# ruleid: msgpack-numpy | ||
x_enc3 = msgpack.packb(x) | ||
# ruleid: msgpack-numpy | ||
x_rec3 = msgpack.unpackb(x_enc2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
rules: | ||
- id: msgpack-numpy | ||
message: >- | ||
Found usage of msgpack-numpy unpacking, which relies on pickle to deserialize numpy arrays containing objects. | ||
Functions reliant on pickle can result in arbitrary code execution. | ||
Consider switching to a safer serialization method. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [numpy] | ||
description: "Potential arbitrary code execution from functions reliant on pickling" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
pattern-either: | ||
- patterns: | ||
- pattern: msgpack.$FN(...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: (loads?|dumps?|packb?|unpackb?) | ||
- pattern-inside: | | ||
msgpack_numpy.patch() | ||
... | ||
- pattern-either: | ||
- patterns: | ||
- pattern: msgpack.$FN(..., object_hook=msgpack_numpy.decode, ...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: unpackb? | ||
- patterns: | ||
- pattern: msgpack.$FN(..., default=msgpack_numpy.encode, ...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: packb? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
import pandas as pd | ||
|
||
def id(x): | ||
return x | ||
|
||
expr = id("something") | ||
colB = id("B") | ||
subexpr = id("df.age * 2") | ||
|
||
df1 = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)}) | ||
# ok: pandas-eval | ||
r11 = df1.eval('A + B') | ||
# ruleid: pandas-eval | ||
r12 = df1.eval(expr) | ||
# ok: pandas-eval | ||
r13 = df1.eval(f"A + B") | ||
# ruleid: pandas-eval | ||
r14 = df1.eval(f"A + {colB}") | ||
|
||
df2 = pd.DataFrame({"animal": ["dog", "pig"], "age": [10, 20]}) | ||
# ok: pandas-eval | ||
pd.eval("double_age = df.age * 2", target=df2) | ||
# ruleid: pandas-eval | ||
pd.eval(expr, target=df2) | ||
# ok: pandas-eval | ||
pd.eval(f"double_age = df.age * 2", target=df2) | ||
# ruleid: pandas-eval | ||
pd.eval(f"double_age = {subexpr}", target=df2) | ||
|
||
df3 = pd.DataFrame({ | ||
'A': range(1, 6), | ||
'B': range(10, 0, -2) | ||
}) | ||
# ok: pandas-eval | ||
r31 = df3.query('A > B') | ||
# ruleid: pandas-eval | ||
r32 = df3.query(expr) | ||
# ok: pandas-eval | ||
r33 = df3.query(f'A > B') | ||
# ruleid: pandas-eval | ||
r34 = df3.query(f'A > {colB}') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
rules: | ||
- id: pandas-eval | ||
message: >- | ||
Pandas eval() and query() may be dangerous if used to evaluate | ||
dynamic content. If this content can be input from outside the program, this | ||
may be a code injection vulnerability. Ensure evaluated content is not definable | ||
by external sources. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code ('Eval Injection')" | ||
subcategory: | ||
- audit | ||
confidence: LOW | ||
likelihood: LOW | ||
impact: HIGH | ||
description: "Potential arbitrary code execution from `pandas` functions that evaluate user-provided expressions" | ||
|
||
patterns: | ||
- pattern-either: | ||
- patterns: | ||
- pattern: pandas.DataFrame.$FN(...) | ||
- pattern-not: pandas.DataFrame.$FN("...", ...) | ||
- pattern-not: pandas.DataFrame.$FN(f"", ...) | ||
- patterns: | ||
- pattern: pandas.$FN(...) | ||
- pattern-not: pandas.$FN("...", ...) | ||
- pattern-not: pandas.$FN(f"", ...) | ||
- patterns: | ||
- pattern-inside: | | ||
import pandas | ||
... | ||
- pattern: $DF.$FN(...) | ||
- pattern-not: $DF.$FN("...", ...) | ||
- pattern-not: $DF.$FN(f"", ...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: (eval|query) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from tensorflow import keras | ||
from keras.models import load_model | ||
|
||
def id(x): | ||
return x | ||
|
||
h5_file_path = id("model.h5") | ||
keras_file_path = id("model.keras") | ||
|
||
# ruleid: pickles-in-keras-deprecation | ||
m1 = load_model("model.h5") | ||
|
||
# ok: pickles-in-keras-deprecation | ||
m2 = load_model("model.keras") | ||
|
||
# ruleid: pickles-in-keras-deprecation | ||
m3 = load_model(h5_file_path) | ||
|
||
# ruleid: pickles-in-keras-deprecation | ||
m4 = load_model(keras_file_path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
rules: | ||
- id: pickles-in-keras-deprecation | ||
message: >- | ||
The usage of pickle and hdf5 formats for model files are deprecated in Keras. | ||
The keras.models.load_model function is deprecated as well. Keras is now | ||
embedded in Tensorflow 2 under tensorflow.keras. | ||
languages: [python] | ||
severity: WARNING | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [keras] | ||
description: "Potential arbitrary code execution from Keras' load_model function" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern-either: | ||
- pattern: keras.models.load_model(...) | ||
- pattern: tensorflow.keras.models.load_model(...) | ||
- pattern: keras.saving.load_model(...) | ||
- pattern: tensorflow.keras.saving.load_model(...) | ||
- pattern-not: | ||
patterns: | ||
- pattern-either: | ||
- pattern: keras.models.load_model($FILE) | ||
- pattern: tensorflow.keras.models.load_model($FILE) | ||
- pattern: keras.saving.load_model($FILE) | ||
- pattern: tensorflow.keras.saving.load_model($FILE) | ||
- metavariable-comparison: | ||
metavariable: $FILE | ||
comparison: re.match(".*\.keras", $FILE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from tensorflow import keras | ||
from keras.models import load_model | ||
|
||
def id(x): | ||
return x | ||
|
||
h5_file_path = id("model.h5") | ||
keras_file_path = id("model.keras") | ||
|
||
# ok: pickles-in-keras | ||
m1 = load_model("model.h5") | ||
|
||
# ok: pickles-in-keras | ||
m2 = load_model("model.keras") | ||
|
||
# ruleid: pickles-in-keras | ||
m3 = load_model(h5_file_path) | ||
|
||
# ruleid: pickles-in-keras | ||
m4 = load_model(keras_file_path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
rules: | ||
- id: pickles-in-keras | ||
message: >- | ||
Keras' load_model function may result in arbitrary code execution: | ||
- It can load vulnerable pickled models | ||
- It can load an hdf5 model that contains a lambda layer whith arbitrary code | ||
that will be executed every time the model is used (loading, training, eval) | ||
Note: Keras loading with the built-in file format should be safe as long as checks are not disabled. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [keras] | ||
description: "Potential arbitrary code execution from Keras' load_model function" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern-either: | ||
- patterns: | ||
- pattern: keras.models.load_model(...) | ||
- pattern-not: keras.models.load_model("...", ...) | ||
- patterns: | ||
- pattern: tensorflow.keras.models.load_model(...) | ||
- pattern-not: tensorflow.keras.models.load_model("...", ...) | ||
- patterns: | ||
- pattern: keras.saving.load_model(...) | ||
- pattern-not: keras.saving.load_model("...", ...) | ||
- patterns: | ||
- pattern: tensorflow.keras.saving.load_model(...) | ||
- pattern-not: tensorflow.keras.saving.load_model("...", ...) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import tensorflow as tf | ||
|
||
def id(x): | ||
return x | ||
|
||
model_dir = id("model_dir") | ||
|
||
# ok: pickles-in-tensorflow | ||
m1 = tf.saved_model.load("model_dir") | ||
# ruleid: pickles-in-tensorflow | ||
m2 = tf.saved_model.load(model_dir) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
rules: | ||
- id: pickles-in-tensorflow | ||
message: >- | ||
Tensorflow's low-level load function may result in arbitrary code execution. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [keras] | ||
description: "Potential arbitrary code execution from tensorflow's load function" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern: tensorflow.saved_model.load(...) | ||
- pattern-not: tensorflow.saved_model.load("...", ...) | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running
python ./validate-metadata.py -s ./metadata-schema.yaml.schm -f .
shows thattechnology
andreferences
metadata fields are missing