-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ML rules fixes, new rule for msgpack-numpy (#39)
* improves some rules and tests * msgpack-numpy rule * new rule for pandas eval functions * keras and tf loading functions * signed commit * new rules: keras and tf loading * fix formatting, less FPs for pandas-eval * pandas-eval metadata * pandas-eval, add testcase for empty f-string * numpy-in-pytorch-modules, generic match * pickles-in-keras-deprecation - regex --------- Co-authored-by: Lucas Bourtoule <[email protected]> Co-authored-by: Paweł Płatek <[email protected]> Co-authored-by: GrosQuildu <[email protected]>
- Loading branch information
1 parent
772f68e
commit 69fd8d8
Showing
15 changed files
with
336 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
import msgpack | ||
import msgpack_numpy as m | ||
import numpy as np | ||
|
||
x = np.random.rand(5) | ||
# ruleid: msgpack-numpy | ||
x_enc = msgpack.packb(x, default=m.encode) | ||
# ruleid: msgpack-numpy | ||
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode) | ||
|
||
# ok: msgpack-numpy | ||
x_enc2 = msgpack.packb(x) | ||
# ok: msgpack-numpy | ||
x_rec2 = msgpack.unpackb(x_enc2) | ||
|
||
# ok: msgpack-numpy | ||
x_enc3 = msgpack.load(x) | ||
# ok: msgpack-numpy | ||
x_rec3 = msgpack.loads(x_enc2) | ||
|
||
m.patch() | ||
|
||
# ruleid: msgpack-numpy | ||
x_enc3 = msgpack.packb(x) | ||
# ruleid: msgpack-numpy | ||
x_rec3 = msgpack.unpackb(x_enc2) | ||
|
||
# ruleid: msgpack-numpy | ||
x_enc3 = msgpack.load(x) | ||
# ruleid: msgpack-numpy | ||
x_rec3 = msgpack.loads(x_enc2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
rules: | ||
- id: msgpack-numpy | ||
message: >- | ||
Found usage of msgpack-numpy unpacking, which relies on pickle to deserialize numpy arrays containing objects. | ||
Functions reliant on pickle can result in arbitrary code execution. | ||
Consider switching to a safer serialization method. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [numpy] | ||
description: "Potential arbitrary code execution from functions reliant on pickling" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
pattern-either: | ||
- patterns: | ||
- pattern: msgpack.$FN(...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: (loads?|dumps?|packb?|unpackb?) | ||
- pattern-inside: | | ||
msgpack_numpy.patch() | ||
... | ||
- patterns: | ||
- pattern: msgpack.$FN(..., object_hook=msgpack_numpy.decode, ...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: unpackb? | ||
|
||
- patterns: | ||
- pattern: msgpack.$FN(..., default=msgpack_numpy.encode, ...) | ||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: packb? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import pandas as pd | ||
|
||
def id(x): | ||
return x | ||
|
||
expr = id("something") | ||
colB = id("B") | ||
subexpr = id("df.age * 2") | ||
|
||
df1 = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)}) | ||
# ok: pandas-eval | ||
r11 = df1.eval('A + B') | ||
# ruleid: pandas-eval | ||
r12 = df1.eval(expr) | ||
# ok: pandas-eval | ||
r13 = df1.eval(f"A + B") | ||
# ruleid: pandas-eval | ||
r14 = df1.eval(f"A + {colB}") | ||
# ok: pandas-eval | ||
r15 = df1.eval(f"") | ||
|
||
df2 = pd.DataFrame({"animal": ["dog", "pig"], "age": [10, 20]}) | ||
# ok: pandas-eval | ||
pd.eval("double_age = df.age * 2", target=df2) | ||
# ruleid: pandas-eval | ||
pd.eval(expr, target=df2) | ||
# ok: pandas-eval | ||
pd.eval(f"double_age = df.age * 2", target=df2) | ||
# ruleid: pandas-eval | ||
pd.eval(f"double_age = {subexpr}", target=df2) | ||
|
||
df3 = pd.DataFrame({ | ||
'A': range(1, 6), | ||
'B': range(10, 0, -2) | ||
}) | ||
# ok: pandas-eval | ||
r31 = df3.query('A > B') | ||
# ruleid: pandas-eval | ||
r32 = df3.query(expr) | ||
# ok: pandas-eval | ||
r33 = df3.query(f'A > B') | ||
# ruleid: pandas-eval | ||
r34 = df3.query(f'A > {colB}') | ||
|
||
class X: | ||
def query(self, x): | ||
pass | ||
|
||
# ok: pandas-eval | ||
X().query(expr) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
rules: | ||
- id: pandas-eval | ||
message: >- | ||
Pandas eval() and query() may be dangerous if used to evaluate | ||
dynamic content. If this content can be input from outside the program, this | ||
may be a code injection vulnerability. Ensure evaluated content is not definable | ||
by external sources. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code ('Eval Injection')" | ||
subcategory: | ||
- audit | ||
confidence: LOW | ||
likelihood: LOW | ||
impact: HIGH | ||
technology: [pandas] | ||
description: "Potential arbitrary code execution from `pandas` functions that evaluate user-provided expressions" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern-inside: | | ||
import pandas | ||
... | ||
- pattern-either: | ||
- patterns: | ||
- pattern: pandas.DataFrame.$FN(...) | ||
- pattern-not: pandas.DataFrame.$FN("...", ...) | ||
- pattern-not: pandas.DataFrame.$FN(f"", ...) | ||
|
||
- patterns: | ||
- pattern: pandas.$FN(...) | ||
- pattern-not: pandas.$FN("...", ...) | ||
- pattern-not: pandas.$FN(f"", ...) | ||
|
||
- patterns: | ||
- pattern-inside: | | ||
$DF = pandas.DataFrame(...) | ||
... | ||
- pattern: $DF.$FN(...) | ||
- pattern-not: $DF.$FN("...", ...) | ||
- pattern-not: $DF.$FN(f"", ...) | ||
|
||
- metavariable-regex: | ||
metavariable: $FN | ||
regex: (eval|query) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from tensorflow import keras | ||
from keras.models import load_model | ||
|
||
def id(x): | ||
return x | ||
|
||
h5_file_path = id("model.h5") | ||
keras_file_path = id("model.keras") | ||
|
||
# ruleid: pickles-in-keras-deprecation | ||
m1 = load_model("model.h5") | ||
|
||
# ok: pickles-in-keras-deprecation | ||
m2 = load_model("model.keras") | ||
|
||
# ruleid: pickles-in-keras-deprecation | ||
m3 = load_model(h5_file_path) | ||
|
||
# ruleid: pickles-in-keras-deprecation | ||
m4 = load_model(keras_file_path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
rules: | ||
- id: pickles-in-keras-deprecation | ||
message: >- | ||
The usage of pickle and hdf5 formats for model files are deprecated in Keras. | ||
The keras.models.load_model function is deprecated as well. Keras is now | ||
embedded in Tensorflow 2 under tensorflow.keras. | ||
languages: [python] | ||
severity: WARNING | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [keras] | ||
description: "Potential arbitrary code execution from Keras' load_model function" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern-either: | ||
- pattern: keras.models.load_model(...) | ||
- pattern: tensorflow.keras.models.load_model(...) | ||
- pattern: keras.saving.load_model(...) | ||
- pattern: tensorflow.keras.saving.load_model(...) | ||
- pattern-not: | ||
patterns: | ||
- pattern-either: | ||
- pattern: keras.models.load_model($FILE) | ||
- pattern: tensorflow.keras.models.load_model($FILE) | ||
- pattern: keras.saving.load_model($FILE) | ||
- pattern: tensorflow.keras.saving.load_model($FILE) | ||
- metavariable-regex: | ||
metavariable: $FILE | ||
regex: .*\.keras |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from tensorflow import keras | ||
from keras.models import load_model | ||
|
||
def id(x): | ||
return x | ||
|
||
h5_file_path = id("model.h5") | ||
keras_file_path = id("model.keras") | ||
|
||
# ok: pickles-in-keras | ||
m1 = load_model("model.h5") | ||
|
||
# ok: pickles-in-keras | ||
m2 = load_model("model.keras") | ||
|
||
# ruleid: pickles-in-keras | ||
m3 = load_model(h5_file_path) | ||
|
||
# ruleid: pickles-in-keras | ||
m4 = load_model(keras_file_path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
rules: | ||
- id: pickles-in-keras | ||
message: >- | ||
Keras' load_model function may result in arbitrary code execution: | ||
- It can load vulnerable pickled models | ||
- It can load an hdf5 model that contains a lambda layer with arbitrary code | ||
that will be executed every time the model is used (loading, training, eval) | ||
Note: Keras loading with the built-in file format should be safe as long as checks are not disabled. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [keras] | ||
description: "Potential arbitrary code execution from Keras' load_model function" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern-either: | ||
- patterns: | ||
- pattern: keras.models.load_model(...) | ||
- pattern-not: keras.models.load_model("...", ...) | ||
- patterns: | ||
- pattern: tensorflow.keras.models.load_model(...) | ||
- pattern-not: tensorflow.keras.models.load_model("...", ...) | ||
- patterns: | ||
- pattern: keras.saving.load_model(...) | ||
- pattern-not: keras.saving.load_model("...", ...) | ||
- patterns: | ||
- pattern: tensorflow.keras.saving.load_model(...) | ||
- pattern-not: tensorflow.keras.saving.load_model("...", ...) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import tensorflow as tf | ||
|
||
def id(x): | ||
return x | ||
|
||
model_dir = id("model_dir") | ||
|
||
# ok: pickles-in-tensorflow | ||
m1 = tf.saved_model.load("model_dir") | ||
# ruleid: pickles-in-tensorflow | ||
m2 = tf.saved_model.load(model_dir) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
rules: | ||
- id: pickles-in-tensorflow | ||
message: >- | ||
Tensorflow's low-level load function may result in arbitrary code execution. | ||
languages: [python] | ||
severity: ERROR | ||
metadata: | ||
category: security | ||
cwe: "CWE-502: Deserialization of Untrusted Data" | ||
subcategory: [vuln] | ||
confidence: MEDIUM | ||
likelihood: MEDIUM | ||
impact: HIGH | ||
technology: [keras] | ||
description: "Potential arbitrary code execution from tensorflow's load function" | ||
references: | ||
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/ | ||
|
||
patterns: | ||
- pattern: tensorflow.saved_model.load(...) | ||
- pattern-not: tensorflow.saved_model.load("...", ...) |