Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML rules fixes, new rule for msgpack-numpy #39

Merged
merged 14 commits into from
Apr 3, 2024
Merged

ML rules fixes, new rule for msgpack-numpy #39

merged 14 commits into from
Apr 3, 2024

Conversation

dhalf
Copy link
Contributor

@dhalf dhalf commented Oct 16, 2023

This PR contains:

  • Some changes/fixes to ML-related Python rules and/or their associated test file: lxml-in-pandas, numpy-in-pytorch-modules and pickles-in-pytorch
  • A new rule to spot the usage of msgpack-numpy, a Python library that prepares numpy objects to be serialized with msgpack but that is backed by pickle. The rule excludes normal uses of msgpack (without the numpy translation layer).

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ GrosQuildu
✅ dhalf
❌ Lucas Bourtoule


Lucas Bourtoule seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@GrosQuildu
Copy link
Collaborator

Thanks! Will review soon. Seems like you have to sign the CLA?

@dhalf
Copy link
Contributor Author

dhalf commented Oct 17, 2023

Thanks! Will review soon. Seems like you have to sign the CLA?

Hi! I did sign twice but this doesn't seem to be reflected here... Any idea?

@dhalf
Copy link
Contributor Author

dhalf commented Oct 18, 2023

I added two new rules to detect the use of Keras and Tensorflow loading functions. We reject all calls except those with a string literal (or constant) as file path. I also added a warning rule to deprecate the usage of pickle and hdf5 for Keras. The preferred format is now .keras which comes with checks that mitigate the risk of arbitrary code execution.

@GrosQuildu
Copy link
Collaborator

Overall looks good. Need @suhacker1 to review these (or somebody from ML). Then I can review from the semgrep/cq perspective.

suhacker1
suhacker1 previously approved these changes Oct 19, 2023
Copy link
Contributor

@suhacker1 suhacker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the ML-perspective, everything looks good to me!

GrosQuildu
GrosQuildu previously approved these changes Apr 3, 2024
- audit
confidence: LOW
likelihood: LOW
impact: HIGH
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running python ./validate-metadata.py -s ./metadata-schema.yaml.schm -f . shows that technology and references metadata fields are missing

Copy link
Member

@mschwager mschwager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left a few minor comments 👍

python/pandas-eval.yaml Show resolved Hide resolved
python/numpy-in-pytorch-modules.yaml Outdated Show resolved Hide resolved
python/pickles-in-keras-deprecation.yaml Outdated Show resolved Hide resolved
@GrosQuildu GrosQuildu merged commit 69fd8d8 into main Apr 3, 2024
2 of 3 checks passed
@GrosQuildu GrosQuildu deleted the lucas branch April 3, 2024 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants