Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New patterns & ML update #600

Merged
merged 8 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 26 additions & 30 deletions .ci/benchmark.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
META MD5 ab5ac74eb958a4c281103e3e4973b8c2
DATA MD5 4ecb4c9436742b6b281ce7977066099d
DATA: 16344639 interested lines. MARKUP: 62815 items
META MD5 f019321883fa9315afcd43fa085b5bf9
DATA MD5 de85ea0a77bd333be6a0d8422b835df4
DATA: 16344639 interested lines. MARKUP: 62823 items
FileType FileNumber ValidLines Positives Negatives Templates
--------------- ------------ ------------ ----------- ----------- -----------
194 28318 71 418 90
Expand Down Expand Up @@ -85,7 +85,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.java 621 134132 362 1363 172
.jenkinsfile 1 58 2 6
.jinja2 1 64 2
.js 659 536413 532 2496 331
.js 659 536413 531 2497 331
.json 851 13046493 1077 10907 140
.jsp 13 3202 1 40
.jsx 7 857 19
Expand All @@ -108,7 +108,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.lock 24 160912 142
.log 2 199 38 52
.lua 10 1924 37 3
.m 16 13358 14 159 3
.m 16 13358 19 161 3
.manifest 3 102 9 6
.markdown 3 139 3 1
.markerb 3 12 3
Expand All @@ -125,7 +125,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.mqh 1 1023 2
.msg 1 26644 1 1
.mysql 1 36 2
.ndjson 2 5006 75 242 2
.ndjson 2 5006 75 243 2
.nix 4 211 12
.nolint 1 2 1
.odd 1 1281 43
Expand Down Expand Up @@ -179,7 +179,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.scala 40 5071 22 101
.scss 16 8553 32 1
.secrets 1 11 1
.sh 143 21525 58 480 26
.sh 143 21525 60 480 24
.slim 1 153 1 2
.smali 1 775 18
.snap 3 1708 9 30 2
Expand Down Expand Up @@ -209,7 +209,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.ts 583 106730 157 1800 203
.tsx 54 7914 1 114 5
.ttar 1 452 1
.txt 440 78102 5284 6357 49
.txt 440 78102 5287 6354 49
.utf8 1 77 2
.vsixmanifest 1 36 1
.vsmdi 1 6 2
Expand All @@ -219,34 +219,30 @@ FileType FileNumber ValidLines Positives Negatives Templat
.xib 11 503 169
.xsl 1 311 1
.yaml 137 19004 125 345 42
.yml 419 36169 558 887 377
.yml 419 36169 559 889 376
.zsh 6 872 12
.zsh-theme 1 97 1
TOTAL: 10254 16344639 12211 50498 5107
NEARBY (39, 45) 1479647,1a13a17c,GitHub,6c73b80a,data/6c73b80a/test/1a13a17c.go,273,273,F,F,39,47,F,F,,,,,0.0,0,F,F,F,Password
NEARBY (39, 45) 1479648,1a13a17c,GitHub,6c73b80a,data/6c73b80a/test/1a13a17c.go,277,277,F,F,39,47,F,F,,,,,0.0,0,F,F,F,Password
NEARBY (33, 41) 1479649,7708ebf0,GitHub,6c73b80a,data/6c73b80a/test/7708ebf0.go,5079,5079,F,F,33,43,F,F,,,,,0.0,0,F,F,F,Password
NEARBY (33, 41) 1479650,7708ebf0,GitHub,6c73b80a,data/6c73b80a/test/7708ebf0.go,5083,5083,F,F,33,43,F,F,,,,,0.0,0,F,F,F,Password
credsweeper result_cnt : 11479, lost_cnt : 4, true_cnt : 11162, false_cnt : 313
TOTAL: 10254 16344639 12221 50501 5104
credsweeper result_cnt : 11487, lost_cnt : 0, true_cnt : 11308, false_cnt : 179
Rules Positives Negatives Templates Reported TP FP TN FN FPR FNR ACC PRC RCL F1
------------------------------ ----------- ----------- ----------- ---------- ----- ---- ----- ---- -------- -------- -------- -------- -------- --------
API 130 3166 188 119 116 3 3351 14 0.000894 0.107692 0.995121 0.974790 0.892308 0.931727
API 130 3166 188 125 123 2 3352 7 0.000596 0.053846 0.997417 0.984000 0.946154 0.964706
AWS Client ID 168 21 0 160 160 0 21 8 0.000000 0.047619 0.957672 1.000000 0.952381 0.975610
AWS Multi 82 10 0 88 82 5 5 0 0.500000 0.000000 0.945652 0.942529 1.000000 0.970414
AWS S3 Bucket 67 23 0 92 67 23 0 0 1.000000 0.000000 0.744444 0.744444 1.000000 0.853503
Atlassian Old PAT token 27 308 3 12 3 8 303 24 0.025723 0.888889 0.905325 0.272727 0.111111 0.157895
Auth 414 2739 82 397 376 21 2800 38 0.007444 0.091787 0.981762 0.947103 0.908213 0.927250
Auth 414 2739 82 390 387 3 2818 27 0.001063 0.065217 0.990726 0.992308 0.934783 0.962687
Azure Access Token 19 0 0 12 12 0 0 7 0.368421 0.631579 1.000000 0.631579 0.774194
BASE64 Private Key 7 4 0 7 7 0 4 0 0.000000 0.000000 1.000000 1.000000 1.000000 1.000000
BASE64 encoded PEM Private Key 7 0 0 5 5 0 0 2 0.285714 0.714286 1.000000 0.714286 0.833333
Bitbucket Client ID 143 2095 9 48 28 19 2085 115 0.009030 0.804196 0.940365 0.595745 0.195804 0.294737
Bitbucket Client Secret 301 807 10 40 29 11 806 272 0.013464 0.903654 0.746869 0.725000 0.096346 0.170088
CMD ConvertTo-SecureString 13 4 0 10 10 0 4 3 0.000000 0.230769 0.823529 1.000000 0.769231 0.869565
CMD Password 21 128 6 17 17 0 134 4 0.000000 0.190476 0.974194 1.000000 0.809524 0.894737
CMD ConvertTo-SecureString 13 4 0 13 13 0 4 0 0.000000 0.000000 1.000000 1.000000 1.000000 1.000000
CMD Password 21 128 6 18 18 0 134 3 0.000000 0.142857 0.980645 1.000000 0.857143 0.923077
CMD Secret 1 1 0 1 1 0 1 0 0.000000 0.000000 1.000000 1.000000 1.000000 1.000000
CMD Token 6 0 0 5 5 0 0 1 0.166667 0.833333 1.000000 0.833333 0.909091
Certificate 24 471 0 25 19 6 465 5 0.012739 0.208333 0.977778 0.760000 0.791667 0.775510
Credential 93 419 76 92 92 0 495 1 0.000000 0.010753 0.998299 1.000000 0.989247 0.994595
CMD Token 6 0 0 6 6 0 0 0 0.000000 1.000000 1.000000 1.000000 1.000000
Certificate 24 471 0 20 20 0 471 4 0.000000 0.166667 0.991919 1.000000 0.833333 0.909091
Credential 93 419 76 94 93 1 494 0 0.002020 0.000000 0.998299 0.989362 1.000000 0.994652
Docker Swarm Token 2 0 0 1 1 0 0 1 0.500000 0.500000 1.000000 0.500000 0.666667
Dropbox App secret 64 139 1 46 35 10 130 29 0.071429 0.453125 0.808824 0.777778 0.546875 0.642202
Facebook Access Token 0 1 0 0 0 1 0 0.000000 1.000000
Expand All @@ -261,17 +257,17 @@ Grafana Provisioned API Key 22 1 0
JSON Web Token 170 61 0 131 131 0 61 39 0.000000 0.229412 0.831169 1.000000 0.770588 0.870432
Jira / Confluence PAT token 0 4 0 0 0 4 0 0.000000 1.000000
Jira 2FA 15 6 1 12 12 0 7 3 0.000000 0.200000 0.863636 1.000000 0.800000 0.888889
Key 3906 15720 485 3968 3867 101 16104 39 0.006233 0.009985 0.993039 0.974546 0.990015 0.982220
Nonce 91 49 0 87 87 0 49 4 0.000000 0.043956 0.971429 1.000000 0.956044 0.977528
Key 3909 15717 485 3944 3893 51 16151 16 0.003148 0.004093 0.996668 0.987069 0.995907 0.991468
Nonce 91 49 0 89 88 1 48 3 0.020408 0.032967 0.971429 0.988764 0.967033 0.977778
Other 8 8292 1 0 0 8293 8 0.000000 1.000000 0.999036 0.000000
PEM Private Key 1019 1483 0 1023 1019 4 1479 0 0.002697 0.000000 0.998401 0.996090 1.000000 0.998041
Password 1862 7531 2683 1762 1703 55 10159 159 0.005385 0.085392 0.982279 0.968714 0.914608 0.940884
Salt 47 76 1 44 43 1 76 4 0.012987 0.085106 0.959677 0.977273 0.914894 0.945055
Secret 1297 1576 802 1272 1269 3 2375 28 0.001262 0.021588 0.991565 0.997642 0.978412 0.987933
Password 1869 7535 2680 1776 1758 18 10197 111 0.001762 0.059390 0.989325 0.989865 0.940610 0.964609
Salt 47 76 1 44 44 0 77 3 0.000000 0.063830 0.975806 1.000000 0.936170 0.967033
Secret 1297 1576 802 1288 1283 5 2373 14 0.002103 0.010794 0.994830 0.996118 0.989206 0.992650
Seed 1 6 0 0 0 6 1 0.000000 1.000000 0.857143 0.000000
Slack Token 4 1 0 4 4 0 1 0 0.000000 0.000000 1.000000 1.000000 1.000000 1.000000
Token 643 4168 454 610 588 22 4600 55 0.004760 0.085537 0.985375 0.963934 0.914463 0.938547
Token 643 4170 454 616 614 2 4622 29 0.000433 0.045101 0.994114 0.996753 0.954899 0.975377
Twilio API Key 0 5 2 0 0 7 0 0.000000 1.000000
URL Credentials 210 156 216 213 207 5 367 3 0.013441 0.014286 0.986254 0.976415 0.985714 0.981043
URL Credentials 210 156 216 205 205 0 372 5 0.000000 0.023810 0.991409 1.000000 0.976190 0.987952
UUID 1069 265 0 1068 1067 1 264 2 0.003774 0.001871 0.997751 0.999064 0.998129 0.998596
12211 50498 5107 11487 11162 313 50185 1049 0.006198 0.085906 0.978281 0.972723 0.914094 0.942498
12221 50501 5104 11494 11308 179 50322 913 0.003544 0.074707 0.982590 0.984417 0.925293 0.953940
12 changes: 4 additions & 8 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@ jobs:
- name: Checkout CredData
uses: actions/checkout@v4
with:
repository: babenek/CredData
ref: awsmulti
repository: Samsung/CredData

- name: Markup hashing
run: |
Expand Down Expand Up @@ -73,8 +72,7 @@ jobs:
- name: Checkout CredData
uses: actions/checkout@v4
with:
repository: babenek/CredData
ref: awsmulti
repository: Samsung/CredData

- name: Markup hashing
run: |
Expand Down Expand Up @@ -171,8 +169,7 @@ jobs:
- name: Checkout CredData
uses: actions/checkout@v4
with:
repository: babenek/CredData
ref: awsmulti
repository: Samsung/CredData

- name: Markup hashing
run: |
Expand Down Expand Up @@ -354,8 +351,7 @@ jobs:
- name: Checkout CredData
uses: actions/checkout@v4
with:
repository: babenek/CredData
ref: awsmulti
repository: Samsung/CredData

- name: Markup hashing
run: |
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,8 @@ jobs:
- name: Check ml_model.onnx integrity
if: ${{ always() && steps.code_checkout.conclusion == 'success' }}
run: |
md5sum --binary credsweeper/ml_model/ml_config.json | grep 2b29c5e1aa199d14b788652bd542c7c0
md5sum --binary credsweeper/ml_model/ml_model.onnx | grep 88f37978fc0599ac8d1bf732ad40c077
md5sum --binary credsweeper/ml_model/ml_config.json | grep 49c4352ae9ec82ad432d49d7e51c27f1
md5sum --binary credsweeper/ml_model/ml_model.onnx | grep ff66e97c446d0f2bbd8d37b7dfff7361
# # # line ending

Expand Down
3 changes: 3 additions & 0 deletions credsweeper/common/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ def get(confidence: Union[str, "Confidence"]) -> Optional["Confidence"]:

class Base(Enum):
"""Stores types of character sets in lower case"""
digits = "digits"
ascii_uppercase = "ascii_uppercase"
ascii_lowercase = "ascii_lowercase"
base16upper = "base16upper"
base16lower = "base16lower"
base32 = "base32"
Expand Down
18 changes: 18 additions & 0 deletions credsweeper/common/keyword_checklist.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,21 @@ def morpheme_set(self) -> Set[str]:
def morpheme_len(self) -> int:
"""Length of morpheme_set"""
return len(self.__morpheme_set)

def check_morphemes(self, line_lower: str, threshold: int) -> bool:
"""Checks limit of morphemes limit in line.
Args:
line_lower: input line - MUST be in lower
threshold: number of minimal morphemes
Return:
True - if number of morphemes exceeds the threshold
"""
matches = 0
for keyword in self.morpheme_set:
if keyword in line_lower:
matches += 1
if threshold < matches:
return True
return False
27 changes: 27 additions & 0 deletions credsweeper/common/morpheme_checklist.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
../
.com
.org
/bin
/dev
/etc
/lib
/mnt
/opt
/sbin
/srv
/tmp
/usr
/var
000
111
222
Expand Down Expand Up @@ -206,6 +220,7 @@ best
bias
big
bill
bin/
binar
bind
bio
Expand Down Expand Up @@ -373,6 +388,7 @@ course
court
cove
cpu_
crac
creat
cred
cript
Expand Down Expand Up @@ -428,6 +444,7 @@ dest
detach
detai
detect
dev/
dev_
develop
device
Expand Down Expand Up @@ -529,6 +546,7 @@ esam
esses
estima
esult
etc/
eth_
etic
eting
Expand Down Expand Up @@ -694,6 +712,7 @@ hybrid
iabl
ical
icon
id_rsa
iden
idle
ieee
Expand Down Expand Up @@ -808,6 +827,7 @@ lexeme
lexic
lianc
liant
lib/
library
licens
lies
Expand Down Expand Up @@ -893,6 +913,7 @@ mit
mix
mmon
mmun
mnt/
mobile
mock
mode
Expand Down Expand Up @@ -968,6 +989,7 @@ one
onfig
only
open
opt/
opted
opti
oracle
Expand Down Expand Up @@ -1307,6 +1329,8 @@ spot
spray
sql
src_
srv/
ssh
ssl
stack
stan
Expand Down Expand Up @@ -1400,6 +1424,7 @@ tio
tish
title
titud
tmp/
to_
tod
toke
Expand Down Expand Up @@ -1461,11 +1486,13 @@ url
usb
use
usin
usr/
uster
util
val_
valid
valu
var/
vari
vault
vect
Expand Down
Loading
Loading