Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply ValuePemPatternCheck for PEM rule #367

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ jobs:
- name: Checkout CredData
uses: actions/checkout@v3
with:
repository: Samsung/CredData
repository: babenek/CredData
ref: opensshpk

- name: Cache data
id: cache-data
Expand Down
10 changes: 5 additions & 5 deletions cicd/benchmark.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Detected Credentials: 4668
result_cnt : 4159, lost_cnt : 99, true_cnt : 3701, false_cnt : 359
credsweeper -> TP : 3701, FP : 359, TN : 19429501, FN : 897, FPR : 0.0000184767, FNR : 0.1950848195, ACC : 0.9999353725, PRC : 0.9115763547, RCL : 0.8049151805, F1 : 0.8549318549
credsweeper Private Key -> TP : 952, FP : 0, TN : 4, FN : 39, FPR : None, FNR : 0.0393541877, ACC : 0.9608040201, PRC : 1.0000000000, RCL : 0.9606458123, F1 : 0.9799279465
Detected Credentials: 4693
result_cnt : 4182, lost_cnt : 96, true_cnt : 3718, false_cnt : 368
credsweeper -> TP : 3718, FP : 368, TN : 19429480, FN : 892, FPR : 0.0000189399, FNR : 0.1934924078, ACC : 0.9999351667, PRC : 0.9099363681, RCL : 0.8065075922, F1 : 0.8551057958
credsweeper Private Key -> TP : 967, FP : 0, TN : 4, FN : 34, FPR : None, FNR : 0.0339660340, ACC : 0.9661691542, PRC : 1.0000000000, RCL : 0.9660339660, F1 : 0.9827235772
credsweeper Predefined Pattern -> TP : 309, FP : 2, TN : 40, FN : 17, FPR : 0.0476190476, FNR : 0.0521472393, ACC : 0.9483695652, PRC : 0.9935691318, RCL : 0.9478527607, F1 : 0.9701726845
credsweeper Password -> TP : 974, FP : 116, TN : 4164, FN : 422, FPR : 0.0271028037, FNR : 0.3022922636, ACC : 0.9052149401, PRC : 0.8935779817, RCL : 0.6977077364, F1 : 0.7835880933
credsweeper Generic Token -> TP : 284, FP : 6, TN : 597, FN : 49, FPR : 0.0099502488, FNR : 0.1471471471, ACC : 0.9412393162, PRC : 0.9793103448, RCL : 0.8528528529, F1 : 0.9117174960
credsweeper Other -> TP : 125, FP : 5, TN : 739, FN : 266, FPR : 0.0067204301, FNR : 0.6803069054, ACC : 0.7612334802, PRC : 0.9615384615, RCL : 0.3196930946, F1 : 0.4798464491
credsweeper Other -> TP : 127, FP : 6, TN : 738, FN : 266, FPR : 0.0080645161, FNR : 0.6768447837, ACC : 0.7607739666, PRC : 0.9548872180, RCL : 0.3231552163, F1 : 0.4828897338
credsweeper Generic Secret -> TP : 971, FP : 2, TN : 216, FN : 84, FPR : 0.0091743119, FNR : 0.0796208531, ACC : 0.9324430479, PRC : 0.9979445015, RCL : 0.9203791469, F1 : 0.9575936884
credsweeper Seed, Salt, Nonce -> TP : 35, FP : 2, TN : 6, FN : 4, FPR : 0.2500000000, FNR : 0.1025641026, ACC : 0.8723404255, PRC : 0.9459459459, RCL : 0.8974358974, F1 : 0.9210526316
credsweeper Authentication Key & Token -> TP : 51, FP : 4, TN : 28, FN : 16, FPR : 0.1250000000, FNR : 0.2388059701, ACC : 0.7979797980, PRC : 0.9272727273, RCL : 0.7611940299, F1 : 0.8360655738
6 changes: 5 additions & 1 deletion credsweeper/common/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ class DiffRowType(Enum):
MIN_VARIABLE_LENGTH = 1
MIN_SEPARATOR_LENGTH = 1
MIN_VALUE_LENGTH = 4
MAX_LINE_LENGTH = 1500
MAX_LINE_LENGTH = 2000
""" values according https://docs.python.org/3/library/codecs.html """
UTF_8 = "utf_8"
UTF_16 = "utf_16"
Expand All @@ -140,3 +140,7 @@ class DiffRowType(Enum):

# default value for config and ValuePemPatternCheck
DEFAULT_PEM_PATTERN_LEN = 5

# PEM x509 patterns
PEM_BEGIN_PATTERN = "-----BEGIN"
PEM_END_PATTERN = "-----END"
9 changes: 6 additions & 3 deletions credsweeper/common/morpheme_checklist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -444,8 +444,7 @@ diod
dir_
direct
disab
discipl
discon
disc
disk
dismi
dispos
Expand Down Expand Up @@ -952,6 +951,7 @@ obj
oblique
occur
ocean
ocess
oder
off
often
Expand Down Expand Up @@ -1052,7 +1052,7 @@ priv
pro_
probe
problem
process
proc
prod
prof
prog
Expand Down Expand Up @@ -1204,9 +1204,12 @@ scali
scen
sched
schem
scipl
scont
scope
scram
screen
scret
scri
scro
seal
Expand Down
6 changes: 3 additions & 3 deletions credsweeper/rules/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -270,14 +270,14 @@
- src
- doc

- name: PEM Certificate
- name: PEM Private Key
severity: high
type: pem_key
values:
- (?P<value>-----BEGIN\s(?!ENCRYPTED|EC).*PRIVATE)
- (?P<value>-----BEGIN\s(?!ENCRYPTED|EC)[^-]*PRIVATE[^-]*KEY[^-]*-----)
filter_type:
- LineSpecificKeyCheck
min_line_len: 20
min_line_len: 27
usage_list:
- src
- doc
Expand Down
145 changes: 91 additions & 54 deletions credsweeper/scanner/scan_type/pem_key_pattern.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from typing import List, Optional
import string
from typing import Optional

from credsweeper.common.constants import Chars, PEM_BEGIN_PATTERN, PEM_END_PATTERN
from credsweeper.config import Config
from credsweeper.credentials import Candidate
from credsweeper.file_handler.analysis_target import AnalysisTarget
from credsweeper.filters import ValuePatternCheck
from credsweeper.filters import ValuePatternCheck, ValuePemPatternCheck
from credsweeper.rules import Rule
from credsweeper.scanner.scan_type import ScanType
from credsweeper.utils import Util
Expand All @@ -19,8 +21,11 @@ class PemKeyPattern(ScanType):

"""

ignore_starts = ["Proc-Type", "Version", "DEK-Info"]
remove_characters = " '\";,[]\n\r\t\\+#*"
ignore_starts = [PEM_BEGIN_PATTERN, "Proc-Type", "Version", "DEK-Info"]
wrap_characters = "\\'\";,[]#*"
remove_characters = string.whitespace + wrap_characters
remove_characters_plus = remove_characters + '+'
pem_pattern_check: Optional[ValuePatternCheck] = None

@classmethod
def run(cls, config: Config, rule: Rule, target: AnalysisTarget) -> Optional[Candidate]:
Expand All @@ -38,47 +43,62 @@ def run(cls, config: Config, rule: Rule, target: AnalysisTarget) -> Optional[Can
"""
assert rule.pattern_type == rule.PEM_KEY_PATTERN, \
"Rules provided to PemKeyPattern.run should have pattern_type equal to PEM_KEY_PATTERN"

if cls.is_pem_key(target.lines[target.line_num:], config):
return cls._get_candidate(config, rule, target)
if not cls.pem_pattern_check:
cls.pem_pattern_check = ValuePemPatternCheck(config)
if finish_line := cls.detect_pem_key(target):
if candidate := cls._get_candidate(config, rule, target):
candidate.line_data_list[0].info += f"[{target.line_num}:{finish_line}]"
return candidate

return None

@classmethod
def is_pem_key(cls, lines: List[str], config: Config) -> bool:
def detect_pem_key(cls, target: AnalysisTarget) -> int:
"""Check if provided lines is a PEM key.

Args:
lines: Lines to be checked
target: Analysis target

Return:
Boolean. True if PEM key, False otherwise
integer. last line number of the detected PEM key

"""
lines = cls.strip_lines(lines)
lines = cls.remove_leading_config_lines(lines)
key_data = ""
for line_num, line in enumerate(lines):
# get line with -----BEGIN which may contain full key
start_line = target.line_num - 1 if 0 < target.line_num else 0
for line_num, line in enumerate(target.lines[start_line:]):
if line_num >= 190:
return False
if "-----END" in line:
# Check if entropy is high enough
removed_by_entropy = not Util.is_entropy_validate(key_data)
# Check if have no substring with 5 same consecutive characters (like 'AAAAA')
pattern_check = ValuePatternCheck(config)
removed_by_filter = pattern_check.equal_pattern_check(key_data)
not_removed = not (removed_by_entropy or removed_by_filter)
return not_removed
# PEM key line should not contain spaces or . (and especially not ...)
elif " " in line or "..." in line:
return False
else:
key_data += line

return False # Return false if no `-END` section in lines
return 0
sublines = line.replace("\\r", '\n').replace("\\n", '\n').splitlines()
for subline in sublines:
if cls.is_leading_config_line(subline):
continue
elif PEM_END_PATTERN in subline:
# PEM key line should not contain spaces or . (and especially not ...)
if "..." in key_data:
return 0
# Check if entropy is high enough for base64 set with padding sign
removed_by_entropy = Util.get_shannon_entropy(key_data, Chars.BASE64_CHARS.value) < 4.5
if "OPENSSH" in target.line:
# the format has multiple AAAAA pattern
removed_by_filter = False
else:
# Check whether data have no substring with 5 same consecutive characters (like 'AAAAA')
removed_by_filter = cls.pem_pattern_check.equal_pattern_check(key_data)
if removed_by_entropy or removed_by_filter:
return 0
return target.line_num + line_num
else:
sanitized_line = cls.sanitize_line(subline)
if ' ' in sanitized_line:
# early return if one space appears in the data
return 0
key_data += sanitized_line

return 0

@classmethod
def strip_lines(cls, lines: List[str]) -> List[str]:
def sanitize_line(cls, line: str, recursy_level: int = 5) -> str:
"""Remove common symbols that can surround PEM keys inside code.

Examples::
Expand All @@ -88,22 +108,47 @@ def strip_lines(cls, lines: List[str]) -> List[str]:
` "ZZAWarrA1\\n" + `

Args:
lines: Lines to be striped
line: Line to be cleaned

Return:
lines with special characters removed from both ends
line with special characters removed from both ends

"""
recursy_level -= 1

if 0 > recursy_level:
return line

# Note that this strip would remove `\n` but not `\\n`
stripped_lines = [line.strip(cls.remove_characters) for line in lines]
line = line.strip(string.whitespace)
# If line still ends with "\n" - remove last 2 characters and strip again (case of `\\n` in the line)
stripped_lines = [
line[:-2].strip(cls.remove_characters) if line.endswith("\\n") else line for line in stripped_lines
]
return stripped_lines
if line.endswith("\\n"):
line = line[:-2]
if line.startswith("// "):
# assume, the commented line has to be separated from base64 code. Otherwise, it may be a part of PEM.
line = line[3:]
if line.startswith("/*"):
line = line[2:]
if line.endswith("*/"):
line = line[:-2]
if '"' in line or "'" in line:
# remove concatenation only when quotes present
line = line.strip(cls.remove_characters_plus)
else:
line = line.strip(cls.remove_characters)
# check whether new iteration requires
for x in string.whitespace:
if line.startswith(x) or line.endswith(x):
return cls.sanitize_line(line, recursy_level)

for x in cls.wrap_characters:
if x in line:
return cls.sanitize_line(line, recursy_level)

return line

@classmethod
def remove_leading_config_lines(cls, lines: List[str]) -> List[str]:
def is_leading_config_line(cls, line: str) -> bool:
"""Remove non-key lines from the beginning of a list.

Example lines with non-key leading lines:
Expand All @@ -116,23 +161,15 @@ def remove_leading_config_lines(cls, lines: List[str]) -> List[str]:
ZZAWarrA1...

Args:
lines: Lines to be checked
line: Line to be checked

Return:
List of strings without leading non-key lines
True if the line is not a part of encoded data but leading config

"""
leading_lines = 0

for line in lines:
if len(line) == 0:
leading_lines += 1
else:
for ignore_string in cls.ignore_starts:
if line.startswith(ignore_string):
leading_lines += 1
break
if not leading_lines:
break

return lines[leading_lines:]
if 0 == len(line):
return True
for ignore_string in cls.ignore_starts:
if ignore_string in line:
return True
return False
4 changes: 2 additions & 2 deletions credsweeper/scanner/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from credsweeper.app import APP_PATH
from credsweeper.common.constants import RuleType, MIN_VARIABLE_LENGTH, MIN_SEPARATOR_LENGTH, MIN_VALUE_LENGTH, \
MAX_LINE_LENGTH, Separator
MAX_LINE_LENGTH, Separator, PEM_BEGIN_PATTERN
from credsweeper.config import Config
from credsweeper.credentials import Candidate
from credsweeper.file_handler.analysis_target import AnalysisTarget
Expand Down Expand Up @@ -102,7 +102,7 @@ def _select_and_group_targets(self, targets: List[AnalysisTarget]) -> Tuple[Targ
if target_line_trimmed_len >= self.min_pattern_len:
pattern_targets.append((target, target_line_trimmed_lower, target_line_trimmed_len))
# Check if have "BEGIN" substring. Cannot otherwise ba matched as a PEM key
if target_line_trimmed_len >= self.min_pem_key_len and "BEGIN" in target_line_trimmed:
if target_line_trimmed_len >= self.min_pem_key_len and PEM_BEGIN_PATTERN in target_line_trimmed:
pem_targets.append((target, target_line_trimmed_lower, target_line_trimmed_len))

return keyword_targets, pattern_targets, pem_targets
Expand Down
8 changes: 4 additions & 4 deletions tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@
SAMPLES_FILES_COUNT: int = 106

# credentials count after scan
SAMPLES_CRED_COUNT: int = 101
SAMPLES_CRED_LINE_COUNT: int = 105
SAMPLES_CRED_COUNT: int = 103
SAMPLES_CRED_LINE_COUNT: int = 107

# credentials count after post-processing
SAMPLES_POST_CRED_COUNT: int = 95
SAMPLES_POST_CRED_COUNT: int = 97

# with option --doc
SAMPLES_IN_DOC = 72
SAMPLES_IN_DOC = 73

# archived credentials that are not found without --depth
SAMPLES_IN_DEEP_1 = SAMPLES_POST_CRED_COUNT + 17
Expand Down
Loading