credsweeper.utils package¶
Submodules¶
credsweeper.utils.entropy_validator module¶
- class credsweeper.utils.entropy_validator.EntropyValidator(data, iterator=None)[source]¶
Bases:
object
Verifies data entropy with base64, base36 and base16(hex)
- CHARS_LIMIT_MAP = {<Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>: 4.5, <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>: 3, <Chars.HEX_CHARS: '0123456789ABCDEFabcdef'>: 3}¶
credsweeper.utils.pem_key_detector module¶
- class credsweeper.utils.pem_key_detector.PemKeyDetector[source]¶
Bases:
object
Class to detect PEM PRIVATE keys only
- base64set = {'+', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}¶
- classmethod detect_pem_key(config, target)[source]¶
Detects PEM key in single line and with iterative for next lines according https://www.rfc-editor.org/rfc/rfc7468
- Parameters:
config (
Config
) – Configtarget (
AnalysisTarget
) – Analysis target
- Return type:
- Returns:
List of LineData with found PEM
- ignore_starts = ['-----BEGIN', 'Proc-Type', 'Version', 'DEK-Info']¶
- classmethod is_leading_config_line(line)[source]¶
Remove non-key lines from the beginning of a list.
Example lines with non-key leading lines:
Proc-Type: 4,ENCRYPTED DEK-Info: DEK-Info: AES-256-CBC,2AA219GG746F88F6DDA0D852A0FD3211 ZZAWarrA1...
- re_pem_begin = re.compile('(?P<value>-----BEGIN\\s(?!ENCRYPTED)[^-]*PRIVATE[^-]*KEY[^-]*-----(.+-----END[^-]+KEY[^-]*-----)?)')¶
- re_value_pem = re.compile('(?P<value>([^-]*-----END[^-]+-----)|(([a-zA-Z0-9/+=]{64}.*)?[a-zA-Z0-9/+=]{4})+)')¶
- remove_characters = ' \t\n\r\x0b\x0c\\\'";,[]#*!'¶
- classmethod sanitize_line(line, recurse_level=5)[source]¶
Remove common symbols that can surround PEM keys inside code.
Examples:
`# ZZAWarrA1` `* ZZAWarrA1` ` "ZZAWarrA1\n" + `
- wrap_characters = '\\\'";,[]#*!'¶
credsweeper.utils.util module¶
- class credsweeper.utils.util.DiffDict¶
Bases:
TypedDict
- class credsweeper.utils.util.DiffRowData(line_type, line_numb, line)[source]¶
Bases:
object
Class for keeping data of diff row.
-
line_type:
DiffRowType
¶
-
line_type:
- class credsweeper.utils.util.Util[source]¶
Bases:
object
Class that contains different useful methods.
-
MIN_DATA_ENTROPY:
Dict
[int
,float
] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035}¶
- static decode_base64(text, padding_safe=False, urlsafe_detect=False)[source]¶
decode text to bytes with / without padding detect and urlsafe symbols
- Return type:
- static decode_bytes(content, encodings=None)[source]¶
Decode content using different encodings.
Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM
- Parameters:
- Return type:
- Returns:
list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end
- static get_extension(file_path, lower=True)[source]¶
Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’
- Return type:
- static get_min_data_entropy(x)[source]¶
Returns minimal entropy for size of random data. Precalculated data is applied for speedup
- Return type:
- static get_shannon_entropy(data, iterator)[source]¶
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.
- Return type:
- static get_xml_from_lines(xml_lines)[source]¶
Parse xml data from list of string and return List of str.
- static is_ascii_entropy_validate(data)[source]¶
Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy
- Return type:
- static is_binary(data)[source]¶
Returns true if any recognized binary format found or two zeroes sequence is found which never exists in text format (UTF-8, UTF-16) UTF-32 is not supported
- Return type:
- static is_bzip2(data)[source]¶
According https://en.wikipedia.org/wiki/Bzip2
- Return type:
- static is_elf(data)[source]¶
According to https://en.wikipedia.org/wiki/Executable_and_Linkable_Format use only 5 bytes
- Return type:
- static is_eml(data)[source]¶
According to https://datatracker.ietf.org/doc/html/rfc822 lookup the fields: Date, From, To or Subject
- Return type:
- static is_gzip(data)[source]¶
According https://www.rfc-editor.org/rfc/rfc1952
- Return type:
- static is_jks(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures - jks
- Return type:
- static is_pdf(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures - pdf
- Return type:
- static is_tar(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures
- Return type:
- static is_zip(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures
- Return type:
- static json_dump(obj, file_path, encoding='utf_8', indent=4)[source]¶
Write dictionary to json file
- Return type:
- static patch2files_diff(raw_patch, change_type)[source]¶
Generate files changes from patch for added or deleted filepaths.
- Parameters:
change_type (
DiffRowType
) – change type to select, DiffRowType.ADDED or DiffRowType.DELETED
- Return type:
- Returns:
return dict with
{file paths: list of file row changes}
, where elements of list of file row changes represented as:{ "old": line number before diff, "new": line number after diff, "line": line text, "hunk": diff hunk number }
- static preprocess_diff_rows(added_line_number, deleted_line_number, line)[source]¶
Auxiliary function to extend diff changes.
- static preprocess_file_diff(changes)[source]¶
Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).
- Parameters:
- Return type:
- Returns:
diff rows data with as list of row change type, line number, row content
- static read_file(path, encodings=None)[source]¶
Read the file content using different encodings.
Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding
-
MIN_DATA_ENTROPY: