credsweeper.utils package
Submodules
credsweeper.utils.entropy_validator module
credsweeper.utils.pem_key_detector module
- class credsweeper.utils.pem_key_detector.PemKeyDetector[source]
Bases:
objectClass to detect PEM PRIVATE keys only
- base64set = {'+', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}
- classmethod detect_pem_key(config: Config, target: AnalysisTarget) List[LineData][source]
Detects PEM key in single line and with iterative for next lines according https://www.rfc-editor.org/rfc/rfc7468
- Parameters:
config – Config
target – Analysis target
- Returns:
List of LineData with found PEM
- ignore_starts = ['-----BEGIN', 'Proc-Type', 'Version', 'DEK-Info']
- classmethod is_leading_config_line(line: str) bool[source]
Remove non-key lines from the beginning of a list.
Example lines with non-key leading lines:
Proc-Type: 4,ENCRYPTED DEK-Info: DEK-Info: AES-256-CBC,2AA219GG746F88F6DDA0D852A0FD3211 ZZAWarrA1...
- Parameters:
line – Line to be checked
- Returns:
True if the line is not a part of encoded data but leading config
- re_pem_begin = re.compile('(?P<value>-----BEGIN\\s(?!ENCRYPTED)[^-]*PRIVATE[^-]*KEY[^-]*-----(.+-----END[^-]+KEY[^-]*-----)?)')
- re_value_pem = re.compile('(?P<value>([^-]*-----END[^-]+-----)|(([a-zA-Z0-9/+=]{64}.*)?[a-zA-Z0-9/+=]{4})+)')
- remove_characters = ' \t\n\r\x0b\x0c\\\'";,[]#*!'
- classmethod sanitize_line(line: str, recurse_level: int = 5) str[source]
Remove common symbols that can surround PEM keys inside code.
Examples:
`# ZZAWarrA1` `* ZZAWarrA1` ` "ZZAWarrA1\n" + `
- Parameters:
line – Line to be cleaned
recurse_level – to avoid infinite loop in case when removed symbol inside base64 encoded
- Returns:
line with special characters removed from both ends
- wrap_characters = '\\\'";,[]#*!'
credsweeper.utils.util module
- class credsweeper.utils.util.DiffDict
Bases:
TypedDict
- class credsweeper.utils.util.DiffRowData(line_type: DiffRowType, line_numb: int, line: str)[source]
Bases:
objectClass for keeping data of diff row.
- line_type: DiffRowType
- class credsweeper.utils.util.Util[source]
Bases:
objectClass that contains different useful methods.
- MIN_DATA_ENTROPY: Dict[int, float] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035, 384: 7.39, 512: 7.55}
- static ast_to_dict(node: Any) List[Any][source]
Recursive parsing AST tree of python source to list with strings
- static decode_base64(text: str, padding_safe: bool = False, urlsafe_detect=False) bytes[source]
decode text to bytes with / without padding detect and urlsafe symbols
- static decode_bytes(content: bytes, encodings: List[str] | None = None) List[str][source]
Decode content using different encodings.
Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM
- Parameters:
content – raw data that might be text
encodings – supported encodings
- Returns:
list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end
- static get_chunks(line_len: int) List[Tuple[int, int]][source]
Returns chunks positions for given line length
- static get_extension(file_path: str, lower=True) str[source]
Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’
- static get_min_data_entropy(x: int) float[source]
Returns minimal entropy for size of random data. Precalculated data is applied for speedup
- static get_shannon_entropy(data: str, iterator: str) float[source]
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.
- static get_xml_from_lines(xml_lines: List[str]) Tuple[List[str] | None, List[int] | None][source]
Parse xml data from list of string and return List of str.
- Parameters:
xml_lines – list of lines of xml data
- Returns:
{root.text}”)
- Return type:
List of formatted string(f”{root.tag}
- Raises:
xml exception –
- static is_ascii_entropy_validate(data: bytes) bool[source]
Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy
- static is_binary(data: bytes) bool[source]
Returns true if any recognized binary format found or two zeroes sequence is found which never exists in text format (UTF-8, UTF-16) UTF-32 is not supported
- static is_bzip2(data: bytes) bool[source]
According https://en.wikipedia.org/wiki/Bzip2
- static is_elf(data: bytes | bytearray) bool[source]
According to https://en.wikipedia.org/wiki/Executable_and_Linkable_Format use only 5 bytes
- static is_eml(data: bytes | bytearray) bool[source]
According to https://datatracker.ietf.org/doc/html/rfc822 lookup the fields: Date, From, To or Subject
- static is_gzip(data: bytes) bool[source]
According https://www.rfc-editor.org/rfc/rfc1952
- static is_jks(data: bytes) bool[source]
According https://en.wikipedia.org/wiki/List_of_file_signatures - jks
- static is_pdf(data: bytes) bool[source]
According https://en.wikipedia.org/wiki/List_of_file_signatures - pdf
- static is_tar(data: bytes) bool[source]
According https://en.wikipedia.org/wiki/List_of_file_signatures
- static is_zip(data: bytes) bool[source]
According https://en.wikipedia.org/wiki/List_of_file_signatures
- static json_dump(obj: Any, file_path: str | Path, encoding='utf_8', indent=4) None[source]
Write dictionary to json file
- static json_load(file_path: str | Path, encoding='utf_8') Any[source]
Load dictionary from json file
- static parse_python(source: str) List[Any][source]
Parse python source to list of strings and assignments
- static patch2files_diff(raw_patch: List[str], change_type: DiffRowType) Dict[str, List[DiffDict]][source]
Generate files changes from patch for added or deleted filepaths.
- Parameters:
raw_patch – git patch file content
change_type – change type to select, DiffRowType.ADDED or DiffRowType.DELETED
- Returns:
return dict with
{file paths: list of file row changes}, where elements of list of file row changes represented as:{ "old": line number before diff, "new": line number after diff, "line": line text, "hunk": diff hunk number }
- static preprocess_diff_rows(added_line_number: int | None, deleted_line_number: int | None, line: str) List[DiffRowData][source]
Auxiliary function to extend diff changes.
- Parameters:
added_line_number – number of added line or None
deleted_line_number – number of deleted line or None
line – the text line
- Returns:
diff rows data with as list of row change type, line number, row content
- static preprocess_file_diff(changes: List[DiffDict]) List[DiffRowData][source]
Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).
- Parameters:
changes – git diff by file rows data
- Returns:
diff rows data with as list of row change type, line number, row content
- static read_data(path: str | Path) bytes | None[source]
Read the file bytes as is.
Try to read the data of the file.
- Parameters:
path – path to file
- Returns:
list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned
- static read_file(path: str | Path, encodings: List[str] | None = None) List[str][source]
Read the file content using different encodings.
Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding
- Parameters:
path – path to file
encodings – supported encodings
- Returns:
list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned
- static subtext(text: str, pos: int, hunk_size: int) str[source]
cut text symmetrically for given position or use remained quota to be fitted in 2x hunk_size