credsweeper.utils package

Submodules

credsweeper.utils.entropy_validator module

class credsweeper.utils.entropy_validator.EntropyValidator(data: str, iterator: Chars | None = None)[source]

Bases: object

Verifies data entropy with base64, base36 and base16(hex)

CHARS_LIMIT_MAP = {Chars.BASE36_CHARS: 3, Chars.BASE64_CHARS: 4.5, Chars.HEX_CHARS: 3}

property entropy: float | None: Value success entropy or maximal value

property iterator: str | None: Which iterator was used for the entropy

to_dict() → dict[source]: Representation to dictionary

property valid: bool | None: Shows whether validation was successful

credsweeper.utils.pem_key_detector module

class credsweeper.utils.pem_key_detector.PemKeyDetector[source]

Bases: object

Class to detect PEM PRIVATE keys only

base64set = {'+', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}

classmethod detect_pem_key(config: Config, target: AnalysisTarget) → List[LineData][source]

Detects PEM key in single line and with iterative for next lines according https://www.rfc-editor.org/rfc/rfc7468

Parameters:

config – Config
target – Analysis target

Returns:

List of LineData with found PEM

ignore_starts = ['-----BEGIN', 'Proc-Type', 'Version', 'DEK-Info']

classmethod is_leading_config_line(line: str) → bool[source]

Remove non-key lines from the beginning of a list.

Example lines with non-key leading lines:

Proc-Type: 4,ENCRYPTED
DEK-Info: DEK-Info: AES-256-CBC,2AA219GG746F88F6DDA0D852A0FD3211

ZZAWarrA1...

Parameters:: line – Line to be checked
Returns:: True if the line is not a part of encoded data but leading config

re_pem_begin = re.compile('(?P<value>-----BEGIN\\s(?!ENCRYPTED)[^-]*PRIVATE[^-]*KEY[^-]*-----(.+-----END[^-]+KEY[^-]*-----)?)')

re_value_pem = re.compile('(?P<value>([^-]*-----END[^-]+-----)|(([a-zA-Z0-9/+=]{64}.*)?[a-zA-Z0-9/+=]{4})+)')

remove_characters = ' \t\n\r\x0b\x0c\\\'";,[]#*!'

classmethod sanitize_line(line: str, recurse_level: int = 5) → str[source]

Remove common symbols that can surround PEM keys inside code.

Examples:

`# ZZAWarrA1`
`* ZZAWarrA1`
`  "ZZAWarrA1\n" + `

Parameters:

line – Line to be cleaned
recurse_level – to avoid infinite loop in case when removed symbol inside base64 encoded

Returns:

line with special characters removed from both ends

wrap_characters = '\\\'";,[]#*!'

credsweeper.utils.util module

class credsweeper.utils.util.DiffDict

Bases: TypedDict

hunk: Any

line: str | bytes

new: int | None

old: int | None

class credsweeper.utils.util.DiffRowData(line_type: DiffRowType, line_numb: int, line: str)[source]

Bases: object

Class for keeping data of diff row.

line: str

line_numb: int

line_type: DiffRowType

class credsweeper.utils.util.Util[source]

Bases: object

Class that contains different useful methods.

MIN_DATA_ENTROPY: Dict[int, float] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035, 384: 7.39, 512: 7.55}

static ast_to_dict(node: Any) → List[Any][source]: Recursive parsing AST tree of python source to list with strings

static decode_base64(text: str, padding_safe: bool = False, urlsafe_detect=False) → bytes[source]: decode text to bytes with / without padding detect and urlsafe symbols

static decode_bytes(content: bytes, encodings: List[str] | None = None) → List[str][source]

Decode content using different encodings.

Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM

Parameters:

content – raw data that might be text
encodings – supported encodings

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end

static get_chunks(line_len: int) → List[Tuple[int, int]][source]: Returns chunks positions for given line length

static get_extension(file_path: str, lower=True) → str[source]: Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’

static get_min_data_entropy(x: int) → float[source]: Returns minimal entropy for size of random data. Precalculated data is applied for speedup

static get_regex_combine_or(re_strs: List[str]) → str[source]: Routine combination for regex ‘or’

static get_shannon_entropy(data: str, iterator: str) → float[source]: Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.

static get_xml_from_lines(xml_lines: List[str]) → Tuple[List[str] | None, List[int] | None][source]

Parse xml data from list of string and return List of str.

Parameters:: xml_lines – list of lines of xml data
Returns:: {root.text}”)
Return type:: List of formatted string(f”{root.tag}
Raises:: xml exception –

static is_ascii_entropy_validate(data: bytes) → bool[source]: Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy

static is_asn1(data: bytes) → bool[source]: Only sequence type 0x30 and size correctness is checked

static is_binary(data: bytes) → bool[source]: Returns true if any recognized binary format found or two zeroes sequence is found which never exists in text format (UTF-8, UTF-16) UTF-32 is not supported

static is_bzip2(data: bytes) → bool[source]: According https://en.wikipedia.org/wiki/Bzip2

static is_elf(data: bytes | bytearray) → bool[source]: According to https://en.wikipedia.org/wiki/Executable_and_Linkable_Format use only 5 bytes

static is_eml(data: bytes | bytearray) → bool[source]: According to https://datatracker.ietf.org/doc/html/rfc822 lookup the fields: Date, From, To or Subject

static is_gzip(data: bytes) → bool[source]: According https://www.rfc-editor.org/rfc/rfc1952

static is_html(data: bytes | bytearray) → bool[source]: Used to detect html format of eml

static is_jks(data: bytes) → bool[source]: According https://en.wikipedia.org/wiki/List_of_file_signatures - jks

static is_pdf(data: bytes) → bool[source]: According https://en.wikipedia.org/wiki/List_of_file_signatures - pdf

static is_tar(data: bytes) → bool[source]: According https://en.wikipedia.org/wiki/List_of_file_signatures

static is_zip(data: bytes) → bool[source]: According https://en.wikipedia.org/wiki/List_of_file_signatures

static json_dump(obj: Any, file_path: str | Path, encoding='utf_8', indent=4) → None[source]: Write dictionary to json file

static json_load(file_path: str | Path, encoding='utf_8') → Any[source]: Load dictionary from json file

static parse_python(source: str) → List[Any][source]: Parse python source to list of strings and assignments

static patch2files_diff(raw_patch: List[str], change_type: DiffRowType) → Dict[str, List[DiffDict]][source]

Generate files changes from patch for added or deleted filepaths.

Parameters:

raw_patch – git patch file content
change_type – change type to select, DiffRowType.ADDED or DiffRowType.DELETED

Returns:

return dict with {file paths: list of file row changes}, where elements of list of file row changes represented as:

{
    "old": line number before diff,
    "new": line number after diff,
    "line": line text,
    "hunk": diff hunk number
}

static preprocess_diff_rows(added_line_number: int | None, deleted_line_number: int | None, line: str) → List[DiffRowData][source]

Auxiliary function to extend diff changes.

Parameters:

added_line_number – number of added line or None
deleted_line_number – number of deleted line or None
line – the text line

Returns:

diff rows data with as list of row change type, line number, row content

static preprocess_file_diff(changes: List[DiffDict]) → List[DiffRowData][source]

Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).

Parameters:: changes – git diff by file rows data
Returns:: diff rows data with as list of row change type, line number, row content

static read_data(path: str | Path) → bytes | None[source]

Read the file bytes as is.

Try to read the data of the file.

Parameters:: path – path to file
Returns:: list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static read_file(path: str | Path, encodings: List[str] | None = None) → List[str][source]

Read the file content using different encodings.

Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding

Parameters:

path – path to file
encodings – supported encodings

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static subtext(text: str, pos: int, hunk_size: int) → str[source]: cut text symmetrically for given position or use remained quota to be fitted in 2x hunk_size

static wrong_change(change: DiffDict) → bool[source]: Returns True if the change is wrong

static yaml_dump(obj: Any, file_path: str | Path, encoding='utf_8') → None[source]: Write dictionary to yaml file

static yaml_load(file_path: str | Path, encoding='utf_8') → Any[source]: Load dictionary from yaml file

credsweeper.utils package

Submodules

credsweeper.utils.entropy_validator module

credsweeper.utils.pem_key_detector module

credsweeper.utils.util module

Module contents