credsweeper.utils package

Submodules

credsweeper.utils.entropy_validator module

class credsweeper.utils.entropy_validator.EntropyValidator(data, iterator=None)[source]

Bases: object

Verifies data entropy with base64, base36 and base16(hex)

CHARS_LIMIT_MAP = {<Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>: 4.5, <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>: 3, <Chars.HEX_CHARS: '0123456789ABCDEFabcdef'>: 3}
property entropy: float | None

Value success entropy or maximal value

property iterator: str | None

Which iterator was used for the entropy

to_dict()[source]

Representation to dictionary

Return type:

dict

property valid: bool | None

Shows whether validation was successful

credsweeper.utils.pem_key_detector module

class credsweeper.utils.pem_key_detector.PemKeyDetector[source]

Bases: object

Class to detect PEM PRIVATE keys only

base64set = {'+', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}
classmethod detect_pem_key(config, target)[source]

Detects PEM key in single line and with iterative for next lines according https://www.rfc-editor.org/rfc/rfc7468

Parameters:
Return type:

List[LineData]

Returns:

List of LineData with found PEM

ignore_starts = ['-----BEGIN', 'Proc-Type', 'Version', 'DEK-Info']
classmethod is_leading_config_line(line)[source]

Remove non-key lines from the beginning of a list.

Example lines with non-key leading lines:

Proc-Type: 4,ENCRYPTED
DEK-Info: DEK-Info: AES-256-CBC,2AA219GG746F88F6DDA0D852A0FD3211

ZZAWarrA1...
Parameters:

line (str) – Line to be checked

Return type:

bool

Returns:

True if the line is not a part of encoded data but leading config

re_pem_begin = re.compile('(?P<value>-----BEGIN\\s(?!ENCRYPTED)[^-]*PRIVATE[^-]*KEY[^-]*-----(.+-----END[^-]+KEY[^-]*-----)?)')
re_value_pem = re.compile('(?P<value>([^-]*-----END[^-]+-----)|(([a-zA-Z0-9/+=]{64}.*)?[a-zA-Z0-9/+=]{4})+)')
remove_characters = ' \t\n\r\x0b\x0c\\\'";,[]#*!'
classmethod sanitize_line(line, recurse_level=5)[source]

Remove common symbols that can surround PEM keys inside code.

Examples:

`# ZZAWarrA1`
`* ZZAWarrA1`
`  "ZZAWarrA1\n" + `
Parameters:
  • line (str) – Line to be cleaned

  • recurse_level (int) – to avoid infinite loop in case when removed symbol inside base64 encoded

Return type:

str

Returns:

line with special characters removed from both ends

wrap_characters = '\\\'";,[]#*!'

credsweeper.utils.util module

class credsweeper.utils.util.DiffDict

Bases: TypedDict

hunk: Any
line: Union[str, bytes]
new: Optional[int]
old: Optional[int]
class credsweeper.utils.util.DiffRowData(line_type, line_numb, line)[source]

Bases: object

Class for keeping data of diff row.

line: str
line_numb: int
line_type: DiffRowType
class credsweeper.utils.util.Util[source]

Bases: object

Class that contains different useful methods.

MIN_DATA_ENTROPY: Dict[int, float] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035}
static ast_to_dict(node)[source]

Recursive parsing AST tree of python source to list with strings

Return type:

List[Any]

static decode_base64(text, padding_safe=False, urlsafe_detect=False)[source]

decode text to bytes with / without padding detect and urlsafe symbols

Return type:

bytes

static decode_bytes(content, encodings=None)[source]

Decode content using different encodings.

Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM

Parameters:
  • content (bytes) – raw data that might be text

  • encodings (Optional[List[str]]) – supported encodings

Return type:

List[str]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end

static get_extension(file_path, lower=True)[source]

Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’

Return type:

str

static get_min_data_entropy(x)[source]

Returns minimal entropy for size of random data. Precalculated data is applied for speedup

Return type:

float

static get_regex_combine_or(re_strs)[source]

Routine combination for regex ‘or’

Return type:

str

static get_shannon_entropy(data, iterator)[source]

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.

Return type:

float

static get_xml_from_lines(xml_lines)[source]

Parse xml data from list of string and return List of str.

Parameters:

xml_lines (List[str]) – list of lines of xml data

Returns:

{root.text}”)

Return type:

List of formatted string(f”{root.tag}

Raises:

xml exception

static is_ascii_entropy_validate(data)[source]

Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy

Return type:

bool

static is_asn1(data)[source]

Only sequence type 0x30 and size correctness is checked

Return type:

bool

static is_binary(data)[source]

Returns true if any recognized binary format found or two zeroes sequence is found which never exists in text format (UTF-8, UTF-16) UTF-32 is not supported

Return type:

bool

static is_bzip2(data)[source]

According https://en.wikipedia.org/wiki/Bzip2

Return type:

bool

static is_elf(data)[source]

According to https://en.wikipedia.org/wiki/Executable_and_Linkable_Format use only 5 bytes

Return type:

bool

static is_eml(data)[source]

According to https://datatracker.ietf.org/doc/html/rfc822 lookup the fields: Date, From, To or Subject

Return type:

bool

static is_gzip(data)[source]

According https://www.rfc-editor.org/rfc/rfc1952

Return type:

bool

static is_html(data)[source]

Used to detect html format of eml

Return type:

bool

static is_jks(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures - jks

Return type:

bool

static is_pdf(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures - pdf

Return type:

bool

static is_tar(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures

Return type:

bool

static is_zip(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures

Return type:

bool

static json_dump(obj, file_path, encoding='utf_8', indent=4)[source]

Write dictionary to json file

Return type:

None

static json_load(file_path, encoding='utf_8')[source]

Load dictionary from json file

Return type:

Any

static parse_python(source)[source]

Parse python source to list of strings and assignments

Return type:

List[Any]

static patch2files_diff(raw_patch, change_type)[source]

Generate files changes from patch for added or deleted filepaths.

Parameters:
  • raw_patch (List[str]) – git patch file content

  • change_type (DiffRowType) – change type to select, DiffRowType.ADDED or DiffRowType.DELETED

Return type:

Dict[str, List[DiffDict]]

Returns:

return dict with {file paths: list of file row changes}, where elements of list of file row changes represented as:

{
    "old": line number before diff,
    "new": line number after diff,
    "line": line text,
    "hunk": diff hunk number
}

static preprocess_diff_rows(added_line_number, deleted_line_number, line)[source]

Auxiliary function to extend diff changes.

Parameters:
  • added_line_number (Optional[int]) – number of added line or None

  • deleted_line_number (Optional[int]) – number of deleted line or None

  • line (str) – the text line

Return type:

List[DiffRowData]

Returns:

diff rows data with as list of row change type, line number, row content

static preprocess_file_diff(changes)[source]

Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).

Parameters:

changes (List[DiffDict]) – git diff by file rows data

Return type:

List[DiffRowData]

Returns:

diff rows data with as list of row change type, line number, row content

static read_data(path)[source]

Read the file bytes as is.

Try to read the data of the file.

Parameters:

path (Union[str, Path]) – path to file

Return type:

Optional[bytes]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static read_file(path, encodings=None)[source]

Read the file content using different encodings.

Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding

Parameters:
Return type:

List[str]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static wrong_change(change)[source]

Returns True if the change is wrong

Return type:

bool

static yaml_dump(obj, file_path, encoding='utf_8')[source]

Write dictionary to yaml file

Return type:

None

static yaml_load(file_path, encoding='utf_8')[source]

Load dictionary from yaml file

Return type:

Any

Module contents