credsweeper.utils package

Submodules

credsweeper.utils.hop_stat module

class credsweeper.utils.hop_stat.HopStat[source]

Bases: object

Statistical check distances between symbols sequence in a value on keyboard layout

KEYBOARD = ('`1234567890-=', '\x00qwertyuiop[]\\', "\x00\x00asdfghjkl;'", '\x00\x00zxcvbnm,./')

TRANSLATION = {33: '1', 34: "'", 35: '3', 36: '4', 37: '5', 38: '7', 40: '9', 41: '0', 42: '8', 43: '=', 58: ';', 60: ',', 62: '.', 63: '/', 64: '2', 65: 'a', 66: 'b', 67: 'c', 68: 'd', 69: 'e', 70: 'f', 71: 'g', 72: 'h', 73: 'i', 74: 'j', 75: 'k', 76: 'l', 77: 'm', 78: 'n', 79: 'o', 80: 'p', 81: 'q', 82: 'r', 83: 's', 84: 't', 85: 'u', 86: 'v', 87: 'w', 88: 'x', 89: 'y', 90: 'z', 94: '6', 95: '-', 123: '[', 124: '\\', 125: ']', 126: '`'}

stat(value: str) → Tuple[float, float][source]

Calculates statistical distances between given symbols

Parameters:: value – string based on initial alphabet
Returns:: Average distance, deviation or exception if a value is not in initial alphabet

credsweeper.utils.pem_key_detector module

class credsweeper.utils.pem_key_detector.PemKeyDetector(config: Config)[source]

Bases: object

Class to detect PEM PRIVATE keys only

BASE64_CHARS_SET = {'+', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}

ENTROPY_LIMIT_BASE64 = 4.5

IGNORE_STARTS = ['-----BEGIN', 'Proc-Type', 'Version', 'DEK-Info']

MAX_PEM_LENGTH = 32000

REMOVE_CHARACTERS = ' \t\n\r\x0b\x0c\\\'"`;,[]#*!'

RE_BASE64_CHARS = re.compile('[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789\\+/=]+')

RE_PEM_BEGIN = re.compile('(?P<value>-----BEGIN(?![^-]{1,80}ENCRYPTED)[^-]{0,80}PRIVATE[^-]{1,80}KEY[^-]{0,80}-----(.{1,8000}-----END[^-]{1,80}KEY[^-]{0,80}-----)?)')

RE_PEM_VALUE = re.compile('(?P<value>.{0,32000})')

WRAP_CHARACTERS = '\\\'"`;,[]#*!'

cut_barrier(line: str) → str[source]: Cut off barrier if detected

detect_pem_key(first_line: LineData, target: AnalysisTarget) → List[LineData][source]

Detects PEM key in single line and with iterative for next lines according https://www.rfc-editor.org/rfc/rfc7468

Parameters:

first_line – detected —–BEGIN from rule pattern
target – Analysis target

Returns:

List of LineData with found PEM

static finalize(line_data_list: List[LineData], key_data_list: List[str], last_line: str) → List[LineData][source]: Checks collected key_data according the key type

static is_leading_config_line(line: str) → bool[source]

Remove non-key lines from the beginning of a list.

Example lines with non-key leading lines:

Proc-Type: 4,ENCRYPTED
DEK-Info: DEK-Info: AES-256-CBC,2AA219GG746F88F6DDA0D852A0FD3211

ZZAWarrA1...

Parameters:: line – Line to be checked
Returns:: True if the line is not a part of encoded data but leading config

static sanitize_line(line: str, recurse_level: int = 5) → str[source]

Remove common symbols that can surround PEM keys inside code.

Examples:

`# ZZAWarrA1`
`* ZZAWarrA1`
`  "ZZAWarrA1\n" + `

Parameters:

line – Line to be cleaned
recurse_level – to avoid infinite loop in case when removed symbol inside base64 encoded

Returns:

line with special characters removed from both ends

static sanitize_line_data_list(line_data_list: List[LineData], key_data_list: List[str], last_line: str)[source]: Sanitize line_data_list to keep only valuable values

set_barrier(line: str, start=0, end=8000)[source]: Detects barrier with offset of RE_PEM_BEGIN

credsweeper.utils.util module

class credsweeper.utils.util.Util[source]

Bases: object

Class that contains different useful methods.

MIN_DATA_ENTROPY: Dict[int, float] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035, 384: 7.39, 512: 7.55}

NOT_LATIN1_PRINTABLE_SET = {0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159}

PEM_CLEANING_PATTERN = re.compile('\\\\[tnrvf]')

RANDOM_DATA = b'\xb9j\x12\xdfo*rW\x8e;\xbc\x8d\x829\x96\x0f\xa5c\x05\xa6'

WHITESPACE_TRANS_TABLE = {9: None, 10: None, 11: None, 12: None, 13: None, 32: None}

static check_pk(pkey: DHPrivateKey | Ed25519PrivateKey | Ed448PrivateKey | MLDSA44PrivateKey | MLDSA65PrivateKey | MLDSA87PrivateKey | MLKEM768PrivateKey | MLKEM1024PrivateKey | RSAPrivateKey | DSAPrivateKey | EllipticCurvePrivateKey | X25519PrivateKey | X448PrivateKey) → bool[source]: Check private key with encrypt-decrypt random data

static decode_base64(text: str, padding_safe: bool = False, urlsafe_detect=False) → bytes[source]: decode text to bytes with / without padding detect and urlsafe symbols

static decode_bytes(content: bytes | None, encodings: List[str] | None = None) → List[str][source]

Decode content using different encodings.

Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM

Parameters:

content – raw data that might be text
encodings – supported encodings

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end

static decode_text(content: bytes | None, encodings: List[str] | None = None) → str | None[source]

Decode content using different encodings.

Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM

Parameters:

content – raw data that might be text
encodings – supported encodings

Returns:

Decoded text in str for any suitable encoding or None when binary data detected

static extract_element_data(element: Any, attr: str) → str[source]

Extract xml element data to string.

Try to extract the xml data and strip() the string.

Parameters:

element – xml element
attr – attribute name

Returns:

String xml data with strip()

static get_asn1_size(data: bytes | bytearray) → int[source]: Only sequence type 0x30 and size correctness are checked Returns size of ASN1 data over 128 bytes or 0 if no interested data

static get_chunks(line_len: int) → List[Tuple[int, int]][source]: Returns chunks positions for given line length

static get_excel_column_name(column_index: int) → str[source]: Converts index based column position into Excel style column name

static get_extension(file_path: str, lower=True) → str[source]: Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’

static get_min_data_entropy(x: int) → float[source]: Returns minimal entropy for size of random data. Precalculated data is applied for speedup

static get_regex_combine_or(re_strs: List[str]) → str[source]: Routine combination for regex ‘or’

static get_shannon_entropy(data: str | bytes) → float[source]: Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.

static get_type(file_path: str, lower=True) → str[source]: Return all extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’

static get_xml_from_lines(xml_lines: List[str]) → Tuple[List[str] | None, List[int] | None][source]

Parse xml data from list of string and return List of str.

Parameters:: xml_lines – list of lines of xml data
Returns:: {root.text}”)
Return type:: List of formatted string(f”{root.tag}
Raises:: xml exception –

static is_ascii_entropy_validate(data: bytes) → bool[source]: Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy

static is_binary(data: bytes | bytearray) → bool[source]: Returns True when two zeroes sequence is found in begin of data. The sequence never exists in text format (UTF-8, UTF-16). UTF-32 is not supported.

static is_latin1(data: bytes | bytearray) → bool[source]: Returns True when data looks like LATIN-1 for first MAX_LINE_LENGTH bytes.

static json_dump(obj: Any, file_path: str | Path, encoding='utf_8', indent=4) → None[source]: Write dictionary to JSON file

static json_load(file_path: str | Path, encoding='utf_8') → Any[source]: Load dictionary from JSON file

static load_pk(data: bytes, password: bytes | None = None) → DHPrivateKey | Ed25519PrivateKey | Ed448PrivateKey | MLDSA44PrivateKey | MLDSA65PrivateKey | MLDSA87PrivateKey | MLKEM768PrivateKey | MLKEM1024PrivateKey | RSAPrivateKey | DSAPrivateKey | EllipticCurvePrivateKey | X25519PrivateKey | X448PrivateKey | None[source]: Try to load private key from PKCS1, PKCS8 and PKCS12 formats

static parse_python(source: str) → List[Any][source]: Parse Python source and back to remove strings merge and line wrap

static read_data(path: str | Path) → bytes | None[source]

Read the file bytes as is.

Try to read the data of the file.

Parameters:: path – path to file
Returns:: list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static read_file(path: str | Path, encodings: List[str] | None = None) → List[str][source]

Read the file content using different encodings.

Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding

Parameters:

path – path to file
encodings – supported encodings

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static split_text(text: str) → List[str][source]: Splits a text into lines, handling all common line endings (e.g., LF, CRLF, CR).

static subtext(text: str, pos: int, hunk_size: int) → str[source]: cut text symmetrically for given position or use remained quota to be fitted in 2x hunk_size

static yaml_dump(obj: Any, file_path: str | Path, encoding='utf_8') → None[source]: Write dictionary to YAML file

static yaml_load(file_path: str | Path, encoding='utf_8') → Any[source]: Load dictionary from YAML file

credsweeper.utils package

Submodules

credsweeper.utils.hop_stat module

credsweeper.utils.pem_key_detector module

credsweeper.utils.util module

Module contents