credsweeper.utils package

Submodules

credsweeper.utils.entropy_validator module

class credsweeper.utils.entropy_validator.EntropyValidator(data, iterator=None)[source]

Bases: object

Verifies data entropy with base64, base36 and base16(hex)

CHARS_LIMIT_MAP = {<Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>: 4.5, <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>: 3, <Chars.HEX_CHARS: '0123456789ABCDEFabcdef'>: 3}
property entropy: float | None

Value success entropy or maximal value

property iterator: str | None

Which iterator was used for the entropy

to_dict()[source]

Representation to dictionary

Return type:

dict

property valid: bool | None

Shows whether validation was successful

credsweeper.utils.util module

class credsweeper.utils.util.DiffDict

Bases: TypedDict

hunk: Any
line: Union[str, bytes]
new: Optional[int]
old: Optional[int]
class credsweeper.utils.util.DiffRowData(line_type, line_numb, line)[source]

Bases: object

Class for keeping data of diff row.

line: str
line_numb: int
line_type: DiffRowType
class credsweeper.utils.util.Util[source]

Bases: object

Class that contains different useful methods.

MIN_DATA_ENTROPY: Dict[int, float] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035}
static ast_to_dict(node)[source]

Recursive parsing AST tree of python source to list with strings

Return type:

List[Any]

static decode_bytes(content, encodings=None)[source]

Decode content using different encodings.

Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM

Parameters:
  • content (bytes) – raw data that might be text

  • encodings (Optional[List[str]]) – supported encodings

Return type:

List[str]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end

static get_extension(file_path, lower=True)[source]

Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’

Return type:

str

static get_keyword_pattern(keyword, separator='=|:=|:')[source]

Returns compiled regex pattern

Return type:

Pattern

static get_min_data_entropy(x)[source]

Returns minimal entropy for size of random data. Precalculated data is applied for speedup

Return type:

float

static get_regex_combine_or(re_strs)[source]

Routine combination for regex ‘or’

Return type:

str

static get_shannon_entropy(data, iterator)[source]

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.

Return type:

float

static get_xml_from_lines(xml_lines)[source]

Parse xml data from list of string and return List of str.

Parameters:

xml_lines (List[str]) – list of lines of xml data

Returns:

{root.text}”)

Return type:

List of formatted string(f”{root.tag}

Raises:

xml exception

static is_ascii_entropy_validate(data)[source]

Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy

Return type:

bool

static is_binary(data)[source]

Returns true if any recognized binary format found or two zeroes sequence is found which never exists in text format (UTF-8, UTF-16) UTF-32 is not supported

Return type:

bool

static is_bzip2(data)[source]

According https://en.wikipedia.org/wiki/Bzip2

Return type:

bool

static is_elf(data)[source]

According to https://en.wikipedia.org/wiki/Executable_and_Linkable_Format use only 5 bytes

Return type:

bool

static is_gzip(data)[source]

According https://www.rfc-editor.org/rfc/rfc1952

Return type:

bool

static is_pdf(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures - pdf

Return type:

bool

static is_tar(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures

Return type:

bool

static is_zip(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures

Return type:

bool

static json_dump(obj, file_path, encoding='utf_8', indent=4)[source]

Write dictionary to json file

Return type:

None

static json_load(file_path, encoding='utf_8')[source]

Load dictionary from json file

Return type:

Any

static parse_python(source)[source]

Parse python source to list of strings and assignments

Return type:

List[Any]

static patch2files_diff(raw_patch, change_type)[source]

Generate files changes from patch for added or deleted filepaths.

Parameters:
  • raw_patch (List[str]) – git patch file content

  • change_type (DiffRowType) – change type to select, DiffRowType.ADDED or DiffRowType.DELETED

Return type:

Dict[str, List[DiffDict]]

Returns:

return dict with {file paths: list of file row changes}, where elements of list of file row changes represented as:

{
    "old": line number before diff,
    "new": line number after diff,
    "line": line text,
    "hunk": diff hunk number
}

static preprocess_diff_rows(added_line_number, deleted_line_number, line)[source]

Auxiliary function to extend diff changes.

Parameters:
  • added_line_number (Optional[int]) – number of added line or None

  • deleted_line_number (Optional[int]) – number of deleted line or None

  • line (str) – the text line

Return type:

List[DiffRowData]

Returns:

diff rows data with as list of row change type, line number, row content

static preprocess_file_diff(changes)[source]

Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).

Parameters:

changes (List[DiffDict]) – git diff by file rows data

Return type:

List[DiffRowData]

Returns:

diff rows data with as list of row change type, line number, row content

static read_data(path)[source]

Read the file bytes as is.

Try to read the data of the file.

Parameters:

path (Union[str, Path]) – path to file

Return type:

Optional[bytes]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static read_file(path, encodings=None)[source]

Read the file content using different encodings.

Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding

Parameters:
Return type:

List[str]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static wrong_change(change)[source]

Returns True if the change is wrong

Return type:

bool

static yaml_dump(obj, file_path, encoding='utf_8')[source]

Write dictionary to yaml file

Return type:

None

static yaml_load(file_path, encoding='utf_8')[source]

Load dictionary from yaml file

Return type:

Any

Module contents