utils package

Submodules

utils.util module

class credsweeper.utils.util.DiffDict(_typename, _fields=None, /, **kwargs)

Bases: dict

hunk: str
line: str
new: int
old: int
class credsweeper.utils.util.DiffRowData(line_type, line_numb, line)[source]

Bases: object

Class for keeping data of diff row.

line: str
line_numb: int
line_type: str
class credsweeper.utils.util.Util[source]

Bases: object

Class that contains different useful methods.

static decode_bytes(content, encodings=('utf8', 'utf16', 'latin_1'))[source]

Decode content using different encodings.

Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM

Parameters:
  • content (bytes) – raw data that might be text

  • encodings (Tuple[str, ...]) – supported encodings

Return type:

List[str]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

default_encodings: Tuple[str, ...] = ('utf8', 'utf16', 'latin_1')
static get_extension(file_path, lower=True)[source]

Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’

Return type:

str

static get_keyword_pattern(keyword, separator='=|:=|:')[source]

Returns compiled regex pattern

Return type:

Pattern

static get_regex_combine_or(regex_strs)[source]

Routine combination for regex ‘or’

Return type:

str

static get_shannon_entropy(data, iterator)[source]

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.

Return type:

float

static get_xml_data(file_path)[source]

Read xml data and return List of str.

Try to read the xml data and return formatted string.

Parameters:

file_path (str) – path of xml file

Returns:

{root.text}”)

Return type:

List of formatted string(f”{root.tag}

static is_entropy_validate(data)[source]

Verifies data entropy with base64, base36 and base16(hex)

Return type:

bool

static is_gzip(data)[source]

According https://www.rfc-editor.org/rfc/rfc1952

Return type:

bool

static is_zip(data)[source]

According https://en.wikipedia.org/wiki/List_of_file_signatures

Return type:

bool

static json_dump(obj, file_path, encoding='utf8', indent=4)[source]

Write dictionary to json file

Return type:

None

static json_load(file_path, encoding='utf8')[source]

Load dictionary from json file

Return type:

Any

static patch2files_diff(raw_patch, change_type)[source]

Generate files changes from patch for added or deleted filepaths.

Parameters:
  • raw_patch (List[str]) – git patch file content

  • change_type (str) – change type to select, “added” or “deleted”

Return type:

Dict[str, List[DiffDict]]

Returns:

return dict with {file paths: list of file row changes}, where elements of list of file row changes represented as:

{
    "old": line number before diff,
    "new": line number after diff,
    "line": line text,
    "hunk": diff hunk number
}

static preprocess_file_diff(changes)[source]

Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).

Parameters:

changes (List[DiffDict]) – git diff by file rows data

Return type:

List[DiffRowData]

Returns:

diff rows data with as list of row change type, line number, row content

static read_data(path)[source]

Read the file bytes as is.

Try to read the data of the file.

Parameters:

path (str) – path to file

Return type:

Optional[bytes]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static read_file(path, encodings=('utf8', 'utf16', 'latin_1'))[source]

Read the file content using different encodings.

Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding

Parameters:
  • path (str) – path to file

  • encodings (Tuple[str, ...]) – supported encodings

Return type:

List[str]

Returns:

list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned

static yaml_dump(obj, file_path, encoding='utf8')[source]

Write dictionary to yaml file

Return type:

None

static yaml_load(file_path, encoding='utf8')[source]

Load dictionary from yaml file

Return type:

Any

Module contents