credsweeper.utils package¶
Submodules¶
credsweeper.utils.entropy_validator module¶
- class credsweeper.utils.entropy_validator.EntropyValidator(data, iterator=None)[source]¶
Bases:
objectVerifies data entropy with base64, base36 and base16(hex)
- CHARS_LIMIT_MAP = {<Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>: 4.5, <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>: 3, <Chars.HEX_CHARS: '0123456789ABCDEFabcdef'>: 3}¶
credsweeper.utils.util module¶
- class credsweeper.utils.util.DiffDict¶
Bases:
TypedDict
- class credsweeper.utils.util.DiffRowData(line_type, line_numb, line)[source]¶
Bases:
objectClass for keeping data of diff row.
-
line_type:
DiffRowType¶
-
line_type:
- class credsweeper.utils.util.Util[source]¶
Bases:
objectClass that contains different useful methods.
-
MIN_DATA_ENTROPY:
Dict[int,float] = {16: 1.66973671780348, 20: 2.07723544540831, 32: 3.25392803184602, 40: 3.64853567064867, 64: 4.57756933688035}¶
- static decode_bytes(content, encodings=None)[source]¶
Decode content using different encodings.
Try to decode bytes according to the list of encodings “encodings” occurs without any exceptions. UTF-16 requires BOM
- Parameters:
- Return type:
- Returns:
list of file rows in a suitable encoding from “encodings”, if none of the encodings match, an empty list will be returned Also empty list will be returned after last encoding and 0 symbol is present in lines not at end
- static get_extension(file_path, lower=True)[source]¶
Return extension of file in lower case by default e.g.: ‘.txt’, ‘.JPG’
- Return type:
- static get_keyword_pattern(keyword, separator='=|:=|:')[source]¶
Returns compiled regex pattern
- Return type:
- static get_min_data_entropy(x)[source]¶
Returns minimal entropy for size of random data. Precalculated data is applied for speedup
- Return type:
- static get_shannon_entropy(data, iterator)[source]¶
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.
- Return type:
- static get_xml_from_lines(xml_lines)[source]¶
Parse xml data from list of string and return List of str.
- static is_ascii_entropy_validate(data)[source]¶
Tests small data sequence (<256) for data randomness by testing for ascii and shannon entropy Returns True when data is an ASCII symbols or have small entropy
- Return type:
- static is_binary(data)[source]¶
Returns true if any recognized binary format found or two zeroes sequence is found which never exists in text format (UTF-8, UTF-16) UTF-32 is not supported
- Return type:
- static is_bzip2(data)[source]¶
According https://en.wikipedia.org/wiki/Bzip2
- Return type:
- static is_elf(data)[source]¶
According to https://en.wikipedia.org/wiki/Executable_and_Linkable_Format use only 5 bytes
- Return type:
- static is_gzip(data)[source]¶
According https://www.rfc-editor.org/rfc/rfc1952
- Return type:
- static is_pdf(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures - pdf
- Return type:
- static is_tar(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures
- Return type:
- static is_zip(data)[source]¶
According https://en.wikipedia.org/wiki/List_of_file_signatures
- Return type:
- static json_dump(obj, file_path, encoding='utf_8', indent=4)[source]¶
Write dictionary to json file
- Return type:
- static patch2files_diff(raw_patch, change_type)[source]¶
Generate files changes from patch for added or deleted filepaths.
- Parameters:
change_type (
DiffRowType) – change type to select, DiffRowType.ADDED or DiffRowType.DELETED
- Return type:
- Returns:
return dict with
{file paths: list of file row changes}, where elements of list of file row changes represented as:{ "old": line number before diff, "new": line number after diff, "line": line text, "hunk": diff hunk number }
- static preprocess_diff_rows(added_line_number, deleted_line_number, line)[source]¶
Auxiliary function to extend diff changes.
- static preprocess_file_diff(changes)[source]¶
Generate changed file rows from diff data with changed lines (e.g. marked + or - in diff).
- Parameters:
- Return type:
- Returns:
diff rows data with as list of row change type, line number, row content
- static read_file(path, encodings=None)[source]¶
Read the file content using different encodings.
Try to read the contents of the file according to the list of encodings “encodings” as soon as reading occurs without any exceptions, the data is returned in the current encoding
-
MIN_DATA_ENTROPY: