credsweeper.file_handler package

Submodules

credsweeper.file_handler.abstract_provider module

class credsweeper.file_handler.abstract_provider.AbstractProvider(paths)[source]

Bases: ABC

Base class for all files provider objects.

abstract get_scannable_files(config)[source]

Get list of file object for analysis based on attribute “paths”.

Parameters:

config (Config) – dict of credsweeper configuration

Return type:

Sequence[Union[DiffContentProvider, TextContentProvider]]

Returns:

file objects to analyse

property paths: Sequence[str | Path | BytesIO | Tuple[str | Path, BytesIO]]

paths getter

credsweeper.file_handler.analysis_target module

class credsweeper.file_handler.analysis_target.AnalysisTarget(line_pos, lines, line_nums, descriptor)[source]

Bases: object

property descriptor: Descriptor

cached value

property file_path: str | None

cached value

property file_type: str | None

cached value

property info: str | None

cached value

property line: str

cached value

property line_len: int

cached value

property line_num: int

cached value

property line_nums: List[int]

cached value

property line_pos: int

cached value

property line_strip: str

cached value

property line_strip_len: int

cached value

property line_strip_lower: str

cached value

property lines: List[str]

cached value

property lines_len: int

cached value

credsweeper.file_handler.byte_content_provider module

class credsweeper.file_handler.byte_content_provider.ByteContentProvider(content, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Allow to scan byte sequence instead of extra reading a file

property data: bytes | None

data getter for ByteContentProvider

property lines: List[str]

lines getter for ByteContentProvider

yield_analysis_target(min_len)[source]

Return lines to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets based on every row in a content

credsweeper.file_handler.content_provider module

class credsweeper.file_handler.content_provider.ContentProvider(file_path=None, file_type=None, info=None)[source]

Bases: ABC

Base class to provide access to analysis targets for scanned object.

abstract property data: bytes | None

abstract data getter

property descriptor: Descriptor

descriptor getter

property file_path: str

file_path getter

property file_type: str

file_type getter

property info: str

info getter

lines_to_targets(min_len, lines, line_nums=None)[source]

Creates list of targets with multiline concatenation

Return type:

Generator[AnalysisTarget, None, None]

abstract yield_analysis_target(min_len)[source]

Load and preprocess file diff data to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

row objects to analysing

credsweeper.file_handler.data_content_provider module

class credsweeper.file_handler.data_content_provider.DataContentProvider(data, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Dummy raw provider to keep bytes

property data: bytes | None

data getter for DataContentProvider

represent_as_encoded()[source]

Encodes data from base64. Stores result in decoded

Return type:

bool

Returns:

True if the data correctly parsed and verified

represent_as_html(depth, recursive_limit_size, keywords_required_substrings_check)[source]

Tries to read data as html

Return type:

bool

Returns:

True if reading was successful

represent_as_structure()[source]

Tries to convert data with many parsers. Stores result to internal structure Return True if some structure found

Return type:

bool

represent_as_xml()[source]

Tries to read data as xml

Return type:

bool

Returns:

True if reading was successful

yield_analysis_target(min_len)[source]

Return nothing. The class provides only data storage.

Parameters:

min_len (int) – minimal line length to scan

Raises:

NotImplementedError

Return type:

Generator[AnalysisTarget, None, None]

credsweeper.file_handler.descriptor module

class credsweeper.file_handler.descriptor.Descriptor(path, extension, info)[source]

Bases: object

Descriptor for file - optimize memory consumption

extension: str
info: str
path: str

credsweeper.file_handler.diff_content_provider module

class credsweeper.file_handler.diff_content_provider.DiffContentProvider(file_path, change_type, diff)[source]

Bases: ContentProvider

Provide data from a single .patch file.

Parameters:
  • file_path (str) – path to file

  • change_type (DiffRowType) – set added or deleted file data to scan

  • diff (List[DiffDict]) –

    list of file row changes, with base elements represented as:

    {
        "old": line number before diff,
        "new": line number after diff,
        "line": line text,
        "hunk": diff hunk number
    }
    

property data: bytes

data getter for DiffContentProvider

parse_lines_data(lines_data)[source]

Parse diff lines data.

Return list of line numbers with change type “self.change_type” and list of all lines in file

in original order(replaced all lines not mentioned in diff file with blank line)

Parameters:

lines_data (List[DiffRowData]) – data of all rows mentioned in diff file

Return type:

Tuple[List[int], List[str]]

Returns:

tuple of line numbers with change type “self.change_type” and all file lines in original order(replaced all lines not mentioned in diff file with blank line)

yield_analysis_target(min_len)[source]

Preprocess file diff data to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets of every row of file diff corresponding to change type “self.change_type”

credsweeper.file_handler.file_path_extractor module

class credsweeper.file_handler.file_path_extractor.FilePathExtractor[source]

Bases: object

Util class to browse files in directories

static apply_gitignore(detected_files)[source]

Apply gitignore rules for each file.

Parameters:

detected_files (List[str]) – list of files to be checked

Return type:

List[str]

Returns:

List of files with all files ignored by git removed

static check_exclude_file(config, path)[source]

Checks whether file should be excluded

Parameters:
  • config (Config) – Config

  • path (str) – str - full path preferred

Return type:

bool

Returns:

True when the file full path should be excluded according config

static check_file_size(config, reference)[source]

Checks whether the file is over the size limit from configuration

Parameters:
Return type:

bool

Returns:

True when the file is oversize

static get_file_paths(config, path)[source]

Get all files in the directory. Automatically exclude files non-code or data files (such as .jpg).

Parameters:
  • config (Config) – credsweeper configuration

  • path (Union[str, Path]) – path to the file or directory to be scanned

Return type:

List[str]

Returns:

List all non-excluded files in the directory

static is_find_by_ext_file(config, extension)[source]

Checks whether file has suspicious extension

Parameters:
  • config (Config) – Config

  • extension (str) – str - may be only file name with extension

Return type:

bool

Returns:

True when the feature is configured and the file extension matches

classmethod is_valid_path(path)[source]

Locate nearest .git directory to the path and check if path is ignored.

Parameters:

path (str) – path to the file or directory to check

Return type:

bool

Returns:

False if file is ignored by git. True otherwise

located_repos: Dict[Path, Repo] = {}

credsweeper.file_handler.files_provider module

class credsweeper.file_handler.files_provider.FilesProvider(paths, skip_ignored=None)[source]

Bases: AbstractProvider

Provider of plain os files to be analysed.

get_scannable_files(config)[source]

Get list of full text file object for analysis of files with parent paths from “paths”.

Parameters:

config (Config) – dict of credsweeper configuration

Return type:

Sequence[Union[DiffContentProvider, TextContentProvider]]

Returns:

preprocessed file objects for analysis

credsweeper.file_handler.patches_provider module

class credsweeper.file_handler.patches_provider.PatchesProvider(paths, change_type)[source]

Bases: AbstractProvider

Provide data from a list of .patch files.

get_files_sequence(raw_patches)[source]

Returns sequence of files

Return type:

Sequence[Union[DiffContentProvider, TextContentProvider]]

get_scannable_files(config)[source]

Get files to scan. Output based on the paths field.

Parameters:

config (Config) – dict of credsweeper configuration

Return type:

Sequence[Union[DiffContentProvider, TextContentProvider]]

Returns:

file objects for analysing

load_patch_data(config)[source]

Loads data from patch

Return type:

List[List[str]]

credsweeper.file_handler.string_content_provider module

class credsweeper.file_handler.string_content_provider.StringContentProvider(lines, line_numbers=None, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Provider performs scan simple text lines

property data: bytes

data getter for StringContentProvider

yield_analysis_target(min_len)[source]

Return lines to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets based on every row in file

credsweeper.file_handler.struct_content_provider module

class credsweeper.file_handler.struct_content_provider.StructContentProvider(struct, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Content provider to keep structured data

property data: bytes

data getter for StructContentProvider

property struct: Any

obj getter

yield_analysis_target(min_len)[source]

Return nothing. The class provides only data storage.

Parameters:

min_len (int) – minimal line length to scan

Raises:

NotImplementedError

Return type:

Generator[AnalysisTarget, None, None]

credsweeper.file_handler.text_content_provider module

class credsweeper.file_handler.text_content_provider.TextContentProvider(file_path, file_type=None, info=None)[source]

Bases: ContentProvider

Provide access to analysis targets for full-text file scanning.

Parameters:

file_path (Union[str, Path, Tuple[Union[str, Path], BytesIO]]) – string, path to file

property data: bytes | None

data getter for TextContentProvider

property lines: List[str] | None

lines getter for TextContentProvider

yield_analysis_target(min_len)[source]

Load and preprocess file content to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets based on every row in file

Module contents

class credsweeper.file_handler.ByteContentProvider(content, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Allow to scan byte sequence instead of extra reading a file

property data: bytes | None

data getter for ByteContentProvider

property lines: List[str]

lines getter for ByteContentProvider

yield_analysis_target(min_len)[source]

Return lines to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets based on every row in a content

class credsweeper.file_handler.ContentProvider(file_path=None, file_type=None, info=None)[source]

Bases: ABC

Base class to provide access to analysis targets for scanned object.

abstract property data: bytes | None

abstract data getter

property descriptor: Descriptor

descriptor getter

property file_path: str

file_path getter

property file_type: str

file_type getter

property info: str

info getter

lines_to_targets(min_len, lines, line_nums=None)[source]

Creates list of targets with multiline concatenation

Return type:

Generator[AnalysisTarget, None, None]

abstract yield_analysis_target(min_len)[source]

Load and preprocess file diff data to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

row objects to analysing

class credsweeper.file_handler.DataContentProvider(data, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Dummy raw provider to keep bytes

property data: bytes | None

data getter for DataContentProvider

represent_as_encoded()[source]

Encodes data from base64. Stores result in decoded

Return type:

bool

Returns:

True if the data correctly parsed and verified

represent_as_html(depth, recursive_limit_size, keywords_required_substrings_check)[source]

Tries to read data as html

Return type:

bool

Returns:

True if reading was successful

represent_as_structure()[source]

Tries to convert data with many parsers. Stores result to internal structure Return True if some structure found

Return type:

bool

represent_as_xml()[source]

Tries to read data as xml

Return type:

bool

Returns:

True if reading was successful

yield_analysis_target(min_len)[source]

Return nothing. The class provides only data storage.

Parameters:

min_len (int) – minimal line length to scan

Raises:

NotImplementedError

Return type:

Generator[AnalysisTarget, None, None]

class credsweeper.file_handler.DiffContentProvider(file_path, change_type, diff)[source]

Bases: ContentProvider

Provide data from a single .patch file.

Parameters:
  • file_path (str) – path to file

  • change_type (DiffRowType) – set added or deleted file data to scan

  • diff (List[DiffDict]) –

    list of file row changes, with base elements represented as:

    {
        "old": line number before diff,
        "new": line number after diff,
        "line": line text,
        "hunk": diff hunk number
    }
    

property data: bytes

data getter for DiffContentProvider

parse_lines_data(lines_data)[source]

Parse diff lines data.

Return list of line numbers with change type “self.change_type” and list of all lines in file

in original order(replaced all lines not mentioned in diff file with blank line)

Parameters:

lines_data (List[DiffRowData]) – data of all rows mentioned in diff file

Return type:

Tuple[List[int], List[str]]

Returns:

tuple of line numbers with change type “self.change_type” and all file lines in original order(replaced all lines not mentioned in diff file with blank line)

yield_analysis_target(min_len)[source]

Preprocess file diff data to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets of every row of file diff corresponding to change type “self.change_type”

class credsweeper.file_handler.StringContentProvider(lines, line_numbers=None, file_path=None, file_type=None, info=None)[source]

Bases: ContentProvider

Provider performs scan simple text lines

property data: bytes

data getter for StringContentProvider

yield_analysis_target(min_len)[source]

Return lines to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets based on every row in file

class credsweeper.file_handler.TextContentProvider(file_path, file_type=None, info=None)[source]

Bases: ContentProvider

Provide access to analysis targets for full-text file scanning.

Parameters:

file_path (Union[str, Path, Tuple[Union[str, Path], BytesIO]]) – string, path to file

property data: bytes | None

data getter for TextContentProvider

property lines: List[str] | None

lines getter for TextContentProvider

yield_analysis_target(min_len)[source]

Load and preprocess file content to scan.

Parameters:

min_len (int) – minimal line length to scan

Return type:

Generator[AnalysisTarget, None, None]

Returns:

list of analysis targets based on every row in file