file_handler package

Submodules

file_handler.analysis_target module

class credsweeper.file_handler.analysis_target.AnalysisTarget(line, line_num, lines, file_path)[source]

Bases: object

file_path: str
line: str
line_num: int
lines: List[str]

file_handler.content_provider module

class credsweeper.file_handler.content_provider.ContentProvider(_file_path)[source]

Bases: ABC

Base class to provide access to analysis targets for scanned object.

property file_path: str

file_path getter

Return type:

str

abstract get_analysis_target()[source]

Load and preprocess file diff data to scan.

Return type:

List[AnalysisTarget]

Returns:

row objects to analysing

lines_to_targets(lines, line_nums=None)[source]

Creates list of targets with multiline concatenation

Return type:

List[AnalysisTarget]

file_handler.diff_content_provider module

class credsweeper.file_handler.diff_content_provider.DiffContentProvider(file_path, change_type, diff)[source]

Bases: ContentProvider

Provide data from a single .patch file.

Parameters:
  • file_path (str) – path to file

  • change_type (str) – set added or deleted file data to scan

  • diff (List[DiffDict]) –

    list of file row changes, with base elements represented as:

    {
        "old": line number before diff,
        "new": line number after diff,
        "line": line text,
        "hunk": diff hunk number
    }
    

get_analysis_target()[source]

Preprocess file diff data to scan.

Return type:

List[AnalysisTarget]

Returns:

list of analysis targets of every row of file diff corresponding to change type “self.change_type”

parse_lines_data(lines_data)[source]

Parse diff lines data.

Return list of line numbers with change type “self.change_type” and list of all lines in file

in original order(replaced all lines not mentioned in diff file with blank line)

Parameters:

lines_data (List[DiffRowData]) – data of all rows mentioned in diff file

Return type:

Tuple[List[int], List[str]]

Returns:

tuple of line numbers with change type “self.change_type” and all file lines in original order(replaced all lines not mentioned in diff file with blank line)

file_handler.file_path_extractor module

class credsweeper.file_handler.file_path_extractor.FilePathExtractor[source]

Bases: object

Util class to browse files in directories

classmethod apply_gitignore(detected_files)[source]

Apply gitignore rules for each file.

Parameters:

detected_files (List[str]) – list of files to be checked

Return type:

List[str]

Returns:

List of files with all files ignored by git removed

classmethod check_exclude_file(config, path)[source]

Checks whether file should be excluded

Parameters:
  • config (Config) – Config

  • path (str) – str - full path preferred

Return type:

bool

Returns:

True when the file full path should be excluded according config

classmethod check_file_size(config, path)[source]

Checks whether the file is oversize limit

Parameters:
  • config (Config) – Config

  • path (str) – str - acceptable file

Return type:

bool

Returns:

True when the file is oversize

classmethod get_file_paths(config, path)[source]

Get all files in the directory. Automatically exclude files non-code or data files (such as .jpg).

Parameters:
  • config (Config) – credsweeper configuration

  • path (str) – path to the file or directory to be scanned

Return type:

List[str]

Returns:

List all non-excluded files in the directory

static is_find_by_ext_file(config, path)[source]

Checks whether file has suspicious extension

Parameters:
  • config (Config) – Config

  • path (str) – str - may be only file name with extension

Return type:

bool

Returns:

True when the feature is configured and the file extension matches

classmethod is_valid_path(path)[source]

Locate nearest .git directory to the path and check if path is ignored.

Parameters:

path (str) – path to the file or directory to check

Return type:

bool

Returns:

False if file is ignored by git. True otherwise

located_repos: Dict[Path, Repo] = {}

file_handler.files_provider module

class credsweeper.file_handler.files_provider.FilesProvider(paths, change_type=None, skip_ignored=None)[source]

Bases: ABC

Base class for all files provider objects.

Parameters:
  • paths (List[str]) – list of paths to scan

  • change_type (Optional[str]) – type of analyses changes in patch (added or deleted)

  • skip_ignored (Optional[bool]) – Checking the directory to the list of ignored directories from the gitignore file

abstract get_scannable_files(config)[source]

Get list of file object for analysis based on attribute “paths”.

Parameters:

config (Config) – dict of credsweeper configuration

Return type:

Union[List[DiffContentProvider], List[TextContentProvider]]

Returns:

file objects to analyse

file_handler.patch_provider module

class credsweeper.file_handler.patch_provider.PatchProvider(paths, change_type=None, skip_ignored=None)[source]

Bases: FilesProvider

Provide data from a list of .patch files.

Allows to scan for data that has changed between git commits, rather than the entire project.

Parameters:
  • paths (List[str]) – file paths list to scan. All files should be in .patch format

  • change_type (Optional[str]) – string, type of analyses changes in patch (added or deleted)

  • skip_ignored (Optional[bool]) – boolean variable, Checking the directory to the list of ignored directories from the gitignore file

get_files_sequence(raw_patches)[source]

Returns sequence of files

Return type:

List[DiffContentProvider]

get_scannable_files(config)[source]

Get files to scan. Output based on the paths field.

Parameters:

config (Config) – dict of credsweeper configuration

Return type:

Union[List[DiffContentProvider], List[TextContentProvider]]

Returns:

file objects for analysing

load_patch_data()[source]

Loads data from patch

Return type:

List[List[str]]

file_handler.text_content_provider module

class credsweeper.file_handler.text_content_provider.TextContentProvider(file_path, change_type=None, diff=None)[source]

Bases: ContentProvider

Provide access to analysis targets for full-text file scanning.

Parameters:

file_path (str) – string, path to file

get_analysis_target()[source]

Load and preprocess file content to scan.

Return type:

List[AnalysisTarget]

Returns:

list of analysis targets based on every row in file

file_handler.text_provider module

class credsweeper.file_handler.text_provider.TextProvider(paths, change_type=None, skip_ignored=None)[source]

Bases: FilesProvider

Provider of full text files analysing.

Parameters:
  • paths (List[str]) – list of string, list of parent path of files to scan

  • change_type (Optional[str]) – string, type of analyses changes in patch (added or deleted)

  • skip_ignored (Optional[bool]) – boolean variable, Checking the directory to the list of ignored directories from the gitignore file

get_files_sequence(file_paths)[source]

Get list of paths and returns list of TextContentProviders

Parameters:

file_paths (List[str]) – list of paths

Return type:

List[TextContentProvider]

Returns:

list of files providers

get_scannable_files(config)[source]

Get list of full text file object for analysis of files with parent paths from “paths”.

Parameters:

config (Config) – dict of credsweeper configuration

Return type:

Union[List[DiffContentProvider], List[TextContentProvider]]

Returns:

preprocessed file objects for analysis

Module contents

class credsweeper.file_handler.ByteContentProvider(content, file_path=None)[source]

Bases: ContentProvider

Allow to scan byte sequence.

Parameters:
  • content (bytes) – byte sequence to be scanned.Would be automatically split into an array of lines in a new line character is present

  • file_path (Optional[str]) – optional string. Might be specified if you know true file name lines was taken from

get_analysis_target()[source]

Return lines to scan.

Return type:

List[AnalysisTarget]

Returns:

list of analysis targets based on every row in a content

class credsweeper.file_handler.ContentProvider(_file_path)[source]

Bases: ABC

Base class to provide access to analysis targets for scanned object.

property file_path: str

file_path getter

Return type:

str

abstract get_analysis_target()[source]

Load and preprocess file diff data to scan.

Return type:

List[AnalysisTarget]

Returns:

row objects to analysing

lines_to_targets(lines, line_nums=None)[source]

Creates list of targets with multiline concatenation

Return type:

List[AnalysisTarget]

class credsweeper.file_handler.DataContentProvider(data, file_path=None)[source]

Bases: ContentProvider

Dummy raw provider to keep bytes

Parameters:
  • data (bytes) – byte sequence to be stored.

  • file_path (Optional[str]) – optional string. Might be specified if you know true file name lines was taken from.

property data: bytes

data getter

Return type:

bytes

get_analysis_target()[source]

Return nothing. The class provides only data storage.

Raises:

NotImplementedError

Return type:

List[AnalysisTarget]

class credsweeper.file_handler.DiffContentProvider(file_path, change_type, diff)[source]

Bases: ContentProvider

Provide data from a single .patch file.

Parameters:
  • file_path (str) – path to file

  • change_type (str) – set added or deleted file data to scan

  • diff (List[DiffDict]) –

    list of file row changes, with base elements represented as:

    {
        "old": line number before diff,
        "new": line number after diff,
        "line": line text,
        "hunk": diff hunk number
    }
    

get_analysis_target()[source]

Preprocess file diff data to scan.

Return type:

List[AnalysisTarget]

Returns:

list of analysis targets of every row of file diff corresponding to change type “self.change_type”

parse_lines_data(lines_data)[source]

Parse diff lines data.

Return list of line numbers with change type “self.change_type” and list of all lines in file

in original order(replaced all lines not mentioned in diff file with blank line)

Parameters:

lines_data (List[DiffRowData]) – data of all rows mentioned in diff file

Return type:

Tuple[List[int], List[str]]

Returns:

tuple of line numbers with change type “self.change_type” and all file lines in original order(replaced all lines not mentioned in diff file with blank line)

class credsweeper.file_handler.StringContentProvider(lines, file_path=None)[source]

Bases: ContentProvider

Allow to scan array of lines.

Parameters:
  • lines (List[str]) – lines to be processed

  • file_path (Optional[str]) – optional string. Might be specified if you know true file name lines was taken from

get_analysis_target()[source]

Return lines to scan.

Return type:

List[AnalysisTarget]

Returns:

list of analysis targets based on every row in file

class credsweeper.file_handler.TextContentProvider(file_path, change_type=None, diff=None)[source]

Bases: ContentProvider

Provide access to analysis targets for full-text file scanning.

Parameters:

file_path (str) – string, path to file

get_analysis_target()[source]

Load and preprocess file content to scan.

Return type:

List[AnalysisTarget]

Returns:

list of analysis targets based on every row in file