Credsweeper package¶

CredSweeper¶

class credsweeper.app.CredSweeper(rule_path=None, config_path=None, api_validation=False, json_filename=None, xlsx_filename=None, use_filters=True, pool_count=1, ml_batch_size=16, ml_threshold=ThresholdPreset.medium, find_by_ext=False, depth=0, size_limit=None, exclude_lines=None, exclude_values=None)[source]¶

Bases: object

Advanced credential analyzer base class.

Parameters:

credential_manager – CredSweeper credential manager object
scanner – CredSweeper scanner object
pool_count (int) – number of pools used to run multiprocessing scanning
config – dictionary variable, stores analyzer features
json_filename (Optional[str]) – string variable, credential candidates export filename

class MlValidator(threshold)¶

Bases: object

ML validation class

encode(line, char_to_index)¶

Encodes line to array

Return type:: ndarray

extract_common_features(candidates)¶

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

Return type:: ndarray

extract_unique_features(candidates)¶

Extract features that can by different between candidates. Join them with or operator.

Return type:: ndarray

get_group_features(value, candidates)¶

np.newaxis used to add new dimension if front, so input will be treated as a batch

Return type:: Tuple[ndarray, ndarray]

validate(candidate)¶

Validate single credential candidate.

Return type:: Tuple[bool, float]

validate_groups(group_list, batch_size)¶

Use ml model on list of candidate groups.

Parameters:

group_list (List[Tuple[str, List[Candidate]]]) – List of tuples (value, group)
batch_size (int) – ML model batch

Return type:

Tuple[ndarray, ndarray]

Returns:

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

property config: Config¶

config getter

Return type:: Config

data_scan(data_provider, depth, recursive_limit_size)[source]¶

Recursive function to scan files which might be containers like ZIP archives

Parameters:

data_provider (DataContentProvider) – DataContentProvider object may be a container
depth (int) – maximal level of recursion
recursive_limit_size (int) – maximal bytes of opened files to prevent recursive zip-bomb attack

Return type:

List[Candidate]

export_results()[source]¶

Save credential candidates to json file or print them to a console.

Return type:: None

file_scan(content_provider)[source]¶

Run scanning of file from ‘file_provider’.

Parameters:: content_provider (ContentProvider) – content provider object to scan
Return type:: List[Candidate]
Returns:: list of credential candidates from scanned file

property ml_validator: MlValidator¶

ml_validator getter

Return type:: MlValidator

classmethod pool_initializer()[source]¶

Ignore SIGINT in child processes.

Return type:: None

post_processing()[source]¶

Machine learning validation for received credential candidates.

Return type:: None

run(content_provider)[source]¶

Run an analysis of ‘content_provider’ object.

Parameters:: content_provider (FilesProvider) – path objects to scan
Return type:: int

scan(content_providers)[source]¶

Run scanning of files from an argument “content_providers”.

Parameters:: content_providers (Union[List[DiffContentProvider], List[TextContentProvider]]) – file objects to scan
Return type:: None