Credsweeper package

CredSweeper

class credsweeper.app.CredSweeper(rule_path=None, config_path=None, api_validation=False, json_filename=None, xlsx_filename=None, sort_output=False, use_filters=True, pool_count=1, ml_batch_size=16, ml_threshold=ThresholdPreset.medium, azure=False, cuda=False, find_by_ext=False, depth=0, doc=False, severity=Severity.INFO, size_limit=None, exclude_lines=None, exclude_values=None, log_level=None)[source]

Bases: object

Advanced credential analyzer base class.

Parameters:
  • credential_manager – CredSweeper credential manager object

  • scanner – CredSweeper scanner object

  • pool_count (int) – number of pools used to run multiprocessing scanning

  • config – dictionary variable, stores analyzer features

  • json_filename (Union[None, str, Path]) – string variable, credential candidates export filename

class MlValidator(threshold, azure=False, cuda=False)

Bases: object

ML validation class

encode(line, char_to_index)

Encodes line to array

Return type:

ndarray

extract_common_features(candidates)

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

Return type:

ndarray

extract_unique_features(candidates)

Extract features that can be different between candidates. Join them with or operator.

Return type:

ndarray

get_group_features(value, candidates)

np.newaxis used to add new dimension if front, so input will be treated as a batch

Return type:

Tuple[ndarray, ndarray]

validate(candidate)

Validate single credential candidate.

Return type:

Tuple[bool, float]

validate_groups(group_list, batch_size)

Use ml model on list of candidate groups.

Parameters:
Return type:

Tuple[ndarray, ndarray]

Returns:

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

property config: Config

config getter

export_results()[source]

Save credential candidates to json file or print them to a console.

Return type:

None

file_scan(content_provider)[source]

Run scanning of file from ‘file_provider’.

Parameters:

content_provider (Union[DiffContentProvider, TextContentProvider]) – content provider object to scan

Return type:

List[Candidate]

Returns:

list of credential candidates from scanned file

property ml_validator: MlValidator

ml_validator getter

static pool_initializer(log_kwargs)[source]

Ignore SIGINT in child processes.

Return type:

None

post_processing()[source]

Machine learning validation for received credential candidates.

Return type:

None

run(content_provider)[source]

Run an analysis of ‘content_provider’ object.

Parameters:

content_provider (AbstractProvider) – path objects to scan

Return type:

int

scan(content_providers)[source]

Run scanning of files from an argument “content_providers”.

Parameters:

content_providers (Sequence[Union[DiffContentProvider, TextContentProvider]]) – file objects to scan

Return type:

None