Credsweeper package

CredSweeper

class credsweeper.app.CredSweeper(rule_path: None | str | Path = None, config_path: str | None = None, api_validation: bool = False, json_filename: None | str | Path = None, xlsx_filename: None | str | Path = None, hashed: bool = False, subtext: bool = False, sort_output: bool = False, use_filters: bool = True, pool_count: int = 1, ml_batch_size: int | None = None, ml_threshold: float | ThresholdPreset = ThresholdPreset.medium, ml_config: None | str | Path = None, ml_model: None | str | Path = None, ml_providers: str | None = None, find_by_ext: bool = False, depth: int = 0, doc: bool = False, severity: Severity | str = Severity.INFO, size_limit: str | None = None, exclude_lines: List[str] | None = None, exclude_values: List[str] | None = None, log_level: str | None = None)[source]

Bases: object

Advanced credential analyzer base class.

Parameters:
  • credential_manager – CredSweeper credential manager object

  • scanner – CredSweeper scanner object

  • pool_count – number of pools used to run multiprocessing scanning

  • config – dictionary variable, stores analyzer features

  • json_filename – string variable, credential candidates export filename

class MlValidator(threshold: float | ThresholdPreset, ml_config: None | str | Path = None, ml_model: None | str | Path = None, ml_providers: str | None = None)

Bases: object

ML validation class

CHAR_INDEX = {'\x00': 0, '\t': 96, '\n': 97, '\x0b': 99, '\x0c': 100, '\r': 98, ' ': 95, '!': 63, '"': 64, '#': 65, '$': 66, '%': 67, '&': 68, "'": 69, '(': 70, ')': 71, '*': 72, '+': 73, ',': 74, '-': 75, '.': 76, '/': 77, '0': 1, '1': 2, '2': 3, '3': 4, '4': 5, '5': 6, '6': 7, '7': 8, '8': 9, '9': 10, ':': 78, ';': 79, '<': 80, '=': 81, '>': 82, '?': 83, '@': 84, 'A': 37, 'B': 38, 'C': 39, 'D': 40, 'E': 41, 'F': 42, 'G': 43, 'H': 44, 'I': 45, 'J': 46, 'K': 47, 'L': 48, 'M': 49, 'N': 50, 'O': 51, 'P': 52, 'Q': 53, 'R': 54, 'S': 55, 'T': 56, 'U': 57, 'V': 58, 'W': 59, 'X': 60, 'Y': 61, 'Z': 62, '[': 85, '\\': 86, ']': 87, '^': 88, '_': 89, '`': 90, 'a': 11, 'b': 12, 'c': 13, 'd': 14, 'e': 15, 'f': 16, 'g': 17, 'h': 18, 'i': 19, 'j': 20, 'k': 21, 'l': 22, 'm': 23, 'n': 24, 'o': 25, 'p': 26, 'q': 27, 'r': 28, 's': 29, 't': 30, 'u': 31, 'v': 32, 'w': 33, 'x': 34, 'y': 35, 'z': 36, '{': 91, '|': 92, '}': 93, '~': 94, 'ÿ': 101}
MAX_LEN = 160
NON_ASCII = 'ÿ'
NUM_CLASSES = 102
static encode(text: str, limit: int) ndarray

Encodes prepared text to array

static encode_line(text: str, position: int)

Encodes line with balancing for position

static encode_value(text: str) ndarray

Encodes line with balancing for position

extract_common_features(candidates: List[Candidate]) ndarray

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

extract_features(candidates: List[Candidate]) ndarray

extracts common and unique features from list of candidates

extract_unique_features(candidates: List[Candidate]) ndarray

Extract features that can be different between candidates. Join them with or operator.

get_group_features(candidates: List[Candidate]) Tuple[ndarray, ndarray, ndarray, ndarray]

np.newaxis used to add new dimension if front, so input will be treated as a batch

validate_groups(group_list: List[Tuple[CandidateKey, List[Candidate]]], batch_size: int) Tuple[ndarray, ndarray]

Use ml model on list of candidate groups.

Parameters:
  • group_list – List of tuples (value, group)

  • batch_size – ML model batch

Returns:

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

property config: Config

config getter

export_results() None[source]

Save credential candidates to json file or print them to a console.

file_scan(content_provider: DiffContentProvider | TextContentProvider) List[Candidate][source]

Run scanning of file from ‘file_provider’.

Parameters:

content_provider – content provider object to scan

Returns:

list of credential candidates from scanned file

files_scan(content_providers: Sequence[DiffContentProvider | TextContentProvider]) List[Candidate][source]

Auxiliary method for scan one sequence

property ml_validator: MlValidator

ml_validator getter

static pool_initializer(log_kwargs) None[source]

Ignore SIGINT in child processes.

post_processing() None[source]

Machine learning validation for received credential candidates.

run(content_provider: AbstractProvider) int[source]

Run an analysis of ‘content_provider’ object.

Parameters:

content_provider – path objects to scan

scan(content_providers: Sequence[DiffContentProvider | TextContentProvider]) None[source]

Run scanning of files from an argument “content_providers”.

Parameters:

content_providers – file objects to scan