credsweeper.ml_model package

Subpackages

Submodules

credsweeper.ml_model.ml_validator module

class credsweeper.ml_model.ml_validator.MlValidator(threshold: float | ThresholdPreset, ml_config: None | str | Path = None, ml_model: None | str | Path = None, ml_providers: str | None = None)[source]

Bases: object

ML validation class

FAKE_CHAR = '\x01'
MAX_LEN = 128
ZERO_CHAR = '\x00'
encode(text: str, limit: int) ndarray[source]

Encodes prepared text to array

encode_line(text: str, position: int)[source]

Encodes line with balancing for position

encode_value(text: str) ndarray[source]

Encodes line with balancing for position

extract_common_features(candidates: List[Candidate]) ndarray[source]

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

extract_features(candidates: List[Candidate]) ndarray[source]

extracts common and unique features from list of candidates

extract_unique_features(candidates: List[Candidate]) ndarray[source]

Extract features that can be different between candidates. Join them with or operator.

get_group_features(candidates: List[Candidate]) Tuple[ndarray, ndarray, ndarray, ndarray][source]

np.newaxis used to add new dimension if front, so input will be treated as a batch

property session: InferenceSession

session getter to prevent pickle error

validate_groups(group_list: List[Tuple[CandidateKey, List[Candidate]]], batch_size: int) Tuple[ndarray, ndarray][source]

Use ml model on list of candidate groups.

Parameters:
  • group_list – List of tuples (value, group)

  • batch_size – ML model batch

Returns:

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

Module contents