ml_model package¶

Submodules¶

ml_model.features module¶

Most rules are described in ‘Secrets in Source Code: Reducing False Positives Using Machine Learning’.

class credsweeper.ml_model.features.Feature[source]¶

Bases: ABC

Base class for features.

abstract extract(candidate)[source]¶

Return type:: Any

class credsweeper.ml_model.features.FileExtension(extensions)[source]¶

Bases: Feature

Categorical feature of file type.

Parameters:: extensions (List[str]) – extension labels

extract(candidate)[source]¶

Return type:: Any

class credsweeper.ml_model.features.HartleyEntropy(base, norm=False)[source]¶

Bases: RenyiEntropy

Hartley entropy feature.

class credsweeper.ml_model.features.HasHtmlTag[source]¶

Bases: Feature

Feature is true if line has HTML tags (HTML file).

extract(candidate)[source]¶

Return type:: bool

class credsweeper.ml_model.features.IsSecretNumeric[source]¶

Bases: Feature

Feature is true if candidate value is a numerical value.

extract(candidate)[source]¶

Return type:: bool

class credsweeper.ml_model.features.PossibleComment[source]¶

Bases: Feature

Feature is true if candidate line starts with #,*,/*? (Possible comment).

extract(candidate)[source]¶

Return type:: bool

class credsweeper.ml_model.features.RenyiEntropy(base, alpha, norm=False)[source]¶

Bases: Feature

Renyi entropy.

See next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf

Parameters:

alpha (float) – entropy parameter
norm – set True to normalize output probabilities

CHARS: Dict[Base, Chars] = {<Base.base36: 'base36'>: <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>, <Base.base64: 'base64'>: <Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>, <Base.hex: 'hex'>: <Chars.HEX_CHARS: '1234567890abcdefABCDEF'>}¶

estimate_entropy(p_x)[source]¶

Calculate Renyi entropy of ‘p_x’ sequence.

Function is based on definition of Renyi entropy for arbitrary probability distribution. Please see next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf

Return type:: float

extract(candidate)[source]¶

Return type:: ndarray

get_probabilities(data)[source]¶

Get list of alphabet’s characters presented in inputted string.

Return type:: ndarray

class credsweeper.ml_model.features.RuleName(rule_names)[source]¶

Bases: Feature

Categorical feature that corresponds to rule name.

Parameters:: rule_names (List[str]) – rule name labels

extract(candidate)[source]¶

Return type:: Any

class credsweeper.ml_model.features.ShannonEntropy(base, norm=False)[source]¶

Bases: RenyiEntropy

Shannon entropy feature.

base: Base¶

class credsweeper.ml_model.features.WordInLine(words)[source]¶

Bases: Feature

Feature is true if line contains at least one word from predefined list.

extract(candidate)[source]¶

Return type:: bool

class credsweeper.ml_model.features.WordInPath(words)[source]¶

Bases: Feature

Feature is true if candidate path contains at least one word from predefined list.

extract(candidate)[source]¶

Return type:: bool

class credsweeper.ml_model.features.WordInSecret(words)[source]¶

Bases: Feature

Feature returns true if candidate value contains at least one word from predefined list.

extract(candidate)[source]¶

Return type:: bool

ml_model.ml_validator module¶

class credsweeper.ml_model.ml_validator.MlValidator(threshold)[source]¶

Bases: object

ML validation class

encode(line, char_to_index)[source]¶

Encodes line to array

Return type:: ndarray

extract_common_features(candidates)[source]¶

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

Return type:: ndarray

extract_unique_features(candidates)[source]¶

Extract features that can by different between candidates. Join them with or operator.

Return type:: ndarray

get_group_features(value, candidates)[source]¶

np.newaxis used to add new dimension if front, so input will be treated as a batch

Return type:: Tuple[ndarray, ndarray]

validate(candidate)[source]¶

Validate single credential candidate.

Return type:: Tuple[bool, float]

validate_groups(group_list, batch_size)[source]¶

Use ml model on list of candidate groups.

Parameters:

group_list (List[Tuple[str, List[Candidate]]]) – List of tuples (value, group)
batch_size (int) – ML model batch

Return type:

Tuple[ndarray, ndarray]

Returns:

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

ml_model package¶

Submodules¶

ml_model.features module¶

ml_model.ml_validator module¶

Module contents¶