ml_model package

Submodules

ml_model.features module

Most rules are described in ‘Secrets in Source Code: Reducing False Positives Using Machine Learning’.

class credsweeper.ml_model.features.Feature[source]

Bases: ABC

Base class for features.

abstract extract(candidate)[source]
Return type:

Any

class credsweeper.ml_model.features.FileExtension(extensions)[source]

Bases: Feature

Categorical feature of file type.

Parameters:

extensions (List[str]) – extension labels

extract(candidate)[source]
Return type:

Any

class credsweeper.ml_model.features.HartleyEntropy(base, norm=False)[source]

Bases: RenyiEntropy

Hartley entropy feature.

class credsweeper.ml_model.features.HasHtmlTag[source]

Bases: Feature

Feature is true if line has HTML tags (HTML file).

extract(candidate)[source]
Return type:

bool

class credsweeper.ml_model.features.IsSecretNumeric[source]

Bases: Feature

Feature is true if candidate value is a numerical value.

extract(candidate)[source]
Return type:

bool

class credsweeper.ml_model.features.PossibleComment[source]

Bases: Feature

Feature is true if candidate line starts with #,*,/*? (Possible comment).

extract(candidate)[source]
Return type:

bool

class credsweeper.ml_model.features.RenyiEntropy(base, alpha, norm=False)[source]

Bases: Feature

Renyi entropy.

See next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf

Parameters:
  • alpha (float) – entropy parameter

  • norm – set True to normalize output probabilities

CHARS: Dict[Base, Chars] = {<Base.base36: 'base36'>: <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>, <Base.base64: 'base64'>: <Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>, <Base.hex: 'hex'>: <Chars.HEX_CHARS: '1234567890abcdefABCDEF'>}
estimate_entropy(p_x)[source]

Calculate Renyi entropy of ‘p_x’ sequence.

Function is based on definition of Renyi entropy for arbitrary probability distribution. Please see next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf

Return type:

float

extract(candidate)[source]
Return type:

ndarray

get_probabilities(data)[source]

Get list of alphabet’s characters presented in inputted string.

Return type:

ndarray

class credsweeper.ml_model.features.RuleName(rule_names)[source]

Bases: Feature

Categorical feature that corresponds to rule name.

Parameters:

rule_names (List[str]) – rule name labels

extract(candidate)[source]
Return type:

Any

class credsweeper.ml_model.features.ShannonEntropy(base, norm=False)[source]

Bases: RenyiEntropy

Shannon entropy feature.

base: Base
class credsweeper.ml_model.features.WordInLine(words)[source]

Bases: Feature

Feature is true if line contains at least one word from predefined list.

extract(candidate)[source]
Return type:

bool

class credsweeper.ml_model.features.WordInPath(words)[source]

Bases: Feature

Feature is true if candidate path contains at least one word from predefined list.

extract(candidate)[source]
Return type:

bool

class credsweeper.ml_model.features.WordInSecret(words)[source]

Bases: Feature

Feature returns true if candidate value contains at least one word from predefined list.

extract(candidate)[source]
Return type:

bool

ml_model.ml_validator module

class credsweeper.ml_model.ml_validator.MlValidator(threshold)[source]

Bases: object

ML validation class

encode(line, char_to_index)[source]

Encodes line to array

Return type:

ndarray

extract_common_features(candidates)[source]

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

Return type:

ndarray

extract_unique_features(candidates)[source]

Extract features that can by different between candidates. Join them with or operator.

Return type:

ndarray

get_group_features(value, candidates)[source]

np.newaxis used to add new dimension if front, so input will be treated as a batch

Return type:

Tuple[ndarray, ndarray]

validate(candidate)[source]

Validate single credential candidate.

Return type:

Tuple[bool, float]

validate_groups(group_list, batch_size)[source]

Use ml model on list of candidate groups.

Parameters:
Return type:

Tuple[ndarray, ndarray]

Returns:

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

Module contents