credsweeper.ml_model package¶
Submodules¶
credsweeper.ml_model.features module¶
Most rules are described in ‘Secrets in Source Code: Reducing False Positives Using Machine Learning’.
- class credsweeper.ml_model.features.FileExtension(extensions)[source]¶
Bases:
Feature
Categorical feature of file type.
- class credsweeper.ml_model.features.HartleyEntropy(base, norm=False)[source]¶
Bases:
RenyiEntropy
Hartley entropy feature.
- class credsweeper.ml_model.features.HasHtmlTag[source]¶
Bases:
Feature
Feature is true if line has HTML tags (HTML file).
- class credsweeper.ml_model.features.IsSecretNumeric[source]¶
Bases:
Feature
Feature is true if candidate value is a numerical value.
- class credsweeper.ml_model.features.PossibleComment[source]¶
Bases:
Feature
Feature is true if candidate line starts with #,*,/*? (Possible comment).
- class credsweeper.ml_model.features.RenyiEntropy(base, alpha, norm=False)[source]¶
Bases:
Feature
Renyi entropy.
See next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf
- Parameters:
alpha (
float
) – entropy parameternorm – set True to normalize output probabilities
-
CHARS:
Dict
[Base
,Chars
] = {<Base.base36: 'base36'>: <Chars.BASE36_CHARS: 'abcdefghijklmnopqrstuvwxyz1234567890'>, <Base.base64: 'base64'>: <Chars.BASE64_CHARS: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='>, <Base.hex: 'hex'>: <Chars.HEX_CHARS: '0123456789ABCDEFabcdef'>}¶
- estimate_entropy(p_x)[source]¶
Calculate Renyi entropy of ‘p_x’ sequence.
Function is based on definition of Renyi entropy for arbitrary probability distribution. Please see next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf
- Return type:
- class credsweeper.ml_model.features.RuleName(rule_names)[source]¶
Bases:
Feature
Categorical feature that corresponds to rule name.
- class credsweeper.ml_model.features.ShannonEntropy(base, norm=False)[source]¶
Bases:
RenyiEntropy
Shannon entropy feature.
- class credsweeper.ml_model.features.WordInLine(words)[source]¶
Bases:
Feature
Feature is true if line contains at least one word from predefined list.
- class credsweeper.ml_model.features.WordInPath(words)[source]¶
Bases:
Feature
Feature is true if candidate path contains at least one word from predefined list.
credsweeper.ml_model.ml_validator module¶
- class credsweeper.ml_model.ml_validator.MlValidator(threshold, azure=False, cuda=False)[source]¶
Bases:
object
ML validation class
- extract_common_features(candidates)[source]¶
Extract features that are guaranteed to be the same for all candidates on the same line with same value.
- Return type:
- extract_unique_features(candidates)[source]¶
Extract features that can be different between candidates. Join them with or operator.
- Return type:
- get_group_features(value, candidates)[source]¶
np.newaxis used to add new dimension if front, so input will be treated as a batch