How To Use

Run

Get all argument list:

python -m credsweeper --help
usage: python -m credsweeper [-h] (--path PATH [PATH ...] | --diff_path PATH [PATH ...] | --export_config [PATH] | --export_log_config [PATH]) [--rules [PATH]] [--severity SEVERITY] [--config [PATH]]
                         [--log_config [PATH]] [--denylist PATH] [--find-by-ext] [--depth POSITIVE_INT] [--no-filters] [--doc] [--ml_threshold FLOAT_OR_STR] [--ml_batch_size POSITIVE_INT]
                         [--azure | --cuda] [--api_validation] [--jobs POSITIVE_INT] [--skip_ignored] [--save-json [PATH]] [--save-xlsx [PATH]] [--sort] [--log LOG_LEVEL] [--size_limit SIZE_LIMIT]
                         [--banner] [--version]
options:
  -h, --help            show this help message and exit
  --path PATH [PATH ...]
                        file or directory to scan
  --diff_path PATH [PATH ...]
                        git diff file to scan
  --export_config [PATH]
                        exporting default config to file (default: config.json)
  --export_log_config [PATH]
                        exporting default logger config to file (default: log.yaml)
  --rules [PATH]        path of rule config file (default: credsweeper/rules/config.yaml). severity:['critical', 'high', 'medium', 'low', 'info'] type:['keyword', 'pattern', 'pem_key', 'multi']
  --severity SEVERITY   set minimum level for rules to apply ['critical', 'high', 'medium', 'low', 'info'](default: 'Severity.INFO', case insensitive)
  --config [PATH]       use custom config (default: built-in)
  --log_config [PATH]   use custom log config (default: built-in)
  --denylist PATH       path to a plain text file with lines or secrets to ignore
  --find-by-ext         find files by predefined extension
  --depth POSITIVE_INT  additional recursive search in data (experimental)
  --no-filters          disable filters
  --doc                 document-specific scanning
  --ml_threshold FLOAT_OR_STR
                        setup threshold for the ml model. The lower the threshold - the more credentials will be reported. Allowed values: float between 0 and 1, or any of ['lowest', 'low', 'medium', 'high',
                        'highest'] (default: medium)
  --ml_batch_size POSITIVE_INT, -b POSITIVE_INT
                        batch size for model inference (default: 16)
  --azure               enable AzureExecutionProvider for onnx
  --cuda                enable CUDAExecutionProvider for onnx
  --api_validation      add credential api validation option to credsweeper pipeline. External API is used to reduce FP for some rule types.
  --jobs POSITIVE_INT, -j POSITIVE_INT
                        number of parallel processes to use (default: 1)
  --skip_ignored        parse .gitignore files and skip credentials from ignored objects
  --save-json [PATH]    save result to json file (default: output.json)
  --save-xlsx [PATH]    save result to xlsx file (default: output.xlsx)
  --sort                enable output sorting
  --log LOG_LEVEL, -l LOG_LEVEL
                        provide logging level of ['DEBUG', 'INFO', 'WARN', 'WARNING', 'ERROR', 'FATAL', 'CRITICAL', 'SILENCE'](default: 'warning', case insensitive)
  --size_limit SIZE_LIMIT
                        set size limit of files that for scanning (eg. 1GB / 10MiB / 1000)
  --banner              show version and crc32 sum of CredSweeper files at start
  --version, -V         show program's version number and exit

Note

Validation by ML model classifier is used to reduce False Positives (by far), but might increase False negatives and execution time. You may change system sensitivity by modifying –ml_threshold argument. Increasing threshold will decrease the number of alerts. Setting –ml_threshold 0 will turn ML off and will maximize the number of alerts.

Typical False Positives: password = “template_password”

Note

You may also use –api_validation to reduce FP, but only for some rules: GitHub, Google API, Mailchimp, Slack, Square, Stripe. –api_validation utilize external APIs to check if it can authenticate with a detected credential. For example it will try to authenticate on Google Cloud if Google API Key is detected.

However, use of –api_validation is not recommended at the moment as its influence on False Positive/False Negative alerts are not validated yet. Moreover, it might result in a ddos related ban from corresponding APIs if number of requests is too high.

Note

CredSweeper has experimental option –depth to scan files when taking into account a knowledge about data formats:
  • supported containers (tar, zip, gzip, bzip2)

  • base64 encoded data

  • represent text (xml, json, yaml etc.) as a structure and combine keys with values before analysis

  • parse python source files with builtin ast engine

Pay attention: reported line number of found credential may be not actual in original data, but “info” field may help to understand how the credential was found.

Get output as JSON file:

python -m credsweeper --ml_validation --path tests/samples/password --save-json output.json

To check JSON file run:

cat output.json
[
    {
        "api_validation": "NOT_AVAILABLE",
        "ml_validation": "VALIDATED_KEY",
        "ml_probability": 0.99755,
        "rule": "Password",
        "severity": "medium",
        "line_data_list": [
            {
                "line": "password = \"cackle!\"",
                "line_num": 1,
                "path": "tests/samples/password.gradle",
                "info": "",
                "value": "cackle!",
                "value_start": 12,
                "value_end": 19,
                "variable": "password",
                "entropy_validation":
                {
                    "iterator": "BASE64_CHARS",
                    "entropy": 2.120589933192232,
                    "valid": false
                }
            }
        ]
    }
]

Get CLI output only:

python -m credsweeper --path tests/samples/password
rule: Password / severity: medium / line_data_list: [line : 'password = "cackle!"' / line_num : 1 / path : tests/samples/password / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: VALIDATED_KEY

Exclude outputs using CLI:

If you want to remove some values from report (e.g. known public secrets): create text files with lines or values you want to remove and add it using –denylist argument. Space-like characters at left and right will be ignored.

$ python -m credsweeper --path tests/samples/password --denylist list.txt
Detected Credentials: 0
Time Elapsed: 0.07523202896118164s
$ cat list.txt
cackle!
  password = "cackle!"

Exclude outputs using config:

Edit exclude part of the config file. Default config can be generated using python -m credsweeper --export_config place_to_save.json or can be found in credsweeper/secret/config.json. Space-like characters at left and right will be ignored.

"exclude": {
    "lines": ["   password = \"cackle!\" "],
    "values": ["cackle!"]
}

Then specify your config in CLI:

$ python -m credsweeper --path tests/samples/password --config my_cfg.json
Detected Credentials: 0
Time Elapsed: 0.07152628898620605s

Use as a python library

Minimal example for scanning line list:

from credsweeper import CredSweeper, StringContentProvider


to_scan = ["line one", "password='in_line_2'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
    print(r)
rule: Password / severity: medium / line_data_list: [line: 'password='in_line_2'' / line_num: 2 / path:  / value: 'in_line_2' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: NOT_AVAILABLE

Minimal example for scanning bytes:

from credsweeper import CredSweeper, ByteContentProvider


to_scan = b"line one\npassword='in_line_2'"
cred_sweeper = CredSweeper()
provider = ByteContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
    print(r)
rule: Password / severity: medium / line_data_list: [line: 'password='in_line_2'' / line_num: 2 / path:  / value: 'in_line_2' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: NOT_AVAILABLE

Minimal example for the ML validation:

from credsweeper import CredSweeper, StringContentProvider, MlValidator, ThresholdPreset


to_scan = ["line one", "secret='fgELsRdFA'", "secret='template'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)

# You can select lower or higher threshold to get more or less reports respectively
threshold = ThresholdPreset.medium
validator = MlValidator(threshold=threshold)

results = cred_sweeper.file_scan(provider)
for candidate in results:
    # For each results detected by a CredSweeper, you can validate them using MlValidator
    is_credential, with_probability = validator.validate(candidate)
    if is_credential:
        print(candidate)

Note that “secret=’template’” is not reported due to failing check by the MlValidator.

rule: Secret / severity: medium / line_data_list: [line: 'secret='fgELsRdFA'' / line_num: 2 / path:  / value: 'fgELsRdFA' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: NOT_AVAILABLE

Configurations