How To Use

Run

Get all argument list:

python -m credsweeper --help
usage: python -m credsweeper [-h]
                             (--path PATH [PATH ...] | --diff_path PATH [PATH ...] | --export_config [PATH] | --export_log_config [PATH] | --git PATH)
                             [--ref REF] [--rules PATH] [--severity SEVERITY]
                             [--config PATH] [--log_config PATH]
                             [--denylist PATH] [--find-by-ext]
                             [--pedantic | --no-pedantic]
                             [--depth POSITIVE_INT] [--no-filters] [--doc]
                             [--ml_threshold THRESHOLD_OR_FLOAT_OR_ZERO]
                             [--ml_batch_size POSITIVE_INT] [--ml_config PATH]
                             [--ml_model PATH] [--ml_providers STR]
                             [--jobs POSITIVE_INT] [--thrifty | --no-thrifty]
                             [--skip_ignored] [--error | --no-error]
                             [--save-json [PATH]] [--save-xlsx [PATH]]
                             [--stdout | --no-stdout] [--color | --no-color]
                             [--hashed | --no-hashed]
                             [--subtext | --no-subtext] [--sort | --no-sort]
                             [--log LOG_LEVEL] [--size_limit SIZE_LIMIT]
                             [--banner] [--version]

options:
  -h, --help            show this help message and exit
  --path PATH [PATH ...]
                        file or directory to scan
  --diff_path PATH [PATH ...]
                        git diff file to scan
  --export_config [PATH]
                        exporting default config to file (default:
                        config.json)
  --export_log_config [PATH]
                        exporting default logger config to file (default:
                        log.yaml)
  --git PATH            git repo to scan
  --ref REF             scan git repo from the ref, otherwise - all branches
                        were scanned (slow)
  --rules PATH          path of rule config file (default:
                        credsweeper/rules/config.yaml). severity:['critical',
                        'high', 'medium', 'low', 'info'] type:['keyword',
                        'pattern', 'pem_key', 'multi']
  --severity SEVERITY   set minimum level for rules to apply ['critical',
                        'high', 'medium', 'low', 'info'](default:
                        'Severity.INFO', case insensitive)
  --config PATH         use custom config (default: built-in)
  --log_config PATH     use custom log config (default: built-in)
  --denylist PATH       path to a plain text file with lines or secrets to
                        ignore
  --find-by-ext         find files by predefined extension
  --pedantic, --no-pedantic
                        process files without extension (default: False)
  --depth POSITIVE_INT  additional recursive search in data (experimental)
  --no-filters          disable filters
  --doc                 document-specific scanning
  --ml_threshold THRESHOLD_OR_FLOAT_OR_ZERO
                        setup threshold for the ml model. The lower the
                        threshold - the more credentials will be reported.
                        Allowed values: float between 0 and 1, or any of
                        ['lowest', 'low', 'medium', 'high', 'highest']
                        (default: medium)
  --ml_batch_size POSITIVE_INT, -b POSITIVE_INT
                        batch size for model inference (default: 16)
  --ml_config PATH      use external config for ml model
  --ml_model PATH       use external ml model
  --ml_providers STR    comma separated list of providers for onnx
                        (CPUExecutionProvider is used by default)
  --jobs POSITIVE_INT, -j POSITIVE_INT
                        number of parallel processes to use (default: 1)
  --thrifty, --no-thrifty
                        clear objects after scan to reduce memory consumption
                        (default: True)
  --skip_ignored        parse .gitignore files and skip credentials from
                        ignored objects
  --error, --no-error   produce error code if credentials are found (default:
                        False)
  --save-json [PATH]    save result to json file (default: output.json)
  --save-xlsx [PATH]    save result to xlsx file (default: output.xlsx)
  --stdout, --no-stdout
                        print results to stdout (default: True)
  --color, --no-color   print results with colorization (default: False)
  --hashed, --no-hashed
                        line, variable, value will be hashed in output
                        (default: False)
  --subtext, --no-subtext
                        line text will be stripped in 128 symbols but value
                        and variable are kept (default: False)
  --sort, --no-sort     enable output sorting (default: False)
  --log LOG_LEVEL, -l LOG_LEVEL
                        provide logging level of ['DEBUG', 'INFO', 'WARN',
                        'WARNING', 'ERROR', 'FATAL', 'CRITICAL', 'SILENCE']
                        (default: 'warning', case insensitive)
  --size_limit SIZE_LIMIT
                        set size limit of files that for scanning (eg. 1GB /
                        10MiB / 1000)
  --banner              show version and crc32 sum of CredSweeper files at
                        start
  --version, -V         show program's version number and exit

Note

Validation by ML model classifier is used to reduce False Positives (by far), but might increase False negatives and execution time. You may change system sensitivity by modifying –ml_threshold argument. Increasing threshold will decrease the number of alerts. Setting –ml_threshold 0 will turn ML off and will maximize the number of alerts.

Typical False Positives: password = “template_password”

Note

CredSweeper includes an experimental –depth option that enables scanning with awareness of specific data formats, such as:

  • Compressed files (zip, gzip, bzip2, lzma)

  • Data containers (deb, tar, Docker images, pkcs12, jks)

  • Document rendering (pdf, xls, ods, xlsx, docx, pptx, tm7, mxfile)

  • Base64-encoded content

  • Structured text formats (HTML, XML, JSON, NDJSON, YAML, etc.) - keys and values are combined before analysis

  • Python sources - reformatting source code to plain code style to avoid cases which may hide values from patterns (“AKIA” “EXAMPLE…” -> “AKIAEXAMPLE…”)

Remark: The reported line number for a found credential with the option may not correspond to the original file. The info field provides context to help you understand how the credential was detected.

Get output as JSON file with deep scan for docker image:

Prepare dockerfile

FROM scratch
ADD tests/samples /

Build, save and scan

docker build . --tag test_samples
docker save test_samples --output test_samples.docker
python -m credsweeper --path test_samples.docker --save-json output.json --depth 3

Review the report file (output.json):

[
...
    {
        "rule": "Password",
        "severity": "medium",
        "confidence": "moderate",
        "ml_probability": 0.7925280332565308,
        "line_data_list": [
            {
                "line": "password = 'cackle!'",
                "line_num": 1,
                "path": "test_samples.docker",
                "info": "FILE:test_samples.docker|TAR:blobs/sha256/82a4962c3cfebb62a42c2fd5c120ea0706a9ae66f52f71f957c052c873c60775|TAR:password.gradle|STRUCT|STRING:0|RAW",
                "variable": "password",
                "variable_start": 0,
                "variable_end": 8,
                "value": "cackle!",
                "value_start": 12,
                "value_end": 19,
                "entropy": 2.52164
            }
        ]
    },
...
]

Get CLI output only:

python -m credsweeper --path tests/samples/password.gradle
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9149653911590576 | line_data_list: [path: tests/samples/password.gradle | line_num: 1 | value: 'cackle!' | line: 'password = "cackle!"']

Exclude outputs using CLI:

If you want to remove some values from report (e.g. known public secrets): create text files with lines or values you want to remove and add it using –denylist argument. Space-like characters at left and right will be ignored.

$ python -m credsweeper --path tests/samples/password.gradle --denylist list.txt
Detected Credentials: 0
Time Elapsed: 0.07523202896118164s
$ cat list.txt
cackle!
  password = "cackle!"

Exclude outputs using config:

Edit exclude part of the config file. Default config can be generated using python -m credsweeper --export_config place_to_save.json or can be found in credsweeper/secret/config.json. Space-like characters at left and right will be ignored.

"exclude": {
    "lines": ["   password = \"cackle!\" "],
    "values": ["cackle!"]
}

Then specify your config in CLI:

$ python -m credsweeper --path tests/samples/password.gradle --config my_cfg.json
Detected Credentials: 0
Time Elapsed: 0.07152628898620605s

Use as a python library

Minimal example for scanning line list:

from credsweeper import CredSweeper, StringContentProvider


to_scan = ["line one", "password='in_line_2'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
    print(r)
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 1 | path:  | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False]

Minimal example for scanning bytes:

from credsweeper import CredSweeper, ByteContentProvider


to_scan = b"line one\npassword='cackle!'"
cred_sweeper = CredSweeper()
provider = ByteContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
    print(r)
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 2 | path:  | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False]

Minimal example for the ML validation:

from credsweeper import CredSweeper, StringContentProvider, MlValidator, ThresholdPreset


to_scan = ["line one", "password='cackle!'", "secret='template'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)

# You can select lower or higher threshold to get more or less reports respectively
threshold = ThresholdPreset.medium
validator = MlValidator(threshold=threshold)

results = cred_sweeper.file_scan(provider)
for candidate in results:
    # For each results detected by a CredSweeper, you can validate them using MlValidator
    is_credential, with_probability = validator.validate(candidate)
    if is_credential:
        print(candidate)

Note that “secret=’template’” is not reported due to failing check by the MlValidator.

rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 2 | path:  | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False]

Configurations