How To Use
Run
Get all argument list:
python -m credsweeper --help
usage: python -m credsweeper [-h]
(--path PATH [PATH ...] | --diff_path PATH [PATH ...] | --export_config [PATH] | --export_log_config [PATH] | --git PATH)
[--ref REF] [--rules PATH] [--severity SEVERITY]
[--config PATH] [--log_config PATH]
[--denylist PATH] [--find-by-ext]
[--pedantic | --no-pedantic]
[--depth POSITIVE_INT] [--no-filters] [--doc]
[--ml_threshold THRESHOLD_OR_FLOAT_OR_ZERO]
[--ml_batch_size POSITIVE_INT] [--ml_config PATH]
[--ml_model PATH] [--ml_providers STR]
[--jobs POSITIVE_INT] [--thrifty | --no-thrifty]
[--skip_ignored] [--error | --no-error]
[--save-json [PATH]] [--save-xlsx [PATH]]
[--stdout | --no-stdout] [--color | --no-color]
[--hashed | --no-hashed]
[--subtext | --no-subtext] [--sort | --no-sort]
[--log LOG_LEVEL] [--size_limit SIZE_LIMIT]
[--banner] [--version]
options:
-h, --help show this help message and exit
--path PATH [PATH ...]
file or directory to scan
--diff_path PATH [PATH ...]
git diff file to scan
--export_config [PATH]
exporting default config to file (default:
config.json)
--export_log_config [PATH]
exporting default logger config to file (default:
log.yaml)
--git PATH git repo to scan
--ref REF scan git repo from the ref, otherwise - all branches
were scanned (slow)
--rules PATH path of rule config file (default:
credsweeper/rules/config.yaml). severity:['critical',
'high', 'medium', 'low', 'info'] type:['keyword',
'pattern', 'pem_key', 'multi']
--severity SEVERITY set minimum level for rules to apply ['critical',
'high', 'medium', 'low', 'info'](default:
'Severity.INFO', case insensitive)
--config PATH use custom config (default: built-in)
--log_config PATH use custom log config (default: built-in)
--denylist PATH path to a plain text file with lines or secrets to
ignore
--find-by-ext find files by predefined extension
--pedantic, --no-pedantic
process files without extension (default: False)
--depth POSITIVE_INT additional recursive search in data (experimental)
--no-filters disable filters
--doc document-specific scanning
--ml_threshold THRESHOLD_OR_FLOAT_OR_ZERO
setup threshold for the ml model. The lower the
threshold - the more credentials will be reported.
Allowed values: float between 0 and 1, or any of
['lowest', 'low', 'medium', 'high', 'highest']
(default: medium)
--ml_batch_size POSITIVE_INT, -b POSITIVE_INT
batch size for model inference (default: 16)
--ml_config PATH use external config for ml model
--ml_model PATH use external ml model
--ml_providers STR comma separated list of providers for onnx
(CPUExecutionProvider is used by default)
--jobs POSITIVE_INT, -j POSITIVE_INT
number of parallel processes to use (default: 1)
--thrifty, --no-thrifty
clear objects after scan to reduce memory consumption
(default: True)
--skip_ignored parse .gitignore files and skip credentials from
ignored objects
--error, --no-error produce error code if credentials are found (default:
False)
--save-json [PATH] save result to json file (default: output.json)
--save-xlsx [PATH] save result to xlsx file (default: output.xlsx)
--stdout, --no-stdout
print results to stdout (default: True)
--color, --no-color print results with colorization (default: False)
--hashed, --no-hashed
line, variable, value will be hashed in output
(default: False)
--subtext, --no-subtext
line text will be stripped in 128 symbols but value
and variable are kept (default: False)
--sort, --no-sort enable output sorting (default: False)
--log LOG_LEVEL, -l LOG_LEVEL
provide logging level of ['DEBUG', 'INFO', 'WARN',
'WARNING', 'ERROR', 'FATAL', 'CRITICAL', 'SILENCE']
(default: 'warning', case insensitive)
--size_limit SIZE_LIMIT
set size limit of files that for scanning (eg. 1GB /
10MiB / 1000)
--banner show version and crc32 sum of CredSweeper files at
start
--version, -V show program's version number and exit
Note
Validation by ML model classifier is used to reduce False Positives (by far), but might increase False negatives and execution time. You may change system sensitivity by modifying –ml_threshold argument. Increasing threshold will decrease the number of alerts. Setting –ml_threshold 0 will turn ML off and will maximize the number of alerts.
Typical False Positives: password = “template_password”
Note
CredSweeper includes an experimental –depth option that enables scanning with awareness of specific data formats, such as:
Compressed files (zip, gzip, bzip2, lzma)
Data containers (deb, tar, Docker images, pkcs12, jks)
Document rendering (pdf, xls, ods, xlsx, docx, pptx, tm7, mxfile)
Base64-encoded content
Structured text formats (HTML, XML, JSON, NDJSON, YAML, etc.) - keys and values are combined before analysis
Python sources - reformatting source code to plain code style to avoid cases which may hide values from patterns (“AKIA” “EXAMPLE…” -> “AKIAEXAMPLE…”)
Remark: The reported line number for a found credential with the option may not correspond to the original file. The info field provides context to help you understand how the credential was detected.
Get output as JSON file with deep scan for docker image:
Prepare dockerfile
FROM scratch
ADD tests/samples /
Build, save and scan
docker build . --tag test_samples
docker save test_samples --output test_samples.docker
python -m credsweeper --path test_samples.docker --save-json output.json --depth 3
Review the report file (output.json):
[
...
{
"rule": "Password",
"severity": "medium",
"confidence": "moderate",
"ml_probability": 0.7925280332565308,
"line_data_list": [
{
"line": "password = 'cackle!'",
"line_num": 1,
"path": "test_samples.docker",
"info": "FILE:test_samples.docker|TAR:blobs/sha256/82a4962c3cfebb62a42c2fd5c120ea0706a9ae66f52f71f957c052c873c60775|TAR:password.gradle|STRUCT|STRING:0|RAW",
"variable": "password",
"variable_start": 0,
"variable_end": 8,
"value": "cackle!",
"value_start": 12,
"value_end": 19,
"entropy": 2.52164
}
]
},
...
]
Get CLI output only:
python -m credsweeper --path tests/samples/password.gradle
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9149653911590576 | line_data_list: [path: tests/samples/password.gradle | line_num: 1 | value: 'cackle!' | line: 'password = "cackle!"']
Exclude outputs using CLI:
If you want to remove some values from report (e.g. known public secrets): create text files with lines or values you want to remove and add it using –denylist argument. Space-like characters at left and right will be ignored.
$ python -m credsweeper --path tests/samples/password.gradle --denylist list.txt
Detected Credentials: 0
Time Elapsed: 0.07523202896118164s
$ cat list.txt
cackle!
password = "cackle!"
Exclude outputs using config:
Edit exclude part of the config file.
Default config can be generated using python -m credsweeper --export_config place_to_save.json
or can be found in credsweeper/secret/config.json.
Space-like characters at left and right will be ignored.
"exclude": {
"lines": [" password = \"cackle!\" "],
"values": ["cackle!"]
}
Then specify your config in CLI:
$ python -m credsweeper --path tests/samples/password.gradle --config my_cfg.json
Detected Credentials: 0
Time Elapsed: 0.07152628898620605s
Use as a python library
Minimal example for scanning line list:
from credsweeper import CredSweeper, StringContentProvider
to_scan = ["line one", "password='in_line_2'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
print(r)
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 1 | path: | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False]
Minimal example for scanning bytes:
from credsweeper import CredSweeper, ByteContentProvider
to_scan = b"line one\npassword='cackle!'"
cred_sweeper = CredSweeper()
provider = ByteContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
print(r)
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 2 | path: | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False]
Minimal example for the ML validation:
from credsweeper import CredSweeper, StringContentProvider, MlValidator, ThresholdPreset
to_scan = ["line one", "password='cackle!'", "secret='template'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)
# You can select lower or higher threshold to get more or less reports respectively
threshold = ThresholdPreset.medium
validator = MlValidator(threshold=threshold)
results = cred_sweeper.file_scan(provider)
for candidate in results:
# For each results detected by a CredSweeper, you can validate them using MlValidator
is_credential, with_probability = validator.validate(candidate)
if is_credential:
print(candidate)
Note that “secret=’template’” is not reported due to failing check by the MlValidator.
rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 2 | path: | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False]