How To Use ========== Run --- Get all argument list: .. code-block:: bash python -m credsweeper --help .. code-block:: text usage: python -m credsweeper [-h] (--path PATH [PATH ...] | --diff_path PATH [PATH ...] | --export_config [PATH] | --export_log_config [PATH] | --git PATH) [--ref REF] [--rules PATH] [--severity SEVERITY] [--config PATH] [--log_config PATH] [--denylist PATH] [--find-by-ext] [--pedantic | --no-pedantic] [--depth POSITIVE_INT] [--no-filters] [--doc] [--ml_threshold THRESHOLD_OR_FLOAT_OR_ZERO] [--ml_batch_size POSITIVE_INT] [--ml_config PATH] [--ml_model PATH] [--ml_providers STR] [--jobs POSITIVE_INT] [--thrifty | --no-thrifty] [--skip_ignored] [--error | --no-error] [--save-json [PATH]] [--save-xlsx [PATH]] [--stdout | --no-stdout] [--color | --no-color] [--hashed | --no-hashed] [--subtext | --no-subtext] [--sort | --no-sort] [--log LOG_LEVEL] [--size_limit SIZE_LIMIT] [--banner] [--version] options: -h, --help show this help message and exit --path PATH [PATH ...] file or directory to scan --diff_path PATH [PATH ...] git diff file to scan --export_config [PATH] exporting default config to file (default: config.json) --export_log_config [PATH] exporting default logger config to file (default: log.yaml) --git PATH git repo to scan --ref REF scan git repo from the ref, otherwise - all branches were scanned (slow) --rules PATH path of rule config file (default: credsweeper/rules/config.yaml). severity:['critical', 'high', 'medium', 'low', 'info'] type:['keyword', 'pattern', 'pem_key', 'multi'] --severity SEVERITY set minimum level for rules to apply ['critical', 'high', 'medium', 'low', 'info'](default: 'Severity.INFO', case insensitive) --config PATH use custom config (default: built-in) --log_config PATH use custom log config (default: built-in) --denylist PATH path to a plain text file with lines or secrets to ignore --find-by-ext find files by predefined extension --pedantic, --no-pedantic process files without extension (default: False) --depth POSITIVE_INT additional recursive search in data (experimental) --no-filters disable filters --doc document-specific scanning --ml_threshold THRESHOLD_OR_FLOAT_OR_ZERO setup threshold for the ml model. The lower the threshold - the more credentials will be reported. Allowed values: float between 0 and 1, or any of ['lowest', 'low', 'medium', 'high', 'highest'] (default: medium) --ml_batch_size POSITIVE_INT, -b POSITIVE_INT batch size for model inference (default: 16) --ml_config PATH use external config for ml model --ml_model PATH use external ml model --ml_providers STR comma separated list of providers for onnx (CPUExecutionProvider is used by default) --jobs POSITIVE_INT, -j POSITIVE_INT number of parallel processes to use (default: 1) --thrifty, --no-thrifty clear objects after scan to reduce memory consumption (default: True) --skip_ignored parse .gitignore files and skip credentials from ignored objects --error, --no-error produce error code if credentials are found (default: False) --save-json [PATH] save result to json file (default: output.json) --save-xlsx [PATH] save result to xlsx file (default: output.xlsx) --stdout, --no-stdout print results to stdout (default: True) --color, --no-color print results with colorization (default: False) --hashed, --no-hashed line, variable, value will be hashed in output (default: False) --subtext, --no-subtext line text will be stripped in 128 symbols but value and variable are kept (default: False) --sort, --no-sort enable output sorting (default: False) --log LOG_LEVEL, -l LOG_LEVEL provide logging level of ['DEBUG', 'INFO', 'WARN', 'WARNING', 'ERROR', 'FATAL', 'CRITICAL', 'SILENCE'] (default: 'warning', case insensitive) --size_limit SIZE_LIMIT set size limit of files that for scanning (eg. 1GB / 10MiB / 1000) --banner show version and crc32 sum of CredSweeper files at start --version, -V show program's version number and exit .. note:: Validation by `ML model classifier `_ is used to reduce False Positives (by far), but might increase False negatives and execution time. You may change system sensitivity by modifying --ml_threshold argument. Increasing threshold will decrease the number of alerts. Setting `--ml_threshold 0` will turn ML off and will maximize the number of alerts. Typical False Positives: `password = "template_password"` .. note:: CredSweeper includes an experimental `--depth` option that enables scanning with awareness of specific data formats, such as: - Compressed files (zip, gzip, bzip2, lzma) - Data containers (deb, tar, Docker images, pkcs12, jks) - Document rendering (pdf, xls, ods, xlsx, docx, pptx, tm7, mxfile) - Base64-encoded content - Structured text formats (HTML, XML, JSON, NDJSON, YAML, etc.) - keys and values are combined before analysis - Python sources - reformatting source code to plain code style to avoid cases which may hide values from patterns ("AKIA" "EXAMPLE..." -> "AKIAEXAMPLE...") **Remark:** The reported line number for a found credential with the option may not correspond to the original file. The `info` field provides context to help you understand how the credential was detected. Get output as JSON file with deep scan for docker image: Prepare dockerfile .. code-block:: docker FROM scratch ADD tests/samples / Build, save and scan .. code-block:: bash docker build . --tag test_samples docker save test_samples --output test_samples.docker python -m credsweeper --path test_samples.docker --save-json output.json --depth 3 Review the report file (output.json): .. code-block:: json [ ... { "rule": "Password", "severity": "medium", "confidence": "moderate", "ml_probability": 0.7925280332565308, "line_data_list": [ { "line": "password = 'cackle!'", "line_num": 1, "path": "test_samples.docker", "info": "FILE:test_samples.docker|TAR:blobs/sha256/82a4962c3cfebb62a42c2fd5c120ea0706a9ae66f52f71f957c052c873c60775|TAR:password.gradle|STRUCT|STRING:0|RAW", "variable": "password", "variable_start": 0, "variable_end": 8, "value": "cackle!", "value_start": 12, "value_end": 19, "entropy": 2.52164 } ] }, ... ] Get CLI output only: .. code-block:: bash python -m credsweeper --path tests/samples/password.gradle .. code-block:: text rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9149653911590576 | line_data_list: [path: tests/samples/password.gradle | line_num: 1 | value: 'cackle!' | line: 'password = "cackle!"'] Exclude outputs using CLI: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to remove some values from report (e.g. known public secrets): create text files with lines or values you want to remove and add it using `--denylist` argument. Space-like characters at left and right will be ignored. .. code-block:: bash $ python -m credsweeper --path tests/samples/password.gradle --denylist list.txt Detected Credentials: 0 Time Elapsed: 0.07523202896118164s $ cat list.txt cackle! password = "cackle!" Exclude outputs using config: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Edit ``exclude`` part of the config file. Default config can be generated using ``python -m credsweeper --export_config place_to_save.json`` or can be found in ``credsweeper/secret/config.json``. Space-like characters at left and right will be ignored. .. code-block:: json "exclude": { "lines": [" password = \"cackle!\" "], "values": ["cackle!"] } Then specify your config in CLI: .. code-block:: bash $ python -m credsweeper --path tests/samples/password.gradle --config my_cfg.json Detected Credentials: 0 Time Elapsed: 0.07152628898620605s Use as a python library ----------------------- Minimal example for scanning line list: .. code-block:: python from credsweeper import CredSweeper, StringContentProvider to_scan = ["line one", "password='in_line_2'"] cred_sweeper = CredSweeper() provider = StringContentProvider(to_scan) results = cred_sweeper.file_scan(provider) for r in results: print(r) .. code-block:: text rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 1 | path: | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False] Minimal example for scanning bytes: .. code-block:: python from credsweeper import CredSweeper, ByteContentProvider to_scan = b"line one\npassword='cackle!'" cred_sweeper = CredSweeper() provider = ByteContentProvider(to_scan) results = cred_sweeper.file_scan(provider) for r in results: print(r) .. code-block:: text rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 2 | path: | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False] Minimal example for the ML validation: .. code-block:: python from credsweeper import CredSweeper, StringContentProvider, MlValidator, ThresholdPreset to_scan = ["line one", "password='cackle!'", "secret='template'"] cred_sweeper = CredSweeper() provider = StringContentProvider(to_scan) # You can select lower or higher threshold to get more or less reports respectively threshold = ThresholdPreset.medium validator = MlValidator(threshold=threshold) results = cred_sweeper.file_scan(provider) for candidate in results: # For each results detected by a CredSweeper, you can validate them using MlValidator is_credential, with_probability = validator.validate(candidate) if is_credential: print(candidate) Note that `"secret='template'"` is not reported due to failing check by the `MlValidator`. .. code-block:: text rule: Password | severity: medium | confidence: moderate | ml_probability: 0.9857242107391357 | line_data_list: [line: 'password = "cackle!"' | line_num: 2 | path: | value: 'cackle!' | entropy_validation: BASE64STDPAD_CHARS 2.120590 False] Configurations -------------- .. toctree:: :maxdepth: 1 apps_config .. toctree:: :maxdepth: 1 rules_config