credsweeper.credentials package

Submodules

credsweeper.credentials.augment_candidates module

credsweeper.credentials.augment_candidates.augment_candidates(candidates: List[Candidate], new_candidates: List[Candidate])[source]

Augments candidates with new_candidates if value of line data is not present in the candidates

Parameters:
  • candidates – [IN/OUT] list of candidates to be augmented

  • new_candidates – [IN] list with new candidates

credsweeper.credentials.candidate module

class credsweeper.credentials.candidate.Candidate(line_data_list: List[LineData], patterns: List[Pattern], rule_name: str, severity: Severity, config: Config | None = None, use_ml: bool = False, confidence: Confidence = Confidence.MODERATE)[source]

Bases: object

Candidates that can be credentials.

Class contains list of LineData, some attributes from Rule object, and config

Parameters:
  • line_data_list – List of LineData

  • patterns – Regular expressions that can be used for detection

  • rule_name – Name of Rule

  • severity – critical/high/medium/low

  • confidence – strong/moderate/weak

  • config – user configs

  • use_ml – Whether the candidate should be validated with ML. If not - ml_probability is set None

DUMMY_PATTERN = re.compile('^')
compare(other: Candidate) bool[source]

Comparison method - checks only result of final cred

classmethod get_dummy_candidate(config: Config, file_path: str, file_type: str, info: str, rule_name: str)[source]

Create dummy instance to use in searching file by extension

to_dict_list(hashed: bool, subtext: bool) List[dict][source]

Convert credential candidate object to List[dict].

Returns:

List[dict] object generated from current credential candidate

to_json(hashed: bool, subtext: bool) Dict[source]

Convert credential candidate object to dictionary.

Returns:

Dictionary object generated from current credential candidate

to_str(subtext: bool = False, hashed: bool = False) str[source]

Represent candidate with subtext or|and hashed values

credsweeper.credentials.candidate_group_generator module

class credsweeper.credentials.candidate_group_generator.CandidateGroupGenerator[source]

Bases: object

property grouped_candidates: Dict[CandidateKey, List[Candidate]]

property getter

items() List[Tuple[CandidateKey, List[Candidate]]][source]

getter

credsweeper.credentials.candidate_key module

class credsweeper.credentials.candidate_key.CandidateKey(line_data: LineData)[source]

Bases: object

Class used to identify credential candidates.

Candidates that detected same value on same string in a same file would have identical CandidateKey

credsweeper.credentials.credential_manager module

class credsweeper.credentials.credential_manager.CredentialManager[source]

Bases: object

The manager allows you to store, add and delete separate credit candidates.

add_credential(candidate: Candidate) None[source]

Add credential candidate to the manager.

Parameters:

candidate – credential candidate to be added

clear_credentials() None[source]

Clear credential candidates stored in the manager.

get_credentials() List[Candidate][source]

Get all credential candidates stored in the manager.

Returns:

List with all Candidate objects stored in manager

group_credentials() CandidateGroupGenerator[source]

Join candidates that reference same secret value in the same line.

Candidate can belong to two groups in the same time if it has more than one LineData object inside

Returns:

Contain dictionary of [path, line_num, value] -> credential candidates list

len_credentials() int[source]

Get number of credential candidates stored in the manager.

Returns:

Non-negative integer

purge_duplicates() int[source]

Purge duplicates candidates which may appear in overlaps during long line scan.

Returns: number of removed duplicates

remove_credential(candidate: Candidate) None[source]

Remove credential candidate from the manager.

Parameters:

candidate – credential candidate to be removed

set_credentials(candidates: List[Candidate]) None[source]

Remove all current credentials candidates from the manager and add new credentials.

Parameters:

candidates – List with candidates to replace current candidates in the manager

credsweeper.credentials.line_data module

class credsweeper.credentials.line_data.LineData(config: Config, line: str, line_pos: int, line_num: int, path: str, file_type: str, info: str, pattern: Pattern, match_obj: Match | None = None)[source]

Bases: object

Object to treat and store scanned line related data.

Parameters:
  • key – Optional[str] = None

  • line – string variable, line

  • line_num – int variable, number of line in file

  • path – string variable, path to file

  • file_type – string variable, extension of file ‘.txt’

  • info – additional info about how the data was detected

  • pattern – regex pattern, detected pattern in line

  • separator – optional string variable, separators between variable and value

  • separator_start – optional variable, separator position start

  • value – optional string variable, detected value in line

  • variable – optional string variable, detected variable in line

EXCEPTION_POSITION = -2
INITIAL_WRONG_POSITION = -3
bash_param_split = re.compile('\\s+(\\-|\\||\\>|\\w+?\\>|\\&)')
check_url_part() bool[source]

Determines whether value is part of url like line

clean_bash_parameters() None[source]

Split variable and value by bash special characters, if line assumed to be CLI command.

clean_tag_parameters() None[source]

Remove closing tag from value if the opened is somewhere before in line

clean_toml_parameters() None[source]

Parenthesis, curly and squared brackets may be caught in TOML format and bash. Simple clearing

clean_url_parameters() None[source]

Clean url address from ‘query parameters’.

If line seem to be a URL - split by & character. Variable should be right most value after & or ? ([-1]). And value should be left most before & ([0])

comment_starts = ('//', '* ', '# ', '/*', '<!––', '%{', '%', '...', '(*', '--', '--[[', '#=')
compare(other: LineData) bool[source]

Comparison method - skip whole line and checks only when variable and value are the same

get_colored_line(hashed: bool, subtext: bool = False) str[source]

Represents the LineData with a value, separator, and variable color formatting

static get_hash_or_subtext(text: str | None, hashed: bool, cut_pos: StartEnd | None = None) str | None[source]

Represent not empty text with hash or a “beauty” subtext if required

Parameters:
  • text – str - input string

  • hashed – bool - whether the text will be hashed and returned

  • cut_pos – Optional[StartEnd] - start, end positions which text must be kept in output

Returns:

sha256 hash in hex representation of input text with UTF-8 encodings or subtext from start to end, or original text as is

initialize(match_obj: Match | None = None) None[source]

Apply regex to the candidate line and set internal fields based on match.

is_comment() bool[source]

Check if line with credential is a comment.

Returns:

True if line is a comment, False otherwise

property is_quoted: bool

Check if variable and value in a quoted string.

Returns:

True if candidate in a quoted string, False otherwise

is_source_file() bool[source]

Check if file with credential is a source code file or not (data, log, plain text).

Returns:

True if file is source file, False otherwise

is_source_file_with_quotes() bool[source]

Check if file with credential require quotation for string literals.

Returns:

True if file require quotation, False otherwise

property is_well_quoted_value: bool

Well quoted value - means the value has been quoted or has line wrap

line_endings = re.compile('\\\\{1,8}[nr]')
quotation_marks = ('"', "'", '`')
sanitize_value()[source]

Clean found value from extra artifacts. Correct positions if changed.

sanitize_variable() None[source]

Remove trailing spaces, dashes and quotations around the variable. Correct position.

to_json(hashed: bool, subtext: bool) Dict[source]

Convert line data object to dictionary.

Returns:

Dictionary object generated from current line data

to_str(subtext: bool = False, hashed: bool = False) str[source]

Represent line_data with subtext or|and hashed values

url_chars_not_allowed_pattern = re.compile('[\\s"<>\\[\\]^~`{|}]')
url_percent_split = re.compile('%(21|23|24|26|27|28|29|2a|2b|2c|2f|3a|3b|3d|3f|40|5b|5d)', re.IGNORECASE)
url_scheme_part_regex = re.compile('[0-9A-Za-z.-]{3}')
url_unicode_split = re.compile('\\\\u00(0000)?(21|23|24|26|27|28|29|2a|2b|2c|2f|3a|3b|3d|3f|40|5b|5d)', re.IGNORECASE)
url_value_pattern = re.compile('[^\\s&;"<>\\[\\]^~`{|}]+[&;][^\\s=;"<>\\[\\]^~`{|}]{3,80}=[^\\s;&="<>\\[\\]^~`{|}]{1,80}')
variable_strip_pattern = ' \t\n\r\x0b\x0c,\'"-;'

Module contents