- Rename project to stealergram throughout - Add pyproject.toml (replaces requirements.txt split, folds pytest.ini) - Replace all em-dashes with hyphens across all source files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.6 KiB
utils/scorer.py
Severity scoring for credential hits. No Telegram deps. Pure logic.
Public API
from utils.scorer import score_hit, score_hits, summarize, ScoredHit
from utils.scorer import CRITICAL, HIGH, MEDIUM, LOW, SEVERITY_EMOJI, SEVERITY_SCORES
score_hit(line: str) -> ScoredHit
Score a single raw credential line. Parses ULP format (url:user:pass), runs all checks, returns a ScoredHit.
score_hits(lines: list[str]) -> list[ScoredHit]
Score a list of lines. Returns sorted descending by score.
summarize(scored: list[ScoredHit]) -> dict
Returns {CRITICAL: n, HIGH: n, MEDIUM: n, LOW: n}.
ScoredHit dataclass
| Field | Type | Description |
|---|---|---|
raw |
str | Original credential line |
severity |
str | CRITICAL / HIGH / MEDIUM / LOW |
score |
int | 40 / 30 / 20 / 10 |
reasons |
list[str] | Human-readable match reasons |
url |
str|None | Parsed URL field |
username |
str|None | Parsed username/email field |
password |
str|None | Parsed password field |
.emoji |
property | 🔴🟠🟡🟢 |
Scoring rules (highest match wins)
| Severity | Triggers |
|---|---|
| CRITICAL | Employee email domain after @ in username/line · Privileged service URL (admin, vpn, ssh, rdp, gitlab, jira…) |
| HIGH | Internal service URL (intranet, erp, crm, sso, owa, sharepoint…) |
| MEDIUM | Client-facing URL (app, patient, booking, helpdesk…) |
| LOW | Org domain appears anywhere in line (baseline) |
Check 6 (no severity change): flags weak passwords ≤6 chars or common strings.
Employee domain matching
Keywords in config.TARGET_KEYWORDS containing @ become employee patterns.
Pattern: @<domain>(?:[^a-zA-Z0-9.\-]|$) - requires literal @ before the domain.
user@gmail.com on a URL containing myorg.cl does NOT trigger CRITICAL.
Keywords without @ go only to ORG_DOMAINS (LOW baseline).
ULP line parser (ULP_PATTERN)
Separators: : ; , | \t (any of these between the three fields).
The URL field handles two common stealer-log complications:
-
://not treated as separator - the optional scheme prefix(?:https?|ftp)://is consumed before the character-class match, sohttps://never gets split at the colon. -
Port + path consumed into the URL - the optional group
(?::\d+/[^\s:;,|\t]*)absorbs:port/pathwhen the port is pure digits immediately followed by/. This correctly handleshttp://host:8085/path/:user:passbut intentionally skips patterns like:24145487-8(RUT number - hyphen after digits, no/).
Known limitation: A bare port with no path (e.g. https://host:8080:user:pass) will mis-parse 8080 as the username. This is not observed in practice - stealer logs always include at least a trailing /.
Module-level globals (rebuilt on import + via reload_from_config)
| Name | Type | Description |
|---|---|---|
EMPLOYEE_DOMAINS |
list[tuple[str, Pattern]] |
(domain_str, anchored_pattern) for @-keywords |
ORG_DOMAINS |
list[Pattern] |
Plain domain patterns for all keywords |
scorer uses import config as _config (not from config import TARGET_KEYWORDS), so patching config.TARGET_KEYWORDS at runtime is sufficient - _build_* reads the live module attribute.
To rebuild after editing config.TARGET_KEYWORDS at runtime:
import utils.scorer as scorer
scorer.reload_from_config()
reload_from_config() -> None
Rebuilds EMPLOYEE_DOMAINS and ORG_DOMAINS from the current config.TARGET_KEYWORDS. Called by web config routes after config.save_runtime_config() writes new keyword groups.