Files
stealergram/utils/scorer.md
anti 741e6bb0d3 Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout
- Add pyproject.toml (replaces requirements.txt split, folds pytest.ini)
- Replace all em-dashes with hyphens across all source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:06:30 -04:00

3.6 KiB

utils/scorer.py

Severity scoring for credential hits. No Telegram deps. Pure logic.

Public API

from utils.scorer import score_hit, score_hits, summarize, ScoredHit
from utils.scorer import CRITICAL, HIGH, MEDIUM, LOW, SEVERITY_EMOJI, SEVERITY_SCORES

score_hit(line: str) -> ScoredHit

Score a single raw credential line. Parses ULP format (url:user:pass), runs all checks, returns a ScoredHit.

score_hits(lines: list[str]) -> list[ScoredHit]

Score a list of lines. Returns sorted descending by score.

summarize(scored: list[ScoredHit]) -> dict

Returns {CRITICAL: n, HIGH: n, MEDIUM: n, LOW: n}.


ScoredHit dataclass

Field Type Description
raw str Original credential line
severity str CRITICAL / HIGH / MEDIUM / LOW
score int 40 / 30 / 20 / 10
reasons list[str] Human-readable match reasons
url str|None Parsed URL field
username str|None Parsed username/email field
password str|None Parsed password field
.emoji property 🔴🟠🟡🟢

Scoring rules (highest match wins)

Severity Triggers
CRITICAL Employee email domain after @ in username/line · Privileged service URL (admin, vpn, ssh, rdp, gitlab, jira…)
HIGH Internal service URL (intranet, erp, crm, sso, owa, sharepoint…)
MEDIUM Client-facing URL (app, patient, booking, helpdesk…)
LOW Org domain appears anywhere in line (baseline)

Check 6 (no severity change): flags weak passwords ≤6 chars or common strings.


Employee domain matching

Keywords in config.TARGET_KEYWORDS containing @ become employee patterns.
Pattern: @<domain>(?:[^a-zA-Z0-9.\-]|$) - requires literal @ before the domain.
user@gmail.com on a URL containing myorg.cl does NOT trigger CRITICAL.

Keywords without @ go only to ORG_DOMAINS (LOW baseline).


ULP line parser (ULP_PATTERN)

Separators: : ; , | \t (any of these between the three fields).

The URL field handles two common stealer-log complications:

  1. :// not treated as separator - the optional scheme prefix (?:https?|ftp):// is consumed before the character-class match, so https:// never gets split at the colon.

  2. Port + path consumed into the URL - the optional group (?::\d+/[^\s:;,|\t]*) absorbs :port/path when the port is pure digits immediately followed by /. This correctly handles http://host:8085/path/:user:pass but intentionally skips patterns like :24145487-8 (RUT number - hyphen after digits, no /).

Known limitation: A bare port with no path (e.g. https://host:8080:user:pass) will mis-parse 8080 as the username. This is not observed in practice - stealer logs always include at least a trailing /.


Module-level globals (rebuilt on import + via reload_from_config)

Name Type Description
EMPLOYEE_DOMAINS list[tuple[str, Pattern]] (domain_str, anchored_pattern) for @-keywords
ORG_DOMAINS list[Pattern] Plain domain patterns for all keywords

scorer uses import config as _config (not from config import TARGET_KEYWORDS), so patching config.TARGET_KEYWORDS at runtime is sufficient - _build_* reads the live module attribute.

To rebuild after editing config.TARGET_KEYWORDS at runtime:

import utils.scorer as scorer
scorer.reload_from_config()

reload_from_config() -> None

Rebuilds EMPLOYEE_DOMAINS and ORG_DOMAINS from the current config.TARGET_KEYWORDS. Called by web config routes after config.save_runtime_config() writes new keyword groups.