Files
stealergram/utils/scorer.md
anti 4c104cddd2 Add web frontend with JWT auth, RBAC, SSE dashboard, and config editor
- FastAPI + htmx + Jinja2 web frontend, started with --web flag
- JWT HS256 auth (WEB_SECRET_KEY) with httpOnly cookies; access (15 min) +
  refresh (7 day) tokens; refresh rotation + JTI revocation in data/web.db
- RBAC: superadmin > admin > reader enforced per route
- Live SSE dashboard fed by tui/events broadcast queue
- Config editor: keyword groups and channel list saved to data/runtime_config.json
  and hot-reloaded in-process (scorer.reload_from_config, signal_channel_changed)
- config.py migrated to load groups/channels from runtime_config.json;
  falls back to hardcoded defaults when file absent
- tui/events.py: subscribe/unsubscribe broadcast, set_bot_context/signal_channel_changed
- utils/scorer.py: import config as _config (fixes local binding); reload_from_config()
- utils/database.py: count_by_severity, recent_for_domains, count_by_severity_for_domains
- 53 new tests (events bus, JWT lifecycle, web DB CRUD, RBAC enforcement,
  config round-trip); total 141 passing
2026-04-02 11:41:46 -03:00

92 lines
3.6 KiB
Markdown

# utils/scorer.py
Severity scoring for credential hits. No Telegram deps. Pure logic.
## Public API
```python
from utils.scorer import score_hit, score_hits, summarize, ScoredHit
from utils.scorer import CRITICAL, HIGH, MEDIUM, LOW, SEVERITY_EMOJI, SEVERITY_SCORES
```
### `score_hit(line: str) -> ScoredHit`
Score a single raw credential line. Parses ULP format (`url:user:pass`), runs all checks, returns a `ScoredHit`.
### `score_hits(lines: list[str]) -> list[ScoredHit]`
Score a list of lines. Returns sorted descending by score.
### `summarize(scored: list[ScoredHit]) -> dict`
Returns `{CRITICAL: n, HIGH: n, MEDIUM: n, LOW: n}`.
---
## ScoredHit dataclass
| Field | Type | Description |
|-------|------|-------------|
| `raw` | str | Original credential line |
| `severity` | str | CRITICAL / HIGH / MEDIUM / LOW |
| `score` | int | 40 / 30 / 20 / 10 |
| `reasons` | list[str] | Human-readable match reasons |
| `url` | str\|None | Parsed URL field |
| `username` | str\|None | Parsed username/email field |
| `password` | str\|None | Parsed password field |
| `.emoji` | property | 🔴🟠🟡🟢 |
---
## Scoring rules (highest match wins)
| Severity | Triggers |
|----------|----------|
| CRITICAL | Employee email domain after `@` in username/line · Privileged service URL (admin, vpn, ssh, rdp, gitlab, jira…) |
| HIGH | Internal service URL (intranet, erp, crm, sso, owa, sharepoint…) |
| MEDIUM | Client-facing URL (app, patient, booking, helpdesk…) |
| LOW | Org domain appears anywhere in line (baseline) |
Check 6 (no severity change): flags weak passwords ≤6 chars or common strings.
---
## Employee domain matching
Keywords in `config.TARGET_KEYWORDS` containing `@` become employee patterns.
Pattern: `@<domain>(?:[^a-zA-Z0-9.\-]|$)` — requires literal `@` before the domain.
**`user@gmail.com` on a URL containing `myorg.cl` does NOT trigger CRITICAL.**
Keywords without `@` go only to `ORG_DOMAINS` (LOW baseline).
---
## ULP line parser (`ULP_PATTERN`)
Separators: `:` `;` `,` `|` `\t` (any of these between the three fields).
The URL field handles two common stealer-log complications:
1. **`://` not treated as separator** — the optional scheme prefix `(?:https?|ftp)://` is consumed before the character-class match, so `https://` never gets split at the colon.
2. **Port + path consumed into the URL** — the optional group `(?::\d+/[^\s:;,|\t]*)` absorbs `:port/path` when the port is pure digits immediately followed by `/`. This correctly handles `http://host:8085/path/:user:pass` but intentionally skips patterns like `:24145487-8` (RUT number — hyphen after digits, no `/`).
**Known limitation:** A bare port with no path (e.g. `https://host:8080:user:pass`) will mis-parse `8080` as the username. This is not observed in practice — stealer logs always include at least a trailing `/`.
---
## Module-level globals (rebuilt on import + via reload_from_config)
| Name | Type | Description |
|------|------|-------------|
| `EMPLOYEE_DOMAINS` | `list[tuple[str, Pattern]]` | `(domain_str, anchored_pattern)` for `@`-keywords |
| `ORG_DOMAINS` | `list[Pattern]` | Plain domain patterns for all keywords |
scorer uses `import config as _config` (not `from config import TARGET_KEYWORDS`), so patching `config.TARGET_KEYWORDS` at runtime is sufficient — `_build_*` reads the live module attribute.
To rebuild after editing `config.TARGET_KEYWORDS` at runtime:
```python
import utils.scorer as scorer
scorer.reload_from_config()
```
### `reload_from_config() -> None`
Rebuilds `EMPLOYEE_DOMAINS` and `ORG_DOMAINS` from the current `config.TARGET_KEYWORDS`. Called by web config routes after `config.save_runtime_config()` writes new keyword groups.