Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout - Add pyproject.toml (replaces requirements.txt split, folds pytest.ini) - Replace all em-dashes with hyphens across all source files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -15,8 +15,8 @@ NOTIFY_CHAT_ID=987654321
|
|||||||
# ─── Session name (just a filename, no extension needed) ────────────────────
|
# ─── Session name (just a filename, no extension needed) ────────────────────
|
||||||
SESSION_NAME=monitor_session
|
SESSION_NAME=monitor_session
|
||||||
|
|
||||||
# ─── tdl (fast Go downloader) — optional but strongly recommended ───────────
|
# ─── tdl (fast Go downloader) - optional but strongly recommended ───────────
|
||||||
# Install: https://github.com/iyear/tdl
|
# Install: https://github.com/iyear/tdl
|
||||||
# After installing, run once: tdl login -n <SESSION_NAME>
|
# After installing, run once: tdl login -n <SESSION_NAME>
|
||||||
# SESSION_NAME above is shared between Telethon and tdl — no double login needed.
|
# SESSION_NAME above is shared between Telethon and tdl - no double login needed.
|
||||||
# If tdl is not on PATH the bot falls back to Telethon automatically.
|
# If tdl is not on PATH the bot falls back to Telethon automatically.
|
||||||
|
|||||||
12
CLAUDE.md
12
CLAUDE.md
@@ -5,7 +5,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|||||||
## Development workflow
|
## Development workflow
|
||||||
|
|
||||||
After every code change:
|
After every code change:
|
||||||
1. Run `pytest` — all tests must pass at 100%.
|
1. Run `pytest` - all tests must pass at 100%.
|
||||||
2. If 100% pass: present the change to the user, then commit.
|
2. If 100% pass: present the change to the user, then commit.
|
||||||
3. If any test fails: fix the bug and re-run before showing anything to the user.
|
3. If any test fails: fix the bug and re-run before showing anything to the user.
|
||||||
|
|
||||||
@@ -20,7 +20,7 @@ pytest -v # verbose
|
|||||||
pytest tests/test_scorer.py # single file
|
pytest tests/test_scorer.py # single file
|
||||||
```
|
```
|
||||||
|
|
||||||
Tests cover `utils/scorer`, `utils/cache`, `utils/database`, and `core/processor`. They are fully isolated — no `.env` required, no real DB or cache files touched. The `patched_keywords` fixture in `conftest.py` replaces `TARGET_KEYWORDS` with known test patterns; it must patch both `config.TARGET_KEYWORDS` and `scorer.TARGET_KEYWORDS` (the local `from config import` binding).
|
Tests cover `utils/scorer`, `utils/cache`, `utils/database`, and `core/processor`. They are fully isolated - no `.env` required, no real DB or cache files touched. The `patched_keywords` fixture in `conftest.py` replaces `TARGET_KEYWORDS` with known test patterns; it must patch both `config.TARGET_KEYWORDS` and `scorer.TARGET_KEYWORDS` (the local `from config import` binding).
|
||||||
|
|
||||||
## Running the monitor
|
## Running the monitor
|
||||||
|
|
||||||
@@ -66,15 +66,15 @@ Telegram channel message with file attachment
|
|||||||
|
|
||||||
The TUI and Telegram bot run in separate threads with different event loops:
|
The TUI and Telegram bot run in separate threads with different event loops:
|
||||||
|
|
||||||
- **Main thread**: Textual's event loop — runs `MonitorApp`, drains the event bus every 100ms via `_drain_bus()`
|
- **Main thread**: Textual's event loop - runs `MonitorApp`, drains the event bus every 100ms via `_drain_bus()`
|
||||||
- **Bot thread**: own `asyncio` event loop — runs `_bot_main()` with both `user_client` and `bot_client`
|
- **Bot thread**: own `asyncio` event loop - runs `_bot_main()` with both `user_client` and `bot_client`
|
||||||
- **Cross-thread communication**: bot → TUI via `bus.post()` (`queue.Queue.put_nowait`, always safe); TUI → bot via `loop.call_soon_threadsafe()` (e.g., to signal channel list changes)
|
- **Cross-thread communication**: bot → TUI via `bus.post()` (`queue.Queue.put_nowait`, always safe); TUI → bot via `loop.call_soon_threadsafe()` (e.g., to signal channel list changes)
|
||||||
|
|
||||||
### Module responsibilities
|
### Module responsibilities
|
||||||
|
|
||||||
| Module | Role |
|
| Module | Role |
|
||||||
|--------|------|
|
|--------|------|
|
||||||
| `config.py` | All settings — edit keywords, channels, paths, tdl tuning here |
|
| `config.py` | All settings - edit keywords, channels, paths, tdl tuning here |
|
||||||
| `core/scraper.py` | Live listener + backfill orchestration; registers Telethon `NewMessage` handlers |
|
| `core/scraper.py` | Live listener + backfill orchestration; registers Telethon `NewMessage` handlers |
|
||||||
| `core/tdl_downloader.py` | Wraps `tdl` subprocess for fast downloads; falls back to Telethon |
|
| `core/tdl_downloader.py` | Wraps `tdl` subprocess for fast downloads; falls back to Telethon |
|
||||||
| `core/bot_downloader.py` | Handles inline button click flow where files come via bot reply |
|
| `core/bot_downloader.py` | Handles inline button click flow where files come via bot reply |
|
||||||
@@ -127,4 +127,4 @@ tail -f data/logs/monitor.log
|
|||||||
| `r` | Refresh stats |
|
| `r` | Refresh stats |
|
||||||
| `q` / `Escape` | Quit / back |
|
| `q` / `Escape` | Quit / back |
|
||||||
|
|
||||||
Runtime keyword and channel changes are **not** persisted — copy them to `config.py` to survive restarts.
|
Runtime keyword and channel changes are **not** persisted - copy them to `config.py` to survive restarts.
|
||||||
|
|||||||
10
QUICK_REF.md
10
QUICK_REF.md
@@ -1,4 +1,4 @@
|
|||||||
# ULP Monitor — Quick Reference
|
# ULP Monitor - Quick Reference
|
||||||
|
|
||||||
> For Claude Code: read the per-file `.md` alongside each `.py` before editing.
|
> For Claude Code: read the per-file `.md` alongside each `.py` before editing.
|
||||||
> Full docs in `README.md`.
|
> Full docs in `README.md`.
|
||||||
@@ -10,7 +10,7 @@
|
|||||||
```
|
```
|
||||||
ulp_monitor/
|
ulp_monitor/
|
||||||
├── main.py Entry point (--no-tui flag for CLI mode)
|
├── main.py Entry point (--no-tui flag for CLI mode)
|
||||||
├── config.py All settings — edit this for keywords, channels, paths
|
├── config.py All settings - edit this for keywords, channels, paths
|
||||||
│
|
│
|
||||||
├── core/ Telegram I/O pipeline (all async, Telethon-dependent)
|
├── core/ Telegram I/O pipeline (all async, Telethon-dependent)
|
||||||
│ ├── scraper.py Live listener + backfill orchestration
|
│ ├── scraper.py Live listener + backfill orchestration
|
||||||
@@ -24,11 +24,11 @@ ulp_monitor/
|
|||||||
│ ├── cache.py Seen file-ID dedup (data/cache.json)
|
│ ├── cache.py Seen file-ID dedup (data/cache.json)
|
||||||
│ └── database.py SQLite read/write (data/hits.db)
|
│ └── database.py SQLite read/write (data/hits.db)
|
||||||
│
|
│
|
||||||
├── tui/ Textual TUI — runs in main thread
|
├── tui/ Textual TUI - runs in main thread
|
||||||
│ ├── app.py MonitorApp + all screens + bot thread launcher
|
│ ├── app.py MonitorApp + all screens + bot thread launcher
|
||||||
│ └── events.py Thread-safe queue.Queue event bus
|
│ └── events.py Thread-safe queue.Queue event bus
|
||||||
│
|
│
|
||||||
└── data/ Runtime output — gitignored
|
└── data/ Runtime output - gitignored
|
||||||
├── hits.db
|
├── hits.db
|
||||||
├── hits.txt
|
├── hits.txt
|
||||||
├── hits.csv
|
├── hits.csv
|
||||||
@@ -126,7 +126,7 @@ cross-thread communication
|
|||||||
| MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) |
|
| MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) |
|
||||||
| LOW | 10 | Org domain appears anywhere in line |
|
| LOW | 10 | Org domain appears anywhere in line |
|
||||||
|
|
||||||
`@`-keyword rule: pattern requires literal `@` before domain — `user@gmail.com` on a URL containing `myorg.cl` does **not** trigger CRITICAL.
|
`@`-keyword rule: pattern requires literal `@` before domain - `user@gmail.com` on a URL containing `myorg.cl` does **not** trigger CRITICAL.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
20
README.md
20
README.md
@@ -33,7 +33,7 @@ ulp_monitor/
|
|||||||
│ ├── processor.py Archive extraction + line-by-line search
|
│ ├── processor.py Archive extraction + line-by-line search
|
||||||
│ └── notifier.py hits.txt / hits.csv writer + bot alerts
|
│ └── notifier.py hits.txt / hits.csv writer + bot alerts
|
||||||
│
|
│
|
||||||
├── utils/ Pure logic — no Telegram dependencies
|
├── utils/ Pure logic - no Telegram dependencies
|
||||||
│ ├── scorer.py Hit severity scoring
|
│ ├── scorer.py Hit severity scoring
|
||||||
│ ├── cache.py Seen-file deduplication
|
│ ├── cache.py Seen-file deduplication
|
||||||
│ └── database.py SQLite persistence layer
|
│ └── database.py SQLite persistence layer
|
||||||
@@ -75,11 +75,11 @@ cp .env.example .env
|
|||||||
|
|
||||||
Open `config.py` and set:
|
Open `config.py` and set:
|
||||||
|
|
||||||
- **`TARGET_KEYWORDS`** — your org's domains and email patterns.
|
- **`TARGET_KEYWORDS`** - your org's domains and email patterns.
|
||||||
Keywords with `@` (e.g. `r"@myorg\.cl"`) are **employee email domains** → CRITICAL.
|
Keywords with `@` (e.g. `r"@myorg\.cl"`) are **employee email domains** → CRITICAL.
|
||||||
Keywords without `@` are plain domain matches → LOW baseline.
|
Keywords without `@` are plain domain matches → LOW baseline.
|
||||||
- **`WATCHED_CHANNELS`** — channel usernames or numeric IDs
|
- **`WATCHED_CHANNELS`** - channel usernames or numeric IDs
|
||||||
- **`BACKFILL_LIMIT`** — past messages to scan per channel on startup
|
- **`BACKFILL_LIMIT`** - past messages to scan per channel on startup
|
||||||
|
|
||||||
### 5. Install dependencies
|
### 5. Install dependencies
|
||||||
|
|
||||||
@@ -97,7 +97,7 @@ curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh |
|
|||||||
tdl login -n monitor_session
|
tdl login -n monitor_session
|
||||||
```
|
```
|
||||||
|
|
||||||
### 6. First run — complete Telegram auth
|
### 6. First run - complete Telegram auth
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python main.py --no-tui
|
python main.py --no-tui
|
||||||
@@ -130,9 +130,9 @@ python main.py --no-tui # plain CLI
|
|||||||
|
|
||||||
| File | Description |
|
| File | Description |
|
||||||
|------|-------------|
|
|------|-------------|
|
||||||
| `data/hits.db` | SQLite — all hits with scores, severity, dedup flag |
|
| `data/hits.db` | SQLite - all hits with scores, severity, dedup flag |
|
||||||
| `data/hits.txt` | Human-readable grouped log |
|
| `data/hits.txt` | Human-readable grouped log |
|
||||||
| `data/hits.csv` | CSV — easy to pull into Excel / pandas |
|
| `data/hits.csv` | CSV - easy to pull into Excel / pandas |
|
||||||
| `data/logs/monitor.log` | Full run log |
|
| `data/logs/monitor.log` | Full run log |
|
||||||
|
|
||||||
Telegram alerts fire for CRITICAL / HIGH / MEDIUM only. LOW is stored silently.
|
Telegram alerts fire for CRITICAL / HIGH / MEDIUM only. LOW is stored silently.
|
||||||
@@ -141,6 +141,6 @@ Telegram alerts fire for CRITICAL / HIGH / MEDIUM only. LOW is stored silently.
|
|||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- **Session files are sensitive** — equivalent to a logged-in account. Gitignored, never share.
|
- **Session files are sensitive** - equivalent to a logged-in account. Gitignored, never share.
|
||||||
- **Flood limits** — `FloodWaitError` is handled automatically.
|
- **Flood limits** - `FloodWaitError` is handled automatically.
|
||||||
- **Private channels** — your user account must already be a member.
|
- **Private channels** - your user account must already be a member.
|
||||||
|
|||||||
43
config.py
43
config.py
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
config.py — Loads and validates all settings from .env
|
config.py - Loads and validates all settings from .env
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
@@ -29,30 +29,35 @@ RUNTIME_CONFIG_PATH = Path("./data/runtime_config.json")
|
|||||||
# Add your org's domains, email patterns, IP ranges, known usernames, etc.
|
# Add your org's domains, email patterns, IP ranges, known usernames, etc.
|
||||||
# All patterns are case-insensitive regex.
|
# All patterns are case-insensitive regex.
|
||||||
_DEFAULT_KEYWORDS: list[str] = [
|
_DEFAULT_KEYWORDS: list[str] = [
|
||||||
r"sanatorioaleman\.cl",
|
#r"sanatorioaleman\.cl",
|
||||||
r"@sanatorioaleman\.cl",
|
#r"@sanatorioaleman\.cl",
|
||||||
|
#r"@hites\.cl",
|
||||||
|
#r"hites\.com",
|
||||||
# r"192\.168\.10\.", # internal IP range example
|
# r"192\.168\.10\.", # internal IP range example
|
||||||
# r"specificuser", # known internal usernames
|
# r"specificuser", # known internal usernames
|
||||||
|
r"onion\.global",
|
||||||
|
r"@onion\.global",
|
||||||
]
|
]
|
||||||
|
|
||||||
# Use usernames (without @) or numeric channel IDs (-100xxxxxxxxxx)
|
# Use usernames (without @) or numeric channel IDs (-100xxxxxxxxxx)
|
||||||
_DEFAULT_CHANNELS: list[str | int] = [
|
_DEFAULT_CHANNELS: list[str | int] = [
|
||||||
#-1002230225603,
|
#-1002230225603,
|
||||||
"cloudxlog",
|
#"cloudxlog",
|
||||||
#-1001967030016, # daisycloud
|
##-1001967030016, # daisycloud
|
||||||
#"berserklogs", # berserklogs
|
##"berserklogs", # berserklogs
|
||||||
#"BorwitaFreeLogs", # borwita
|
##"BorwitaFreeLogs", # borwita
|
||||||
-1002748707556, # darkcloud
|
#-1002748707556, # darkcloud
|
||||||
-1001684073398, # BHF Cloud
|
#-1001684073398, # BHF Cloud
|
||||||
-1003163621939, # Wich Love from R
|
#-1003163621939, # Wich Love from R
|
||||||
-1003611713618, # Khazan Cloud
|
#-1003611713618, # Khazan Cloud
|
||||||
-1003328682684, # LogsPlanet
|
#-1003328682684, # LogsPlanet
|
||||||
-1003204260194, # JDP
|
#-1003204260194, # JDP
|
||||||
-1002828367761, # HesoyamCloud
|
#-1002828367761, # HesoyamCloud
|
||||||
-1003513974925, # Slurm Logs
|
#-1003513974925, # Slurm Logs
|
||||||
-1003599300787, # Arhont Corp
|
#-1003599300787, # Arhont Corp
|
||||||
-1002582513379, # OnlyLogs
|
#-1002582513379, # OnlyLogs
|
||||||
-1002788333372, # Ickis Cloud
|
#-1002788333372, # Ickis Cloud
|
||||||
|
-1002643355608, # Cloud URL
|
||||||
#-1001234567890, # private channel by ID
|
#-1001234567890, # private channel by ID
|
||||||
]
|
]
|
||||||
|
|
||||||
@@ -149,5 +154,5 @@ TDL_PERFILE = 4
|
|||||||
TDL_AMOUNT = 4
|
TDL_AMOUNT = 4
|
||||||
|
|
||||||
# Whether to use a Telegram takeout session for downloads (lower flood limits).
|
# Whether to use a Telegram takeout session for downloads (lower flood limits).
|
||||||
# Takeout sessions are rate-limited differently — good for bulk backfill.
|
# Takeout sessions are rate-limited differently - good for bulk backfill.
|
||||||
TDL_TAKEOUT = True
|
TDL_TAKEOUT = True
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
"""core — Telegram I/O pipeline (scraper, downloader, processor, notifier)."""
|
"""core - Telegram I/O pipeline (scraper, downloader, processor, notifier)."""
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
bot_downloader.py — Handles "click to download" inline button flows.
|
bot_downloader.py - Handles "click to download" inline button flows.
|
||||||
|
|
||||||
Some Telegram channels post messages with a DOWNLOAD button that triggers
|
Some Telegram channels post messages with a DOWNLOAD button that triggers
|
||||||
a bot to send you the actual file. This module simulates that click and
|
a bot to send you the actual file. This module simulates that click and
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
notifier.py — Persists hits to disk and sends Telegram bot alerts.
|
notifier.py - Persists hits to disk and sends Telegram bot alerts.
|
||||||
|
|
||||||
Includes:
|
Includes:
|
||||||
- Severity scoring via scorer.py
|
- Severity scoring via scorer.py
|
||||||
@@ -31,7 +31,7 @@ log = logging.getLogger(__name__)
|
|||||||
MAX_PREVIEW = 10 # hits to show per severity group in alert
|
MAX_PREVIEW = 10 # hits to show per severity group in alert
|
||||||
DEDUP_FILE = Path("./data/dedup.json")
|
DEDUP_FILE = Path("./data/dedup.json")
|
||||||
|
|
||||||
# Only alert immediately for these severities — LOW hits are silent
|
# Only alert immediately for these severities - LOW hits are silent
|
||||||
ALERT_SEVERITIES = {CRITICAL, HIGH, MEDIUM}
|
ALERT_SEVERITIES = {CRITICAL, HIGH, MEDIUM}
|
||||||
|
|
||||||
|
|
||||||
@@ -124,7 +124,7 @@ def write_hits(scored_hits: list, source: str) -> None:
|
|||||||
|
|
||||||
|
|
||||||
def write_hits_csv(scored_hits: list, source: str, filename: str) -> None:
|
def write_hits_csv(scored_hits: list, source: str, filename: str) -> None:
|
||||||
"""Append new hits to hits.csv — one row per hit, easy to import."""
|
"""Append new hits to hits.csv - one row per hit, easy to import."""
|
||||||
HITS_CSV.parent.mkdir(parents=True, exist_ok=True)
|
HITS_CSV.parent.mkdir(parents=True, exist_ok=True)
|
||||||
write_header = not HITS_CSV.exists()
|
write_header = not HITS_CSV.exists()
|
||||||
timestamp = _timestamp()
|
timestamp = _timestamp()
|
||||||
@@ -152,13 +152,13 @@ async def send_alert(
|
|||||||
) -> None:
|
) -> None:
|
||||||
"""
|
"""
|
||||||
Send a Telegram alert grouped by severity.
|
Send a Telegram alert grouped by severity.
|
||||||
Only includes CRITICAL, HIGH, MEDIUM — LOW hits are omitted from alerts.
|
Only includes CRITICAL, HIGH, MEDIUM - LOW hits are omitted from alerts.
|
||||||
"""
|
"""
|
||||||
summary = summarize(scored_hits)
|
summary = summarize(scored_hits)
|
||||||
alertable = [h for h in scored_hits if h.severity in ALERT_SEVERITIES]
|
alertable = [h for h in scored_hits if h.severity in ALERT_SEVERITIES]
|
||||||
|
|
||||||
if not alertable:
|
if not alertable:
|
||||||
log.info(" No alertable hits (all LOW) — skipping Telegram notification.")
|
log.info(" No alertable hits (all LOW) - skipping Telegram notification.")
|
||||||
return
|
return
|
||||||
|
|
||||||
lines = [
|
lines = [
|
||||||
@@ -210,7 +210,7 @@ async def notify(bot: TelegramClient, hits: list[str], source: str, filename: st
|
|||||||
|
|
||||||
# Score first
|
# Score first
|
||||||
scored = score_hits(hits)
|
scored = score_hits(hits)
|
||||||
log.info(f" Scored {len(scored)} hit(s) — {summarize(scored)}")
|
log.info(f" Scored {len(scored)} hit(s) - {summarize(scored)}")
|
||||||
|
|
||||||
# Deduplicate
|
# Deduplicate
|
||||||
new_hits, dupe_hits = deduplicate(scored)
|
new_hits, dupe_hits = deduplicate(scored)
|
||||||
@@ -222,7 +222,7 @@ async def notify(bot: TelegramClient, hits: list[str], source: str, filename: st
|
|||||||
insert_hits(dupe_hits, source, filename, seen_before=True)
|
insert_hits(dupe_hits, source, filename, seen_before=True)
|
||||||
|
|
||||||
if not new_hits:
|
if not new_hits:
|
||||||
log.info(" All hits already seen before — no alert sent.")
|
log.info(" All hits already seen before - no alert sent.")
|
||||||
return
|
return
|
||||||
|
|
||||||
# Push hits to TUI
|
# Push hits to TUI
|
||||||
|
|||||||
@@ -54,8 +54,8 @@ Nested archives are recursed **one level** only.
|
|||||||
|
|
||||||
## Password order
|
## Password order
|
||||||
|
|
||||||
1. `extra_password` (from message/channel carry-forward) — tried first
|
1. `extra_password` (from message/channel carry-forward) - tried first
|
||||||
2. `config.ARCHIVE_PASSWORDS` — tried in order
|
2. `config.ARCHIVE_PASSWORDS` - tried in order
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
"""
|
"""
|
||||||
processor.py — Archive extraction and hit searching logic.
|
processor.py - Archive extraction and hit searching logic.
|
||||||
|
|
||||||
Supports: .txt, .zip, .7z, .rar
|
Supports: .txt, .zip, .7z, .rar
|
||||||
Stream-processes files line by line — safe for large combo lists.
|
Stream-processes files line by line - safe for large combo lists.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import rarfile
|
import rarfile
|
||||||
@@ -40,7 +40,7 @@ def compile_patterns(keywords: list[str]) -> list[re.Pattern]:
|
|||||||
def search_file(filepath: Path, patterns: list[re.Pattern]) -> list[str]:
|
def search_file(filepath: Path, patterns: list[re.Pattern]) -> list[str]:
|
||||||
"""
|
"""
|
||||||
Stream-reads a text file line by line and returns lines matching any pattern.
|
Stream-reads a text file line by line and returns lines matching any pattern.
|
||||||
Ignores encoding errors — combo files are often messy.
|
Ignores encoding errors - combo files are often messy.
|
||||||
"""
|
"""
|
||||||
hits: list[str] = []
|
hits: list[str] = []
|
||||||
try:
|
try:
|
||||||
@@ -82,7 +82,7 @@ def extract_zip(filepath: Path, dest: Path, extra_password: str | None = None) -
|
|||||||
except RuntimeError:
|
except RuntimeError:
|
||||||
log.info(f" ZIP is password-protected, trying common passwords...")
|
log.info(f" ZIP is password-protected, trying common passwords...")
|
||||||
if not _try_passwords(try_extract, ARCHIVE_PASSWORDS):
|
if not _try_passwords(try_extract, ARCHIVE_PASSWORDS):
|
||||||
log.warning(f" Could not unlock {filepath.name} — skipping.")
|
log.warning(f" Could not unlock {filepath.name} - skipping.")
|
||||||
return []
|
return []
|
||||||
|
|
||||||
extracted = [p for p in dest.rglob("*") if p.is_file()]
|
extracted = [p for p in dest.rglob("*") if p.is_file()]
|
||||||
@@ -95,7 +95,7 @@ def extract_zip(filepath: Path, dest: Path, extra_password: str | None = None) -
|
|||||||
|
|
||||||
def extract_7z(filepath: Path, dest: Path, extra_password: str | None = None) -> list[Path]:
|
def extract_7z(filepath: Path, dest: Path, extra_password: str | None = None) -> list[Path]:
|
||||||
if not HAS_7Z:
|
if not HAS_7Z:
|
||||||
log.warning("py7zr not installed — skipping .7z file.")
|
log.warning("py7zr not installed - skipping .7z file.")
|
||||||
return []
|
return []
|
||||||
extracted: list[Path] = []
|
extracted: list[Path] = []
|
||||||
passwords = ARCHIVE_PASSWORDS.copy()
|
passwords = ARCHIVE_PASSWORDS.copy()
|
||||||
@@ -119,7 +119,7 @@ def extract_7z(filepath: Path, dest: Path, extra_password: str | None = None) ->
|
|||||||
except Exception:
|
except Exception:
|
||||||
continue
|
continue
|
||||||
if not success:
|
if not success:
|
||||||
log.warning(f" Could not unlock {filepath.name} — skipping.")
|
log.warning(f" Could not unlock {filepath.name} - skipping.")
|
||||||
return []
|
return []
|
||||||
|
|
||||||
extracted = [p for p in dest.rglob("*") if p.is_file()]
|
extracted = [p for p in dest.rglob("*") if p.is_file()]
|
||||||
@@ -130,7 +130,7 @@ def extract_7z(filepath: Path, dest: Path, extra_password: str | None = None) ->
|
|||||||
|
|
||||||
def extract_rar(filepath: Path, dest: Path, extra_password: str | None = None) -> list[Path]:
|
def extract_rar(filepath: Path, dest: Path, extra_password: str | None = None) -> list[Path]:
|
||||||
if not HAS_RAR:
|
if not HAS_RAR:
|
||||||
log.warning("rarfile not installed — skipping .rar file.")
|
log.warning("rarfile not installed - skipping .rar file.")
|
||||||
return []
|
return []
|
||||||
|
|
||||||
passwords = ARCHIVE_PASSWORDS.copy()
|
passwords = ARCHIVE_PASSWORDS.copy()
|
||||||
@@ -150,7 +150,7 @@ def extract_rar(filepath: Path, dest: Path, extra_password: str | None = None) -
|
|||||||
except Exception:
|
except Exception:
|
||||||
log.info(f" RAR may be password-protected, trying common passwords...")
|
log.info(f" RAR may be password-protected, trying common passwords...")
|
||||||
if not _try_passwords(try_extract, ARCHIVE_PASSWORDS):
|
if not _try_passwords(try_extract, ARCHIVE_PASSWORDS):
|
||||||
log.warning(f" Could not unlock {filepath.name} — skipping.")
|
log.warning(f" Could not unlock {filepath.name} - skipping.")
|
||||||
return []
|
return []
|
||||||
|
|
||||||
extracted = [p for p in dest.rglob("*") if p.is_file()]
|
extracted = [p for p in dest.rglob("*") if p.is_file()]
|
||||||
@@ -184,7 +184,7 @@ def unpack(filepath: Path, extra_password: str | None = None) -> tuple[list[Path
|
|||||||
return files, extract_dir
|
return files, extract_dir
|
||||||
|
|
||||||
else:
|
else:
|
||||||
# Plain file — return as-is, no extract dir to clean up
|
# Plain file - return as-is, no extract dir to clean up
|
||||||
return [filepath], None
|
return [filepath], None
|
||||||
|
|
||||||
|
|
||||||
@@ -207,7 +207,7 @@ def process_file(filepath: Path, patterns, password: str | None = None) -> list[
|
|||||||
log.info(f" ✓ {len(hits)} hit(s) in {f.name}")
|
log.info(f" ✓ {len(hits)} hit(s) in {f.name}")
|
||||||
all_hits.extend(hits)
|
all_hits.extend(hits)
|
||||||
|
|
||||||
# Nested archives — recurse one level
|
# Nested archives - recurse one level
|
||||||
elif f.suffix.lower() in {".zip", ".7z", ".rar"} and f != filepath:
|
elif f.suffix.lower() in {".zip", ".7z", ".rar"} and f != filepath:
|
||||||
log.info(f" → Nested archive: {f.name}")
|
log.info(f" → Nested archive: {f.name}")
|
||||||
nested_hits = process_file(f, patterns)
|
nested_hits = process_file(f, patterns)
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ from core.scraper import handle_message, backfill_all, register_handlers, warm_e
|
|||||||
### `handle_message(client, bot, msg, source_name, patterns, password=None)`
|
### `handle_message(client, bot, msg, source_name, patterns, password=None)`
|
||||||
**async.** Full pipeline for one document message:
|
**async.** Full pipeline for one document message:
|
||||||
1. Extract filename + size, check allowlist + size guard
|
1. Extract filename + size, check allowlist + size guard
|
||||||
2. Check `utils.cache` — skip if already seen
|
2. Check `utils.cache` - skip if already seen
|
||||||
3. Try `tdl` download → Telethon fallback
|
3. Try `tdl` download → Telethon fallback
|
||||||
4. `core.processor.process_file()` → hits
|
4. `core.processor.process_file()` → hits
|
||||||
5. `core.notifier.notify()` if hits found
|
5. `core.notifier.notify()` if hits found
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
scraper.py — Telethon user client.
|
scraper.py - Telethon user client.
|
||||||
|
|
||||||
Handles:
|
Handles:
|
||||||
- Listening for new file messages in watched channels
|
- Listening for new file messages in watched channels
|
||||||
@@ -99,7 +99,7 @@ async def _telethon_download(client: TelegramClient, msg, dest: Path, filename:
|
|||||||
"""Download a single file via Telethon. Returns True on success."""
|
"""Download a single file via Telethon. Returns True on success."""
|
||||||
_bid = batch_id or f"telethon_{int(time.monotonic_ns())}"
|
_bid = batch_id or f"telethon_{int(time.monotonic_ns())}"
|
||||||
if batch_id is None:
|
if batch_id is None:
|
||||||
# Standalone call (not already queued by tdl path) — post queued event
|
# Standalone call (not already queued by tdl path) - post queued event
|
||||||
bus.post(bus.EvDownloadQueued(
|
bus.post(bus.EvDownloadQueued(
|
||||||
batch_id=_bid, filename=filename,
|
batch_id=_bid, filename=filename,
|
||||||
size_mb=round(size / (1024 * 1024), 2),
|
size_mb=round(size / (1024 * 1024), 2),
|
||||||
@@ -165,12 +165,12 @@ async def handle_message(
|
|||||||
size = get_filesize(msg)
|
size = get_filesize(msg)
|
||||||
ok, reason = is_processable(filename, size)
|
ok, reason = is_processable(filename, size)
|
||||||
if not ok:
|
if not ok:
|
||||||
log.warning(f" handle_message: skipping '{filename}' — {reason}")
|
log.warning(f" handle_message: skipping '{filename}' - {reason}")
|
||||||
return
|
return
|
||||||
|
|
||||||
doc_id = msg.media.document.id
|
doc_id = msg.media.document.id
|
||||||
if is_seen(doc_id):
|
if is_seen(doc_id):
|
||||||
log.info(f" Skipping {filename} — already processed.")
|
log.info(f" Skipping {filename} - already processed.")
|
||||||
return
|
return
|
||||||
|
|
||||||
dest = _make_dest(msg, filename)
|
dest = _make_dest(msg, filename)
|
||||||
@@ -180,7 +180,7 @@ async def handle_message(
|
|||||||
downloaded = await download_single_with_tdl(msg, dest) if is_tdl_available() else False
|
downloaded = await download_single_with_tdl(msg, dest) if is_tdl_available() else False
|
||||||
if not downloaded:
|
if not downloaded:
|
||||||
if is_tdl_available():
|
if is_tdl_available():
|
||||||
log.warning(" [tdl] failed — falling back to Telethon")
|
log.warning(" [tdl] failed - falling back to Telethon")
|
||||||
downloaded = await _telethon_download(client, msg, dest, filename, size)
|
downloaded = await _telethon_download(client, msg, dest, filename, size)
|
||||||
|
|
||||||
if not downloaded:
|
if not downloaded:
|
||||||
@@ -307,7 +307,7 @@ async def backfill_channel(
|
|||||||
|
|
||||||
ok, reason = is_processable(filename, size)
|
ok, reason = is_processable(filename, size)
|
||||||
if not ok:
|
if not ok:
|
||||||
log.warning(f" [Backfill] Skipping '{filename}' — {reason}")
|
log.warning(f" [Backfill] Skipping '{filename}' - {reason}")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
if is_seen(msg.media.document.id):
|
if is_seen(msg.media.document.id):
|
||||||
@@ -319,13 +319,13 @@ async def backfill_channel(
|
|||||||
if len(batch) >= TDL_AMOUNT:
|
if len(batch) >= TDL_AMOUNT:
|
||||||
await flush_batch()
|
await flush_batch()
|
||||||
else:
|
else:
|
||||||
# No tdl — fall straight through to single handle_message
|
# No tdl - fall straight through to single handle_message
|
||||||
await handle_message(client, bot, msg, source_name, patterns, password=password)
|
await handle_message(client, bot, msg, source_name, patterns, password=password)
|
||||||
total += 1
|
total += 1
|
||||||
await asyncio.sleep(0.5)
|
await asyncio.sleep(0.5)
|
||||||
|
|
||||||
elif msg.buttons and has_download_button(msg):
|
elif msg.buttons and has_download_button(msg):
|
||||||
# Bot-button messages can't be batched — handle individually
|
# Bot-button messages can't be batched - handle individually
|
||||||
await flush_batch() # flush any pending batch first
|
await flush_batch() # flush any pending batch first
|
||||||
await handle_bot_download_message(client, bot, msg, source_name, patterns, password=password)
|
await handle_bot_download_message(client, bot, msg, source_name, patterns, password=password)
|
||||||
total += 1
|
total += 1
|
||||||
@@ -339,7 +339,7 @@ async def backfill_channel(
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
log.error(f"[Backfill] Error scanning {channel}: {e}")
|
log.error(f"[Backfill] Error scanning {channel}: {e}")
|
||||||
|
|
||||||
log.info(f"[Backfill] Done: {channel} — {total} file(s) processed")
|
log.info(f"[Backfill] Done: {channel} - {total} file(s) processed")
|
||||||
|
|
||||||
|
|
||||||
async def backfill_all(
|
async def backfill_all(
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ Used by the live handler and `bot_downloader`.
|
|||||||
|
|
||||||
### `download_batch_with_tdl(entries: list[BatchEntry]) -> dict[int, bool]`
|
### `download_batch_with_tdl(entries: list[BatchEntry]) -> dict[int, bool]`
|
||||||
**async.** Downloads up to `TDL_AMOUNT` messages in a single `tdl dl` invocation.
|
**async.** Downloads up to `TDL_AMOUNT` messages in a single `tdl dl` invocation.
|
||||||
Returns `{doc_id: True|False}` — `False` means Telethon fallback needed.
|
Returns `{doc_id: True|False}` - `False` means Telethon fallback needed.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -55,7 +55,7 @@ In CLI mode: subprocess inherits the terminal, progress bars render natively.
|
|||||||
Each batch/single download gets a unique `data/tmp/_tdl_{monotonic_ns}/` staging dir.
|
Each batch/single download gets a unique `data/tmp/_tdl_{monotonic_ns}/` staging dir.
|
||||||
After `tdl` exits, files are matched by name (with fuzzy stem fallback for `filenamify()` mangling) and moved to final `dest`. Staging dir is removed regardless of outcome.
|
After `tdl` exits, files are matched by name (with fuzzy stem fallback for `filenamify()` mangling) and moved to final `dest`. Staging dir is removed regardless of outcome.
|
||||||
|
|
||||||
`--template '{{ filenamify .FileName }}'` — tdl uses the original Telegram filename, not its default `DialogID_MessageID_filename` format.
|
`--template '{{ filenamify .FileName }}'` - tdl uses the original Telegram filename, not its default `DialogID_MessageID_filename` format.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
"""
|
"""
|
||||||
tdl_downloader.py — Fast file downloads via tdl (Go MTProto implementation).
|
tdl_downloader.py - Fast file downloads via tdl (Go MTProto implementation).
|
||||||
|
|
||||||
Install: https://github.com/iyear/tdl
|
Install: https://github.com/iyear/tdl
|
||||||
curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh | bash
|
curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh | bash
|
||||||
|
|
||||||
First-time setup — log in once:
|
First-time setup - log in once:
|
||||||
tdl login # saves to namespace "default"
|
tdl login # saves to namespace "default"
|
||||||
tdl login -n myns # saves to a named namespace
|
tdl login -n myns # saves to a named namespace
|
||||||
|
|
||||||
@@ -77,7 +77,7 @@ def _build_cmd(urls: list[str], staging_dir: Path) -> list[str]:
|
|||||||
(no DialogID_MessageID_ prefix).
|
(no DialogID_MessageID_ prefix).
|
||||||
|
|
||||||
--continue is kept so interrupted downloads resume rather than restart.
|
--continue is kept so interrupted downloads resume rather than restart.
|
||||||
--skip-same is intentionally omitted — deduplication is handled upstream
|
--skip-same is intentionally omitted - deduplication is handled upstream
|
||||||
by is_seen(), and --skip-same can cause the .tmp rename to fail when a
|
by is_seen(), and --skip-same can cause the .tmp rename to fail when a
|
||||||
same-named file already exists in the directory.
|
same-named file already exists in the directory.
|
||||||
"""
|
"""
|
||||||
@@ -103,7 +103,7 @@ def _build_cmd(urls: list[str], staging_dir: Path) -> list[str]:
|
|||||||
|
|
||||||
# ─── Runner ───────────────────────────────────────────────────────────────────
|
# ─── Runner ───────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
# ANSI escape stripper — tdl emits colour codes even when not a TTY
|
# ANSI escape stripper - tdl emits colour codes even when not a TTY
|
||||||
import re as _re
|
import re as _re
|
||||||
_ANSI_RE = _re.compile(r"\x1b\[[0-9;]*[mGKHFJA-Z]|\x1b=|\x1b>|\x1b\[\?[0-9]+[hl]")
|
_ANSI_RE = _re.compile(r"\x1b\[[0-9;]*[mGKHFJA-Z]|\x1b=|\x1b>|\x1b\[\?[0-9]+[hl]")
|
||||||
|
|
||||||
@@ -141,7 +141,7 @@ async def _run_tdl(cmd: list[str], label: str) -> bool:
|
|||||||
buf += chunk.decode(errors="replace")
|
buf += chunk.decode(errors="replace")
|
||||||
# Split on both \r and \n; process all complete segments
|
# Split on both \r and \n; process all complete segments
|
||||||
parts = _re.split(r"[\r\n]", buf)
|
parts = _re.split(r"[\r\n]", buf)
|
||||||
# Last element may be an incomplete segment — keep in buffer
|
# Last element may be an incomplete segment - keep in buffer
|
||||||
buf = parts[-1]
|
buf = parts[-1]
|
||||||
for part in parts[:-1]:
|
for part in parts[:-1]:
|
||||||
clean = _strip_ansi(part).strip()
|
clean = _strip_ansi(part).strip()
|
||||||
@@ -163,7 +163,7 @@ async def _run_tdl(cmd: list[str], label: str) -> bool:
|
|||||||
log.info(f"[tdl] ✓ {label}")
|
log.info(f"[tdl] ✓ {label}")
|
||||||
return True
|
return True
|
||||||
else:
|
else:
|
||||||
log.error(f"[tdl] ✗ exit {proc.returncode} — {label}")
|
log.error(f"[tdl] ✗ exit {proc.returncode} - {label}")
|
||||||
return False
|
return False
|
||||||
except FileNotFoundError:
|
except FileNotFoundError:
|
||||||
log.error("[tdl] binary not found at runtime")
|
log.error("[tdl] binary not found at runtime")
|
||||||
@@ -260,7 +260,7 @@ async def download_batch_with_tdl(entries: list[BatchEntry]) -> dict[int, bool]:
|
|||||||
return {}
|
return {}
|
||||||
|
|
||||||
if not is_tdl_available():
|
if not is_tdl_available():
|
||||||
log.warning("[tdl] not available — all entries need Telethon fallback")
|
log.warning("[tdl] not available - all entries need Telethon fallback")
|
||||||
return {e.doc_id: False for e in entries}
|
return {e.doc_id: False for e in entries}
|
||||||
|
|
||||||
urls: list[str] = []
|
urls: list[str] = []
|
||||||
@@ -327,7 +327,7 @@ async def download_single_with_tdl(msg, dest: Path) -> bool:
|
|||||||
bot_downloader where batching doesn't apply.
|
bot_downloader where batching doesn't apply.
|
||||||
"""
|
"""
|
||||||
if not is_tdl_available():
|
if not is_tdl_available():
|
||||||
log.warning("[tdl] not available — falling back to Telethon")
|
log.warning("[tdl] not available - falling back to Telethon")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
|||||||
6
main.py
6
main.py
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
main.py — Entry point for the ULP credential monitor.
|
main.py - Entry point for the ULP credential monitor.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python main.py # TUI mode (default)
|
python main.py # TUI mode (default)
|
||||||
@@ -55,7 +55,7 @@ def _start_web_thread(host: str, port: int) -> threading.Thread:
|
|||||||
# ─── Plain CLI mode ───────────────────────────────────────────────────────────
|
# ─── Plain CLI mode ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
async def _cli_main():
|
async def _cli_main():
|
||||||
"""Original asyncio main — runs without the TUI."""
|
"""Original asyncio main - runs without the TUI."""
|
||||||
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))
|
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))
|
||||||
|
|
||||||
from telethon import TelegramClient
|
from telethon import TelegramClient
|
||||||
@@ -64,7 +64,7 @@ async def _cli_main():
|
|||||||
from core.scraper import backfill_all, register_handlers, warm_entity_cache
|
from core.scraper import backfill_all, register_handlers, warm_entity_cache
|
||||||
|
|
||||||
log.info("=" * 60)
|
log.info("=" * 60)
|
||||||
log.info(" ULP Credential Monitor — CLI mode")
|
log.info(" ULP Credential Monitor - CLI mode")
|
||||||
log.info("=" * 60)
|
log.info("=" * 60)
|
||||||
|
|
||||||
patterns = compile_patterns(config.TARGET_KEYWORDS)
|
patterns = compile_patterns(config.TARGET_KEYWORDS)
|
||||||
|
|||||||
46
pyproject.toml
Normal file
46
pyproject.toml
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=68"]
|
||||||
|
build-backend = "setuptools.backends.legacy:build"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "stealergram"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Telegram channel monitor - downloads, extracts, scores, and alerts on credential leaks"
|
||||||
|
requires-python = ">=3.11"
|
||||||
|
dependencies = [
|
||||||
|
# Telegram
|
||||||
|
"telethon",
|
||||||
|
"tgcrypto",
|
||||||
|
# TUI
|
||||||
|
"textual",
|
||||||
|
# Config
|
||||||
|
"python-dotenv",
|
||||||
|
# Progress bars (CLI mode)
|
||||||
|
"tqdm",
|
||||||
|
# Archive extraction
|
||||||
|
"py7zr",
|
||||||
|
"rarfile",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
web = [
|
||||||
|
"fastapi",
|
||||||
|
"uvicorn[standard]",
|
||||||
|
"jinja2",
|
||||||
|
"python-multipart",
|
||||||
|
"bcrypt",
|
||||||
|
"python-jose[cryptography]",
|
||||||
|
]
|
||||||
|
dev = [
|
||||||
|
"pytest",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
stealergram = "main:main"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["."]
|
||||||
|
exclude = ["tests*", "data*", "logs*", "tmp*"]
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["tests"]
|
||||||
@@ -15,7 +15,7 @@ tqdm
|
|||||||
py7zr
|
py7zr
|
||||||
rarfile
|
rarfile
|
||||||
|
|
||||||
# Web frontend (optional — only needed with --web)
|
# Web frontend (optional - only needed with --web)
|
||||||
fastapi
|
fastapi
|
||||||
uvicorn[standard]
|
uvicorn[standard]
|
||||||
jinja2
|
jinja2
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ os.environ.setdefault("API_HASH", "dummy_hash_for_tests")
|
|||||||
os.environ.setdefault("BOT_TOKEN", "0:dummy_bot_token")
|
os.environ.setdefault("BOT_TOKEN", "0:dummy_bot_token")
|
||||||
os.environ.setdefault("NOTIFY_CHAT_ID", "99999")
|
os.environ.setdefault("NOTIFY_CHAT_ID", "99999")
|
||||||
|
|
||||||
# Web frontend test defaults — set once here so all web test files see the same values.
|
# Web frontend test defaults - set once here so all web test files see the same values.
|
||||||
os.environ.setdefault("WEB_SECRET_KEY", "test-secret-key-for-pytest")
|
os.environ.setdefault("WEB_SECRET_KEY", "test-secret-key-for-pytest")
|
||||||
os.environ.setdefault("WEB_ADMIN_USER", "superadmin")
|
os.environ.setdefault("WEB_ADMIN_USER", "superadmin")
|
||||||
os.environ.setdefault("WEB_ADMIN_PASS", "superpass")
|
os.environ.setdefault("WEB_ADMIN_PASS", "superpass")
|
||||||
@@ -17,8 +17,8 @@ import config
|
|||||||
import utils.scorer as scorer
|
import utils.scorer as scorer
|
||||||
|
|
||||||
# Two test keywords:
|
# Two test keywords:
|
||||||
# @testcorp\.com — employee email domain (triggers CRITICAL)
|
# @testcorp\.com - employee email domain (triggers CRITICAL)
|
||||||
# testcorp\.com — plain domain match (triggers LOW baseline)
|
# testcorp\.com - plain domain match (triggers LOW baseline)
|
||||||
TEST_KEYWORDS = [r"@testcorp\.com", r"testcorp\.com"]
|
TEST_KEYWORDS = [r"@testcorp\.com", r"testcorp\.com"]
|
||||||
|
|
||||||
|
|
||||||
@@ -29,7 +29,7 @@ def patched_keywords(monkeypatch):
|
|||||||
scorer's module-level globals so scoring logic uses known test patterns.
|
scorer's module-level globals so scoring logic uses known test patterns.
|
||||||
|
|
||||||
scorer.py now reads _config.TARGET_KEYWORDS at call time via `import config as _config`,
|
scorer.py now reads _config.TARGET_KEYWORDS at call time via `import config as _config`,
|
||||||
so patching config.TARGET_KEYWORDS is sufficient — no direct scorer patch needed.
|
so patching config.TARGET_KEYWORDS is sufficient - no direct scorer patch needed.
|
||||||
"""
|
"""
|
||||||
monkeypatch.setattr(config, "TARGET_KEYWORDS", TEST_KEYWORDS)
|
monkeypatch.setattr(config, "TARGET_KEYWORDS", TEST_KEYWORDS)
|
||||||
monkeypatch.setattr(scorer, "EMPLOYEE_DOMAINS", scorer._build_employee_domains())
|
monkeypatch.setattr(scorer, "EMPLOYEE_DOMAINS", scorer._build_employee_domains())
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Tests for utils/cache.py — file-ID deduplication cache.
|
Tests for utils/cache.py - file-ID deduplication cache.
|
||||||
|
|
||||||
Each test gets an isolated cache file via the `isolated_cache` fixture
|
Each test gets an isolated cache file via the `isolated_cache` fixture
|
||||||
so tests never touch data/cache.json.
|
so tests never touch data/cache.json.
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Tests for utils/database.py — SQLite persistence layer.
|
Tests for utils/database.py - SQLite persistence layer.
|
||||||
|
|
||||||
Each test gets an isolated in-memory-equivalent DB via the `isolated_db`
|
Each test gets an isolated in-memory-equivalent DB via the `isolated_db`
|
||||||
fixture so tests never touch data/hits.db.
|
fixture so tests never touch data/hits.db.
|
||||||
@@ -112,7 +112,7 @@ def test_by_severity_returns_correct_severity():
|
|||||||
|
|
||||||
|
|
||||||
def test_by_severity_excludes_duplicates():
|
def test_by_severity_excludes_duplicates():
|
||||||
"""seen_before=1 rows must be invisible to by_severity — they are stored for stats only."""
|
"""seen_before=1 rows must be invisible to by_severity - they are stored for stats only."""
|
||||||
hit = make_hit(severity=HIGH, url="intranet.testcorp.com")
|
hit = make_hit(severity=HIGH, url="intranet.testcorp.com")
|
||||||
db_module.insert_hits([hit], source="c", filename="f.txt", seen_before=True)
|
db_module.insert_hits([hit], source="c", filename="f.txt", seen_before=True)
|
||||||
assert db_module.by_severity(HIGH) == []
|
assert db_module.by_severity(HIGH) == []
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Tests for tui/events.py — subscribe/unsubscribe broadcast, signal_channel_changed.
|
Tests for tui/events.py - subscribe/unsubscribe broadcast, signal_channel_changed.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import queue
|
import queue
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Tests for core/processor.py — archive extraction and line-by-line search.
|
Tests for core/processor.py - archive extraction and line-by-line search.
|
||||||
|
|
||||||
No Telegram deps, no async. Tests create real archive fixtures in tmp_path
|
No Telegram deps, no async. Tests create real archive fixtures in tmp_path
|
||||||
so process_file's cleanup guarantee can be verified against actual disk state.
|
so process_file's cleanup guarantee can be verified against actual disk state.
|
||||||
@@ -60,7 +60,7 @@ class TestSearchFile:
|
|||||||
assert search_file(f, patterns) == ["testcorp.com|user|pass"]
|
assert search_file(f, patterns) == ["testcorp.com|user|pass"]
|
||||||
|
|
||||||
def test_handles_encoding_errors_gracefully(self, tmp_path, patterns):
|
def test_handles_encoding_errors_gracefully(self, tmp_path, patterns):
|
||||||
"""Combo files are often messy — invalid bytes must not crash the search."""
|
"""Combo files are often messy - invalid bytes must not crash the search."""
|
||||||
f = tmp_path / "combo.txt"
|
f = tmp_path / "combo.txt"
|
||||||
f.write_bytes(
|
f.write_bytes(
|
||||||
b"testcorp.com|user1|pass\n"
|
b"testcorp.com|user1|pass\n"
|
||||||
@@ -81,7 +81,7 @@ class TestSearchFile:
|
|||||||
assert len(hits) == 2
|
assert len(hits) == 2
|
||||||
|
|
||||||
|
|
||||||
# ─── process_file — plain .txt ────────────────────────────────────────────────
|
# ─── process_file - plain .txt ────────────────────────────────────────────────
|
||||||
|
|
||||||
class TestProcessFilePlainText:
|
class TestProcessFilePlainText:
|
||||||
def test_returns_hits(self, tmp_path, patterns):
|
def test_returns_hits(self, tmp_path, patterns):
|
||||||
@@ -104,7 +104,7 @@ class TestProcessFilePlainText:
|
|||||||
assert not f.exists()
|
assert not f.exists()
|
||||||
|
|
||||||
|
|
||||||
# ─── process_file — .zip extraction ──────────────────────────────────────────
|
# ─── process_file - .zip extraction ──────────────────────────────────────────
|
||||||
|
|
||||||
class TestProcessFileZip:
|
class TestProcessFileZip:
|
||||||
def _make_zip(self, tmp_path: Path, content: str, filename="content.txt") -> Path:
|
def _make_zip(self, tmp_path: Path, content: str, filename="content.txt") -> Path:
|
||||||
@@ -155,7 +155,7 @@ class TestProcessFileZip:
|
|||||||
assert len(hits) == 2
|
assert len(hits) == 2
|
||||||
|
|
||||||
|
|
||||||
# ─── process_file — nested archives ──────────────────────────────────────────
|
# ─── process_file - nested archives ──────────────────────────────────────────
|
||||||
|
|
||||||
class TestProcessFileNested:
|
class TestProcessFileNested:
|
||||||
def test_nested_zip_is_recursed(self, tmp_path, patterns):
|
def test_nested_zip_is_recursed(self, tmp_path, patterns):
|
||||||
@@ -177,7 +177,7 @@ class TestProcessFileNested:
|
|||||||
assert not (tmp_path / "outer").exists()
|
assert not (tmp_path / "outer").exists()
|
||||||
|
|
||||||
|
|
||||||
# ─── process_file — password-protected .7z ───────────────────────────────────
|
# ─── process_file - password-protected .7z ───────────────────────────────────
|
||||||
|
|
||||||
class TestProcessFile7zPassword:
|
class TestProcessFile7zPassword:
|
||||||
def test_unlocks_with_correct_password(self, tmp_path, patterns, monkeypatch):
|
def test_unlocks_with_correct_password(self, tmp_path, patterns, monkeypatch):
|
||||||
@@ -218,6 +218,6 @@ class TestProcessFile7zPassword:
|
|||||||
z.write(txt, "content.txt")
|
z.write(txt, "content.txt")
|
||||||
txt.unlink()
|
txt.unlink()
|
||||||
|
|
||||||
# No hits — archive could not be opened
|
# No hits - archive could not be opened
|
||||||
hits = process_file(szf, patterns)
|
hits = process_file(szf, patterns)
|
||||||
assert hits == []
|
assert hits == []
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
"""
|
"""
|
||||||
Tests for utils/scorer.py — severity scoring and ULP line parsing.
|
Tests for utils/scorer.py - severity scoring and ULP line parsing.
|
||||||
|
|
||||||
All tests use the `patched_keywords` fixture (see conftest.py) which
|
All tests use the `patched_keywords` fixture (see conftest.py) which
|
||||||
replaces TARGET_KEYWORDS with two entries:
|
replaces TARGET_KEYWORDS with two entries:
|
||||||
@testcorp.com — employee email domain (CRITICAL trigger)
|
@testcorp.com - employee email domain (CRITICAL trigger)
|
||||||
testcorp.com — plain domain match (LOW baseline)
|
testcorp.com - plain domain match (LOW baseline)
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
@@ -50,7 +50,7 @@ class TestULPParsingRealWorld:
|
|||||||
|
|
||||||
@pytest.mark.parametrize("line,exp_url,exp_user,exp_pass", [
|
@pytest.mark.parametrize("line,exp_url,exp_user,exp_pass", [
|
||||||
# ── Protocol + port + path, colon separator ──────────────────────────
|
# ── Protocol + port + path, colon separator ──────────────────────────
|
||||||
# Port is digits followed by '/' — must be consumed as part of the URL.
|
# Port is digits followed by '/' - must be consumed as part of the URL.
|
||||||
(
|
(
|
||||||
"http://portal.fakehosp.example.com:88/:55512309-1:hunter2",
|
"http://portal.fakehosp.example.com:88/:55512309-1:hunter2",
|
||||||
"http://portal.fakehosp.example.com:88/", "55512309-1", "hunter2",
|
"http://portal.fakehosp.example.com:88/", "55512309-1", "hunter2",
|
||||||
@@ -91,7 +91,7 @@ class TestULPParsingRealWorld:
|
|||||||
"jdoe@fakehosp.example.com", "Passw0rd!",
|
"jdoe@fakehosp.example.com", "Passw0rd!",
|
||||||
),
|
),
|
||||||
|
|
||||||
# ── Pipe separator (unambiguous — port stays in URL) ──────────────────
|
# ── Pipe separator (unambiguous - port stays in URL) ──────────────────
|
||||||
(
|
(
|
||||||
"http://portal.fakehosp.example.com:88/|22.987.654-3|florida88",
|
"http://portal.fakehosp.example.com:88/|22.987.654-3|florida88",
|
||||||
"http://portal.fakehosp.example.com:88/", "22.987.654-3", "florida88",
|
"http://portal.fakehosp.example.com:88/", "22.987.654-3", "florida88",
|
||||||
@@ -113,7 +113,7 @@ class TestULPParsingRealWorld:
|
|||||||
"portal.fakehosp.example.com:88/", "22.987.654-3", "florida88",
|
"portal.fakehosp.example.com:88/", "22.987.654-3", "florida88",
|
||||||
),
|
),
|
||||||
|
|
||||||
# ── No protocol, no port — plain colon separators ────────────────────
|
# ── No protocol, no port - plain colon separators ────────────────────
|
||||||
(
|
(
|
||||||
"booking.fakehosp.example.com:66778899-7:correcthorse",
|
"booking.fakehosp.example.com:66778899-7:correcthorse",
|
||||||
"booking.fakehosp.example.com", "66778899-7", "correcthorse",
|
"booking.fakehosp.example.com", "66778899-7", "correcthorse",
|
||||||
@@ -234,7 +234,7 @@ class TestWeakPasswordFlags:
|
|||||||
assert any("Common password" in r for r in hit.reasons)
|
assert any("Common password" in r for r in hit.reasons)
|
||||||
|
|
||||||
def test_weak_password_does_not_escalate_severity(self, patched_keywords):
|
def test_weak_password_does_not_escalate_severity(self, patched_keywords):
|
||||||
"""Weak password flags are informational — they must not change severity."""
|
"""Weak password flags are informational - they must not change severity."""
|
||||||
hit = score_hit("testcorp.com|user|abc")
|
hit = score_hit("testcorp.com|user|abc")
|
||||||
assert hit.severity == LOW
|
assert hit.severity == LOW
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Tests for web/auth.py — JWT token lifecycle, bcrypt helpers.
|
Tests for web/auth.py - JWT token lifecycle, bcrypt helpers.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Tests for web/db.py — user store and refresh token management.
|
Tests for web/db.py - user store and refresh token management.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
"""tui — Textual TUI frontend and event bus."""
|
"""tui - Textual TUI frontend and event bus."""
|
||||||
|
|||||||
@@ -34,8 +34,8 @@ MonitorApp (App)
|
|||||||
|
|
||||||
### Threading model
|
### Threading model
|
||||||
- **Bot backend** → `threading.Thread(daemon=True)` with its own `asyncio.new_event_loop()`
|
- **Bot backend** → `threading.Thread(daemon=True)` with its own `asyncio.new_event_loop()`
|
||||||
Runs `_bot_main()` — Telethon is completely isolated from Textual's loop.
|
Runs `_bot_main()` - Telethon is completely isolated from Textual's loop.
|
||||||
- **TUI drain** → `set_interval(0.1, _drain_bus)` — polls `queue.Queue` every 100ms on Textual's loop.
|
- **TUI drain** → `set_interval(0.1, _drain_bus)` - polls `queue.Queue` every 100ms on Textual's loop.
|
||||||
|
|
||||||
### Key methods
|
### Key methods
|
||||||
|
|
||||||
@@ -105,7 +105,7 @@ Changes apply immediately (handler re-registered). Not persisted to `config.py`
|
|||||||
- Validates regex before adding
|
- Validates regex before adding
|
||||||
- On change: rebuilds `utils.scorer.EMPLOYEE_DOMAINS` and `ORG_DOMAINS`
|
- On change: rebuilds `utils.scorer.EMPLOYEE_DOMAINS` and `ORG_DOMAINS`
|
||||||
- Bot handler recompiles patterns on the next incoming message automatically
|
- Bot handler recompiles patterns on the next incoming message automatically
|
||||||
- **Changes are in-memory only** — copy to `config.py` to persist
|
- **Changes are in-memory only** - copy to `config.py` to persist
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
64
tui/app.py
64
tui/app.py
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
tui.py — Textual TUI for the ULP credential monitor.
|
tui.py - Textual TUI for the ULP credential monitor.
|
||||||
|
|
||||||
Layout (main screen):
|
Layout (main screen):
|
||||||
┌──────────────────────────────────┬──────────────────────────────────┐
|
┌──────────────────────────────────┬──────────────────────────────────┐
|
||||||
@@ -14,13 +14,13 @@ Layout (main screen):
|
|||||||
└─────────────────────────────────────────────────────────────────────┘
|
└─────────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
Additional screens (push/pop via keybindings):
|
Additional screens (push/pop via keybindings):
|
||||||
• SearchScreen — full-text search across hits DB [s]
|
• SearchScreen - full-text search across hits DB [s]
|
||||||
• HitsDBScreen — paginated recent / severity viewer [h]
|
• HitsDBScreen - paginated recent / severity viewer [h]
|
||||||
• KeywordsScreen — live-edit TARGET_KEYWORDS regex list [k]
|
• KeywordsScreen - live-edit TARGET_KEYWORDS regex list [k]
|
||||||
|
|
||||||
Architecture:
|
Architecture:
|
||||||
- The entire bot backend runs as a Textual Worker (asyncio task inside the
|
- The entire bot backend runs as a Textual Worker (asyncio task inside the
|
||||||
TUI event loop — no threading needed).
|
TUI event loop - no threading needed).
|
||||||
- A second Worker runs _bus_consumer(), reading events from tui_events.queue
|
- A second Worker runs _bus_consumer(), reading events from tui_events.queue
|
||||||
and dispatching to the right panel.
|
and dispatching to the right panel.
|
||||||
- Channel add/remove from the UI immediately re-registers Telethon handlers
|
- Channel add/remove from the UI immediately re-registers Telethon handlers
|
||||||
@@ -29,7 +29,7 @@ Architecture:
|
|||||||
into the download panel's RichLog.
|
into the download panel's RichLog.
|
||||||
- StatsPanel polls database.stats() every 10 s via set_interval().
|
- StatsPanel polls database.stats() every 10 s via set_interval().
|
||||||
- Keyword changes are applied in-memory immediately (scorer caches rebuilt);
|
- Keyword changes are applied in-memory immediately (scorer caches rebuilt);
|
||||||
NOT auto-persisted to config.py — a notice banner reminds the user.
|
NOT auto-persisted to config.py - a notice banner reminds the user.
|
||||||
- Live patterns are recompiled from config.TARGET_KEYWORDS on every message
|
- Live patterns are recompiled from config.TARGET_KEYWORDS on every message
|
||||||
so keyword changes take effect without a handler restart.
|
so keyword changes take effect without a handler restart.
|
||||||
"""
|
"""
|
||||||
@@ -88,7 +88,7 @@ def _now() -> str:
|
|||||||
|
|
||||||
class DownloadPanel(Vertical):
|
class DownloadPanel(Vertical):
|
||||||
"""
|
"""
|
||||||
Left panel — two sub-logs stacked vertically:
|
Left panel - two sub-logs stacked vertically:
|
||||||
• top: tdl raw output (stripped ANSI), scrolling
|
• top: tdl raw output (stripped ANSI), scrolling
|
||||||
• bottom: our own structured status entries
|
• bottom: our own structured status entries
|
||||||
"""
|
"""
|
||||||
@@ -158,7 +158,7 @@ class DownloadPanel(Vertical):
|
|||||||
# ─── Hits panel ───────────────────────────────────────────────────────────────
|
# ─── Hits panel ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
class HitsPanel(Vertical):
|
class HitsPanel(Vertical):
|
||||||
"""Right panel — scrollable color-coded hit log with live counter badge."""
|
"""Right panel - scrollable color-coded hit log with live counter badge."""
|
||||||
|
|
||||||
hit_count: reactive[int] = reactive(0)
|
hit_count: reactive[int] = reactive(0)
|
||||||
|
|
||||||
@@ -208,7 +208,7 @@ class HitsPanel(Vertical):
|
|||||||
|
|
||||||
class StatsPanel(Horizontal):
|
class StatsPanel(Horizontal):
|
||||||
"""
|
"""
|
||||||
Slim bar — shows live DB stats, refreshed every 10 s.
|
Slim bar - shows live DB stats, refreshed every 10 s.
|
||||||
Also refreshed immediately whenever a new hit arrives.
|
Also refreshed immediately whenever a new hit arrives.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -233,14 +233,14 @@ class StatsPanel(Horizontal):
|
|||||||
|
|
||||||
def compose(self) -> ComposeResult:
|
def compose(self) -> ComposeResult:
|
||||||
yield Static("📊 DB Stats", id="stat-label")
|
yield Static("📊 DB Stats", id="stat-label")
|
||||||
yield Static("🔴 —", classes="stat-critical", id="stat-critical")
|
yield Static("🔴 - ", classes="stat-critical", id="stat-critical")
|
||||||
yield Static("🟠 —", classes="stat-high", id="stat-high")
|
yield Static("🟠 - ", classes="stat-high", id="stat-high")
|
||||||
yield Static("🟡 —", classes="stat-medium", id="stat-medium")
|
yield Static("🟡 - ", classes="stat-medium", id="stat-medium")
|
||||||
yield Static("🟢 —", classes="stat-low", id="stat-low")
|
yield Static("🟢 - ", classes="stat-low", id="stat-low")
|
||||||
yield Static("total: —", id="stat-total")
|
yield Static("total: - ", id="stat-total")
|
||||||
yield Static("unique: —", id="stat-unique")
|
yield Static("unique: - ", id="stat-unique")
|
||||||
yield Static("dupes: —", id="stat-dupes")
|
yield Static("dupes: - ", id="stat-dupes")
|
||||||
yield Static("sources: —", id="stat-sources")
|
yield Static("sources: - ", id="stat-sources")
|
||||||
|
|
||||||
def on_mount(self) -> None:
|
def on_mount(self) -> None:
|
||||||
self.set_interval(10, self.refresh_stats)
|
self.set_interval(10, self.refresh_stats)
|
||||||
@@ -266,7 +266,7 @@ class StatsPanel(Horizontal):
|
|||||||
|
|
||||||
class ChannelPanel(Vertical):
|
class ChannelPanel(Vertical):
|
||||||
"""
|
"""
|
||||||
Bottom panel — live-editable channel list.
|
Bottom panel - live-editable channel list.
|
||||||
|
|
||||||
Changes are applied immediately (Telethon handlers are re-registered).
|
Changes are applied immediately (Telethon handlers are re-registered).
|
||||||
To make them permanent, edit config.py's WATCHED_CHANNELS manually.
|
To make them permanent, edit config.py's WATCHED_CHANNELS manually.
|
||||||
@@ -314,7 +314,7 @@ class ChannelPanel(Vertical):
|
|||||||
|
|
||||||
def compose(self) -> ComposeResult:
|
def compose(self) -> ComposeResult:
|
||||||
yield Label(
|
yield Label(
|
||||||
"📡 Channels — changes apply immediately | edit config.py to persist",
|
"📡 Channels - changes apply immediately | edit config.py to persist",
|
||||||
classes="panel-title",
|
classes="panel-title",
|
||||||
)
|
)
|
||||||
with Horizontal(classes="controls"):
|
with Horizontal(classes="controls"):
|
||||||
@@ -524,7 +524,7 @@ class HitsDBScreen(Screen):
|
|||||||
status,
|
status,
|
||||||
)
|
)
|
||||||
self.query_one("#db-status", Label).update(
|
self.query_one("#db-status", Label).update(
|
||||||
f" {len(rows)} row(s) — {label}"
|
f" {len(rows)} row(s) - {label}"
|
||||||
)
|
)
|
||||||
|
|
||||||
def _load_recent(self) -> None:
|
def _load_recent(self) -> None:
|
||||||
@@ -560,7 +560,7 @@ class KeywordsScreen(Screen):
|
|||||||
• scorer's domain caches are rebuilt
|
• scorer's domain caches are rebuilt
|
||||||
• The bot handler recompiles patterns on the next message automatically
|
• The bot handler recompiles patterns on the next message automatically
|
||||||
|
|
||||||
Changes are NOT written back to config.py — a notice banner says so.
|
Changes are NOT written back to config.py - a notice banner says so.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
BINDINGS = [Binding("escape", "dismiss", "Back")]
|
BINDINGS = [Binding("escape", "dismiss", "Back")]
|
||||||
@@ -601,7 +601,7 @@ class KeywordsScreen(Screen):
|
|||||||
yield Header()
|
yield Header()
|
||||||
yield Label("🔑 Keyword / Pattern Editor", classes="screen-title")
|
yield Label("🔑 Keyword / Pattern Editor", classes="screen-title")
|
||||||
yield Label(
|
yield Label(
|
||||||
"⚠ Changes are in-memory only — copy patterns to config.py to persist across restarts.",
|
"⚠ Changes are in-memory only - copy patterns to config.py to persist across restarts.",
|
||||||
classes="notice",
|
classes="notice",
|
||||||
)
|
)
|
||||||
with Horizontal(id="kw-controls"):
|
with Horizontal(id="kw-controls"):
|
||||||
@@ -671,7 +671,7 @@ class KeywordsScreen(Screen):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
log.warning(f"Could not rebuild scorer caches: {e}")
|
log.warning(f"Could not rebuild scorer caches: {e}")
|
||||||
bus.post(bus.EvStatus(
|
bus.post(bus.EvStatus(
|
||||||
f"Keywords updated — {len(config.TARGET_KEYWORDS)} pattern(s) active"
|
f"Keywords updated - {len(config.TARGET_KEYWORDS)} pattern(s) active"
|
||||||
))
|
))
|
||||||
|
|
||||||
def action_dismiss(self) -> None:
|
def action_dismiss(self) -> None:
|
||||||
@@ -721,7 +721,7 @@ class MonitorApp(App):
|
|||||||
# The bot backend runs in its own thread with its own asyncio event
|
# The bot backend runs in its own thread with its own asyncio event
|
||||||
# loop, completely isolated from Textual. Telethon spawns background
|
# loop, completely isolated from Textual. Telethon spawns background
|
||||||
# tasks via asyncio.ensure_future() and calls connect() which returns
|
# tasks via asyncio.ensure_future() and calls connect() which returns
|
||||||
# only after its receiver loop is scheduled — both of these deadlock
|
# only after its receiver loop is scheduled - both of these deadlock
|
||||||
# inside Textual's managed loop. Running in a dedicated thread
|
# inside Textual's managed loop. Running in a dedicated thread
|
||||||
# sidesteps all of that.
|
# sidesteps all of that.
|
||||||
#
|
#
|
||||||
@@ -767,7 +767,7 @@ class MonitorApp(App):
|
|||||||
"""
|
"""
|
||||||
Called every 100 ms by set_interval(). Drains all pending events
|
Called every 100 ms by set_interval(). Drains all pending events
|
||||||
from the thread-safe queue and dispatches them to the right widget.
|
from the thread-safe queue and dispatches them to the right widget.
|
||||||
Runs on Textual's event loop — safe to call widget methods directly.
|
Runs on Textual's event loop - safe to call widget methods directly.
|
||||||
"""
|
"""
|
||||||
q = bus.get_bus()
|
q = bus.get_bus()
|
||||||
if q is None:
|
if q is None:
|
||||||
@@ -854,7 +854,7 @@ class MonitorApp(App):
|
|||||||
|
|
||||||
async def _bot_main(self) -> None:
|
async def _bot_main(self) -> None:
|
||||||
"""
|
"""
|
||||||
Full bot backend — runs inside the bot thread's own event loop.
|
Full bot backend - runs inside the bot thread's own event loop.
|
||||||
Telethon is free to schedule background tasks without interfering
|
Telethon is free to schedule background tasks without interfering
|
||||||
with Textual's loop.
|
with Textual's loop.
|
||||||
"""
|
"""
|
||||||
@@ -870,7 +870,7 @@ class MonitorApp(App):
|
|||||||
patterns = compile_patterns(config.TARGET_KEYWORDS)
|
patterns = compile_patterns(config.TARGET_KEYWORDS)
|
||||||
|
|
||||||
bus.post(bus.EvStatus(
|
bus.post(bus.EvStatus(
|
||||||
f"Starting — {len(config.WATCHED_CHANNELS)} channel(s), "
|
f"Starting - {len(config.WATCHED_CHANNELS)} channel(s), "
|
||||||
f"{len(patterns)} pattern(s)"
|
f"{len(patterns)} pattern(s)"
|
||||||
))
|
))
|
||||||
|
|
||||||
@@ -894,9 +894,9 @@ class MonitorApp(App):
|
|||||||
await user_client.connect()
|
await user_client.connect()
|
||||||
log.info("[bot] user_client connected, checking auth...")
|
log.info("[bot] user_client connected, checking auth...")
|
||||||
if not await user_client.is_user_authorized():
|
if not await user_client.is_user_authorized():
|
||||||
log.error("[bot] user_client not authorized — run: python main.py --no-tui")
|
log.error("[bot] user_client not authorized - run: python main.py --no-tui")
|
||||||
bus.post(bus.EvStatus(
|
bus.post(bus.EvStatus(
|
||||||
"Not authorized — run --no-tui once to complete login",
|
"Not authorized - run --no-tui once to complete login",
|
||||||
level="error",
|
level="error",
|
||||||
))
|
))
|
||||||
return
|
return
|
||||||
@@ -962,7 +962,7 @@ class MonitorApp(App):
|
|||||||
log.info(f"[bot] Handler registered for {len(channels)} channel(s)")
|
log.info(f"[bot] Handler registered for {len(channels)} channel(s)")
|
||||||
bus.post(bus.EvStatus(f"Watching {len(channels)} channel(s)"))
|
bus.post(bus.EvStatus(f"Watching {len(channels)} channel(s)"))
|
||||||
|
|
||||||
# Channel-change event — lives on this (bot) loop.
|
# Channel-change event - lives on this (bot) loop.
|
||||||
# Textual signals it thread-safely via _signal_channel_changed().
|
# Textual signals it thread-safely via _signal_channel_changed().
|
||||||
_ch_changed = asyncio.Event()
|
_ch_changed = asyncio.Event()
|
||||||
self._bot_loop_channel_event = _ch_changed
|
self._bot_loop_channel_event = _ch_changed
|
||||||
@@ -971,7 +971,7 @@ class MonitorApp(App):
|
|||||||
bus.post(bus.EvStatus("Live listener active"))
|
bus.post(bus.EvStatus("Live listener active"))
|
||||||
|
|
||||||
await backfill_all(user_client, bot_client, patterns)
|
await backfill_all(user_client, bot_client, patterns)
|
||||||
bus.post(bus.EvStatus("Backfill complete — monitoring live"))
|
bus.post(bus.EvStatus("Backfill complete - monitoring live"))
|
||||||
|
|
||||||
async def _watch_channels():
|
async def _watch_channels():
|
||||||
while True:
|
while True:
|
||||||
@@ -1009,7 +1009,7 @@ class MonitorApp(App):
|
|||||||
# ─── Entry point ──────────────────────────────────────────────────────────────
|
# ─── Entry point ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def run_tui() -> None:
|
def run_tui() -> None:
|
||||||
# Do NOT call bus.init_bus() here — the Queue must be created inside
|
# Do NOT call bus.init_bus() here - the Queue must be created inside
|
||||||
# Textual's event loop (see MonitorApp.on_mount). Calling it here
|
# Textual's event loop (see MonitorApp.on_mount). Calling it here
|
||||||
# would bind the Queue to the outer loop which is discarded when
|
# would bind the Queue to the outer loop which is discarded when
|
||||||
# App.run() creates a new one.
|
# App.run() creates a new one.
|
||||||
|
|||||||
@@ -14,11 +14,11 @@ from tui.events import set_bot_context, signal_channel_changed
|
|||||||
```
|
```
|
||||||
|
|
||||||
### `init_bus() -> queue.Queue`
|
### `init_bus() -> queue.Queue`
|
||||||
Creates the `queue.Queue`. Called inside `MonitorApp.on_mount()` — **must run on Textual's event loop**, not before `App.run()`.
|
Creates the `queue.Queue`. Called inside `MonitorApp.on_mount()` - **must run on Textual's event loop**, not before `App.run()`.
|
||||||
|
|
||||||
### `post(event: Any) -> None`
|
### `post(event: Any) -> None`
|
||||||
Fire-and-forget from any thread. Delivers to the TUI queue **and** all subscriber queues.
|
Fire-and-forget from any thread. Delivers to the TUI queue **and** all subscriber queues.
|
||||||
Uses `queue.Queue.put_nowait()` — never blocks.
|
Uses `queue.Queue.put_nowait()` - never blocks.
|
||||||
|
|
||||||
### `get_bus() -> queue.Queue | None`
|
### `get_bus() -> queue.Queue | None`
|
||||||
Returns the TUI queue for `_drain_bus()` to consume.
|
Returns the TUI queue for `_drain_bus()` to consume.
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
tui_events.py — Thread-safe event bus between the bot backend and the TUI.
|
tui_events.py - Thread-safe event bus between the bot backend and the TUI.
|
||||||
|
|
||||||
The bot backend runs in a dedicated thread with its own asyncio event loop
|
The bot backend runs in a dedicated thread with its own asyncio event loop
|
||||||
(completely isolated from Textual's loop). Events are posted via a standard
|
(completely isolated from Textual's loop). Events are posted via a standard
|
||||||
@@ -18,7 +18,7 @@ import threading
|
|||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
# Thread-safe queue — works across the bot thread and Textual's thread.
|
# Thread-safe queue - works across the bot thread and Textual's thread.
|
||||||
_queue: queue.Queue | None = None
|
_queue: queue.Queue | None = None
|
||||||
_queue_lock = threading.Lock()
|
_queue_lock = threading.Lock()
|
||||||
|
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
"""utils — pure logic modules with no Telegram dependencies."""
|
"""utils - pure logic modules with no Telegram dependencies."""
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ from utils.cache import is_seen, mark_seen
|
|||||||
|
|
||||||
### `is_seen(file_id: int) -> bool`
|
### `is_seen(file_id: int) -> bool`
|
||||||
Returns `True` if this document ID has been processed before.
|
Returns `True` if this document ID has been processed before.
|
||||||
Loads from disk on every call (safe for multi-process, slightly slow for hot loops — not an issue given download cadence).
|
Loads from disk on every call (safe for multi-process, slightly slow for hot loops - not an issue given download cadence).
|
||||||
|
|
||||||
### `mark_seen(file_id: int) -> None`
|
### `mark_seen(file_id: int) -> None`
|
||||||
Adds `file_id` to the cache and persists to disk.
|
Adds `file_id` to the cache and persists to disk.
|
||||||
@@ -21,12 +21,12 @@ Adds `file_id` to the cache and persists to disk.
|
|||||||
## Storage
|
## Storage
|
||||||
|
|
||||||
- **File:** `data/cache.json`
|
- **File:** `data/cache.json`
|
||||||
- **Format:** JSON array of integers — `[123456789, 987654321, ...]`
|
- **Format:** JSON array of integers - `[123456789, 987654321, ...]`
|
||||||
- **No expiry** — grows indefinitely. Safe to delete to re-process all files.
|
- **No expiry** - grows indefinitely. Safe to delete to re-process all files.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- `is_seen` + `mark_seen` are called in `core/scraper.py` after a successful download+process cycle, not before — so a file that fails mid-process will be retried on next run.
|
- `is_seen` + `mark_seen` are called in `core/scraper.py` after a successful download+process cycle, not before - so a file that fails mid-process will be retried on next run.
|
||||||
- Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.
|
- Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
cache.py — Tracks already-processed file IDs to avoid redownloading.
|
cache.py - Tracks already-processed file IDs to avoid redownloading.
|
||||||
Persists to a simple JSON file on disk.
|
Persists to a simple JSON file on disk.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|||||||
@@ -85,5 +85,5 @@ Indexes: `url`, `username`, `source`, `timestamp`, `severity`.
|
|||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Each query opens and closes its own connection via the `_connect()` context manager.
|
- Each query opens and closes its own connection via the `_connect()` context manager.
|
||||||
- `conn.row_factory = sqlite3.Row` — rows support both index and column-name access.
|
- `conn.row_factory = sqlite3.Row` - rows support both index and column-name access.
|
||||||
- Transactions: commit on success, rollback on exception.
|
- Transactions: commit on success, rollback on exception.
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
database.py — SQLite storage for credential hits.
|
database.py - SQLite storage for credential hits.
|
||||||
|
|
||||||
Schema:
|
Schema:
|
||||||
hits table:
|
hits table:
|
||||||
|
|||||||
@@ -51,7 +51,7 @@ Check 6 (no severity change): flags weak passwords ≤6 chars or common strings.
|
|||||||
## Employee domain matching
|
## Employee domain matching
|
||||||
|
|
||||||
Keywords in `config.TARGET_KEYWORDS` containing `@` become employee patterns.
|
Keywords in `config.TARGET_KEYWORDS` containing `@` become employee patterns.
|
||||||
Pattern: `@<domain>(?:[^a-zA-Z0-9.\-]|$)` — requires literal `@` before the domain.
|
Pattern: `@<domain>(?:[^a-zA-Z0-9.\-]|$)` - requires literal `@` before the domain.
|
||||||
**`user@gmail.com` on a URL containing `myorg.cl` does NOT trigger CRITICAL.**
|
**`user@gmail.com` on a URL containing `myorg.cl` does NOT trigger CRITICAL.**
|
||||||
|
|
||||||
Keywords without `@` go only to `ORG_DOMAINS` (LOW baseline).
|
Keywords without `@` go only to `ORG_DOMAINS` (LOW baseline).
|
||||||
@@ -64,11 +64,11 @@ Separators: `:` `;` `,` `|` `\t` (any of these between the three fields).
|
|||||||
|
|
||||||
The URL field handles two common stealer-log complications:
|
The URL field handles two common stealer-log complications:
|
||||||
|
|
||||||
1. **`://` not treated as separator** — the optional scheme prefix `(?:https?|ftp)://` is consumed before the character-class match, so `https://` never gets split at the colon.
|
1. **`://` not treated as separator** - the optional scheme prefix `(?:https?|ftp)://` is consumed before the character-class match, so `https://` never gets split at the colon.
|
||||||
|
|
||||||
2. **Port + path consumed into the URL** — the optional group `(?::\d+/[^\s:;,|\t]*)` absorbs `:port/path` when the port is pure digits immediately followed by `/`. This correctly handles `http://host:8085/path/:user:pass` but intentionally skips patterns like `:24145487-8` (RUT number — hyphen after digits, no `/`).
|
2. **Port + path consumed into the URL** - the optional group `(?::\d+/[^\s:;,|\t]*)` absorbs `:port/path` when the port is pure digits immediately followed by `/`. This correctly handles `http://host:8085/path/:user:pass` but intentionally skips patterns like `:24145487-8` (RUT number - hyphen after digits, no `/`).
|
||||||
|
|
||||||
**Known limitation:** A bare port with no path (e.g. `https://host:8080:user:pass`) will mis-parse `8080` as the username. This is not observed in practice — stealer logs always include at least a trailing `/`.
|
**Known limitation:** A bare port with no path (e.g. `https://host:8080:user:pass`) will mis-parse `8080` as the username. This is not observed in practice - stealer logs always include at least a trailing `/`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -79,7 +79,7 @@ The URL field handles two common stealer-log complications:
|
|||||||
| `EMPLOYEE_DOMAINS` | `list[tuple[str, Pattern]]` | `(domain_str, anchored_pattern)` for `@`-keywords |
|
| `EMPLOYEE_DOMAINS` | `list[tuple[str, Pattern]]` | `(domain_str, anchored_pattern)` for `@`-keywords |
|
||||||
| `ORG_DOMAINS` | `list[Pattern]` | Plain domain patterns for all keywords |
|
| `ORG_DOMAINS` | `list[Pattern]` | Plain domain patterns for all keywords |
|
||||||
|
|
||||||
scorer uses `import config as _config` (not `from config import TARGET_KEYWORDS`), so patching `config.TARGET_KEYWORDS` at runtime is sufficient — `_build_*` reads the live module attribute.
|
scorer uses `import config as _config` (not `from config import TARGET_KEYWORDS`), so patching `config.TARGET_KEYWORDS` at runtime is sufficient - `_build_*` reads the live module attribute.
|
||||||
|
|
||||||
To rebuild after editing `config.TARGET_KEYWORDS` at runtime:
|
To rebuild after editing `config.TARGET_KEYWORDS` at runtime:
|
||||||
```python
|
```python
|
||||||
|
|||||||
@@ -1,24 +1,24 @@
|
|||||||
"""
|
"""
|
||||||
scorer.py — Severity scoring for credential hits.
|
scorer.py - Severity scoring for credential hits.
|
||||||
|
|
||||||
Scoring logic (highest match wins):
|
Scoring logic (highest match wins):
|
||||||
|
|
||||||
CRITICAL — Employee credentials (internal email domain)
|
CRITICAL - Employee credentials (internal email domain)
|
||||||
e.g. jdoe@yourclinic.cl:password
|
e.g. jdoe@yourclinic.cl:password
|
||||||
— Admin/privileged service URLs
|
- Admin/privileged service URLs
|
||||||
e.g. admin., vpn., ssh., rdp., gitlab., jira.
|
e.g. admin., vpn., ssh., rdp., gitlab., jira.
|
||||||
|
|
||||||
HIGH — Internal-facing services
|
HIGH - Internal-facing services
|
||||||
e.g. intranet., erp., crm., portal., citrix.
|
e.g. intranet., erp., crm., portal., citrix.
|
||||||
— Password manager or SSO hits
|
- Password manager or SSO hits
|
||||||
— Any credential where username looks like an employee email
|
- Any credential where username looks like an employee email
|
||||||
|
|
||||||
MEDIUM — Client-facing portals
|
MEDIUM - Client-facing portals
|
||||||
e.g. app., patient., client., booking.
|
e.g. app., patient., client., booking.
|
||||||
— Domain match on a non-privileged service
|
- Domain match on a non-privileged service
|
||||||
|
|
||||||
LOW — Generic domain keyword match
|
LOW - Generic domain keyword match
|
||||||
— No URL parsed, just a raw domain mention
|
- No URL parsed, just a raw domain mention
|
||||||
|
|
||||||
Each scored hit gets a dict with:
|
Each scored hit gets a dict with:
|
||||||
- severity: CRITICAL / HIGH / MEDIUM / LOW
|
- severity: CRITICAL / HIGH / MEDIUM / LOW
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
web/app.py — FastAPI application factory.
|
web/app.py - FastAPI application factory.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
from web.app import create_app
|
from web.app import create_app
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
"""
|
"""
|
||||||
web/auth.py — JWT signing/verification and bcrypt password helpers.
|
web/auth.py - JWT signing/verification and bcrypt password helpers.
|
||||||
|
|
||||||
Tokens:
|
Tokens:
|
||||||
access — HS256, 15 min TTL, payload: {sub, role, type:"access"}
|
access - HS256, 15 min TTL, payload: {sub, role, type:"access"}
|
||||||
refresh — HS256, 7 day TTL, payload: {sub, jti, type:"refresh"}
|
refresh - HS256, 7 day TTL, payload: {sub, jti, type:"refresh"}
|
||||||
|
|
||||||
Both tokens live in httpOnly SameSite=Strict cookies.
|
Both tokens live in httpOnly SameSite=Strict cookies.
|
||||||
The `type` claim prevents an access token being used as a refresh token.
|
The `type` claim prevents an access token being used as a refresh token.
|
||||||
|
|||||||
10
web/db.py
10
web/db.py
@@ -1,9 +1,9 @@
|
|||||||
"""
|
"""
|
||||||
web/db.py — SQLite user store for the web frontend.
|
web/db.py - SQLite user store for the web frontend.
|
||||||
|
|
||||||
Tables:
|
Tables:
|
||||||
users — credentials + role + active flag
|
users - credentials + role + active flag
|
||||||
refresh_tokens — JTI-indexed refresh token revocation list
|
refresh_tokens - JTI-indexed refresh token revocation list
|
||||||
|
|
||||||
Bootstrap: on first init, creates a superadmin from WEB_ADMIN_USER / WEB_ADMIN_PASS
|
Bootstrap: on first init, creates a superadmin from WEB_ADMIN_USER / WEB_ADMIN_PASS
|
||||||
env vars (required only on first run if the DB doesn't exist yet).
|
env vars (required only on first run if the DB doesn't exist yet).
|
||||||
@@ -63,7 +63,9 @@ def init_db() -> None:
|
|||||||
admin_pass = os.environ.get("WEB_ADMIN_PASS")
|
admin_pass = os.environ.get("WEB_ADMIN_PASS")
|
||||||
if not admin_pass:
|
if not admin_pass:
|
||||||
raise RuntimeError(
|
raise RuntimeError(
|
||||||
"WEB_ADMIN_PASS env var is required on first run to create the superadmin."
|
"WEB_ADMIN_PASS env var is required on first run to bootstrap the superadmin. "
|
||||||
|
"Add WEB_ADMIN_PASS=<password> (and optionally WEB_ADMIN_USER=<username>) "
|
||||||
|
"to your .env file, then restart."
|
||||||
)
|
)
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO users (id, username, password_hash, role, created_at) VALUES (?,?,?,?,?)",
|
"INSERT INTO users (id, username, password_hash, role, created_at) VALUES (?,?,?,?,?)",
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
web/dependencies.py — FastAPI dependency functions.
|
web/dependencies.py - FastAPI dependency functions.
|
||||||
|
|
||||||
get_current_user: reads the access_token cookie, decodes + validates it,
|
get_current_user: reads the access_token cookie, decodes + validates it,
|
||||||
loads the user row from web.db. Raises 401 if anything fails.
|
loads the user row from web.db. Raises 401 if anything fails.
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
web/models.py — Pydantic request/response schemas.
|
web/models.py - Pydantic request/response schemas.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import re
|
import re
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
"""
|
"""
|
||||||
web/routes/auth.py — Login, logout, token refresh.
|
web/routes/auth.py - Login, logout, token refresh.
|
||||||
|
|
||||||
POST /login — form submit; sets access_token + refresh_token cookies
|
POST /login - form submit; sets access_token + refresh_token cookies
|
||||||
POST /logout — revokes refresh token, clears cookies
|
POST /logout - revokes refresh token, clears cookies
|
||||||
POST /refresh — exchanges refresh_token cookie for a new access_token
|
POST /refresh - exchanges refresh_token cookie for a new access_token
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from fastapi import APIRouter, Form, HTTPException, Request, Response, status
|
from fastapi import APIRouter, Form, HTTPException, Request, Response, status
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
web/routes/config_routes.py — Keyword groups and channel list management.
|
web/routes/config_routes.py - Keyword groups and channel list management.
|
||||||
|
|
||||||
GET /config/keywords → render groups editor
|
GET /config/keywords → render groups editor
|
||||||
PUT /config/keywords → validate + save groups, reload scorer
|
PUT /config/keywords → validate + save groups, reload scorer
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
web/routes/dashboard.py — Dashboard views and SSE live stream.
|
web/routes/dashboard.py - Dashboard views and SSE live stream.
|
||||||
|
|
||||||
GET / → redirect to /dashboard
|
GET / → redirect to /dashboard
|
||||||
GET /dashboard → overview: all groups, stats, live hit feed
|
GET /dashboard → overview: all groups, stats, live hit feed
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
web/routes/users.py — User CRUD (superadmin only).
|
web/routes/users.py - User CRUD (superadmin only).
|
||||||
|
|
||||||
GET /users → list all users
|
GET /users → list all users
|
||||||
POST /users → create a new user
|
POST /users → create a new user
|
||||||
|
|||||||
Reference in New Issue
Block a user