# ULP Monitor - Quick Reference > For Claude Code: read the per-file `.md` alongside each `.py` before editing. > Full docs in `README.md`. --- ## Project layout ``` ulp_monitor/ ├── main.py Entry point (--no-tui flag for CLI mode) ├── config.py All settings - edit this for keywords, channels, paths │ ├── core/ Telegram I/O pipeline (all async, Telethon-dependent) │ ├── scraper.py Live listener + backfill orchestration │ ├── tdl_downloader.py tdl subprocess wrapper + Telethon fallback │ ├── bot_downloader.py Inline "DOWNLOAD" button click flow │ ├── processor.py Archive extraction (.zip/.7z/.rar) + line search │ └── notifier.py Scoring → dedup → DB → hits.txt/csv → Telegram alert │ ├── utils/ Pure logic, no Telegram deps, no async │ ├── scorer.py Severity scoring (CRITICAL/HIGH/MEDIUM/LOW) │ ├── cache.py Seen file-ID dedup (data/cache.json) │ └── database.py SQLite read/write (data/hits.db) │ ├── tui/ Textual TUI - runs in main thread │ ├── app.py MonitorApp + all screens + bot thread launcher │ └── events.py Thread-safe queue.Queue event bus │ └── data/ Runtime output - gitignored ├── hits.db ├── hits.txt ├── hits.csv ├── cache.json ├── dedup.json └── logs/monitor.log ``` --- ## Data flow ``` Telegram channel └─ new message with file / download button │ ├─ core/scraper.py detects + guards (size, extension, dedup) │ ├─ core/tdl_downloader.py downloads via tdl (batched) │ └─ core/scraper.py Telethon fallback if tdl fails │ ├─ core/bot_downloader.py handles inline button → bot reply flow │ ├─ core/processor.py extracts archive → searches .txt line by line │ └─ core/notifier.py scores → deduplicates → persists → alerts ├─ utils/scorer.py ├─ utils/database.py └─ tui/events.py posts EvHit to TUI ``` --- ## Threading architecture ``` main thread (Textual's event loop) ├─ MonitorApp.on_mount() │ ├─ bus.init_bus() creates queue.Queue on THIS loop │ ├─ threading.Thread → _run_bot_thread() │ └─ set_interval(0.1, _drain_bus) │ ├─ _drain_bus() [every 100ms] │ └─ queue.Queue.get_nowait() → dispatch to widgets │ └─ Textual widgets, screens, keybindings bot thread (own asyncio event loop) └─ _bot_main() ├─ bot_client.connect() + sign_in() ├─ user_client.connect() + is_user_authorized() ├─ warm_entity_cache() ├─ _make_handler() → NewMessage handler registered ├─ backfill_all() └─ run_until_disconnected() + _watch_channels() [gathered] cross-thread communication bot → TUI: bus.post(event) [queue.Queue.put_nowait, always safe] TUI → bot: loop.call_soon_threadsafe() [asyncio.Event.set for channel changes] ``` --- ## Config quick reference (`config.py`) | Setting | Type | Description | |---------|------|-------------| | `API_ID` | int | From my.telegram.org | | `API_HASH` | str | From my.telegram.org | | `BOT_TOKEN` | str | From @BotFather | | `NOTIFY_CHAT_ID` | int | Your Telegram user/group ID | | `SESSION_NAME` | str | Session file name (default: `monitor_session`) | | `TARGET_KEYWORDS` | list[str] | Regex patterns. `@`-prefixed → employee email (CRITICAL). Plain → domain match (LOW) | | `WATCHED_CHANNELS` | list[str\|int] | Usernames or `-100xxxxxxxxxx` IDs | | `BACKFILL_LIMIT` | int | Messages to scan per channel on startup (0 = off) | | `ALLOWED_EXTENSIONS` | set | `.txt .zip .7z .rar` | | `MAX_FILE_SIZE` | int | Bytes (default 4 GB) | | `ARCHIVE_PASSWORDS` | list[bytes] | Tried in order on locked archives | | `TDL_NAMESPACE` | str\|None | `tdl login -n ` namespace | | `TDL_THREADS` | int | Chunk workers per file (`-t`) | | `TDL_PERFILE` | int | Concurrent files per tdl call (`-l`) | | `TDL_AMOUNT` | int | Messages per batch | | `TEMP_DIR` | Path | `data/tmp` | | `HITS_FILE` | Path | `data/hits.txt` | | `LOG_FILE` | Path | `data/logs/monitor.log` | --- ## Severity scoring summary | Severity | Score | Triggers | |----------|-------|----------| | CRITICAL | 40 | Employee email (`@myorg.cl` in username) · Privileged service URL (admin, vpn, rdp, gitlab…) | | HIGH | 30 | Internal service URL (intranet, erp, sso, owa…) | | MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) | | LOW | 10 | Org domain appears anywhere in line | `@`-keyword rule: pattern requires literal `@` before domain - `user@gmail.com` on a URL containing `myorg.cl` does **not** trigger CRITICAL. --- ## TUI keybindings | Key | Action | Screen | |-----|--------|--------| | `s` | Search hits DB | → SearchScreen | | `h` | Browse hits by severity | → HitsDBScreen | | `k` | Edit keyword patterns live | → KeywordsScreen | | `c` | Clear download + hits logs | main | | `r` | Force-refresh stats bar | main | | `q` / `ctrl+c` | Quit | any | | `Escape` | Back to main | sub-screens | | `1`/`2`/`3`/`4` | Filter CRITICAL/HIGH/MEDIUM/LOW | HitsDBScreen | | `r` | Load recent 50 | HitsDBScreen | --- ## Per-file reference docs | File | Reference | |------|-----------| | `utils/scorer.py` | `utils/scorer.md` | | `utils/cache.py` | `utils/cache.md` | | `utils/database.py` | `utils/database.md` | | `core/scraper.py` | `core/scraper.md` | | `core/processor.py` | `core/processor.md` | | `core/notifier.py` | `core/notifier.md` | | `core/tdl_downloader.py` | `core/tdl_downloader.md` | | `core/bot_downloader.py` | `core/bot_downloader.md` | | `tui/app.py` | `tui/app.md` | | `tui/events.py` | `tui/events.md` | --- ## Common tasks **Add a new keyword at runtime:** open the TUI → press `k` → add pattern → active immediately. Copy to `config.TARGET_KEYWORDS` to persist. **Add a channel at runtime:** type username or numeric ID in the Channels panel → ➕ Add. Handler re-registers immediately. Edit `config.WATCHED_CHANNELS` to persist. **Query hits from CLI:** ```bash sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20" ``` **Re-process all files** (wipe cache): ```bash rm data/cache.json data/dedup.json ``` **Check what's happening:** `tail -f data/logs/monitor.log`