Files
stealergram/QUICK_REF.md
anti 741e6bb0d3 Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout
- Add pyproject.toml (replaces requirements.txt split, folds pytest.ini)
- Replace all em-dashes with hyphens across all source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:06:30 -04:00

183 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ULP Monitor - Quick Reference
> For Claude Code: read the per-file `.md` alongside each `.py` before editing.
> Full docs in `README.md`.
---
## Project layout
```
ulp_monitor/
├── main.py Entry point (--no-tui flag for CLI mode)
├── config.py All settings - edit this for keywords, channels, paths
├── core/ Telegram I/O pipeline (all async, Telethon-dependent)
│ ├── scraper.py Live listener + backfill orchestration
│ ├── tdl_downloader.py tdl subprocess wrapper + Telethon fallback
│ ├── bot_downloader.py Inline "DOWNLOAD" button click flow
│ ├── processor.py Archive extraction (.zip/.7z/.rar) + line search
│ └── notifier.py Scoring → dedup → DB → hits.txt/csv → Telegram alert
├── utils/ Pure logic, no Telegram deps, no async
│ ├── scorer.py Severity scoring (CRITICAL/HIGH/MEDIUM/LOW)
│ ├── cache.py Seen file-ID dedup (data/cache.json)
│ └── database.py SQLite read/write (data/hits.db)
├── tui/ Textual TUI - runs in main thread
│ ├── app.py MonitorApp + all screens + bot thread launcher
│ └── events.py Thread-safe queue.Queue event bus
└── data/ Runtime output - gitignored
├── hits.db
├── hits.txt
├── hits.csv
├── cache.json
├── dedup.json
└── logs/monitor.log
```
---
## Data flow
```
Telegram channel
└─ new message with file / download button
├─ core/scraper.py detects + guards (size, extension, dedup)
├─ core/tdl_downloader.py downloads via tdl (batched)
│ └─ core/scraper.py Telethon fallback if tdl fails
├─ core/bot_downloader.py handles inline button → bot reply flow
├─ core/processor.py extracts archive → searches .txt line by line
└─ core/notifier.py scores → deduplicates → persists → alerts
├─ utils/scorer.py
├─ utils/database.py
└─ tui/events.py posts EvHit to TUI
```
---
## Threading architecture
```
main thread (Textual's event loop)
├─ MonitorApp.on_mount()
│ ├─ bus.init_bus() creates queue.Queue on THIS loop
│ ├─ threading.Thread → _run_bot_thread()
│ └─ set_interval(0.1, _drain_bus)
├─ _drain_bus() [every 100ms]
│ └─ queue.Queue.get_nowait() → dispatch to widgets
└─ Textual widgets, screens, keybindings
bot thread (own asyncio event loop)
└─ _bot_main()
├─ bot_client.connect() + sign_in()
├─ user_client.connect() + is_user_authorized()
├─ warm_entity_cache()
├─ _make_handler() → NewMessage handler registered
├─ backfill_all()
└─ run_until_disconnected() + _watch_channels() [gathered]
cross-thread communication
bot → TUI: bus.post(event) [queue.Queue.put_nowait, always safe]
TUI → bot: loop.call_soon_threadsafe() [asyncio.Event.set for channel changes]
```
---
## Config quick reference (`config.py`)
| Setting | Type | Description |
|---------|------|-------------|
| `API_ID` | int | From my.telegram.org |
| `API_HASH` | str | From my.telegram.org |
| `BOT_TOKEN` | str | From @BotFather |
| `NOTIFY_CHAT_ID` | int | Your Telegram user/group ID |
| `SESSION_NAME` | str | Session file name (default: `monitor_session`) |
| `TARGET_KEYWORDS` | list[str] | Regex patterns. `@`-prefixed → employee email (CRITICAL). Plain → domain match (LOW) |
| `WATCHED_CHANNELS` | list[str\|int] | Usernames or `-100xxxxxxxxxx` IDs |
| `BACKFILL_LIMIT` | int | Messages to scan per channel on startup (0 = off) |
| `ALLOWED_EXTENSIONS` | set | `.txt .zip .7z .rar` |
| `MAX_FILE_SIZE` | int | Bytes (default 4 GB) |
| `ARCHIVE_PASSWORDS` | list[bytes] | Tried in order on locked archives |
| `TDL_NAMESPACE` | str\|None | `tdl login -n <name>` namespace |
| `TDL_THREADS` | int | Chunk workers per file (`-t`) |
| `TDL_PERFILE` | int | Concurrent files per tdl call (`-l`) |
| `TDL_AMOUNT` | int | Messages per batch |
| `TEMP_DIR` | Path | `data/tmp` |
| `HITS_FILE` | Path | `data/hits.txt` |
| `LOG_FILE` | Path | `data/logs/monitor.log` |
---
## Severity scoring summary
| Severity | Score | Triggers |
|----------|-------|----------|
| CRITICAL | 40 | Employee email (`@myorg.cl` in username) · Privileged service URL (admin, vpn, rdp, gitlab…) |
| HIGH | 30 | Internal service URL (intranet, erp, sso, owa…) |
| MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) |
| LOW | 10 | Org domain appears anywhere in line |
`@`-keyword rule: pattern requires literal `@` before domain - `user@gmail.com` on a URL containing `myorg.cl` does **not** trigger CRITICAL.
---
## TUI keybindings
| Key | Action | Screen |
|-----|--------|--------|
| `s` | Search hits DB | → SearchScreen |
| `h` | Browse hits by severity | → HitsDBScreen |
| `k` | Edit keyword patterns live | → KeywordsScreen |
| `c` | Clear download + hits logs | main |
| `r` | Force-refresh stats bar | main |
| `q` / `ctrl+c` | Quit | any |
| `Escape` | Back to main | sub-screens |
| `1`/`2`/`3`/`4` | Filter CRITICAL/HIGH/MEDIUM/LOW | HitsDBScreen |
| `r` | Load recent 50 | HitsDBScreen |
---
## Per-file reference docs
| File | Reference |
|------|-----------|
| `utils/scorer.py` | `utils/scorer.md` |
| `utils/cache.py` | `utils/cache.md` |
| `utils/database.py` | `utils/database.md` |
| `core/scraper.py` | `core/scraper.md` |
| `core/processor.py` | `core/processor.md` |
| `core/notifier.py` | `core/notifier.md` |
| `core/tdl_downloader.py` | `core/tdl_downloader.md` |
| `core/bot_downloader.py` | `core/bot_downloader.md` |
| `tui/app.py` | `tui/app.md` |
| `tui/events.py` | `tui/events.md` |
---
## Common tasks
**Add a new keyword at runtime:** open the TUI → press `k` → add pattern → active immediately. Copy to `config.TARGET_KEYWORDS` to persist.
**Add a channel at runtime:** type username or numeric ID in the Channels panel → Add. Handler re-registers immediately. Edit `config.WATCHED_CHANNELS` to persist.
**Query hits from CLI:**
```bash
sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20"
```
**Re-process all files** (wipe cache):
```bash
rm data/cache.json data/dedup.json
```
**Check what's happening:** `tail -f data/logs/monitor.log`