Initial commit: ULPgrammer
- Core Telegram monitoring pipeline (scraper, processor, notifier, downloaders) - Textual TUI frontend with thread-safe event bus - SQLite persistence, severity scoring, dedup cache - Fixed ULP parser: handles https:// truncation, port+path URLs, semicolon separator - Test suite: 88 tests across scorer, cache, database, processor
This commit is contained in:
182
QUICK_REF.md
Normal file
182
QUICK_REF.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# ULP Monitor — Quick Reference
|
||||
|
||||
> For Claude Code: read the per-file `.md` alongside each `.py` before editing.
|
||||
> Full docs in `README.md`.
|
||||
|
||||
---
|
||||
|
||||
## Project layout
|
||||
|
||||
```
|
||||
ulp_monitor/
|
||||
├── main.py Entry point (--no-tui flag for CLI mode)
|
||||
├── config.py All settings — edit this for keywords, channels, paths
|
||||
│
|
||||
├── core/ Telegram I/O pipeline (all async, Telethon-dependent)
|
||||
│ ├── scraper.py Live listener + backfill orchestration
|
||||
│ ├── tdl_downloader.py tdl subprocess wrapper + Telethon fallback
|
||||
│ ├── bot_downloader.py Inline "DOWNLOAD" button click flow
|
||||
│ ├── processor.py Archive extraction (.zip/.7z/.rar) + line search
|
||||
│ └── notifier.py Scoring → dedup → DB → hits.txt/csv → Telegram alert
|
||||
│
|
||||
├── utils/ Pure logic, no Telegram deps, no async
|
||||
│ ├── scorer.py Severity scoring (CRITICAL/HIGH/MEDIUM/LOW)
|
||||
│ ├── cache.py Seen file-ID dedup (data/cache.json)
|
||||
│ └── database.py SQLite read/write (data/hits.db)
|
||||
│
|
||||
├── tui/ Textual TUI — runs in main thread
|
||||
│ ├── app.py MonitorApp + all screens + bot thread launcher
|
||||
│ └── events.py Thread-safe queue.Queue event bus
|
||||
│
|
||||
└── data/ Runtime output — gitignored
|
||||
├── hits.db
|
||||
├── hits.txt
|
||||
├── hits.csv
|
||||
├── cache.json
|
||||
├── dedup.json
|
||||
└── logs/monitor.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data flow
|
||||
|
||||
```
|
||||
Telegram channel
|
||||
└─ new message with file / download button
|
||||
│
|
||||
├─ core/scraper.py detects + guards (size, extension, dedup)
|
||||
│
|
||||
├─ core/tdl_downloader.py downloads via tdl (batched)
|
||||
│ └─ core/scraper.py Telethon fallback if tdl fails
|
||||
│
|
||||
├─ core/bot_downloader.py handles inline button → bot reply flow
|
||||
│
|
||||
├─ core/processor.py extracts archive → searches .txt line by line
|
||||
│
|
||||
└─ core/notifier.py scores → deduplicates → persists → alerts
|
||||
├─ utils/scorer.py
|
||||
├─ utils/database.py
|
||||
└─ tui/events.py posts EvHit to TUI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Threading architecture
|
||||
|
||||
```
|
||||
main thread (Textual's event loop)
|
||||
├─ MonitorApp.on_mount()
|
||||
│ ├─ bus.init_bus() creates queue.Queue on THIS loop
|
||||
│ ├─ threading.Thread → _run_bot_thread()
|
||||
│ └─ set_interval(0.1, _drain_bus)
|
||||
│
|
||||
├─ _drain_bus() [every 100ms]
|
||||
│ └─ queue.Queue.get_nowait() → dispatch to widgets
|
||||
│
|
||||
└─ Textual widgets, screens, keybindings
|
||||
|
||||
bot thread (own asyncio event loop)
|
||||
└─ _bot_main()
|
||||
├─ bot_client.connect() + sign_in()
|
||||
├─ user_client.connect() + is_user_authorized()
|
||||
├─ warm_entity_cache()
|
||||
├─ _make_handler() → NewMessage handler registered
|
||||
├─ backfill_all()
|
||||
└─ run_until_disconnected() + _watch_channels() [gathered]
|
||||
|
||||
cross-thread communication
|
||||
bot → TUI: bus.post(event) [queue.Queue.put_nowait, always safe]
|
||||
TUI → bot: loop.call_soon_threadsafe() [asyncio.Event.set for channel changes]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Config quick reference (`config.py`)
|
||||
|
||||
| Setting | Type | Description |
|
||||
|---------|------|-------------|
|
||||
| `API_ID` | int | From my.telegram.org |
|
||||
| `API_HASH` | str | From my.telegram.org |
|
||||
| `BOT_TOKEN` | str | From @BotFather |
|
||||
| `NOTIFY_CHAT_ID` | int | Your Telegram user/group ID |
|
||||
| `SESSION_NAME` | str | Session file name (default: `monitor_session`) |
|
||||
| `TARGET_KEYWORDS` | list[str] | Regex patterns. `@`-prefixed → employee email (CRITICAL). Plain → domain match (LOW) |
|
||||
| `WATCHED_CHANNELS` | list[str\|int] | Usernames or `-100xxxxxxxxxx` IDs |
|
||||
| `BACKFILL_LIMIT` | int | Messages to scan per channel on startup (0 = off) |
|
||||
| `ALLOWED_EXTENSIONS` | set | `.txt .zip .7z .rar` |
|
||||
| `MAX_FILE_SIZE` | int | Bytes (default 4 GB) |
|
||||
| `ARCHIVE_PASSWORDS` | list[bytes] | Tried in order on locked archives |
|
||||
| `TDL_NAMESPACE` | str\|None | `tdl login -n <name>` namespace |
|
||||
| `TDL_THREADS` | int | Chunk workers per file (`-t`) |
|
||||
| `TDL_PERFILE` | int | Concurrent files per tdl call (`-l`) |
|
||||
| `TDL_AMOUNT` | int | Messages per batch |
|
||||
| `TEMP_DIR` | Path | `data/tmp` |
|
||||
| `HITS_FILE` | Path | `data/hits.txt` |
|
||||
| `LOG_FILE` | Path | `data/logs/monitor.log` |
|
||||
|
||||
---
|
||||
|
||||
## Severity scoring summary
|
||||
|
||||
| Severity | Score | Triggers |
|
||||
|----------|-------|----------|
|
||||
| CRITICAL | 40 | Employee email (`@myorg.cl` in username) · Privileged service URL (admin, vpn, rdp, gitlab…) |
|
||||
| HIGH | 30 | Internal service URL (intranet, erp, sso, owa…) |
|
||||
| MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) |
|
||||
| LOW | 10 | Org domain appears anywhere in line |
|
||||
|
||||
`@`-keyword rule: pattern requires literal `@` before domain — `user@gmail.com` on a URL containing `myorg.cl` does **not** trigger CRITICAL.
|
||||
|
||||
---
|
||||
|
||||
## TUI keybindings
|
||||
|
||||
| Key | Action | Screen |
|
||||
|-----|--------|--------|
|
||||
| `s` | Search hits DB | → SearchScreen |
|
||||
| `h` | Browse hits by severity | → HitsDBScreen |
|
||||
| `k` | Edit keyword patterns live | → KeywordsScreen |
|
||||
| `c` | Clear download + hits logs | main |
|
||||
| `r` | Force-refresh stats bar | main |
|
||||
| `q` / `ctrl+c` | Quit | any |
|
||||
| `Escape` | Back to main | sub-screens |
|
||||
| `1`/`2`/`3`/`4` | Filter CRITICAL/HIGH/MEDIUM/LOW | HitsDBScreen |
|
||||
| `r` | Load recent 50 | HitsDBScreen |
|
||||
|
||||
---
|
||||
|
||||
## Per-file reference docs
|
||||
|
||||
| File | Reference |
|
||||
|------|-----------|
|
||||
| `utils/scorer.py` | `utils/scorer.md` |
|
||||
| `utils/cache.py` | `utils/cache.md` |
|
||||
| `utils/database.py` | `utils/database.md` |
|
||||
| `core/scraper.py` | `core/scraper.md` |
|
||||
| `core/processor.py` | `core/processor.md` |
|
||||
| `core/notifier.py` | `core/notifier.md` |
|
||||
| `core/tdl_downloader.py` | `core/tdl_downloader.md` |
|
||||
| `core/bot_downloader.py` | `core/bot_downloader.md` |
|
||||
| `tui/app.py` | `tui/app.md` |
|
||||
| `tui/events.py` | `tui/events.md` |
|
||||
|
||||
---
|
||||
|
||||
## Common tasks
|
||||
|
||||
**Add a new keyword at runtime:** open the TUI → press `k` → add pattern → active immediately. Copy to `config.TARGET_KEYWORDS` to persist.
|
||||
|
||||
**Add a channel at runtime:** type username or numeric ID in the Channels panel → ➕ Add. Handler re-registers immediately. Edit `config.WATCHED_CHANNELS` to persist.
|
||||
|
||||
**Query hits from CLI:**
|
||||
```bash
|
||||
sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20"
|
||||
```
|
||||
|
||||
**Re-process all files** (wipe cache):
|
||||
```bash
|
||||
rm data/cache.json data/dedup.json
|
||||
```
|
||||
|
||||
**Check what's happening:** `tail -f data/logs/monitor.log`
|
||||
Reference in New Issue
Block a user