Files
stealergram/QUICK_REF.md
anti 741e6bb0d3 Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout
- Add pyproject.toml (replaces requirements.txt split, folds pytest.ini)
- Replace all em-dashes with hyphens across all source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:06:30 -04:00

6.5 KiB
Raw Blame History

ULP Monitor - Quick Reference

For Claude Code: read the per-file .md alongside each .py before editing.
Full docs in README.md.


Project layout

ulp_monitor/
├── main.py           Entry point (--no-tui flag for CLI mode)
├── config.py         All settings - edit this for keywords, channels, paths
│
├── core/             Telegram I/O pipeline (all async, Telethon-dependent)
│   ├── scraper.py        Live listener + backfill orchestration
│   ├── tdl_downloader.py tdl subprocess wrapper + Telethon fallback
│   ├── bot_downloader.py Inline "DOWNLOAD" button click flow
│   ├── processor.py      Archive extraction (.zip/.7z/.rar) + line search
│   └── notifier.py       Scoring → dedup → DB → hits.txt/csv → Telegram alert
│
├── utils/            Pure logic, no Telegram deps, no async
│   ├── scorer.py         Severity scoring (CRITICAL/HIGH/MEDIUM/LOW)
│   ├── cache.py          Seen file-ID dedup (data/cache.json)
│   └── database.py       SQLite read/write (data/hits.db)
│
├── tui/              Textual TUI - runs in main thread
│   ├── app.py            MonitorApp + all screens + bot thread launcher
│   └── events.py         Thread-safe queue.Queue event bus
│
└── data/             Runtime output - gitignored
    ├── hits.db
    ├── hits.txt
    ├── hits.csv
    ├── cache.json
    ├── dedup.json
    └── logs/monitor.log

Data flow

Telegram channel
  └─ new message with file / download button
       │
       ├─ core/scraper.py          detects + guards (size, extension, dedup)
       │
       ├─ core/tdl_downloader.py   downloads via tdl (batched)
       │   └─ core/scraper.py      Telethon fallback if tdl fails
       │
       ├─ core/bot_downloader.py   handles inline button → bot reply flow
       │
       ├─ core/processor.py        extracts archive → searches .txt line by line
       │
       └─ core/notifier.py         scores → deduplicates → persists → alerts
            ├─ utils/scorer.py
            ├─ utils/database.py
            └─ tui/events.py       posts EvHit to TUI

Threading architecture

main thread (Textual's event loop)
  ├─ MonitorApp.on_mount()
  │   ├─ bus.init_bus()            creates queue.Queue on THIS loop
  │   ├─ threading.Thread → _run_bot_thread()
  │   └─ set_interval(0.1, _drain_bus)
  │
  ├─ _drain_bus() [every 100ms]
  │   └─ queue.Queue.get_nowait() → dispatch to widgets
  │
  └─ Textual widgets, screens, keybindings

bot thread (own asyncio event loop)
  └─ _bot_main()
      ├─ bot_client.connect() + sign_in()
      ├─ user_client.connect() + is_user_authorized()
      ├─ warm_entity_cache()
      ├─ _make_handler() → NewMessage handler registered
      ├─ backfill_all()
      └─ run_until_disconnected() + _watch_channels() [gathered]

cross-thread communication
  bot → TUI:  bus.post(event)              [queue.Queue.put_nowait, always safe]
  TUI → bot:  loop.call_soon_threadsafe()  [asyncio.Event.set for channel changes]

Config quick reference (config.py)

Setting Type Description
API_ID int From my.telegram.org
API_HASH str From my.telegram.org
BOT_TOKEN str From @BotFather
NOTIFY_CHAT_ID int Your Telegram user/group ID
SESSION_NAME str Session file name (default: monitor_session)
TARGET_KEYWORDS list[str] Regex patterns. @-prefixed → employee email (CRITICAL). Plain → domain match (LOW)
WATCHED_CHANNELS list[str|int] Usernames or -100xxxxxxxxxx IDs
BACKFILL_LIMIT int Messages to scan per channel on startup (0 = off)
ALLOWED_EXTENSIONS set .txt .zip .7z .rar
MAX_FILE_SIZE int Bytes (default 4 GB)
ARCHIVE_PASSWORDS list[bytes] Tried in order on locked archives
TDL_NAMESPACE str|None tdl login -n <name> namespace
TDL_THREADS int Chunk workers per file (-t)
TDL_PERFILE int Concurrent files per tdl call (-l)
TDL_AMOUNT int Messages per batch
TEMP_DIR Path data/tmp
HITS_FILE Path data/hits.txt
LOG_FILE Path data/logs/monitor.log

Severity scoring summary

Severity Score Triggers
CRITICAL 40 Employee email (@myorg.cl in username) · Privileged service URL (admin, vpn, rdp, gitlab…)
HIGH 30 Internal service URL (intranet, erp, sso, owa…)
MEDIUM 20 Client-facing URL (app, booking, helpdesk…)
LOW 10 Org domain appears anywhere in line

@-keyword rule: pattern requires literal @ before domain - user@gmail.com on a URL containing myorg.cl does not trigger CRITICAL.


TUI keybindings

Key Action Screen
s Search hits DB → SearchScreen
h Browse hits by severity → HitsDBScreen
k Edit keyword patterns live → KeywordsScreen
c Clear download + hits logs main
r Force-refresh stats bar main
q / ctrl+c Quit any
Escape Back to main sub-screens
1/2/3/4 Filter CRITICAL/HIGH/MEDIUM/LOW HitsDBScreen
r Load recent 50 HitsDBScreen

Per-file reference docs

File Reference
utils/scorer.py utils/scorer.md
utils/cache.py utils/cache.md
utils/database.py utils/database.md
core/scraper.py core/scraper.md
core/processor.py core/processor.md
core/notifier.py core/notifier.md
core/tdl_downloader.py core/tdl_downloader.md
core/bot_downloader.py core/bot_downloader.md
tui/app.py tui/app.md
tui/events.py tui/events.md

Common tasks

Add a new keyword at runtime: open the TUI → press k → add pattern → active immediately. Copy to config.TARGET_KEYWORDS to persist.

Add a channel at runtime: type username or numeric ID in the Channels panel → Add. Handler re-registers immediately. Edit config.WATCHED_CHANNELS to persist.

Query hits from CLI:

sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20"

Re-process all files (wipe cache):

rm data/cache.json data/dedup.json

Check what's happening: tail -f data/logs/monitor.log