# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Development workflow After every code change: 1. Run `pytest` - all tests must pass at 100%. 2. If 100% pass: present the change to the user, then commit. 3. If any test fails: fix the bug and re-run before showing anything to the user. Never present code or commit while tests are failing. ## Running tests ```bash pip install -r requirements-dev.txt pytest # all tests pytest -v # verbose pytest tests/test_scorer.py # single file ``` Tests cover `utils/scorer`, `utils/cache`, `utils/database`, and `core/processor`. They are fully isolated - no `.env` required, no real DB or cache files touched. The `patched_keywords` fixture in `conftest.py` replaces `TARGET_KEYWORDS` with known test patterns; it must patch both `config.TARGET_KEYWORDS` and `scorer.TARGET_KEYWORDS` (the local `from config import` binding). ## Running the monitor ```bash source .venv/bin/activate # initialize the python enviroment, if .venv exists python main.py # TUI mode (default) python main.py --no-tui # Plain CLI, logs to stdout + data/logs/monitor.log ``` First run will interactively prompt for Telegram phone + 2FA to create a session file. ## Setup prerequisites ```bash pip install -r requirements.txt # rarfile requires the unrar binary: sudo apt install unrar (Linux) or brew install rar (macOS) # tdl (strongly recommended for fast downloads): curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh | bash tdl login -n monitor_session ``` If no `.env` file exists, ask the user to manually create the file. We cannot create it, because it contains personal information. ## Architecture ### Data flow ``` Telegram channel message with file attachment └─ core/scraper.py detects attachment, guards (size/extension/dedup) └─ core/tdl_downloader.py downloads via tdl subprocess (batched) └─ core/scraper.py Telethon fallback if tdl fails └─ core/bot_downloader.py handles inline "DOWNLOAD" button → bot reply flow └─ core/processor.py extracts .zip/.7z/.rar, searches .txt line by line └─ core/notifier.py scores → deduplicates → writes DB/txt/csv → Telegram alert ├─ utils/scorer.py ├─ utils/database.py └─ tui/events.py posts EvHit to TUI event bus ``` ### Threading model The TUI and Telegram bot run in separate threads with different event loops: - **Main thread**: Textual's event loop - runs `MonitorApp`, drains the event bus every 100ms via `_drain_bus()` - **Bot thread**: own `asyncio` event loop - runs `_bot_main()` with both `user_client` and `bot_client` - **Cross-thread communication**: bot → TUI via `bus.post()` (`queue.Queue.put_nowait`, always safe); TUI → bot via `loop.call_soon_threadsafe()` (e.g., to signal channel list changes) ### Module responsibilities | Module | Role | |--------|------| | `config.py` | All settings - edit keywords, channels, paths, tdl tuning here | | `core/scraper.py` | Live listener + backfill orchestration; registers Telethon `NewMessage` handlers | | `core/tdl_downloader.py` | Wraps `tdl` subprocess for fast downloads; falls back to Telethon | | `core/bot_downloader.py` | Handles inline button click flow where files come via bot reply | | `core/processor.py` | Archive extraction (supports nested archives one level deep) + line-by-line search | | `core/notifier.py` | Scoring → dedup → DB insert → hits.txt/csv write → Telegram bot alert | | `utils/scorer.py` | Severity scoring; parses ULP lines (`url:user:pass`), classifies CRITICAL/HIGH/MEDIUM/LOW | | `utils/cache.py` | Seen file-ID dedup stored in `data/cache.json` | | `utils/database.py` | SQLite read/write for `data/hits.db` | | `tui/app.py` | `MonitorApp` + all screens (Search, HitsDB, Keywords) | | `tui/events.py` | Thread-safe `queue.Queue` event bus | ### Severity scoring Keywords in `config.TARGET_KEYWORDS` with `@` (e.g. `r"@myorg\.cl"`) are **employee email domains** → CRITICAL on match. Keywords without `@` are plain domain matches → LOW baseline. | Severity | Score | Triggers | |----------|-------|----------| | CRITICAL | 40 | Employee email in username · Privileged service URL (admin, vpn, rdp, gitlab…) | | HIGH | 30 | Internal service URL (intranet, erp, sso, owa…) | | MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) | | LOW | 10 | Org domain appears anywhere in line | Telegram alerts fire for CRITICAL/HIGH/MEDIUM only. LOW is stored silently. ## Per-file reference docs Each `.py` has a companion `.md` with design notes. **Always read the `.md` first, then the `.py` only if needed.** After making code changes, update the companion `.md` to match. ## Useful CLI queries ```bash # Query hits directly sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20" # Wipe dedup cache to re-process files rm data/cache.json data/dedup.json # Follow live log tail -f data/logs/monitor.log ``` ## TUI keybindings | Key | Action | |-----|--------| | `s` | Search hits DB | | `h` | Browse hits by severity (filter with `1`/`2`/`3`/`4`, recent with `r`) | | `k` | Edit keyword patterns live (changes take effect immediately) | | `c` | Clear logs | | `r` | Refresh stats | | `q` / `Escape` | Quit / back | Runtime keyword and channel changes are **not** persisted - copy them to `config.py` to survive restarts.