5.5 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Development workflow
After every code change:
- Run
pytest— all tests must pass at 100%. - If 100% pass: present the change to the user, then commit.
- If any test fails: fix the bug and re-run before showing anything to the user.
Never present code or commit while tests are failing.
Running tests
pip install -r requirements-dev.txt
pytest # all tests
pytest -v # verbose
pytest tests/test_scorer.py # single file
Tests cover utils/scorer, utils/cache, utils/database, and core/processor. They are fully isolated — no .env required, no real DB or cache files touched. The patched_keywords fixture in conftest.py replaces TARGET_KEYWORDS with known test patterns; it must patch both config.TARGET_KEYWORDS and scorer.TARGET_KEYWORDS (the local from config import binding).
Running the monitor
source .venv/bin/activate # initialize the python enviroment, if .venv exists
python main.py # TUI mode (default)
python main.py --no-tui # Plain CLI, logs to stdout + data/logs/monitor.log
First run will interactively prompt for Telegram phone + 2FA to create a session file.
Setup prerequisites
pip install -r requirements.txt
# rarfile requires the unrar binary: sudo apt install unrar (Linux) or brew install rar (macOS)
# tdl (strongly recommended for fast downloads):
curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh | bash
tdl login -n monitor_session
If no .env file exists, ask the user to manually create the file. We cannot create it, because it contains personal information.
Architecture
Data flow
Telegram channel message with file attachment
└─ core/scraper.py detects attachment, guards (size/extension/dedup)
└─ core/tdl_downloader.py downloads via tdl subprocess (batched)
└─ core/scraper.py Telethon fallback if tdl fails
└─ core/bot_downloader.py handles inline "DOWNLOAD" button → bot reply flow
└─ core/processor.py extracts .zip/.7z/.rar, searches .txt line by line
└─ core/notifier.py scores → deduplicates → writes DB/txt/csv → Telegram alert
├─ utils/scorer.py
├─ utils/database.py
└─ tui/events.py posts EvHit to TUI event bus
Threading model
The TUI and Telegram bot run in separate threads with different event loops:
- Main thread: Textual's event loop — runs
MonitorApp, drains the event bus every 100ms via_drain_bus() - Bot thread: own
asyncioevent loop — runs_bot_main()with bothuser_clientandbot_client - Cross-thread communication: bot → TUI via
bus.post()(queue.Queue.put_nowait, always safe); TUI → bot vialoop.call_soon_threadsafe()(e.g., to signal channel list changes)
Module responsibilities
| Module | Role |
|---|---|
config.py |
All settings — edit keywords, channels, paths, tdl tuning here |
core/scraper.py |
Live listener + backfill orchestration; registers Telethon NewMessage handlers |
core/tdl_downloader.py |
Wraps tdl subprocess for fast downloads; falls back to Telethon |
core/bot_downloader.py |
Handles inline button click flow where files come via bot reply |
core/processor.py |
Archive extraction (supports nested archives one level deep) + line-by-line search |
core/notifier.py |
Scoring → dedup → DB insert → hits.txt/csv write → Telegram bot alert |
utils/scorer.py |
Severity scoring; parses ULP lines (url:user:pass), classifies CRITICAL/HIGH/MEDIUM/LOW |
utils/cache.py |
Seen file-ID dedup stored in data/cache.json |
utils/database.py |
SQLite read/write for data/hits.db |
tui/app.py |
MonitorApp + all screens (Search, HitsDB, Keywords) |
tui/events.py |
Thread-safe queue.Queue event bus |
Severity scoring
Keywords in config.TARGET_KEYWORDS with @ (e.g. r"@myorg\.cl") are employee email domains → CRITICAL on match. Keywords without @ are plain domain matches → LOW baseline.
| Severity | Score | Triggers |
|---|---|---|
| CRITICAL | 40 | Employee email in username · Privileged service URL (admin, vpn, rdp, gitlab…) |
| HIGH | 30 | Internal service URL (intranet, erp, sso, owa…) |
| MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) |
| LOW | 10 | Org domain appears anywhere in line |
Telegram alerts fire for CRITICAL/HIGH/MEDIUM only. LOW is stored silently.
Per-file reference docs
Each .py has a companion .md with design notes. Always read the .md first, then the .py only if needed. After making code changes, update the companion .md to match.
Useful CLI queries
# Query hits directly
sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20"
# Wipe dedup cache to re-process files
rm data/cache.json data/dedup.json
# Follow live log
tail -f data/logs/monitor.log
TUI keybindings
| Key | Action |
|---|---|
s |
Search hits DB |
h |
Browse hits by severity (filter with 1/2/3/4, recent with r) |
k |
Edit keyword patterns live (changes take effect immediately) |
c |
Clear logs |
r |
Refresh stats |
q / Escape |
Quit / back |
Runtime keyword and channel changes are not persisted — copy them to config.py to survive restarts.