Files
stealergram/CLAUDE.md

131 lines
5.5 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Development workflow
After every code change:
1. Run `pytest` — all tests must pass at 100%.
2. If 100% pass: present the change to the user, then commit.
3. If any test fails: fix the bug and re-run before showing anything to the user.
Never present code or commit while tests are failing.
## Running tests
```bash
pip install -r requirements-dev.txt
pytest # all tests
pytest -v # verbose
pytest tests/test_scorer.py # single file
```
Tests cover `utils/scorer`, `utils/cache`, `utils/database`, and `core/processor`. They are fully isolated — no `.env` required, no real DB or cache files touched. The `patched_keywords` fixture in `conftest.py` replaces `TARGET_KEYWORDS` with known test patterns; it must patch both `config.TARGET_KEYWORDS` and `scorer.TARGET_KEYWORDS` (the local `from config import` binding).
## Running the monitor
```bash
source .venv/bin/activate # initialize the python enviroment, if .venv exists
python main.py # TUI mode (default)
python main.py --no-tui # Plain CLI, logs to stdout + data/logs/monitor.log
```
First run will interactively prompt for Telegram phone + 2FA to create a session file.
## Setup prerequisites
```bash
pip install -r requirements.txt
# rarfile requires the unrar binary: sudo apt install unrar (Linux) or brew install rar (macOS)
# tdl (strongly recommended for fast downloads):
curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh | bash
tdl login -n monitor_session
```
If no `.env` file exists, ask the user to manually create the file. We cannot create it, because it contains personal information.
## Architecture
### Data flow
```
Telegram channel message with file attachment
└─ core/scraper.py detects attachment, guards (size/extension/dedup)
└─ core/tdl_downloader.py downloads via tdl subprocess (batched)
└─ core/scraper.py Telethon fallback if tdl fails
└─ core/bot_downloader.py handles inline "DOWNLOAD" button → bot reply flow
└─ core/processor.py extracts .zip/.7z/.rar, searches .txt line by line
└─ core/notifier.py scores → deduplicates → writes DB/txt/csv → Telegram alert
├─ utils/scorer.py
├─ utils/database.py
└─ tui/events.py posts EvHit to TUI event bus
```
### Threading model
The TUI and Telegram bot run in separate threads with different event loops:
- **Main thread**: Textual's event loop — runs `MonitorApp`, drains the event bus every 100ms via `_drain_bus()`
- **Bot thread**: own `asyncio` event loop — runs `_bot_main()` with both `user_client` and `bot_client`
- **Cross-thread communication**: bot → TUI via `bus.post()` (`queue.Queue.put_nowait`, always safe); TUI → bot via `loop.call_soon_threadsafe()` (e.g., to signal channel list changes)
### Module responsibilities
| Module | Role |
|--------|------|
| `config.py` | All settings — edit keywords, channels, paths, tdl tuning here |
| `core/scraper.py` | Live listener + backfill orchestration; registers Telethon `NewMessage` handlers |
| `core/tdl_downloader.py` | Wraps `tdl` subprocess for fast downloads; falls back to Telethon |
| `core/bot_downloader.py` | Handles inline button click flow where files come via bot reply |
| `core/processor.py` | Archive extraction (supports nested archives one level deep) + line-by-line search |
| `core/notifier.py` | Scoring → dedup → DB insert → hits.txt/csv write → Telegram bot alert |
| `utils/scorer.py` | Severity scoring; parses ULP lines (`url:user:pass`), classifies CRITICAL/HIGH/MEDIUM/LOW |
| `utils/cache.py` | Seen file-ID dedup stored in `data/cache.json` |
| `utils/database.py` | SQLite read/write for `data/hits.db` |
| `tui/app.py` | `MonitorApp` + all screens (Search, HitsDB, Keywords) |
| `tui/events.py` | Thread-safe `queue.Queue` event bus |
### Severity scoring
Keywords in `config.TARGET_KEYWORDS` with `@` (e.g. `r"@myorg\.cl"`) are **employee email domains** → CRITICAL on match. Keywords without `@` are plain domain matches → LOW baseline.
| Severity | Score | Triggers |
|----------|-------|----------|
| CRITICAL | 40 | Employee email in username · Privileged service URL (admin, vpn, rdp, gitlab…) |
| HIGH | 30 | Internal service URL (intranet, erp, sso, owa…) |
| MEDIUM | 20 | Client-facing URL (app, booking, helpdesk…) |
| LOW | 10 | Org domain appears anywhere in line |
Telegram alerts fire for CRITICAL/HIGH/MEDIUM only. LOW is stored silently.
## Per-file reference docs
Each `.py` has a companion `.md` with design notes. **Always read the `.md` first, then the `.py` only if needed.** After making code changes, update the companion `.md` to match.
## Useful CLI queries
```bash
# Query hits directly
sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20"
# Wipe dedup cache to re-process files
rm data/cache.json data/dedup.json
# Follow live log
tail -f data/logs/monitor.log
```
## TUI keybindings
| Key | Action |
|-----|--------|
| `s` | Search hits DB |
| `h` | Browse hits by severity (filter with `1`/`2`/`3`/`4`, recent with `r`) |
| `k` | Edit keyword patterns live (changes take effect immediately) |
| `c` | Clear logs |
| `r` | Refresh stats |
| `q` / `Escape` | Quit / back |
Runtime keyword and channel changes are **not** persisted — copy them to `config.py` to survive restarts.