Files
stealergram/CLAUDE.md
anti 741e6bb0d3 Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout
- Add pyproject.toml (replaces requirements.txt split, folds pytest.ini)
- Replace all em-dashes with hyphens across all source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:06:30 -04:00

5.5 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development workflow

After every code change:

  1. Run pytest - all tests must pass at 100%.
  2. If 100% pass: present the change to the user, then commit.
  3. If any test fails: fix the bug and re-run before showing anything to the user.

Never present code or commit while tests are failing.

Running tests

pip install -r requirements-dev.txt
pytest           # all tests
pytest -v        # verbose
pytest tests/test_scorer.py  # single file

Tests cover utils/scorer, utils/cache, utils/database, and core/processor. They are fully isolated - no .env required, no real DB or cache files touched. The patched_keywords fixture in conftest.py replaces TARGET_KEYWORDS with known test patterns; it must patch both config.TARGET_KEYWORDS and scorer.TARGET_KEYWORDS (the local from config import binding).

Running the monitor

source .venv/bin/activate  # initialize the python enviroment, if .venv exists
python main.py             # TUI mode (default)
python main.py --no-tui    # Plain CLI, logs to stdout + data/logs/monitor.log

First run will interactively prompt for Telegram phone + 2FA to create a session file.

Setup prerequisites

pip install -r requirements.txt
# rarfile requires the unrar binary: sudo apt install unrar (Linux) or brew install rar (macOS)

# tdl (strongly recommended for fast downloads):
curl -sSL https://raw.githubusercontent.com/iyear/tdl/main/scripts/install.sh | bash
tdl login -n monitor_session

If no .env file exists, ask the user to manually create the file. We cannot create it, because it contains personal information.

Architecture

Data flow

Telegram channel message with file attachment
  └─ core/scraper.py          detects attachment, guards (size/extension/dedup)
       └─ core/tdl_downloader.py  downloads via tdl subprocess (batched)
           └─ core/scraper.py     Telethon fallback if tdl fails
       └─ core/bot_downloader.py  handles inline "DOWNLOAD" button → bot reply flow
       └─ core/processor.py       extracts .zip/.7z/.rar, searches .txt line by line
       └─ core/notifier.py        scores → deduplicates → writes DB/txt/csv → Telegram alert
            ├─ utils/scorer.py
            ├─ utils/database.py
            └─ tui/events.py      posts EvHit to TUI event bus

Threading model

The TUI and Telegram bot run in separate threads with different event loops:

  • Main thread: Textual's event loop - runs MonitorApp, drains the event bus every 100ms via _drain_bus()
  • Bot thread: own asyncio event loop - runs _bot_main() with both user_client and bot_client
  • Cross-thread communication: bot → TUI via bus.post() (queue.Queue.put_nowait, always safe); TUI → bot via loop.call_soon_threadsafe() (e.g., to signal channel list changes)

Module responsibilities

Module Role
config.py All settings - edit keywords, channels, paths, tdl tuning here
core/scraper.py Live listener + backfill orchestration; registers Telethon NewMessage handlers
core/tdl_downloader.py Wraps tdl subprocess for fast downloads; falls back to Telethon
core/bot_downloader.py Handles inline button click flow where files come via bot reply
core/processor.py Archive extraction (supports nested archives one level deep) + line-by-line search
core/notifier.py Scoring → dedup → DB insert → hits.txt/csv write → Telegram bot alert
utils/scorer.py Severity scoring; parses ULP lines (url:user:pass), classifies CRITICAL/HIGH/MEDIUM/LOW
utils/cache.py Seen file-ID dedup stored in data/cache.json
utils/database.py SQLite read/write for data/hits.db
tui/app.py MonitorApp + all screens (Search, HitsDB, Keywords)
tui/events.py Thread-safe queue.Queue event bus

Severity scoring

Keywords in config.TARGET_KEYWORDS with @ (e.g. r"@myorg\.cl") are employee email domains → CRITICAL on match. Keywords without @ are plain domain matches → LOW baseline.

Severity Score Triggers
CRITICAL 40 Employee email in username · Privileged service URL (admin, vpn, rdp, gitlab…)
HIGH 30 Internal service URL (intranet, erp, sso, owa…)
MEDIUM 20 Client-facing URL (app, booking, helpdesk…)
LOW 10 Org domain appears anywhere in line

Telegram alerts fire for CRITICAL/HIGH/MEDIUM only. LOW is stored silently.

Per-file reference docs

Each .py has a companion .md with design notes. Always read the .md first, then the .py only if needed. After making code changes, update the companion .md to match.

Useful CLI queries

# Query hits directly
sqlite3 data/hits.db "SELECT severity, username, url FROM hits WHERE seen_before=0 ORDER BY score DESC LIMIT 20"

# Wipe dedup cache to re-process files
rm data/cache.json data/dedup.json

# Follow live log
tail -f data/logs/monitor.log

TUI keybindings

Key Action
s Search hits DB
h Browse hits by severity (filter with 1/2/3/4, recent with r)
k Edit keyword patterns live (changes take effect immediately)
c Clear logs
r Refresh stats
q / Escape Quit / back

Runtime keyword and channel changes are not persisted - copy them to config.py to survive restarts.