Files
stealergram/core/processor.md
anti 741e6bb0d3 Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout
- Add pyproject.toml (replaces requirements.txt split, folds pytest.ini)
- Replace all em-dashes with hyphens across all source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:06:30 -04:00

2.2 KiB

core/processor.py

Archive extraction and hit searching. No Telegram deps, no async.

Public API

from core.processor import compile_patterns, process_file

compile_patterns(keywords: list[str]) -> list[re.Pattern]

Compiles a list of keyword strings into case-insensitive regex patterns.
Call once at startup; pass the result everywhere patterns are needed.

patterns = compile_patterns(config.TARGET_KEYWORDS)

process_file(filepath: Path, patterns, password=None) -> list[str]

Full pipeline: unpack → search each .txt → recurse into nested archives → clean up everything.
Returns list of matching raw lines (hits). Deletes the original file and all extracted contents on completion.

hits = process_file(Path("data/tmp/combo.zip"), patterns, password="infected")

Internal functions

Function Signature Description
search_file (filepath, patterns) -> list[str] Stream-reads .txt line by line; ignores encoding errors
unpack (filepath, extra_password) -> (files, extract_dir|None) Dispatches to correct extractor; plain .txt returned as-is
extract_zip (filepath, dest, extra_password) Tries no password first, then ARCHIVE_PASSWORDS list
extract_7z (filepath, dest, extra_password) Requires py7zr; skips if not installed
extract_rar (filepath, dest, extra_password) Requires rarfile + unrar binary
_try_passwords (extract_fn, passwords) Iterates password list, stops on first success

Supported formats

Extension Library Notes
.txt built-in Stream-read, no load into memory
.zip zipfile stdlib
.7z py7zr optional; skipped if not installed
.rar rarfile optional; requires unrar system binary

Nested archives are recursed one level only.


Password order

  1. extra_password (from message/channel carry-forward) - tried first
  2. config.ARCHIVE_PASSWORDS - tried in order

Cleanup guarantee

process_file always deletes:

  • Extracted individual files
  • Extract subdirectory
  • Original downloaded file

Even if no hits are found.