Files
stealergram/utils/cache.md
anti 48f486ac97 Initial commit: ULPgrammer
- Core Telegram monitoring pipeline (scraper, processor, notifier, downloaders)
- Textual TUI frontend with thread-safe event bus
- SQLite persistence, severity scoring, dedup cache
- Fixed ULP parser: handles https:// truncation, port+path URLs, semicolon separator
- Test suite: 88 tests across scorer, cache, database, processor
2026-04-02 01:58:49 -03:00

1.0 KiB

utils/cache.py

Tracks already-processed Telegram document IDs to avoid redownloading.
Persists to data/cache.json as a JSON array of integers.

Public API

from utils.cache import is_seen, mark_seen

is_seen(file_id: int) -> bool

Returns True if this document ID has been processed before.
Loads from disk on every call (safe for multi-process, slightly slow for hot loops — not an issue given download cadence).

mark_seen(file_id: int) -> None

Adds file_id to the cache and persists to disk.


Storage

  • File: data/cache.json
  • Format: JSON array of integers — [123456789, 987654321, ...]
  • No expiry — grows indefinitely. Safe to delete to re-process all files.

Notes

  • is_seen + mark_seen are called in core/scraper.py after a successful download+process cycle, not before — so a file that fails mid-process will be retried on next run.
  • Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.