# utils/cache.py Tracks already-processed Telegram document IDs to avoid redownloading. Persists to `data/cache.json` as a JSON array of integers. ## Public API ```python from utils.cache import is_seen, mark_seen ``` ### `is_seen(file_id: int) -> bool` Returns `True` if this document ID has been processed before. Loads from disk on every call (safe for multi-process, slightly slow for hot loops — not an issue given download cadence). ### `mark_seen(file_id: int) -> None` Adds `file_id` to the cache and persists to disk. --- ## Storage - **File:** `data/cache.json` - **Format:** JSON array of integers — `[123456789, 987654321, ...]` - **No expiry** — grows indefinitely. Safe to delete to re-process all files. --- ## Notes - `is_seen` + `mark_seen` are called in `core/scraper.py` after a successful download+process cycle, not before — so a file that fails mid-process will be retried on next run. - Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.