Files
stealergram/utils/cache.md
anti 741e6bb0d3 Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout
- Add pyproject.toml (replaces requirements.txt split, folds pytest.ini)
- Replace all em-dashes with hyphens across all source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:06:30 -04:00

1.0 KiB

utils/cache.py

Tracks already-processed Telegram document IDs to avoid redownloading.
Persists to data/cache.json as a JSON array of integers.

Public API

from utils.cache import is_seen, mark_seen

is_seen(file_id: int) -> bool

Returns True if this document ID has been processed before.
Loads from disk on every call (safe for multi-process, slightly slow for hot loops - not an issue given download cadence).

mark_seen(file_id: int) -> None

Adds file_id to the cache and persists to disk.


Storage

  • File: data/cache.json
  • Format: JSON array of integers - [123456789, 987654321, ...]
  • No expiry - grows indefinitely. Safe to delete to re-process all files.

Notes

  • is_seen + mark_seen are called in core/scraper.py after a successful download+process cycle, not before - so a file that fails mid-process will be retried on next run.
  • Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.