Rename to stealergram, add pyproject.toml, purge em-dashes
- Rename project to stealergram throughout - Add pyproject.toml (replaces requirements.txt split, folds pytest.ini) - Replace all em-dashes with hyphens across all source files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -11,7 +11,7 @@ from utils.cache import is_seen, mark_seen
|
||||
|
||||
### `is_seen(file_id: int) -> bool`
|
||||
Returns `True` if this document ID has been processed before.
|
||||
Loads from disk on every call (safe for multi-process, slightly slow for hot loops — not an issue given download cadence).
|
||||
Loads from disk on every call (safe for multi-process, slightly slow for hot loops - not an issue given download cadence).
|
||||
|
||||
### `mark_seen(file_id: int) -> None`
|
||||
Adds `file_id` to the cache and persists to disk.
|
||||
@@ -21,12 +21,12 @@ Adds `file_id` to the cache and persists to disk.
|
||||
## Storage
|
||||
|
||||
- **File:** `data/cache.json`
|
||||
- **Format:** JSON array of integers — `[123456789, 987654321, ...]`
|
||||
- **No expiry** — grows indefinitely. Safe to delete to re-process all files.
|
||||
- **Format:** JSON array of integers - `[123456789, 987654321, ...]`
|
||||
- **No expiry** - grows indefinitely. Safe to delete to re-process all files.
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- `is_seen` + `mark_seen` are called in `core/scraper.py` after a successful download+process cycle, not before — so a file that fails mid-process will be retried on next run.
|
||||
- `is_seen` + `mark_seen` are called in `core/scraper.py` after a successful download+process cycle, not before - so a file that fails mid-process will be retried on next run.
|
||||
- Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.
|
||||
|
||||
Reference in New Issue
Block a user