Initial commit: ULPgrammer

- Core Telegram monitoring pipeline (scraper, processor, notifier, downloaders)
- Textual TUI frontend with thread-safe event bus
- SQLite persistence, severity scoring, dedup cache
- Fixed ULP parser: handles https:// truncation, port+path URLs, semicolon separator
- Test suite: 88 tests across scorer, cache, database, processor
This commit is contained in:
2026-04-02 01:58:49 -03:00
commit 48f486ac97
41 changed files with 5270 additions and 0 deletions

32
utils/cache.md Normal file
View File

@@ -0,0 +1,32 @@
# utils/cache.py
Tracks already-processed Telegram document IDs to avoid redownloading.
Persists to `data/cache.json` as a JSON array of integers.
## Public API
```python
from utils.cache import is_seen, mark_seen
```
### `is_seen(file_id: int) -> bool`
Returns `True` if this document ID has been processed before.
Loads from disk on every call (safe for multi-process, slightly slow for hot loops — not an issue given download cadence).
### `mark_seen(file_id: int) -> None`
Adds `file_id` to the cache and persists to disk.
---
## Storage
- **File:** `data/cache.json`
- **Format:** JSON array of integers — `[123456789, 987654321, ...]`
- **No expiry** — grows indefinitely. Safe to delete to re-process all files.
---
## Notes
- `is_seen` + `mark_seen` are called in `core/scraper.py` after a successful download+process cycle, not before — so a file that fails mid-process will be retried on next run.
- Not thread-safe (load/modify/save is not atomic). Acceptable because downloads are sequential within the bot loop.