Initial commit: ULPgrammer

- Core Telegram monitoring pipeline (scraper, processor, notifier, downloaders)
- Textual TUI frontend with thread-safe event bus
- SQLite persistence, severity scoring, dedup cache
- Fixed ULP parser: handles https:// truncation, port+path URLs, semicolon separator
- Test suite: 88 tests across scorer, cache, database, processor
This commit is contained in:
2026-04-02 01:58:49 -03:00
commit 48f486ac97
41 changed files with 5270 additions and 0 deletions

70
core/tdl_downloader.md Normal file
View File

@@ -0,0 +1,70 @@
# core/tdl_downloader.py
Fast file downloads via `tdl` (Go MTProto). Falls back gracefully if tdl is not installed.
## Public API
```python
from core.tdl_downloader import (
is_tdl_available,
download_single_with_tdl,
download_batch_with_tdl,
BatchEntry,
)
```
### `is_tdl_available() -> bool`
Returns `True` if `tdl` binary is on PATH.
### `download_single_with_tdl(msg, dest: Path) -> bool`
**async.** Downloads one message's document. Returns `True` on success.
Used by the live handler and `bot_downloader`.
### `download_batch_with_tdl(entries: list[BatchEntry]) -> dict[int, bool]`
**async.** Downloads up to `TDL_AMOUNT` messages in a single `tdl dl` invocation.
Returns `{doc_id: True|False}``False` means Telethon fallback needed.
---
## BatchEntry dataclass
```python
@dataclass
class BatchEntry:
msg: object # Telethon Message
filename: str
dest: Path # final destination path in TEMP_DIR
doc_id: int # msg.media.document.id
source_name: str
password: str | None
```
---
## TUI output pipeline
In TUI mode (`bus.tui_active == True`), `_run_tdl` pipes stdout+stderr and relays lines as `EvTdlOutput` events in real time.
**Reads raw 256-byte chunks** (not line-by-line) and splits on `\r` and `\n`, because tdl uses `\r` to overwrite its progress bar in place.
In CLI mode: subprocess inherits the terminal, progress bars render natively.
---
## Staging directory isolation
Each batch/single download gets a unique `data/tmp/_tdl_{monotonic_ns}/` staging dir.
After `tdl` exits, files are matched by name (with fuzzy stem fallback for `filenamify()` mangling) and moved to final `dest`. Staging dir is removed regardless of outcome.
`--template '{{ filenamify .FileName }}'` — tdl uses the original Telegram filename, not its default `DialogID_MessageID_filename` format.
---
## Config knobs (`config.py`)
| Setting | Default | Description |
|---------|---------|-------------|
| `TDL_NAMESPACE` | `"default"` | `-n` flag; `None` omits it |
| `TDL_THREADS` | `8` | `-t` chunk workers per file |
| `TDL_PERFILE` | `4` | `-l` concurrent files per invocation |
| `TDL_AMOUNT` | `4` | Max messages per batch |
| `TDL_TAKEOUT` | `False` | `--takeout` session flag |