# core/scraper.py Telethon user-client layer. Handles live listening, backfill, and the single-message download pipeline. ## Public API ```python from core.scraper import handle_message, backfill_all, register_handlers, warm_entity_cache ``` ### `handle_message(client, bot, msg, source_name, patterns, password=None)` **async.** Full pipeline for one document message: 1. Extract filename + size, check allowlist + size guard 2. Check `utils.cache` - skip if already seen 3. Try `tdl` download → Telethon fallback 4. `core.processor.process_file()` → hits 5. `core.notifier.notify()` if hits found 6. `utils.cache.mark_seen()` Called by: live handler, `bot_downloader`, backfill fallback path. ### `backfill_all(client, bot, patterns)` **async.** Iterates `config.WATCHED_CHANNELS`, calls `backfill_channel()` for each. No-op if `config.BACKFILL_LIMIT == 0`. ### `register_handlers(client, bot, patterns)` Registers a `NewMessage` Telethon event handler on `config.WATCHED_CHANNELS`. Used in **CLI mode only** (`--no-tui`). The TUI manages its own handler via `_make_handler()` in `tui/app.py`. ### `warm_entity_cache(client)` **async.** Iterates `client.iter_dialogs()` so Telethon caches entity mappings. Must be called before using raw numeric channel IDs. --- ## Internal functions | Function | Description | |----------|-------------| | `get_filename(msg)` | Extracts filename from `MessageMediaDocument`; falls back to `{msg_id}{ext}` from MIME | | `get_filesize(msg)` | Returns document size in bytes | | `is_processable(filename, size)` | Checks extension allowlist + size limit; returns `(bool, reason)` | | `_make_dest(msg, filename)` | Resolves temp path, handles collision with `{msg_id}_{filename}` | | `_telethon_download(client, msg, dest, ...)` | Telethon fallback with tqdm progress + flood-wait handling. Posts `EvDownload*` bus events | | `backfill_channel(client, bot, channel, patterns, limit)` | Scans history with password carry-forward; batches via tdl | | `_process_batch(client, bot, batch, patterns)` | One tdl invocation for up to `TDL_AMOUNT` messages; per-file Telethon fallback | --- ## Password carry-forward (backfill) Channels often post the archive password as a separate text message. `backfill_channel` iterates newest→oldest, carrying `last_password` so both older and newer file messages in the same scan pick it up. --- ## Download strategy ``` is_tdl_available()? yes → download_single_with_tdl() / download_batch_with_tdl() ↓ failed? _telethon_download() no → _telethon_download() directly ```