- Core Telegram monitoring pipeline (scraper, processor, notifier, downloaders) - Textual TUI frontend with thread-safe event bus - SQLite persistence, severity scoring, dedup cache - Fixed ULP parser: handles https:// truncation, port+path URLs, semicolon separator - Test suite: 88 tests across scorer, cache, database, processor
2.5 KiB
2.5 KiB
utils/database.py
SQLite persistence layer for credential hits.
DB file: data/hits.db
Public API
from utils.database import init_db, insert_hits, search, recent, by_severity, stats
Setup
init_db() -> None
Creates hits table and indexes if they don't exist. Call once on startup.
Safe to call multiple times (idempotent).
Writing
insert_hits(scored_hits, source, filename, seen_before=False) -> int
Inserts a list of ScoredHit objects. Returns row count inserted.
insert_hits(new_hits, source="channelname", filename="combo.zip")
insert_hits(dupe_hits, source="channelname", filename="combo.zip", seen_before=True)
Querying
search(keyword: str) -> list[sqlite3.Row]
Full-text search across url, username, raw. Returns rows sorted by score DESC, timestamp DESC.
recent(limit: int = 50) -> list[sqlite3.Row]
Most recent hits, newest first.
by_severity(severity: str) -> list[sqlite3.Row]
All unique (non-duplicate) hits at a given severity, newest first.
severity must be one of: "CRITICAL", "HIGH", "MEDIUM", "LOW"
stats() -> dict
Returns summary counters:
{
"total": int, # all rows
"unique": int, # seen_before=0
"duplicates": int, # seen_before=1
"critical": int, # unique CRITICAL
"high": int,
"medium": int,
"low": int,
"sources": int, # distinct source channels
"top_source": {"source": str, "cnt": int} | None,
}
Schema
hits (
id INTEGER PRIMARY KEY AUTOINCREMENT,
url TEXT,
username TEXT,
password TEXT,
raw TEXT NOT NULL, -- full original credential line
source TEXT, -- channel username or ID
filename TEXT, -- downloaded file name
timestamp TEXT NOT NULL, -- "YYYY-MM-DD HH:MM:SS UTC"
severity TEXT NOT NULL, -- CRITICAL/HIGH/MEDIUM/LOW
score INTEGER NOT NULL, -- 40/30/20/10
reasons TEXT, -- pipe-separated reason strings
seen_before INTEGER NOT NULL -- 0=new, 1=duplicate
)
Indexes: url, username, source, timestamp, severity.
Notes
- Each query opens and closes its own connection via the
_connect()context manager. conn.row_factory = sqlite3.Row— rows support both index and column-name access.- Transactions: commit on success, rollback on exception.