Initial commit: ULPgrammer
- Core Telegram monitoring pipeline (scraper, processor, notifier, downloaders) - Textual TUI frontend with thread-safe event bus - SQLite persistence, severity scoring, dedup cache - Fixed ULP parser: handles https:// truncation, port+path URLs, semicolon separator - Test suite: 88 tests across scorer, cache, database, processor
This commit is contained in:
89
utils/database.md
Normal file
89
utils/database.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# utils/database.py
|
||||
|
||||
SQLite persistence layer for credential hits.
|
||||
DB file: `data/hits.db`
|
||||
|
||||
## Public API
|
||||
|
||||
```python
|
||||
from utils.database import init_db, insert_hits, search, recent, by_severity, stats
|
||||
```
|
||||
|
||||
### Setup
|
||||
|
||||
#### `init_db() -> None`
|
||||
Creates `hits` table and indexes if they don't exist. Call once on startup.
|
||||
Safe to call multiple times (idempotent).
|
||||
|
||||
---
|
||||
|
||||
### Writing
|
||||
|
||||
#### `insert_hits(scored_hits, source, filename, seen_before=False) -> int`
|
||||
Inserts a list of `ScoredHit` objects. Returns row count inserted.
|
||||
|
||||
```python
|
||||
insert_hits(new_hits, source="channelname", filename="combo.zip")
|
||||
insert_hits(dupe_hits, source="channelname", filename="combo.zip", seen_before=True)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Querying
|
||||
|
||||
#### `search(keyword: str) -> list[sqlite3.Row]`
|
||||
Full-text search across `url`, `username`, `raw`. Returns rows sorted by score DESC, timestamp DESC.
|
||||
|
||||
#### `recent(limit: int = 50) -> list[sqlite3.Row]`
|
||||
Most recent hits, newest first.
|
||||
|
||||
#### `by_severity(severity: str) -> list[sqlite3.Row]`
|
||||
All unique (non-duplicate) hits at a given severity, newest first.
|
||||
`severity` must be one of: `"CRITICAL"`, `"HIGH"`, `"MEDIUM"`, `"LOW"`
|
||||
|
||||
#### `stats() -> dict`
|
||||
Returns summary counters:
|
||||
```python
|
||||
{
|
||||
"total": int, # all rows
|
||||
"unique": int, # seen_before=0
|
||||
"duplicates": int, # seen_before=1
|
||||
"critical": int, # unique CRITICAL
|
||||
"high": int,
|
||||
"medium": int,
|
||||
"low": int,
|
||||
"sources": int, # distinct source channels
|
||||
"top_source": {"source": str, "cnt": int} | None,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema
|
||||
|
||||
```sql
|
||||
hits (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT,
|
||||
username TEXT,
|
||||
password TEXT,
|
||||
raw TEXT NOT NULL, -- full original credential line
|
||||
source TEXT, -- channel username or ID
|
||||
filename TEXT, -- downloaded file name
|
||||
timestamp TEXT NOT NULL, -- "YYYY-MM-DD HH:MM:SS UTC"
|
||||
severity TEXT NOT NULL, -- CRITICAL/HIGH/MEDIUM/LOW
|
||||
score INTEGER NOT NULL, -- 40/30/20/10
|
||||
reasons TEXT, -- pipe-separated reason strings
|
||||
seen_before INTEGER NOT NULL -- 0=new, 1=duplicate
|
||||
)
|
||||
```
|
||||
|
||||
Indexes: `url`, `username`, `source`, `timestamp`, `severity`.
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Each query opens and closes its own connection via the `_connect()` context manager.
|
||||
- `conn.row_factory = sqlite3.Row` — rows support both index and column-name access.
|
||||
- Transactions: commit on success, rollback on exception.
|
||||
Reference in New Issue
Block a user