Some checks failed
CI / Lint (ruff) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 19s
CI / Dependency audit (pip-audit) (push) Successful in 40s
CI / Test (Standard) (3.11) (push) Successful in 2m38s
CI / Test (Standard) (3.12) (push) Successful in 2m56s
CI / Test (Live) (3.11) (push) Failing after 1m3s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
135 lines
10 KiB
Markdown
135 lines
10 KiB
Markdown
# DECNET Web & Database Models: Architectural Deep Dive
|
|
|
|
> [!IMPORTANT]
|
|
> **DEVELOPMENT DISCLAIMER**: DECNET is currently in active development. The storage schemas and API signatures defined in `decnet/web/db/models.py` are subject to radical change as the framework's analytical capabilities and distributed features expand.
|
|
|
|
## 1. Introduction & Philosophy
|
|
|
|
The `decnet/web/db/models.py` file represents the structural backbone of the DECNET web interface and its underlying analytical engine. It serves a dual purpose that is central to the project's architecture:
|
|
|
|
1. **Unified Source of Truth**: By utilizing **SQLModel**, DECNET collapses the traditional barrier between Pydantic data validation and SQLAlchemy ORM mapping. This allows a single class definition to act as both a database table and an API data object, drastically reducing the "boilerplate" associated with traditional web-database pipelines.
|
|
2. **Analytical Scalability**: The models are designed to scale from small-scale local deployments using **SQLite** to large-scale, enterprise-ready environments backed by **MySQL**. This is achieved through clever usage of SQLAlchemy "Variants" and abstraction layers for large text blobs.
|
|
|
|
---
|
|
|
|
## 2. The Database Layer (SQLModel Entities)
|
|
|
|
These models define the physical tables within the DECNET infrastructure. Every class marked with `table=True` is interpreted by the repository layer to generate the corresponding DDL (Data Definition Language) for the target database.
|
|
|
|
### 2.1 Identity & Security: The `User` Entity
|
|
|
|
The `User` model handles dashboard access control and basic identity management.
|
|
|
|
* `uuid`: A unique string identifier. While integers are often used for IDs, DECNET uses strings to support potential future transitions to UUIDs without schema breakage.
|
|
* `username`: The primary login handle. It is both `unique` and `indexed` for rapid authentication lookups.
|
|
* `password_hash`: Stores the Argon2 or bcrypt hash. Length constraints in various routers ensure that raw passwords never exceed 72 characters, preventing "Long Password Denial of Service" attacks on various hashing algorithms.
|
|
* `role`: A simple string-based permission field (e.g., `admin`, `viewer`).
|
|
* `must_change_password`: A boolean flag used for fresh deployments or manual administrative resets, forcing the user to rotate their credentials upon their first authenticated session.
|
|
|
|
### 2.2 Intelligence & Attribution: `Attacker` and `AttackerBehavior`
|
|
|
|
These two tables form the core of DECNET's "Attacker Profiling" system. They are split into two tables to maintain "Narrow vs. Wide" performance characteristics.
|
|
|
|
#### The `Attacker` Entity (Broad Analytics)
|
|
The `Attacker` table stores the "primary" record for every unique IP discovered by the honeypot fleet.
|
|
|
|
* `ip`: The source IP address. This is the primary key and is heavily indexed.
|
|
* `first_seen` / `last_seen`: Tracking the lifecycle of an attacker's engagement with the network.
|
|
* `event_count` / `service_count` / `decky_count`: Aggregated counters used by the stats dashboard to visualize the magnitude of an engagement.
|
|
* `services` / `deckies`: JSON-serialized lists of every service and machine reached by the attacker. Using `_BIG_TEXT` here allows these lists to grow significantly during long-term campaigns.
|
|
* `traversal_path`: A string representation (e.g., `omega → epsilon → zulu`) that helps analysts visualize lateral movement attempts recorded by the correlation engine.
|
|
|
|
#### The `AttackerBehavior` Entity (Granular Analytics)
|
|
This "Wide" table stores behavioral signatures. It is separated from the main `Attacker` record so that high-frequency updates to timing stats or sniffer-derived packet signatures don't lock the primary attribution rows.
|
|
|
|
* `os_guess`: Derived from the `os_fingerprint` and `sniffer` engines, providing an estimate of the attacker's operating system based on TCP/IP stack nuances.
|
|
* `tcp_fingerprint`: A JSON blob storing the raw TCP signature (Window size, MSS, Option sequence).
|
|
* `behavior_class`: A classification (e.g., `beaconing`, `interactive`, `brute_force`) derived from log inter-arrival timing (IAT).
|
|
* `timing_stats`: Stores a JSON dictionary of mean/median/stdev for event timing, used to detect automated tooling.
|
|
|
|
### 2.3 Telemetry: `Log` and `Bounty`
|
|
|
|
These tables store the "raw" data generated by the honeypots.
|
|
|
|
* **`Log` Table**: The primary event sink. Every line from the collector ends up here.
|
|
* `event_type`: The MSGID from the RFC 5424 header (e.g., `connect`, `exploit`).
|
|
* `raw_line`: The full, un-parsed syslog string for forensic verification.
|
|
* `fields`: A JSON blob containing the structured data (SD-ELEMENTS) extracted during normalization.
|
|
* **`Bounty` Table**: Specifically for high-value events. When a service detects "Gold" (like a plain-text password or a known PoC payload), it is mirrored here for rapid analyst review.
|
|
|
|
### 2.4 System State: The `State` Entity
|
|
|
|
The `State` table acts as the orchestrator's brain. It stores the `decnet-state.json` content within the database when the system is integrated with the web layer.
|
|
|
|
* `key`: The configuration key (e.g., `global_config`, `active_deployment`).
|
|
* `value`: A `MEDIUMTEXT` JSON blob. This is potentially its largest field, storing the entire resolved configuration of every running Decky.
|
|
|
|
---
|
|
|
|
## 3. The API Layer (Pydantic DTOs)
|
|
|
|
These models define how data moves across the wire between the FastAPI backend and the frontend.
|
|
|
|
### 3.1 Authentication Pipeline
|
|
* `LoginRequest`: Validates incoming credentials before passing them to the security middleware.
|
|
* `Token`: The standard OAuth2 bearer token response, enriched with the `must_change_password` hint.
|
|
* `ChangePasswordRequest`: Ensures the old password is provided and the new one meets the project's security constraints.
|
|
|
|
### 3.2 Reporting & Pagination
|
|
DECNET uses a standardized "Envelope" pattern for broad analytical responses (`LogsResponse`, `AttackersResponse`, `BountyResponse`).
|
|
|
|
* `total`: The total count of matching records in the database (ignoring filters).
|
|
* `limit` / `offset`: The specific slice of data returned, supporting "Infinite Scroll" or traditional pagination in the UI.
|
|
* `data`: A list of dictionaries. By using `dict[str, Any]` here, the API remains flexible with SQLModel's dynamic attribute loading.
|
|
|
|
### 3.3 System Administration
|
|
* **`DeployIniRequest`**: The most critical input model. It takes `ini_content` as a validated string. By using the `IniContent` annotated type, the API rejects malformed deployments before they ever touch the fleet builder.
|
|
* **`MutateIntervalRequest`**: Uses a strict REGEX pattern (`^[1-9]\d*[mdMyY]$`) to ensure intervals like `30m` (30 minutes) or `2d` (2 days) are valid before being applied to the orchestrator.
|
|
|
|
---
|
|
|
|
## 4. Technical Foundations
|
|
|
|
### 4.1 Cross-DB Compatibility Logic
|
|
The project uses a custom variant system to handle the discrepancies between SQLite (which has simplified typing) and MySQL (which has strict size constraints).
|
|
|
|
```python
|
|
_BIG_TEXT = Text().with_variant(MEDIUMTEXT(), "mysql")
|
|
```
|
|
|
|
This abstraction ensures that fields like `Attacker.services` (which can grow to thousands of items) are stored as `MEDIUMTEXT` (16 MiB) on MySQL, whereas standard SQLAlchemy `Text` (often 64 KiB on MySQL) would silently truncate the data, leading to analytical loss.
|
|
|
|
### 4.2 High-Fidelity Normalization
|
|
Data arriving from distributed honeypots is often "dirty." The models include custom pre-validators like `_normalize_null`.
|
|
|
|
* **Null Coalescing**: Services often emit logging values as `"null"` or `"undefined"` strings. The `NullableString` type automatically converts these "noise" strings into actual Python `None` types during ingestion.
|
|
* **Timestamp Integrity**: `NullableDatetime` ensures that various ISO formats or epoch timestamps provided by different service containers are normalized into standard UTC datetime objects.
|
|
|
|
---
|
|
|
|
## 5. Integration Case Studies (Deep Analysis)
|
|
|
|
To understand how these models function, we must examine their lifecycle across the web stack.
|
|
|
|
### 5.1 The Repository Layer (`decnet/web/db/sqlmodel_repo.py`)
|
|
The repository is the primary consumer of the "Entities." It utilizes the metadata generated by SQLModel to:
|
|
1. **Generate DDL**: On startup, the repository calls `SQLModel.metadata.create_all()`. This takes every `table=True` class and translates it into `CREATE TABLE` statements tailored to the active engine (SQLite or MySQL).
|
|
2. **Translate DTOs**: When the repository fetches an `Attacker` from the DB, SQLModel automatically populates the Pydantic-style attributes, allowing the repository to return objects that are immediately serializeable by the routers.
|
|
|
|
### 5.2 The Dashboard Routers
|
|
Specific endpoints rely on these models for boundary safety:
|
|
|
|
* **`api_deploy_deckies.py`**: Uses `DeployIniRequest`. This ensures that even if a user tries to POST a massive binary file instead of an INI, the Pydantic layer (powered by `decnet.models.validate_ini_string`) will intercept and reject the request with a `422 Unprocessable Entity` error before it reaches the orchestrator.
|
|
* **`api_get_stats.py`**: Uses `StatsResponse`. This model serves as a "rollup" that aggregates data from the `Log`, `Attacker`, and `State` tables into a single unified JSON object for the dashboard's "At a Glance" view.
|
|
* **`api_get_health.py`**: Uses `HealthResponse`. This model provides a nested view of the system, where each sub-component (Engine, Collector, DB) is represented as a `ComponentHealth` object, allowing the UI to show granular "Success" or "Failure" states.
|
|
|
|
---
|
|
|
|
## 6. Futureproofing & Guidelines
|
|
|
|
As the project grows, the following habits must be maintained:
|
|
|
|
1. **Keep the Row Narrow**: Always separate behavioral data that updates frequently into auxiliary tables like `AttackerBehavior`.
|
|
2. **Use Variants**: Never use standard `String` or `Text` for JSON blobs; always use `_BIG_TEXT` to respect MySQL's storage limitations.
|
|
3. **Validate at the Boundary**: Ensure every new API request model uses Pydantic's strict typing to prevent malicious payloads from reaching the database layer.
|