added: docs; modified: .gitignore
Some checks failed
CI / Lint (ruff) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 19s
CI / Dependency audit (pip-audit) (push) Successful in 40s
CI / Test (Standard) (3.11) (push) Successful in 2m38s
CI / Test (Standard) (3.12) (push) Successful in 2m56s
CI / Test (Live) (3.11) (push) Failing after 1m3s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped

This commit is contained in:
2026-04-16 02:10:38 -04:00
parent 9b59f8672e
commit 49f3002c94
3 changed files with 288 additions and 2 deletions

3
.gitignore vendored
View File

@@ -10,7 +10,6 @@ build/
decnet-compose.yml decnet-compose.yml
decnet-state.json decnet-state.json
*.ini *.ini
.env
decnet.log* decnet.log*
*.loggy *.loggy
*.nmap *.nmap
@@ -19,7 +18,7 @@ webmail
windows1 windows1
*.db *.db
decnet.json decnet.json
.env .env*
.env.local .env.local
.coverage .coverage
.hypothesis/ .hypothesis/

View File

@@ -0,0 +1,153 @@
# DECNET Technical Architecture: Deep Dive
This document provides a low-level technical decomposition of the DECNET (Deception Network) framework. It covers the internal orchestration logic, networking internals, reactive data pipelines, and the persistent intelligence schema.
---
## 1. System Topology & Micro-Services
DECNET is architected as a set of decoupled "engines" that interact via a persistent shared repository (SQLite/MySQL) and the Docker socket.
### Component Connectivity Graph
```mermaid
graph TD
subgraph "Infrastructure Layer"
DK[Docker Engine]
MV[MACVLAN / IPvlan Driver]
end
subgraph "Identity Layer (Deckies)"
B1[Base Container 01]
S1a[Service: SSH]
S1b[Service: HTTP]
B1 --- S1a
B1 --- S1b
end
subgraph "Telemetry Layer"
SNF[Sniffer Worker]
COL[Log Collector]
end
subgraph "Processing Layer"
ING[Log Ingester]
PROF[Attacker Profiler]
end
subgraph "Persistence Layer"
DB[(SQLModel Repository)]
ST[decnet-state.json]
end
DK --- MV
MV --- B1
S1a -- "stdout/stderr" --> COL
S1b -- "stdout/stderr" --> COL
SNF -- "PCAP Analysis" --> COL
COL -- "JSON Tail" --> ING
ING -- "Bounty Extraction" --> DB
ING -- "Log Commit" --> DB
DB -- "Log Cursor" --> PROF
PROF -- "Correlation Engine" --> DB
PROF -- "Behavior Rollup" --> DB
ING -- "Events" --> WS[Web Dashboard / SSE]
```
---
## 2. Core Orchestration: The "Decky" Lifecycle
A **Decky** is a logical entity represented by a shared network namespace.
### The Deployment Flow (`decnet deploy`)
1. **Configuration Parsing**: `DecnetConfig` (via `ini_loader.py`) validates the archetypes and service counts.
2. **IP Allocation**: `ips_to_range()` calculates the minimal CIDR covering all requested IPs to prevent exhaustion of the host's subnet.
3. **Network Setup**:
- Calls `docker network create -d macvlan --parent eth0`.
- Creates a host-side bridge (`decnet_macvlan0`) to fix the Linux bridge isolation issue (hairpin fix).
4. **Logging Injection**: Every service container has `decnet_logging.py` injected into its build context to ensure uniform RFC 5424 syslog output.
5. **Compose Generation**: `write_compose()` creates a dynamic `docker-compose.yml` where:
- Service containers use `network_mode: "service:<base_container_name>"`.
- Base containers use `sysctls` derived from `os_fingerprint.py`.
### Teardown & State
Runtime state is persisted in `decnet-state.json`. Upon `teardown`, DECNET:
1. Runs `docker compose down`.
2. Deletes the host-side macvlan interface and routes.
3. Removes the Docker network.
4. Clears the CLI state.
---
## 3. Networking Internals: Passive & Active Fidelity
### OS Fingerprinting (TCP/IP Spoofing)
DECNET tunes the networking behavior of each Decky within its own namespace. This is handled by the `os_fingerprint.py` module, which sets specific `sysctls` in the base container:
- `net.ipv4.tcp_window_scaling`: Enables/disables based on OS profile.
- `net.ipv4.tcp_timestamps`: Mimics specific OS tendencies (e.g., Windows vs. Linux).
- `net.ipv4.tcp_syncookies`: Prevents OS detection via SYN-flood response patterns.
### The Packet Flow
1. **Ingress**: Packet hits physical NIC -> MACVLAN Bridge -> Target Decky Namespace.
2. **Telemetry**: The `Sniffer` container attaches to the same MACVLAN bridge in promiscuous mode. It uses scapy-like logic (via `decnet.sniffer`) to extract:
- **JA3/JA4**: TLS ClientHello fingerprints.
- **HASSH**: SSH Key Exchange fingerprints.
- **JARM**: (Triggered actively) TLS server fingerprints.
---
## 4. Persistent Intelligence: Database Schema
DECNET uses an asynchronous SQLModel-based repository. The schema is optimized for both high-speed ingestion and complex behavioral correlation.
### Entity Relationship Model
| Table | Purpose | Key Fields |
| :--- | :--- | :--- |
| **logs** | Raw event stream | `id`, `timestamp`, `decky`, `service`, `event_type`, `attacker_ip`, `fields` |
| **bounty** | Harvested artifacts | `id`, `bounty_type`, `payload` (JSON), `attacker_ip` |
| **attackers** | Aggregated profiles | `uuid`, `ip`, `is_traversal`, `traversal_path`, `fingerprints` (JSON), `commands` (JSON) |
| **attacker_behavior** | behavioral profile | `attacker_uuid`, `os_guess`, `behavior_class`, `tool_guesses` (JSON), `timing_stats` (JSON) |
### JSON Logic
To maintain portability across SQLite/MySQL, DECNET uses the `JSON_EXTRACT` function for filtering logs by internal fields (e.g., searching for a specific HTTP User-Agent inside the `fields` column).
---
## 5. Reactive Processing: The Internal Pipeline
### Log Ingestion & Bounty Extraction
1. **Tailer**: `log_ingestion_worker` tails the JSON log stream.
2. **.JSON Parsing**: Every line is validated against the RFC 5424 mapping.
3. **Extraction Logic**:
- If `event_type == "credential"`, a row is added to the `bounty` table.
- If `ja3` field exists, a `fingerprint` bounty is created.
4. **Notification**: Logs are dispatched to active WebSocket/SSE clients for real-time visualization.
### Correlation & Traversal Logic
The `CorrelationEngine` processes logs in batches:
- **IP Grouping**: Logs are indexed by `attacker_ip`.
- **Hop Extraction**: The engine identifies distinct `deckies` touched by the same IP.
- **Path Calculation**: A chronological string (`decky-A -> decky-B`) is built to visualize the attack progression.
- **Attacker Profile Upsert**: The `Attacker` table is updated with the new counts, path, and consolidated bounty history.
---
## 6. Service Plugin Architecture
Adding a new honeypot service is zero-configuration. The `decnet/services/registry.py` uses `pkgutil.iter_modules` to auto-discover any file in the `services/` directory.
### `BaseService` Interface
Every service must implement:
- `name`: Unique identifier (e.g., "ssh").
- `ports`: Targeted ports (e.g., `22/tcp`).
- `dockerfile_context()`: Path to the template directory.
- `compose_service(name, base_name)`: Returns the Docker Compose fragment.
### Templates
Templates (found in `/templates/`) contain the Dockerfile and entrypoint. The `deployer` automatically syncs `decnet_logging.py` into these contexts during build time to ensure logs are streamed correctly to the host.

View File

@@ -0,0 +1,134 @@
# DECNET Web & Database Models: Architectural Deep Dive
> [!IMPORTANT]
> **DEVELOPMENT DISCLAIMER**: DECNET is currently in active development. The storage schemas and API signatures defined in `decnet/web/db/models.py` are subject to radical change as the framework's analytical capabilities and distributed features expand.
## 1. Introduction & Philosophy
The `decnet/web/db/models.py` file represents the structural backbone of the DECNET web interface and its underlying analytical engine. It serves a dual purpose that is central to the project's architecture:
1. **Unified Source of Truth**: By utilizing **SQLModel**, DECNET collapses the traditional barrier between Pydantic data validation and SQLAlchemy ORM mapping. This allows a single class definition to act as both a database table and an API data object, drastically reducing the "boilerplate" associated with traditional web-database pipelines.
2. **Analytical Scalability**: The models are designed to scale from small-scale local deployments using **SQLite** to large-scale, enterprise-ready environments backed by **MySQL**. This is achieved through clever usage of SQLAlchemy "Variants" and abstraction layers for large text blobs.
---
## 2. The Database Layer (SQLModel Entities)
These models define the physical tables within the DECNET infrastructure. Every class marked with `table=True` is interpreted by the repository layer to generate the corresponding DDL (Data Definition Language) for the target database.
### 2.1 Identity & Security: The `User` Entity
The `User` model handles dashboard access control and basic identity management.
* `uuid`: A unique string identifier. While integers are often used for IDs, DECNET uses strings to support potential future transitions to UUIDs without schema breakage.
* `username`: The primary login handle. It is both `unique` and `indexed` for rapid authentication lookups.
* `password_hash`: Stores the Argon2 or bcrypt hash. Length constraints in various routers ensure that raw passwords never exceed 72 characters, preventing "Long Password Denial of Service" attacks on various hashing algorithms.
* `role`: A simple string-based permission field (e.g., `admin`, `viewer`).
* `must_change_password`: A boolean flag used for fresh deployments or manual administrative resets, forcing the user to rotate their credentials upon their first authenticated session.
### 2.2 Intelligence & Attribution: `Attacker` and `AttackerBehavior`
These two tables form the core of DECNET's "Attacker Profiling" system. They are split into two tables to maintain "Narrow vs. Wide" performance characteristics.
#### The `Attacker` Entity (Broad Analytics)
The `Attacker` table stores the "primary" record for every unique IP discovered by the honeypot fleet.
* `ip`: The source IP address. This is the primary key and is heavily indexed.
* `first_seen` / `last_seen`: Tracking the lifecycle of an attacker's engagement with the network.
* `event_count` / `service_count` / `decky_count`: Aggregated counters used by the stats dashboard to visualize the magnitude of an engagement.
* `services` / `deckies`: JSON-serialized lists of every service and machine reached by the attacker. Using `_BIG_TEXT` here allows these lists to grow significantly during long-term campaigns.
* `traversal_path`: A string representation (e.g., `omega → epsilon → zulu`) that helps analysts visualize lateral movement attempts recorded by the correlation engine.
#### The `AttackerBehavior` Entity (Granular Analytics)
This "Wide" table stores behavioral signatures. It is separated from the main `Attacker` record so that high-frequency updates to timing stats or sniffer-derived packet signatures don't lock the primary attribution rows.
* `os_guess`: Derived from the `os_fingerprint` and `sniffer` engines, providing an estimate of the attacker's operating system based on TCP/IP stack nuances.
* `tcp_fingerprint`: A JSON blob storing the raw TCP signature (Window size, MSS, Option sequence).
* `behavior_class`: A classification (e.g., `beaconing`, `interactive`, `brute_force`) derived from log inter-arrival timing (IAT).
* `timing_stats`: Stores a JSON dictionary of mean/median/stdev for event timing, used to detect automated tooling.
### 2.3 Telemetry: `Log` and `Bounty`
These tables store the "raw" data generated by the honeypots.
* **`Log` Table**: The primary event sink. Every line from the collector ends up here.
* `event_type`: The MSGID from the RFC 5424 header (e.g., `connect`, `exploit`).
* `raw_line`: The full, un-parsed syslog string for forensic verification.
* `fields`: A JSON blob containing the structured data (SD-ELEMENTS) extracted during normalization.
* **`Bounty` Table**: Specifically for high-value events. When a service detects "Gold" (like a plain-text password or a known PoC payload), it is mirrored here for rapid analyst review.
### 2.4 System State: The `State` Entity
The `State` table acts as the orchestrator's brain. It stores the `decnet-state.json` content within the database when the system is integrated with the web layer.
* `key`: The configuration key (e.g., `global_config`, `active_deployment`).
* `value`: A `MEDIUMTEXT` JSON blob. This is potentially its largest field, storing the entire resolved configuration of every running Decky.
---
## 3. The API Layer (Pydantic DTOs)
These models define how data moves across the wire between the FastAPI backend and the frontend.
### 3.1 Authentication Pipeline
* `LoginRequest`: Validates incoming credentials before passing them to the security middleware.
* `Token`: The standard OAuth2 bearer token response, enriched with the `must_change_password` hint.
* `ChangePasswordRequest`: Ensures the old password is provided and the new one meets the project's security constraints.
### 3.2 Reporting & Pagination
DECNET uses a standardized "Envelope" pattern for broad analytical responses (`LogsResponse`, `AttackersResponse`, `BountyResponse`).
* `total`: The total count of matching records in the database (ignoring filters).
* `limit` / `offset`: The specific slice of data returned, supporting "Infinite Scroll" or traditional pagination in the UI.
* `data`: A list of dictionaries. By using `dict[str, Any]` here, the API remains flexible with SQLModel's dynamic attribute loading.
### 3.3 System Administration
* **`DeployIniRequest`**: The most critical input model. It takes `ini_content` as a validated string. By using the `IniContent` annotated type, the API rejects malformed deployments before they ever touch the fleet builder.
* **`MutateIntervalRequest`**: Uses a strict REGEX pattern (`^[1-9]\d*[mdMyY]$`) to ensure intervals like `30m` (30 minutes) or `2d` (2 days) are valid before being applied to the orchestrator.
---
## 4. Technical Foundations
### 4.1 Cross-DB Compatibility Logic
The project uses a custom variant system to handle the discrepancies between SQLite (which has simplified typing) and MySQL (which has strict size constraints).
```python
_BIG_TEXT = Text().with_variant(MEDIUMTEXT(), "mysql")
```
This abstraction ensures that fields like `Attacker.services` (which can grow to thousands of items) are stored as `MEDIUMTEXT` (16 MiB) on MySQL, whereas standard SQLAlchemy `Text` (often 64 KiB on MySQL) would silently truncate the data, leading to analytical loss.
### 4.2 High-Fidelity Normalization
Data arriving from distributed honeypots is often "dirty." The models include custom pre-validators like `_normalize_null`.
* **Null Coalescing**: Services often emit logging values as `"null"` or `"undefined"` strings. The `NullableString` type automatically converts these "noise" strings into actual Python `None` types during ingestion.
* **Timestamp Integrity**: `NullableDatetime` ensures that various ISO formats or epoch timestamps provided by different service containers are normalized into standard UTC datetime objects.
---
## 5. Integration Case Studies (Deep Analysis)
To understand how these models function, we must examine their lifecycle across the web stack.
### 5.1 The Repository Layer (`decnet/web/db/sqlmodel_repo.py`)
The repository is the primary consumer of the "Entities." It utilizes the metadata generated by SQLModel to:
1. **Generate DDL**: On startup, the repository calls `SQLModel.metadata.create_all()`. This takes every `table=True` class and translates it into `CREATE TABLE` statements tailored to the active engine (SQLite or MySQL).
2. **Translate DTOs**: When the repository fetches an `Attacker` from the DB, SQLModel automatically populates the Pydantic-style attributes, allowing the repository to return objects that are immediately serializeable by the routers.
### 5.2 The Dashboard Routers
Specific endpoints rely on these models for boundary safety:
* **`api_deploy_deckies.py`**: Uses `DeployIniRequest`. This ensures that even if a user tries to POST a massive binary file instead of an INI, the Pydantic layer (powered by `decnet.models.validate_ini_string`) will intercept and reject the request with a `422 Unprocessable Entity` error before it reaches the orchestrator.
* **`api_get_stats.py`**: Uses `StatsResponse`. This model serves as a "rollup" that aggregates data from the `Log`, `Attacker`, and `State` tables into a single unified JSON object for the dashboard's "At a Glance" view.
* **`api_get_health.py`**: Uses `HealthResponse`. This model provides a nested view of the system, where each sub-component (Engine, Collector, DB) is represented as a `ComponentHealth` object, allowing the UI to show granular "Success" or "Failure" states.
---
## 6. Futureproofing & Guidelines
As the project grows, the following habits must be maintained:
1. **Keep the Row Narrow**: Always separate behavioral data that updates frequently into auxiliary tables like `AttackerBehavior`.
2. **Use Variants**: Never use standard `String` or `Text` for JSON blobs; always use `_BIG_TEXT` to respect MySQL's storage limitations.
3. **Validate at the Boundary**: Ensure every new API request model uses Pydantic's strict typing to prevent malicious payloads from reaching the database layer.