From 49f3002c94635cb1758055d65996f1a3e9be6a64 Mon Sep 17 00:00:00 2001 From: anti Date: Thu, 16 Apr 2026 02:10:38 -0400 Subject: [PATCH] added: docs; modified: .gitignore --- .gitignore | 3 +- development/docs/ARCHITECTURE.md | 153 ++++++++++++++++++++++++ development/docs/services/WEB_MODELS.md | 134 +++++++++++++++++++++ 3 files changed, 288 insertions(+), 2 deletions(-) create mode 100644 development/docs/ARCHITECTURE.md create mode 100644 development/docs/services/WEB_MODELS.md diff --git a/.gitignore b/.gitignore index c65f265..b775e9c 100644 --- a/.gitignore +++ b/.gitignore @@ -10,7 +10,6 @@ build/ decnet-compose.yml decnet-state.json *.ini -.env decnet.log* *.loggy *.nmap @@ -19,7 +18,7 @@ webmail windows1 *.db decnet.json -.env +.env* .env.local .coverage .hypothesis/ diff --git a/development/docs/ARCHITECTURE.md b/development/docs/ARCHITECTURE.md new file mode 100644 index 0000000..5f0c294 --- /dev/null +++ b/development/docs/ARCHITECTURE.md @@ -0,0 +1,153 @@ +# DECNET Technical Architecture: Deep Dive + +This document provides a low-level technical decomposition of the DECNET (Deception Network) framework. It covers the internal orchestration logic, networking internals, reactive data pipelines, and the persistent intelligence schema. + +--- + +## 1. System Topology & Micro-Services + +DECNET is architected as a set of decoupled "engines" that interact via a persistent shared repository (SQLite/MySQL) and the Docker socket. + +### Component Connectivity Graph + +```mermaid +graph TD + subgraph "Infrastructure Layer" + DK[Docker Engine] + MV[MACVLAN / IPvlan Driver] + end + + subgraph "Identity Layer (Deckies)" + B1[Base Container 01] + S1a[Service: SSH] + S1b[Service: HTTP] + B1 --- S1a + B1 --- S1b + end + + subgraph "Telemetry Layer" + SNF[Sniffer Worker] + COL[Log Collector] + end + + subgraph "Processing Layer" + ING[Log Ingester] + PROF[Attacker Profiler] + end + + subgraph "Persistence Layer" + DB[(SQLModel Repository)] + ST[decnet-state.json] + end + + DK --- MV + MV --- B1 + + S1a -- "stdout/stderr" --> COL + S1b -- "stdout/stderr" --> COL + SNF -- "PCAP Analysis" --> COL + + COL -- "JSON Tail" --> ING + ING -- "Bounty Extraction" --> DB + ING -- "Log Commit" --> DB + + DB -- "Log Cursor" --> PROF + PROF -- "Correlation Engine" --> DB + PROF -- "Behavior Rollup" --> DB + + ING -- "Events" --> WS[Web Dashboard / SSE] +``` + +--- + +## 2. Core Orchestration: The "Decky" Lifecycle + +A **Decky** is a logical entity represented by a shared network namespace. + +### The Deployment Flow (`decnet deploy`) +1. **Configuration Parsing**: `DecnetConfig` (via `ini_loader.py`) validates the archetypes and service counts. +2. **IP Allocation**: `ips_to_range()` calculates the minimal CIDR covering all requested IPs to prevent exhaustion of the host's subnet. +3. **Network Setup**: + - Calls `docker network create -d macvlan --parent eth0`. + - Creates a host-side bridge (`decnet_macvlan0`) to fix the Linux bridge isolation issue (hairpin fix). +4. **Logging Injection**: Every service container has `decnet_logging.py` injected into its build context to ensure uniform RFC 5424 syslog output. +5. **Compose Generation**: `write_compose()` creates a dynamic `docker-compose.yml` where: + - Service containers use `network_mode: "service:"`. + - Base containers use `sysctls` derived from `os_fingerprint.py`. + +### Teardown & State +Runtime state is persisted in `decnet-state.json`. Upon `teardown`, DECNET: +1. Runs `docker compose down`. +2. Deletes the host-side macvlan interface and routes. +3. Removes the Docker network. +4. Clears the CLI state. + +--- + +## 3. Networking Internals: Passive & Active Fidelity + +### OS Fingerprinting (TCP/IP Spoofing) +DECNET tunes the networking behavior of each Decky within its own namespace. This is handled by the `os_fingerprint.py` module, which sets specific `sysctls` in the base container: +- `net.ipv4.tcp_window_scaling`: Enables/disables based on OS profile. +- `net.ipv4.tcp_timestamps`: Mimics specific OS tendencies (e.g., Windows vs. Linux). +- `net.ipv4.tcp_syncookies`: Prevents OS detection via SYN-flood response patterns. + +### The Packet Flow +1. **Ingress**: Packet hits physical NIC -> MACVLAN Bridge -> Target Decky Namespace. +2. **Telemetry**: The `Sniffer` container attaches to the same MACVLAN bridge in promiscuous mode. It uses scapy-like logic (via `decnet.sniffer`) to extract: + - **JA3/JA4**: TLS ClientHello fingerprints. + - **HASSH**: SSH Key Exchange fingerprints. + - **JARM**: (Triggered actively) TLS server fingerprints. + +--- + +## 4. Persistent Intelligence: Database Schema + +DECNET uses an asynchronous SQLModel-based repository. The schema is optimized for both high-speed ingestion and complex behavioral correlation. + +### Entity Relationship Model + +| Table | Purpose | Key Fields | +| :--- | :--- | :--- | +| **logs** | Raw event stream | `id`, `timestamp`, `decky`, `service`, `event_type`, `attacker_ip`, `fields` | +| **bounty** | Harvested artifacts | `id`, `bounty_type`, `payload` (JSON), `attacker_ip` | +| **attackers** | Aggregated profiles | `uuid`, `ip`, `is_traversal`, `traversal_path`, `fingerprints` (JSON), `commands` (JSON) | +| **attacker_behavior** | behavioral profile | `attacker_uuid`, `os_guess`, `behavior_class`, `tool_guesses` (JSON), `timing_stats` (JSON) | + +### JSON Logic +To maintain portability across SQLite/MySQL, DECNET uses the `JSON_EXTRACT` function for filtering logs by internal fields (e.g., searching for a specific HTTP User-Agent inside the `fields` column). + +--- + +## 5. Reactive Processing: The Internal Pipeline + +### Log Ingestion & Bounty Extraction +1. **Tailer**: `log_ingestion_worker` tails the JSON log stream. +2. **.JSON Parsing**: Every line is validated against the RFC 5424 mapping. +3. **Extraction Logic**: + - If `event_type == "credential"`, a row is added to the `bounty` table. + - If `ja3` field exists, a `fingerprint` bounty is created. +4. **Notification**: Logs are dispatched to active WebSocket/SSE clients for real-time visualization. + +### Correlation & Traversal Logic +The `CorrelationEngine` processes logs in batches: +- **IP Grouping**: Logs are indexed by `attacker_ip`. +- **Hop Extraction**: The engine identifies distinct `deckies` touched by the same IP. +- **Path Calculation**: A chronological string (`decky-A -> decky-B`) is built to visualize the attack progression. +- **Attacker Profile Upsert**: The `Attacker` table is updated with the new counts, path, and consolidated bounty history. + +--- + +## 6. Service Plugin Architecture + +Adding a new honeypot service is zero-configuration. The `decnet/services/registry.py` uses `pkgutil.iter_modules` to auto-discover any file in the `services/` directory. + +### `BaseService` Interface +Every service must implement: +- `name`: Unique identifier (e.g., "ssh"). +- `ports`: Targeted ports (e.g., `22/tcp`). +- `dockerfile_context()`: Path to the template directory. +- `compose_service(name, base_name)`: Returns the Docker Compose fragment. + +### Templates +Templates (found in `/templates/`) contain the Dockerfile and entrypoint. The `deployer` automatically syncs `decnet_logging.py` into these contexts during build time to ensure logs are streamed correctly to the host. diff --git a/development/docs/services/WEB_MODELS.md b/development/docs/services/WEB_MODELS.md new file mode 100644 index 0000000..4b04530 --- /dev/null +++ b/development/docs/services/WEB_MODELS.md @@ -0,0 +1,134 @@ +# DECNET Web & Database Models: Architectural Deep Dive + +> [!IMPORTANT] +> **DEVELOPMENT DISCLAIMER**: DECNET is currently in active development. The storage schemas and API signatures defined in `decnet/web/db/models.py` are subject to radical change as the framework's analytical capabilities and distributed features expand. + +## 1. Introduction & Philosophy + +The `decnet/web/db/models.py` file represents the structural backbone of the DECNET web interface and its underlying analytical engine. It serves a dual purpose that is central to the project's architecture: + +1. **Unified Source of Truth**: By utilizing **SQLModel**, DECNET collapses the traditional barrier between Pydantic data validation and SQLAlchemy ORM mapping. This allows a single class definition to act as both a database table and an API data object, drastically reducing the "boilerplate" associated with traditional web-database pipelines. +2. **Analytical Scalability**: The models are designed to scale from small-scale local deployments using **SQLite** to large-scale, enterprise-ready environments backed by **MySQL**. This is achieved through clever usage of SQLAlchemy "Variants" and abstraction layers for large text blobs. + +--- + +## 2. The Database Layer (SQLModel Entities) + +These models define the physical tables within the DECNET infrastructure. Every class marked with `table=True` is interpreted by the repository layer to generate the corresponding DDL (Data Definition Language) for the target database. + +### 2.1 Identity & Security: The `User` Entity + +The `User` model handles dashboard access control and basic identity management. + +* `uuid`: A unique string identifier. While integers are often used for IDs, DECNET uses strings to support potential future transitions to UUIDs without schema breakage. +* `username`: The primary login handle. It is both `unique` and `indexed` for rapid authentication lookups. +* `password_hash`: Stores the Argon2 or bcrypt hash. Length constraints in various routers ensure that raw passwords never exceed 72 characters, preventing "Long Password Denial of Service" attacks on various hashing algorithms. +* `role`: A simple string-based permission field (e.g., `admin`, `viewer`). +* `must_change_password`: A boolean flag used for fresh deployments or manual administrative resets, forcing the user to rotate their credentials upon their first authenticated session. + +### 2.2 Intelligence & Attribution: `Attacker` and `AttackerBehavior` + +These two tables form the core of DECNET's "Attacker Profiling" system. They are split into two tables to maintain "Narrow vs. Wide" performance characteristics. + +#### The `Attacker` Entity (Broad Analytics) +The `Attacker` table stores the "primary" record for every unique IP discovered by the honeypot fleet. + +* `ip`: The source IP address. This is the primary key and is heavily indexed. +* `first_seen` / `last_seen`: Tracking the lifecycle of an attacker's engagement with the network. +* `event_count` / `service_count` / `decky_count`: Aggregated counters used by the stats dashboard to visualize the magnitude of an engagement. +* `services` / `deckies`: JSON-serialized lists of every service and machine reached by the attacker. Using `_BIG_TEXT` here allows these lists to grow significantly during long-term campaigns. +* `traversal_path`: A string representation (e.g., `omega → epsilon → zulu`) that helps analysts visualize lateral movement attempts recorded by the correlation engine. + +#### The `AttackerBehavior` Entity (Granular Analytics) +This "Wide" table stores behavioral signatures. It is separated from the main `Attacker` record so that high-frequency updates to timing stats or sniffer-derived packet signatures don't lock the primary attribution rows. + +* `os_guess`: Derived from the `os_fingerprint` and `sniffer` engines, providing an estimate of the attacker's operating system based on TCP/IP stack nuances. +* `tcp_fingerprint`: A JSON blob storing the raw TCP signature (Window size, MSS, Option sequence). +* `behavior_class`: A classification (e.g., `beaconing`, `interactive`, `brute_force`) derived from log inter-arrival timing (IAT). +* `timing_stats`: Stores a JSON dictionary of mean/median/stdev for event timing, used to detect automated tooling. + +### 2.3 Telemetry: `Log` and `Bounty` + +These tables store the "raw" data generated by the honeypots. + +* **`Log` Table**: The primary event sink. Every line from the collector ends up here. + * `event_type`: The MSGID from the RFC 5424 header (e.g., `connect`, `exploit`). + * `raw_line`: The full, un-parsed syslog string for forensic verification. + * `fields`: A JSON blob containing the structured data (SD-ELEMENTS) extracted during normalization. +* **`Bounty` Table**: Specifically for high-value events. When a service detects "Gold" (like a plain-text password or a known PoC payload), it is mirrored here for rapid analyst review. + +### 2.4 System State: The `State` Entity + +The `State` table acts as the orchestrator's brain. It stores the `decnet-state.json` content within the database when the system is integrated with the web layer. + +* `key`: The configuration key (e.g., `global_config`, `active_deployment`). +* `value`: A `MEDIUMTEXT` JSON blob. This is potentially its largest field, storing the entire resolved configuration of every running Decky. + +--- + +## 3. The API Layer (Pydantic DTOs) + +These models define how data moves across the wire between the FastAPI backend and the frontend. + +### 3.1 Authentication Pipeline +* `LoginRequest`: Validates incoming credentials before passing them to the security middleware. +* `Token`: The standard OAuth2 bearer token response, enriched with the `must_change_password` hint. +* `ChangePasswordRequest`: Ensures the old password is provided and the new one meets the project's security constraints. + +### 3.2 Reporting & Pagination +DECNET uses a standardized "Envelope" pattern for broad analytical responses (`LogsResponse`, `AttackersResponse`, `BountyResponse`). + +* `total`: The total count of matching records in the database (ignoring filters). +* `limit` / `offset`: The specific slice of data returned, supporting "Infinite Scroll" or traditional pagination in the UI. +* `data`: A list of dictionaries. By using `dict[str, Any]` here, the API remains flexible with SQLModel's dynamic attribute loading. + +### 3.3 System Administration +* **`DeployIniRequest`**: The most critical input model. It takes `ini_content` as a validated string. By using the `IniContent` annotated type, the API rejects malformed deployments before they ever touch the fleet builder. +* **`MutateIntervalRequest`**: Uses a strict REGEX pattern (`^[1-9]\d*[mdMyY]$`) to ensure intervals like `30m` (30 minutes) or `2d` (2 days) are valid before being applied to the orchestrator. + +--- + +## 4. Technical Foundations + +### 4.1 Cross-DB Compatibility Logic +The project uses a custom variant system to handle the discrepancies between SQLite (which has simplified typing) and MySQL (which has strict size constraints). + +```python +_BIG_TEXT = Text().with_variant(MEDIUMTEXT(), "mysql") +``` + +This abstraction ensures that fields like `Attacker.services` (which can grow to thousands of items) are stored as `MEDIUMTEXT` (16 MiB) on MySQL, whereas standard SQLAlchemy `Text` (often 64 KiB on MySQL) would silently truncate the data, leading to analytical loss. + +### 4.2 High-Fidelity Normalization +Data arriving from distributed honeypots is often "dirty." The models include custom pre-validators like `_normalize_null`. + +* **Null Coalescing**: Services often emit logging values as `"null"` or `"undefined"` strings. The `NullableString` type automatically converts these "noise" strings into actual Python `None` types during ingestion. +* **Timestamp Integrity**: `NullableDatetime` ensures that various ISO formats or epoch timestamps provided by different service containers are normalized into standard UTC datetime objects. + +--- + +## 5. Integration Case Studies (Deep Analysis) + +To understand how these models function, we must examine their lifecycle across the web stack. + +### 5.1 The Repository Layer (`decnet/web/db/sqlmodel_repo.py`) +The repository is the primary consumer of the "Entities." It utilizes the metadata generated by SQLModel to: +1. **Generate DDL**: On startup, the repository calls `SQLModel.metadata.create_all()`. This takes every `table=True` class and translates it into `CREATE TABLE` statements tailored to the active engine (SQLite or MySQL). +2. **Translate DTOs**: When the repository fetches an `Attacker` from the DB, SQLModel automatically populates the Pydantic-style attributes, allowing the repository to return objects that are immediately serializeable by the routers. + +### 5.2 The Dashboard Routers +Specific endpoints rely on these models for boundary safety: + +* **`api_deploy_deckies.py`**: Uses `DeployIniRequest`. This ensures that even if a user tries to POST a massive binary file instead of an INI, the Pydantic layer (powered by `decnet.models.validate_ini_string`) will intercept and reject the request with a `422 Unprocessable Entity` error before it reaches the orchestrator. +* **`api_get_stats.py`**: Uses `StatsResponse`. This model serves as a "rollup" that aggregates data from the `Log`, `Attacker`, and `State` tables into a single unified JSON object for the dashboard's "At a Glance" view. +* **`api_get_health.py`**: Uses `HealthResponse`. This model provides a nested view of the system, where each sub-component (Engine, Collector, DB) is represented as a `ComponentHealth` object, allowing the UI to show granular "Success" or "Failure" states. + +--- + +## 6. Futureproofing & Guidelines + +As the project grows, the following habits must be maintained: + +1. **Keep the Row Narrow**: Always separate behavioral data that updates frequently into auxiliary tables like `AttackerBehavior`. +2. **Use Variants**: Never use standard `String` or `Text` for JSON blobs; always use `_BIG_TEXT` to respect MySQL's storage limitations. +3. **Validate at the Boundary**: Ensure every new API request model uses Pydantic's strict typing to prevent malicious payloads from reaching the database layer.