Files
DECNET/development/docs/services/WEB_MODELS.md
anti 49f3002c94
Some checks failed
CI / Lint (ruff) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 19s
CI / Dependency audit (pip-audit) (push) Successful in 40s
CI / Test (Standard) (3.11) (push) Successful in 2m38s
CI / Test (Standard) (3.12) (push) Successful in 2m56s
CI / Test (Live) (3.11) (push) Failing after 1m3s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
added: docs; modified: .gitignore
2026-04-16 02:10:38 -04:00

10 KiB

DECNET Web & Database Models: Architectural Deep Dive

Important

DEVELOPMENT DISCLAIMER: DECNET is currently in active development. The storage schemas and API signatures defined in decnet/web/db/models.py are subject to radical change as the framework's analytical capabilities and distributed features expand.

1. Introduction & Philosophy

The decnet/web/db/models.py file represents the structural backbone of the DECNET web interface and its underlying analytical engine. It serves a dual purpose that is central to the project's architecture:

  1. Unified Source of Truth: By utilizing SQLModel, DECNET collapses the traditional barrier between Pydantic data validation and SQLAlchemy ORM mapping. This allows a single class definition to act as both a database table and an API data object, drastically reducing the "boilerplate" associated with traditional web-database pipelines.
  2. Analytical Scalability: The models are designed to scale from small-scale local deployments using SQLite to large-scale, enterprise-ready environments backed by MySQL. This is achieved through clever usage of SQLAlchemy "Variants" and abstraction layers for large text blobs.

2. The Database Layer (SQLModel Entities)

These models define the physical tables within the DECNET infrastructure. Every class marked with table=True is interpreted by the repository layer to generate the corresponding DDL (Data Definition Language) for the target database.

2.1 Identity & Security: The User Entity

The User model handles dashboard access control and basic identity management.

  • uuid: A unique string identifier. While integers are often used for IDs, DECNET uses strings to support potential future transitions to UUIDs without schema breakage.
  • username: The primary login handle. It is both unique and indexed for rapid authentication lookups.
  • password_hash: Stores the Argon2 or bcrypt hash. Length constraints in various routers ensure that raw passwords never exceed 72 characters, preventing "Long Password Denial of Service" attacks on various hashing algorithms.
  • role: A simple string-based permission field (e.g., admin, viewer).
  • must_change_password: A boolean flag used for fresh deployments or manual administrative resets, forcing the user to rotate their credentials upon their first authenticated session.

2.2 Intelligence & Attribution: Attacker and AttackerBehavior

These two tables form the core of DECNET's "Attacker Profiling" system. They are split into two tables to maintain "Narrow vs. Wide" performance characteristics.

The Attacker Entity (Broad Analytics)

The Attacker table stores the "primary" record for every unique IP discovered by the honeypot fleet.

  • ip: The source IP address. This is the primary key and is heavily indexed.
  • first_seen / last_seen: Tracking the lifecycle of an attacker's engagement with the network.
  • event_count / service_count / decky_count: Aggregated counters used by the stats dashboard to visualize the magnitude of an engagement.
  • services / deckies: JSON-serialized lists of every service and machine reached by the attacker. Using _BIG_TEXT here allows these lists to grow significantly during long-term campaigns.
  • traversal_path: A string representation (e.g., omega → epsilon → zulu) that helps analysts visualize lateral movement attempts recorded by the correlation engine.

The AttackerBehavior Entity (Granular Analytics)

This "Wide" table stores behavioral signatures. It is separated from the main Attacker record so that high-frequency updates to timing stats or sniffer-derived packet signatures don't lock the primary attribution rows.

  • os_guess: Derived from the os_fingerprint and sniffer engines, providing an estimate of the attacker's operating system based on TCP/IP stack nuances.
  • tcp_fingerprint: A JSON blob storing the raw TCP signature (Window size, MSS, Option sequence).
  • behavior_class: A classification (e.g., beaconing, interactive, brute_force) derived from log inter-arrival timing (IAT).
  • timing_stats: Stores a JSON dictionary of mean/median/stdev for event timing, used to detect automated tooling.

2.3 Telemetry: Log and Bounty

These tables store the "raw" data generated by the honeypots.

  • Log Table: The primary event sink. Every line from the collector ends up here.
    • event_type: The MSGID from the RFC 5424 header (e.g., connect, exploit).
    • raw_line: The full, un-parsed syslog string for forensic verification.
    • fields: A JSON blob containing the structured data (SD-ELEMENTS) extracted during normalization.
  • Bounty Table: Specifically for high-value events. When a service detects "Gold" (like a plain-text password or a known PoC payload), it is mirrored here for rapid analyst review.

2.4 System State: The State Entity

The State table acts as the orchestrator's brain. It stores the decnet-state.json content within the database when the system is integrated with the web layer.

  • key: The configuration key (e.g., global_config, active_deployment).
  • value: A MEDIUMTEXT JSON blob. This is potentially its largest field, storing the entire resolved configuration of every running Decky.

3. The API Layer (Pydantic DTOs)

These models define how data moves across the wire between the FastAPI backend and the frontend.

3.1 Authentication Pipeline

  • LoginRequest: Validates incoming credentials before passing them to the security middleware.
  • Token: The standard OAuth2 bearer token response, enriched with the must_change_password hint.
  • ChangePasswordRequest: Ensures the old password is provided and the new one meets the project's security constraints.

3.2 Reporting & Pagination

DECNET uses a standardized "Envelope" pattern for broad analytical responses (LogsResponse, AttackersResponse, BountyResponse).

  • total: The total count of matching records in the database (ignoring filters).
  • limit / offset: The specific slice of data returned, supporting "Infinite Scroll" or traditional pagination in the UI.
  • data: A list of dictionaries. By using dict[str, Any] here, the API remains flexible with SQLModel's dynamic attribute loading.

3.3 System Administration

  • DeployIniRequest: The most critical input model. It takes ini_content as a validated string. By using the IniContent annotated type, the API rejects malformed deployments before they ever touch the fleet builder.
  • MutateIntervalRequest: Uses a strict REGEX pattern (^[1-9]\d*[mdMyY]$) to ensure intervals like 30m (30 minutes) or 2d (2 days) are valid before being applied to the orchestrator.

4. Technical Foundations

4.1 Cross-DB Compatibility Logic

The project uses a custom variant system to handle the discrepancies between SQLite (which has simplified typing) and MySQL (which has strict size constraints).

_BIG_TEXT = Text().with_variant(MEDIUMTEXT(), "mysql")

This abstraction ensures that fields like Attacker.services (which can grow to thousands of items) are stored as MEDIUMTEXT (16 MiB) on MySQL, whereas standard SQLAlchemy Text (often 64 KiB on MySQL) would silently truncate the data, leading to analytical loss.

4.2 High-Fidelity Normalization

Data arriving from distributed honeypots is often "dirty." The models include custom pre-validators like _normalize_null.

  • Null Coalescing: Services often emit logging values as "null" or "undefined" strings. The NullableString type automatically converts these "noise" strings into actual Python None types during ingestion.
  • Timestamp Integrity: NullableDatetime ensures that various ISO formats or epoch timestamps provided by different service containers are normalized into standard UTC datetime objects.

5. Integration Case Studies (Deep Analysis)

To understand how these models function, we must examine their lifecycle across the web stack.

5.1 The Repository Layer (decnet/web/db/sqlmodel_repo.py)

The repository is the primary consumer of the "Entities." It utilizes the metadata generated by SQLModel to:

  1. Generate DDL: On startup, the repository calls SQLModel.metadata.create_all(). This takes every table=True class and translates it into CREATE TABLE statements tailored to the active engine (SQLite or MySQL).
  2. Translate DTOs: When the repository fetches an Attacker from the DB, SQLModel automatically populates the Pydantic-style attributes, allowing the repository to return objects that are immediately serializeable by the routers.

5.2 The Dashboard Routers

Specific endpoints rely on these models for boundary safety:

  • api_deploy_deckies.py: Uses DeployIniRequest. This ensures that even if a user tries to POST a massive binary file instead of an INI, the Pydantic layer (powered by decnet.models.validate_ini_string) will intercept and reject the request with a 422 Unprocessable Entity error before it reaches the orchestrator.
  • api_get_stats.py: Uses StatsResponse. This model serves as a "rollup" that aggregates data from the Log, Attacker, and State tables into a single unified JSON object for the dashboard's "At a Glance" view.
  • api_get_health.py: Uses HealthResponse. This model provides a nested view of the system, where each sub-component (Engine, Collector, DB) is represented as a ComponentHealth object, allowing the UI to show granular "Success" or "Failure" states.

6. Futureproofing & Guidelines

As the project grows, the following habits must be maintained:

  1. Keep the Row Narrow: Always separate behavioral data that updates frequently into auxiliary tables like AttackerBehavior.
  2. Use Variants: Never use standard String or Text for JSON blobs; always use _BIG_TEXT to respect MySQL's storage limitations.
  3. Validate at the Boundary: Ensure every new API request model uses Pydantic's strict typing to prevent malicious payloads from reaching the database layer.