docs: document decnet collector worker
This commit is contained in:
63
development/docs/services/COLLECTOR.md
Normal file
63
development/docs/services/COLLECTOR.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# DECNET Collector
|
||||
|
||||
The `decnet/collector` module is responsible for the background acquisition, normalization, and filtering of logs generated by the honeypot fleet. It acts as the bridge between the transient Docker container logs and the persistent analytical database.
|
||||
|
||||
## Architecture
|
||||
|
||||
The Collector runs as a host-side worker (typically managed by the CLI or a daemon). It employs a hybrid asynchronous and multi-threaded model to handle log streaming from a dynamic number of containers without blocking the main event loop.
|
||||
|
||||
### Log Pipeline Flow
|
||||
1. **Discovery**: Scans `decnet-state.json` to identify active Decky service containers.
|
||||
2. **Streaming**: Spawns a dedicated thread for every active container to tail its `stdout` via the Docker SDK.
|
||||
3. **Normalization**: Parses the raw RFC 5424 Syslog lines into structured JSON.
|
||||
4. **Filtering**: Applies a rate-limiter to deduplicate high-frequency connection events.
|
||||
5. **Storage**: Appends raw lines to `.log` and filtered JSON to `.json` for database ingestion.
|
||||
|
||||
---
|
||||
|
||||
## Core Components
|
||||
|
||||
### `worker.py`
|
||||
|
||||
#### `log_collector_worker(log_file: str)`
|
||||
The main asynchronous entry point.
|
||||
- **Initial Scan**: Identifies all running containers that match the DECNET service naming convention.
|
||||
- **Event Loop**: Uses the Docker `events` API to listen for `container:start` events, allowing it to automatically pick up new Deckies that are deployed after the collector has started.
|
||||
- **Task Management**: Manages a dictionary of active streaming tasks, ensuring no container is streamed more than once and cleaning up completed tasks.
|
||||
|
||||
---
|
||||
|
||||
## Log Normalization (RFC 5424)
|
||||
|
||||
DECNET services emit logs using a standardized RFC 5424 format with structured data. The `parse_rfc5424` function is the primary tool for extracting this information.
|
||||
|
||||
- **Structured Data**: Extracts parameters from the `decnet@55555` SD-ELEMENT.
|
||||
- **Field Mapping**: Identifies the `attacker_ip` by scanning common source IP fields (`src_ip`, `client_ip`, etc.).
|
||||
- **Consistency**: Formats timestamps into a human-readable `%Y-%m-%d %H:%M:%S` format for the analytical stream.
|
||||
|
||||
---
|
||||
|
||||
## Ingestion Rate Limiter
|
||||
|
||||
To prevent the local SQLite database from being overwhelmed during credential-stuffing attacks or heavy port scanning, the Collector implements a window-based rate limiter for "lifecycle" events.
|
||||
|
||||
- **Scope**: By default, it limits: `connect`, `disconnect`, `connection`, `accept`, and `close`.
|
||||
- **Logic**: It groups events by `(attacker_ip, decky, service, event_type)`. If the same event occurs within the window, it is written to the raw `.log` file (for forensics) but **discarded** for the `.json` stream (ingestion).
|
||||
- **Configuration**:
|
||||
- `DECNET_COLLECTOR_RL_WINDOW_SEC`: The deduplication window size (default: 1.0s).
|
||||
- `DECNET_COLLECTOR_RL_EVENT_TYPES`: Comma-separated list of event types to limit.
|
||||
|
||||
---
|
||||
|
||||
## Resilience & Operational Stability
|
||||
|
||||
### Inode Tracking (`_reopen_if_needed`)
|
||||
Log files can be rotated by `logrotate` or manually deleted. The Collector tracks the **inode** of the log handles. If the file on disk changes (indicating rotation or deletion), the collector transparently closes and reopens the handle, ensuring no logs are lost and preventing "stale handle" errors.
|
||||
|
||||
### Docker SDK Integration
|
||||
The Collector uses `asyncio.to_thread` to run the blocking Docker SDK `logs(stream=True)` calls. This ensures that the high-latency network calls to the Docker daemon do not starve the asynchronous event loop responsible for monitoring container starts.
|
||||
|
||||
### Container Identification
|
||||
The Collector uses two layers of verification to ensure it only collects logs from DECNET honeypots:
|
||||
1. **Name Matching**: Checks if the container name matches the `{decky}-{service}` pattern.
|
||||
2. **State Verification**: Cross-references container names with the current `decnet-state.json`.
|
||||
Reference in New Issue
Block a user