feat(canary): operator-upload instrumenters + tests

Seven instrumenters that mutate operator-supplied artifacts to
embed the callback URL:

- passthrough — bytes unchanged; only DNS-callback tokens trip
  detection, with the slug embedded in the placement path
- plain      — substitutes {{CANARY_URL}}/{{CANARY_HOST}} placeholders;
  falls back to appending a comment line whose prefix adapts to the
  apparent file syntax (#, //, ;)
- html       — injects a 1x1 tracking pixel before </body>, appends
  if the close tag is missing
- docx       — direct zipfile manipulation (no python-docx dep):
  inserts an external-image Relationship into word/_rels/document.xml.rels
  and a matching <w:drawing> element before </w:body>
- xlsx       — sibling of docx; injects an external-image relationship
  into xl/_rels/workbook.xml.rels (orphan rels are still fetched on
  open by most viewers)
- pdf        — uses pikepdf to install /OpenAction /URI on the catalog;
  rejects with a clear message when pikepdf isn't installed
- image      — uses Pillow to embed slug + URL in PNG tEXt / JPEG
  comment; rejects with a clear message when Pillow isn't installed

DOCX and XLSX share the rId allocator + relationship injector via
the docx module; both work on stdlib zipfile only.

Tests synthesise minimal real DOCX/XLSX fixtures inline, round-trip
each instrumenter, and assert the callback URL ends up in the
mutated bytes while the file still parses.
This commit is contained in:
2026-04-27 13:03:42 -04:00
parent c7658ea65e
commit 19ceff4417
10 changed files with 819 additions and 0 deletions

View File

@@ -0,0 +1,45 @@
"""HTML instrumenter — append a 1×1 tracking pixel.
Stdlib-only. We don't parse the HTML; we just inject the ``<img>``
tag immediately before the closing ``</body>`` (or, failing that, at
the end of the document). Most renderers that support remote images
(email previewers, IDE doc previews, browsers) will fetch it as
soon as the document is opened.
"""
from __future__ import annotations
import re
from decnet.canary.base import CanaryArtifact, CanaryContext, CanaryInstrumenter
_BODY_CLOSE = re.compile(rb"</body\s*>", re.IGNORECASE)
class HtmlInstrumenter(CanaryInstrumenter):
name = "html"
mime_prefixes = ("text/html", "application/xhtml+xml")
def instrument(
self, blob: bytes, ctx: CanaryContext, *, target_path: str,
) -> CanaryArtifact:
url = f"{ctx.http_base.rstrip('/')}/c/{ctx.callback_token}".encode()
pixel = (
b"<img src=\"" + url + b"\" width=\"1\" height=\"1\" "
b"alt=\"\" style=\"display:none\">\n"
)
match = _BODY_CLOSE.search(blob)
if match:
out = blob[:match.start()] + pixel + blob[match.start():]
note = "injected 1x1 pixel before </body>"
else:
out = (blob if blob.endswith(b"\n") else blob + b"\n") + pixel
note = "appended 1x1 pixel (no </body> found)"
return CanaryArtifact(
path=target_path,
content=out,
mode=0o644,
mtime_offset=-86400 * 7,
instrumenter=self.name,
notes=[note, f"pixel src={url.decode()}"],
)