Seven instrumenters that mutate operator-supplied artifacts to
embed the callback URL:
- passthrough — bytes unchanged; only DNS-callback tokens trip
detection, with the slug embedded in the placement path
- plain — substitutes {{CANARY_URL}}/{{CANARY_HOST}} placeholders;
falls back to appending a comment line whose prefix adapts to the
apparent file syntax (#, //, ;)
- html — injects a 1x1 tracking pixel before </body>, appends
if the close tag is missing
- docx — direct zipfile manipulation (no python-docx dep):
inserts an external-image Relationship into word/_rels/document.xml.rels
and a matching <w:drawing> element before </w:body>
- xlsx — sibling of docx; injects an external-image relationship
into xl/_rels/workbook.xml.rels (orphan rels are still fetched on
open by most viewers)
- pdf — uses pikepdf to install /OpenAction /URI on the catalog;
rejects with a clear message when pikepdf isn't installed
- image — uses Pillow to embed slug + URL in PNG tEXt / JPEG
comment; rejects with a clear message when Pillow isn't installed
DOCX and XLSX share the rId allocator + relationship injector via
the docx module; both work on stdlib zipfile only.
Tests synthesise minimal real DOCX/XLSX fixtures inline, round-trip
each instrumenter, and assert the callback URL ends up in the
mutated bytes while the file still parses.
77 lines
2.8 KiB
Python
77 lines
2.8 KiB
Python
"""PDF instrumenter — requires :mod:`pikepdf` (optional dependency).
|
|
|
|
PDF embedding is non-trivial: the cleanest place to put a callback
|
|
is an ``/AA`` (additional actions) ``/O`` (open) entry on the
|
|
catalog or a ``/URI`` action on a link annotation. Either path
|
|
needs proper xref-table updates — pikepdf handles that for us.
|
|
|
|
If pikepdf isn't available in the environment the instrumenter
|
|
raises :class:`InstrumenterRejectedError` so the API can return a
|
|
clear 400 directing the operator to either install pikepdf or
|
|
re-upload as ``passthrough``.
|
|
|
|
We don't ship a stdlib fallback because every "naive" PDF mutation
|
|
I'm aware of (appending raw bytes, splicing into the trailer, etc.)
|
|
breaks the document's xref table and trips a "file is corrupt"
|
|
warning in modern viewers — which the attacker will absolutely
|
|
notice.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
from decnet.canary.base import (
|
|
CanaryArtifact,
|
|
CanaryContext,
|
|
CanaryInstrumenter,
|
|
InstrumenterRejectedError,
|
|
)
|
|
|
|
|
|
class PdfInstrumenter(CanaryInstrumenter):
|
|
name = "pdf"
|
|
mime_prefixes = ("application/pdf",)
|
|
|
|
def instrument(
|
|
self, blob: bytes, ctx: CanaryContext, *, target_path: str,
|
|
) -> CanaryArtifact:
|
|
try:
|
|
import pikepdf # type: ignore[import-not-found]
|
|
except ImportError as e:
|
|
raise InstrumenterRejectedError(
|
|
"PDF instrumenter requires pikepdf; install it (`pip "
|
|
"install pikepdf`) or re-upload the artifact with "
|
|
"kind=passthrough so it ships unmodified."
|
|
) from e
|
|
|
|
url = f"{ctx.http_base.rstrip('/')}/c/{ctx.callback_token}"
|
|
try:
|
|
import io
|
|
buf = io.BytesIO(blob)
|
|
with pikepdf.open(buf) as pdf:
|
|
# Add an OpenAction that fires a URI action on document
|
|
# open. Most viewers prompt before fetching; that's
|
|
# fine — even the prompt itself can trip a "user
|
|
# interacted with the document" tell, and an
|
|
# auto-allow viewer fetches the URL silently.
|
|
action = pikepdf.Dictionary(
|
|
Type=pikepdf.Name("/Action"),
|
|
S=pikepdf.Name("/URI"),
|
|
URI=pikepdf.String(url),
|
|
)
|
|
pdf.Root[pikepdf.Name("/OpenAction")] = action
|
|
out = io.BytesIO()
|
|
pdf.save(out)
|
|
mutated = out.getvalue()
|
|
except Exception as e:
|
|
raise InstrumenterRejectedError(
|
|
f"failed to instrument PDF: {e!s}"
|
|
) from e
|
|
|
|
return CanaryArtifact(
|
|
path=target_path,
|
|
content=mutated,
|
|
mode=0o644,
|
|
mtime_offset=-86400 * 14,
|
|
instrumenter=self.name,
|
|
notes=[f"installed /OpenAction /URI -> {url}"],
|
|
)
|