Files
DECNET/decnet/realism/llm/impl/ollama.py
anti 0b9873982d refactor(realism): move emailgen LLM/personas/prompt into shared library
Lift the format-agnostic pieces from decnet/orchestrator/emailgen/
into the new decnet/realism/ library so file-class content generation
(stage 3 of the realism migration) can reuse them. Email-specific
delivery (RFC 2822 EML, IMAP/POP3 spool, thread chains) stays in
orchestrator/.

Renames (history-preserving git mv):
  emailgen/personas.py     -> realism/personas.py
  emailgen/prompt.py       -> realism/prompts/email.py
  emailgen/global_pool.py  -> realism/personas_pool.py
  emailgen/llm/            -> realism/llm/

Env-var clean break (pre-v1, no aliases):
  DECNET_EMAILGEN_LLM      -> DECNET_REALISM_LLM
  DECNET_EMAILGEN_MODEL    -> DECNET_REALISM_MODEL
  DECNET_EMAILGEN_TIMEOUT  -> DECNET_REALISM_TIMEOUT
  DECNET_EMAILGEN_PERSONAS -> DECNET_REALISM_PERSONAS
  DECNET_EMAILGEN_FAKE_OUTPUT -> DECNET_REALISM_FAKE_OUTPUT

Importers rewritten in: orchestrator/emailgen/scheduler.py,
orchestrator/drivers/email.py, web/router/{emailgen,topology}/
api_personas.py, cli/emailgen.py. Tests for moved modules relocated
to tests/realism/; tests for stay-put modules updated in place.

API URL `/api/v1/emailgen/personas` and CLI `decnet emailgen
import-personas` keep their public names until the service-collapse
commit (stage 5).
2026-04-27 16:05:43 -04:00

101 lines
3.4 KiB
Python

"""Ollama subprocess backend.
Shells out to ``ollama run <model>`` with the prompt fed via stdin.
Why subprocess and not the Ollama HTTP API:
* No new dependency (``ollama`` Python lib is optional).
* Works on hosts where Ollama is bound to a unix socket, an unusual TCP
port, or behind a remote-mount layer — `ollama run` resolves all that.
* Same path the operator uses by hand (``ollama run llama3.1``); easier
to debug discrepancies between worker output and a console session.
Cost: per-call process spawn (~50ms on a warm box). Acceptable for
realism tick rates (one body per ~5 minutes per persona by default).
When that cost matters, swap to an HTTP-API backend; the seam is in
:mod:`decnet.realism.llm.factory`.
"""
from __future__ import annotations
import asyncio
import os
import time
from typing import Optional
from decnet.logging import get_logger
from decnet.realism.llm.base import LLMBackend, LLMResult, LLMTimeout
log = get_logger("realism.llm")
_OLLAMA = "ollama"
_DEFAULT_MODEL = os.environ.get("DECNET_REALISM_MODEL", "llama3.1")
_DEFAULT_TIMEOUT = float(os.environ.get("DECNET_REALISM_TIMEOUT", "60"))
class OllamaBackend(LLMBackend):
"""Concrete :class:`LLMBackend` that shells out to ``ollama run``."""
def __init__(
self,
*,
model: Optional[str] = None,
timeout: Optional[float] = None,
) -> None:
self.model = model or _DEFAULT_MODEL
self.timeout = timeout if timeout is not None else _DEFAULT_TIMEOUT
async def generate(self, prompt: str) -> LLMResult:
t0 = time.monotonic()
try:
proc = await asyncio.create_subprocess_exec(
_OLLAMA, "run", self.model,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
except FileNotFoundError as exc:
latency_ms = int((time.monotonic() - t0) * 1000)
return LLMResult(
success=False,
text="",
model=self.model,
latency_ms=latency_ms,
extra={"rc": 127, "stderr": f"argv[0] not found: {exc}"},
)
try:
stdout, stderr = await asyncio.wait_for(
proc.communicate(prompt.encode("utf-8")),
timeout=self.timeout,
)
except asyncio.TimeoutError as exc:
try:
proc.kill()
except ProcessLookupError:
pass
raise LLMTimeout(
f"ollama run {self.model} exceeded {self.timeout}s"
) from exc
latency_ms = int((time.monotonic() - t0) * 1000)
rc = proc.returncode if proc.returncode is not None else -1
text = stdout.decode("utf-8", "replace")
stderr_s = stderr.decode("utf-8", "replace")
if rc != 0 or not text.strip():
log.warning(
"ollama backend non-zero / empty rc=%d model=%s stderr=%r",
rc, self.model, stderr_s[:200],
)
return LLMResult(
success=False,
text=text,
model=self.model,
latency_ms=latency_ms,
extra={"rc": rc, "stderr": stderr_s.strip()[:256]},
)
return LLMResult(
success=True,
text=text,
model=self.model,
latency_ms=latency_ms,
extra={"rc": rc},
)