feat(ua): classify User-Agent into scanner/cli/library/bot/nonstandard

Every http_useragent bounty now carries a `category` label plus an
optional tool name and a signals list. The main analytic win is the
`nonstandard` bucket — UAs like "FUCKYOU/1.0" or custom one-off
scanner labels that don't match any known pattern, which today
silently blend into the generic fingerprint list.

Buckets (priority order):

- scanner: nmap, nuclei, sqlmap, gobuster, nikto, masscan, zgrab,
  ffuf, wpscan, katana, burp, acunetix, nessus, openvas, arachni,
  whatweb, wappalyzer, etc.
- cli:  curl, wget, httpie, xh, fetch.
- library: python-requests, aiohttp, httpx, urllib, Go stdlib, Java,
  okhttp, Apache HttpClient, axios, node-fetch, got, undici, PHP,
  Guzzle, Ruby stdlib, Faraday, .NET, PostmanRuntime, Insomnia, etc.
- bot:  anything containing bot / crawler / spider / slurp / monitor
  (catches Googlebot, bingbot, Baiduspider — many of which ship a
  Mozilla/5.0 prefix, so the bot check runs BEFORE the browser
  regex).
- browser: Mozilla/5.0-prefixed UAs that aren't bots.
- nonstandard: anything else. The interesting bucket.
- empty: literal empty User-Agent header.

Side signals computed regardless of category: suspicious_short (<8
chars), suspicious_long (>512 chars), nonprintable (control chars),
injection_like (SQLi / XSS / path-traversal / Log4Shell markers).
A sqlmap UA with a literal SQL-injection payload embedded fires
category=scanner + injection_like — the combination tells the
analyst the tool is being operated manually vs. on default config.

Classification is deterministic (same UA string → same tuple) so
add_bounty's payload-hash dedup continues to collapse repeat rows.

UI renderer upgraded from FpGeneric to a dedicated FpUserAgent that
colours the category tag by risk (scanner=alert-red,
nonstandard=warn-yellow, browser=accent-green, etc.) and renders
each signal as its own chip. Makes the interesting rows pop in the
fingerprints panel.

Also fixed: the ingester was using `_headers.get("User-Agent") or
_headers.get("user-agent")`, which short-circuits away empty-string
UAs. An explicit empty UA is itself a signal (real clients always
send something) — now captured.
This commit is contained in:
2026-04-24 18:17:18 -04:00
parent 6d1d69443a
commit ca39552692
3 changed files with 452 additions and 7 deletions

View File

@@ -342,6 +342,58 @@ const FpGeneric: React.FC<{ p: any }> = ({ p }) => (
</div>
);
const UA_CATEGORY_COLOR: Record<string, string> = {
scanner: 'var(--alert, #ff4d4d)',
nonstandard: 'var(--warn, #e0a040)',
empty: 'var(--warn, #e0a040)',
bot: 'var(--violet)',
cli: 'var(--matrix)',
library: 'var(--matrix)',
browser: 'var(--accent-color)',
};
const UA_SIGNAL_COLOR: Record<string, string> = {
injection_like: 'var(--alert, #ff4d4d)',
nonprintable: 'var(--alert, #ff4d4d)',
suspicious_long: 'var(--warn, #e0a040)',
suspicious_short: 'var(--warn, #e0a040)',
};
const FpUserAgent: React.FC<{ p: any }> = ({ p }) => {
const category = typeof p.category === 'string' ? p.category : 'unknown';
const color = UA_CATEGORY_COLOR[category] || 'var(--text-color)';
const signals: string[] = Array.isArray(p.signals) ? p.signals : [];
return (
<div style={{ display: 'flex', flexDirection: 'column', gap: '6px' }}>
{p.value !== undefined && p.value !== '' ? (
<span
className="matrix-text"
style={{
fontFamily: 'monospace',
fontSize: '0.85rem',
wordBreak: 'break-all',
}}
>
{p.value}
</span>
) : (
<span className="dim" style={{ fontStyle: 'italic' }}>
(empty User-Agent)
</span>
)}
<div style={{ display: 'flex', gap: '6px', flexWrap: 'wrap' }}>
<Tag color={color}>{category.toUpperCase()}</Tag>
{p.tool && <Tag>{String(p.tool).toUpperCase()}</Tag>}
{signals.map((s) => (
<Tag key={s} color={UA_SIGNAL_COLOR[s] || 'var(--warn, #e0a040)'}>
{s.toUpperCase().replace(/_/g, ' ')}
</Tag>
))}
</div>
</div>
);
};
const FpSpoofedSource: React.FC<{ p: any }> = ({ p }) => (
<div style={{ display: 'flex', flexDirection: 'column', gap: '6px' }}>
<div>
@@ -434,6 +486,7 @@ const FingerprintGroup: React.FC<{ fpType: string; items: any[] }> = ({ fpType,
case 'hassh_server': return <FpHassh key={i} p={p} />;
case 'tcpfp': return <FpTcpStack key={i} p={p} />;
case 'http_quirks': return <FpHttpQuirks key={i} p={p} />;
case 'http_useragent': return <FpUserAgent key={i} p={p} />;
case 'spoofed_source': return <FpSpoofedSource key={i} p={p} />;
default: return <FpGeneric key={i} p={p} />;
}