Skip to content

Implement ARD-compliant discovery service and event feed#46

Draft
kperry-godaddy wants to merge 10 commits into
mainfrom
feat/ans-finder
Draft

Implement ARD-compliant discovery service and event feed#46
kperry-godaddy wants to merge 10 commits into
mainfrom
feat/ans-finder

Conversation

@kperry-godaddy

Copy link
Copy Markdown
Contributor

This pull request introduces the new ans-finder service, which implements the Agentic Resource Discovery (ARD) service over the ANS reference implementation. It also makes several improvements to the RA service, including the addition of a public agent-events feed and enhanced handler logging. The Makefile is updated to support building and documenting the new finder service. Minor cleanups and dependency updates are also included.

New ANS Finder Service:

  • Added the new ans-finder binary (cmd/ans-finder/main.go), which:
    • Polls the RA's agent-events feed, indexes events for full-text search, and serves public discovery endpoints per the ARD contract.
    • Includes liveness/readiness endpoints and operational runbook documentation.
    • Features a rate-limited, anonymous HTTP API and a Swagger UI docs endpoint.

Makefile and Build System Updates:

  • Updated the Makefile to support building the new ans-finder binary and to sync its OpenAPI spec into the docs UI. [1] [2] [3] [4]

RA Service Improvements:

  • Added a public /v1/agents/events feed to the RA, with exact-path anonymous access and handler wiring for event streaming. [1] [2] [3] [4] [5]
  • Enhanced security by adding an HTTP middleware to set X-Content-Type-Options: nosniff on all responses, especially for the public events feed.
  • Updated all handler constructors to accept a logger, improving logging consistency and observability. [1] [2] [3]

Code Cleanup and Dependency Updates:

  • Cleaned up unused imports and removed the unused VerifiedCheckpoint struct in cmd/ans-verify/walk.go. [1] [2] [3] [4]

Add spec/api-spec-finder-v1.yaml — the ANS Finder discovery surface,
implementing the Agentic Resource Discovery Specification v0.5 Registry
REST interface (POST /v1/search, POST /v1/explore).

- House style of spec/api-spec-v2.yaml; RFC 7807 Problem errors shaped
  per spec/api-spec-tl-v2.yaml, carrying the five ARD standard error
  codes (ARDS Appendix B); 400 INVALID_ARGUMENT for bad arguments.
- Each operation documents why POST (structured query body) and that
  responses are not GET-cacheable.
- Query model per ARDS §7.1: text required for search / optional for
  explore, filter as dot-path keys to arrays (OR within, AND across).
- CatalogEntry url-XOR-data via oneOf, with prose scoping the invariant
  to Active entries (tombstones carry neither and never reach the wire).
- TrustManifest/Attestation per ARDS §5.1/§5.2; free-text fields marked
  untrusted publisher content; §-citations throughout.
- federation auto|referrals|none (default auto); pageSize default 10,
  max 100. Discovery routes are anonymous (security: []), rate-limited.

No Makefile/docs-sync wiring — that lands with the server and its
conformance test in a later PR.

Signed-off-by: kperry <kperry@godaddy.com>
Add internal/finder/feed and internal/finder/project — the consumer-side
ingestion contract and the pure event-to-catalog-entry projection.

internal/finder/feed:
- EventPageResponse/EventItem/AgentEndpoint/AgentFunction mirror the
  production swagger (swagger_ans.json) field-for-field: JSON tags and
  required/optional (omitempty) match, Items marshals as [] never null,
  tokens are the production hyphenated forms (HTTP-API, STREAMABLE-HTTP,
  JSON-RPC). EventItem.Validate enforces required fields, agentId UUID,
  createdAt RFC 3339, and binds agentHost to the ansName FQDN via
  domain.ParseAnsName in one step.

internal/finder/project:
- FromEvent is the single entry point. The lifecycle split is the safety
  rule: REVOKED/DEPRECATED mint identity-only tombstones from required
  fields, never touching label minting or URL policy, so a malformed
  display field can never block a revocation. feed.Validate failure is a
  hard error; an unknown eventType is an alertable Skip, not an error, so
  a growing producer enum cannot wedge ingestion at the cursor.
- Two security chokepoints: one text sanitizer strips Cc controls plus
  bidi/zero-width Cf from every emitted string, and validateEmittedURL
  (https-or-AllowHTTP, no userinfo/query/fragment, host-bound) is the
  only way any URL — including the constructed well-known fallback —
  enters an entry. A present-but-invalid metaDataUrl fails closed (no
  fallback rescue); an invalid agentUrl is omitted from metadata, not a
  Skip. URN is a lineage handle (urn:ai:host:agents:label); empty label
  Skips the Active path but never a tombstone.
- Active mapping per ARDS §4.2: A2A/MCP fan out one entry each, HTTP-API
  is excluded, capabilities/tags are sanitized, deduped, sorted and
  capped, and entries sort by (identifier, type, url) so duplicate
  protocols stay deterministic. Standard encoding/json marshaling.

Both packages at 100% statement coverage; golden-vector harness with
UPDATE_GOLDEN matches internal/tl/event. Conformance against the OSS RA
feed route is closed by PR 2's byte-equality and enum-value tests.

Signed-off-by: kperry <kperry@godaddy.com>
…alidation

Review-pass fixes for three blockers plus folded-in polish.

B1 — tombstone dropped createdAt. ProjectedEntry now carries CreatedAt
(json:"-"), populated verbatim in both the tombstone and active paths;
the test-local goldenView struct gained the field, goldens regenerated.
The index orders suppression by this timestamp, so a tombstone that
lost it could be applied out of order. The tombstone table test now
asserts createdAt directly (the prior comment falsely claimed a golden
covered it).

B2 — emitted URLs bypassed the text chokepoint. Two parts:
(a) sanitizeText now strips ALL of unicode.Cf (a superset of the prior
    enumerated bidi/zero-width list — also covers U+061C, U+2060, and
    the U+E0000-E007F TAG block) alongside unicode.Cc;
(b) validateEmittedURL now REJECTS fail-closed (does not strip — a URL
    is structural) any raw URL containing a Cc/Cf rune, before
    returning it. A bidi-bearing metaDataUrl in event_adversarial_text
    now proves URL coverage via the golden (SkipInvalidURL).

B3 — full validation ran before the lifecycle switch, so a
REVOKED/DEPRECATED event missing an Active-only field (e.g. version)
errored and never tombstoned — the exact fail-open the lifecycle split
prevents. Validation is now split: feed.ValidateIdentityKeys (logId,
agentId UUID, ansName parse + FQDN==lower(agentHost), createdAt RFC3339)
runs before the switch; the full feed.Validate (eventType, agentHost,
version presence) runs only on the Active path. A table test proves a
version-less REVOKED/DEPRECATED still tombstones, and an inverse test
keeps the Active path erroring.

Folded-in polish:
- Skip.Detail uses strconv.Quote (raw eventType/protocol with control
  chars reach operator logs).
- agentHost lowercased once at the top of projectActive and tombstone
  so URN, trustManifest.identity, and the well-known fallback agree;
  case-variant events no longer mint byte-different identities.
- validateEmittedURL also rejects u.ForceQuery (bare trailing "?").
- spec: SearchRequest.query uses an allOf overlay adding required:[text]
  so a schema validator rejects a text-less search (prose-only before).
- doc comments drop the local checkout path and PR-N references for the
  public repo.

Both packages remain at 100% statement coverage; make check green;
goldens regenerated.

Signed-off-by: kperry <kperry@godaddy.com>
Add GET /v1/agents/events on the RA: a public, unauthenticated feed of
agent lifecycle events the ANS Finder ingests. The response is
byte-compatible with the production getAgentEvents contract (consumer
mirror: internal/finder/feed).

Pipeline:
- Migration 006 adds outbox_events.log_id, an index on log_id (cursor
  resolution), and a partial (created_at_ms, id) feed index (retention-
  seekable reads). The outbox worker persists the TL-assigned logId
  atomically with sent_at_ms via MarkSent(ctx, id, logID); the feed
  gates on both being non-NULL so an item in the feed is provably
  sealed and its receipt is resolvable from logId. An empty logId from
  a non-compliant TL is treated as a delivery anomaly (row kept pending),
  never written. Open() runs ANALYZE so the planner uses the feed
  indexes (SEARCH, not a primary-key SCAN over aged-out rows).
- port.FeedReader/FeedRow/FeedQuery is the read port; the SQLite
  FeedStore implements it (JOINs registrations + endpoints, retention
  window, outbox-id-ASC ordering, lastLogId cursor resolved to its
  lowest matching outbox id, providerId -> empty page).
- service.EventsService projects each row into the wire EventItem and
  owns the domain->wire token map (driven by domain.AllProtocols/
  AllTransports). providerId is never emitted.
- V1EventsHandler parses limit (1-200, 422 on out-of-range), lastLogId,
  providerId.

Security/correctness hardening:
- Auth exemption for the feed is EXACT-match (WithAnonymousExactPath),
  not prefix: a subtree exemption let chi backtrack /v1/agents/events/*
  onto the authenticated /v1/agents/{agentId}/* routes with auth
  skipped. Both static and OIDC providers now match exact paths exactly
  and subtree paths on a / boundary (so /docsfoo is not under /docs).
- 500 responses for unexpected (non-domain) errors return a fixed
  generic detail; raw fault text no longer leaks to clients. To avoid
  swallowing the cause, error responses route through a shared embedded
  responder seam (handlers embed it; one injected zerolog.Logger, no
  globals) whose writeError logs the real cause of any 5xx before
  sanitizing — enforced by construction across all RA handlers, not the
  events route alone. The package WriteError is retained as the
  domain-error-only entry point for the ownership middleware.
- X-Content-Type-Options: nosniff on all responses.

Conformance tests pin the contract: byte-equality through the consumer
mirror (full AND minimal item), enum-value membership against the
swagger sets via domain.AllProtocols/AllTransports plus the enqueued
eventType tags, feed.EventItem.Validate() over every emitted item, and
an EXPLAIN QUERY PLAN assertion that both feed queries use their
indexes. Auth regression tests assert /v1/agents/events/revoke without
credentials is 401, not a silent bypass; a responder test asserts a
non-domain 500 returns a generic body AND logs the real cause.

events-feed retention added to RA config (default 720h/30d).

Signed-off-by: kperry <kperry@godaddy.com>
…lore

Add the runnable ans-finder binary serving the ARD discovery surface over
the ANS reference implementation.

Pipeline:
- internal/finder/index defines the Catalog port and query vocabulary;
  internal/adapter/store/sqlitefinder is its SQLite FTS5 implementation.
  Applying an Active event REPLACES the complete row set for its ansName
  (grouped by ansName+logId), so an endpoint whose (type,url) changes or
  is dropped between versions never lingers ACTIVE. bm25-ranked search
  normalized 0-100; type/tags/capabilities/publisher/attestation-type
  filters; GROUP BY facets with limit/minCount/otherCount; ansName-keyed
  tombstone suppression gated on created_at; replay-safe (a newer-or-equal
  tombstone is never overridden by replaying an older Active event); a
  revoke that suppresses nothing while the agent is still active is
  reported for a WARN. User search text is quoted into FTS5 string
  literals so operators can never be injected.
- internal/finder/poller drains the RA events feed from the cursor,
  projects each item via project.FromEvent, and applies pages atomically.
  A structural feed error aborts the round without advancing the cursor;
  an unknown eventType is a logged Skip. A non-2xx (incl. 429) is a
  transient retry, never a cursor reset. A no-progress page (same cursor,
  more claimed) breaks the drain loop; repeated failure at one cursor
  escalates a wedge line; idle rounds log DEBUG, ingesting rounds INFO.
  The HTTPS feed client enforces the transport policy (https unless
  AllowHTTP, TLS never skipped), refuses redirects, and caps the body.
- internal/finder/handler serves POST /v1/search and /v1/explore per the
  frozen spec, RFC 7807 Problem errors, a query-bound opaque pageToken, a
  global token-bucket rate limiter (Retry-After + nosniff on responses),
  the additive staleSince signal, per-request cost caps (text size/tokens,
  filter-value count, facet count + dedup), and control-character
  rejection on query text. Filter values accept the spec's bare-scalar or
  array form. Readiness (/v1/admin/ready) is gated on the first completed
  poll; health is liveness-only.
- cmd/ans-finder wires config.LoadFinder, the index, the poller goroutine,
  and the HTTP server (chi, hardened timeouts, graceful shutdown that
  drains the poller before closing the store on either exit path); docs at
  /docs; the package comment documents health/ready semantics and the
  ingestion-wedge recovery runbook.

Wiring: Makefile build-finder + docs-sync; docsui.SpecFinder embed with a
byte-equality drift guard; demo start.sh/stop.sh/run-lifecycle.sh gain an
ans-finder stage that discovers the demo agent (publisher-filtered) end to
end. The frozen finder spec is amended with the additive staleSince
response field and a note that an over-max pageSize is clamped.

internal/finder/{index,poller,handler} and the sqlitefinder adapter are
table-tested against in-memory SQLite and httptest feed servers; a
conformance test validates response field names AND spec-required keys
against the embedded spec. cmd/ans-finder is excluded from the coverage
denominator per repo policy; overall coverage stays above the 90% gate.

Signed-off-by: kperry <kperry@godaddy.com>
…gnote

The C2SP signed-note checkpoint parser was duplicated ~85% between the
offline verifier (cmd/ans-verify) and the TL checkpoint-read path
(internal/tl/service). Consolidate it into a new internal/lognote
package that depends only on internal/crypto and its leaf deps — no
internal/tl/logstore, no storage adapters, no Tessera. This lets
cmd/ans-verify link the verification path without pulling the
log-writer dependency tree.

lognote exposes:
  - Signature{Name, Raw, Blob} with KeyHash/KeyHashHex/Body/Classify
  - SplitNote(raw) (body, sigs, found) — lenient tokenization; its doc
    comment carries the safety invariant that splitting proves nothing
    and only VerifyCheckpointNote's unconditional gate (known keyhash
    AND valid ECDSA sig over the fixed body) establishes trust
  - Checkpoint{Origin, Size, RootHash}
  - VerifyCheckpointNote(raw, keysByHash) (*Checkpoint, error)
  - VerifyC2SPECDSA (moved verbatim from logstore, DER + legacy P1363)

Migration:
  - logstore.VerifyC2SPECDSA deleted outright (no alias); the signer
    stays. Its round-trip test now verifies through lognote.
  - internal/tl/service deletes splitNoteBody/keyhashFromSumdbSig/
    classifySumdbSig and the sigType consts; viewFromRecord maps
    lognote.Signature (origin fallback, "0x"+KeyHashHex(), algES256);
    the enrich switch gains a default arm; enrichC2SPSignature calls
    lognote.VerifyC2SPECDSA.
  - cmd/ans-verify deletes verifyCheckpointNote + keyHashHex and the
    VerifiedCheckpoint type; verifiedCheckpoint returns
    *lognote.Checkpoint and delegates to lognote.VerifyCheckpointNote.
    go list -deps ./cmd/ans-verify no longer references tl/logstore or
    Tessera client/storage.

internal/lognote ships with table-driven tests at 100% of statements,
including a golden note signed over real bytes with a real ECDSA key
and the adversarial case (known keyhash + garbage sig rejected, loop
continues to a later valid line). Exhaustive cases live in lognote;
thin smoke tests remain at the migrated sites.

Wire-delta disclosures (behavior changes from unifying the two
parsers onto internal/lognote, recorded for reviewers):
  - CheckpointView.Signatures: an invalid-base64 signature line is now
    omitted from the rendered signatures instead of surfaced with
    Valid=false. This state is reachable only via corrupted checkpoint
    storage (the TL never writes a malformed line), and dropping it is
    fail-closed — a line we cannot decode carries no trustworthy
    signer/keyhash to display.
  - cmd/ans-verify checkpoint tokenization is unified onto the service
    parser's semantics. Consequence: tab-separated signature lines no
    longer parse (no writer emits tabs — sumdb-note uses single
    spaces), and CRLF / leading-whitespace lines now become signature
    candidates (still subject to the same keyhash+signature gate, so
    no verification weakening).

Signed-off-by: kperry <kperry@godaddy.com>
Add docs/pr-specs/FINDER-ard-discovery-service.md, the design of record
for ans-finder: an ARD-conformant (Agentic Resource Discovery v0.5)
discovery service over ANS-registered agents.

Covers the feed-only ingestion decision and its accepted trades, the
pure EventItem-to-CatalogEntry projection with the tombstone safety
rule, the /v1/search and /v1/explore API surface, the trust model and
receipt semantics, the text-hygiene and URL-policy security contracts
that must precede any wire freeze, the deviations from the ARD and ANS
RA contracts, and verbatim ARDS v0.5 and production-swagger field
tables in the appendix.

Docs-only; new docs/pr-specs/ directory.

Signed-off-by: kperry <kperry@godaddy.com>
Adds docs/architecture/ans-finder.md — the as-built companion to the
FINDER design spec: system topology, register-to-discover-to-prove
sequence, poller round semantics, index entry lifecycle, and the
asserted-vs-provable trust split, each as a Mermaid diagram with the
operational details (feed gating, projection chokepoints, request-cost
caps, readiness semantics, runbook pointers) as built and verified.

Trues up the design spec's deviations table with the two rows the
implementation added: nextPageToken response naming vs ARD §7.2's
example, and the additive optional staleSince freshness field.

Signed-off-by: kperry <kperry@godaddy.com>
…ver-card+json and adjust pagination token references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant