Skip to content

Add metrics for config-load failures and startup latency (#303)#379

Draft
leynos wants to merge 1 commit into
mainfrom
issue-303-config-load-metrics
Draft

Add metrics for config-load failures and startup latency (#303)#379
leynos wants to merge 1 commit into
mainfrom
issue-303-config-load-metrics

Conversation

@leynos

@leynos leynos commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #303

Adds the metrics instrumentation requested as a follow-up to PR #297: a config-load failure counter and a startup-latency histogram, plus developer documentation.

Changes

  • Cargo.toml: add the metrics façade (runtime) and metrics-util (dev, debugging feature).
  • src/main.rs: introduce resolve_configuration (spans cli::resolve_merged_diag_json through cli::merge_with_config) and record_config_load_metrics, emitting:
    • netsuke_config_load_total — counter labelled outcome (success/failure);
    • netsuke_config_load_duration_seconds — duration histogram.
      The merge error path is extracted into handle_config_load_error. Because metrics is a façade, the instruments are no-ops until an operator installs a recorder; Netsuke bundles none.
  • docs/developers-guide.md: new Configuration-load observability subsection documenting counter names, label conventions, and suggested histogram buckets.

Testing

  • Unit tests use metrics_util::debugging::DebuggingRecorder + metrics::with_local_recorder to assert the counter carries outcome=failure/outcome=success and that the histogram records exactly one sample.

Structured log fields (operation, error_category) and per-phase counter labels are the scope of the follow-up #304.

Validation

  • make check-fmt / make markdownlint / make lint / make test — pass (37 suites)

🤖 Generated with Claude Code

Summary by Sourcery

Instrument startup configuration loading with metrics and document their usage.

New Features:

  • Add metrics-based instrumentation for configuration-load outcomes and durations during startup.

Enhancements:

  • Refactor configuration resolution into a dedicated function and centralised error handler to support metrics collection.

Build:

  • Add metrics and metrics-util crates to support runtime instrumentation and test-time inspection of metrics.

Documentation:

  • Document configuration-load observability, including metric names, semantics, and naming conventions in the developer guide.

Tests:

  • Add tests using a debugging metrics recorder to verify emitted configuration-load counters, labels, and histograms.

Configuration-load errors surface explicitly via `Result` boundaries
and structured logging, but no metrics existed to detect production
trends (failure rates, startup latency).

Introduce a config-load observability boundary in `src/main.rs`:
`resolve_configuration` spans diagnostic-mode resolution through the
layer merge and calls `record_config_load_metrics`, which emits a
`netsuke_config_load_total` counter labelled by `outcome`
(`success`/`failure`) and a `netsuke_config_load_duration_seconds`
histogram. The merge error path is extracted into
`handle_config_load_error`.

Recording goes through the `metrics` façade, so the instruments are
no-ops unless an operator installs a recorder; Netsuke emits the
measurements without bundling an exporter.

Document the counter names, label conventions, and histogram buckets
in `docs/developers-guide.md`. Add unit tests (using `metrics-util`'s
debugging recorder) asserting the counter's outcome label and the
single duration sample for both success and failure.
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 92285ef7-0e10-4d17-b6b8-c7ea84b967c2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-303-config-load-metrics

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai

sourcery-ai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Adds config-load observability by instrumenting startup configuration resolution with metrics, refactoring error handling, and documenting the new metrics, along with tests using a debugging recorder.

Sequence diagram for configuration-load metrics and error handling

sequenceDiagram
    participant main
    participant resolve_configuration
    participant cli as cli_merge
    participant metrics_facade
    participant handle_config_load_error

    main->>resolve_configuration: resolve_configuration(parsed_cli, matches)
    resolve_configuration->>cli: cli::resolve_merged_diag_json(parsed_cli, matches)
    resolve_configuration-->>resolve_configuration: DiagMode::from_json_enabled(...)
    resolve_configuration->>cli: cli::merge_with_config(parsed_cli, matches)
    resolve_configuration-->>metrics_facade: record_config_load_metrics(elapsed, merged.is_ok())
    metrics_facade-->>metrics_facade: metrics::histogram!(CONFIG_LOAD_DURATION_SECONDS)
    metrics_facade-->>metrics_facade: metrics::counter!(CONFIG_LOAD_TOTAL)
    resolve_configuration-->>main: (mode, merged)

    alt [merge succeeded]
        main-->>main: merged.with_default_command()
        main-->>main: configure_runtime(...)
    else [merge failed]
        main->>handle_config_load_error: handle_config_load_error(err, mode)
        handle_config_load_error-->>main: ExitCode::FAILURE
    end
Loading

File-Level Changes

Change Details Files
Instrument configuration-load phase with metrics and refactor startup configuration resolution and error handling.
  • Introduce resolve_configuration to compute diagnostic mode, perform config merge, and time the combined config-load phase.
  • Add CONFIG_LOAD_TOTAL counter and CONFIG_LOAD_DURATION_SECONDS histogram, and implement record_config_load_metrics to emit them via the metrics facade.
  • Extract handle_config_load_error to centralize config-load failure rendering and exit-code mapping, reusing prior JSON vs human-path behavior.
  • Update run_with_args to use resolve_configuration and handle_config_load_error, calling with_default_command only on successful merges.
  • Add unit tests validating counter labeling for success/failure and that exactly one histogram sample is recorded per invocation.
src/main.rs
Document configuration-load observability and metric conventions for operators and developers.
  • Add a Configuration-load observability subsection describing where instrumentation lives and how it behaves with the metrics facade.
  • Document the two emitted instruments, their semantics, and suggested histogram bucket boundaries.
  • Clarify metric naming and label cardinality conventions to guide future metrics additions.
docs/developers-guide.md
Wire in metrics dependencies for runtime use and test-time debugging.
  • Add the metrics crate as a runtime dependency for metrics facade macros.
  • Add metrics-util with debugging feature as a dev-dependency to support DebuggingRecorder-based tests.
  • Update Cargo.lock to capture the new dependency graph.
Cargo.toml
Cargo.lock

Assessment against linked issues

Issue Objective Addressed Explanation
#303 Instrument the config-load error handling paths (including handle_config_load_error / resolve_diag_mode_or_exit / merge_cli_or_exit equivalents) with a counter labelled by outcome (success/failure) to track configuration-load failure rates.
#303 Wrap the startup configuration-resolution phase, from cli::resolve_merged_diag_json through cli::merge_with_config, in a duration histogram to record startup latency.
#303 Add developer documentation to docs/developers-guide.md describing the configuration observability instrumentation, including counter names, label conventions, and histogram buckets.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add metrics instrumentation for config-load failure rates and startup latency

1 participant