Add metrics for config-load failures and startup latency (#303)#379
Add metrics for config-load failures and startup latency (#303)#379leynos wants to merge 1 commit into
Conversation
Configuration-load errors surface explicitly via `Result` boundaries and structured logging, but no metrics existed to detect production trends (failure rates, startup latency). Introduce a config-load observability boundary in `src/main.rs`: `resolve_configuration` spans diagnostic-mode resolution through the layer merge and calls `record_config_load_metrics`, which emits a `netsuke_config_load_total` counter labelled by `outcome` (`success`/`failure`) and a `netsuke_config_load_duration_seconds` histogram. The merge error path is extracted into `handle_config_load_error`. Recording goes through the `metrics` façade, so the instruments are no-ops unless an operator installs a recorder; Netsuke emits the measurements without bundling an exporter. Document the counter names, label conventions, and histogram buckets in `docs/developers-guide.md`. Add unit tests (using `metrics-util`'s debugging recorder) asserting the counter's outcome label and the single duration sample for both success and failure.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Reviewer's GuideAdds config-load observability by instrumenting startup configuration resolution with metrics, refactoring error handling, and documenting the new metrics, along with tests using a debugging recorder. Sequence diagram for configuration-load metrics and error handlingsequenceDiagram
participant main
participant resolve_configuration
participant cli as cli_merge
participant metrics_facade
participant handle_config_load_error
main->>resolve_configuration: resolve_configuration(parsed_cli, matches)
resolve_configuration->>cli: cli::resolve_merged_diag_json(parsed_cli, matches)
resolve_configuration-->>resolve_configuration: DiagMode::from_json_enabled(...)
resolve_configuration->>cli: cli::merge_with_config(parsed_cli, matches)
resolve_configuration-->>metrics_facade: record_config_load_metrics(elapsed, merged.is_ok())
metrics_facade-->>metrics_facade: metrics::histogram!(CONFIG_LOAD_DURATION_SECONDS)
metrics_facade-->>metrics_facade: metrics::counter!(CONFIG_LOAD_TOTAL)
resolve_configuration-->>main: (mode, merged)
alt [merge succeeded]
main-->>main: merged.with_default_command()
main-->>main: configure_runtime(...)
else [merge failed]
main->>handle_config_load_error: handle_config_load_error(err, mode)
handle_config_load_error-->>main: ExitCode::FAILURE
end
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary
Closes #303
Adds the metrics instrumentation requested as a follow-up to PR #297: a config-load failure counter and a startup-latency histogram, plus developer documentation.
Changes
Cargo.toml: add themetricsfaçade (runtime) andmetrics-util(dev, debugging feature).src/main.rs: introduceresolve_configuration(spanscli::resolve_merged_diag_jsonthroughcli::merge_with_config) andrecord_config_load_metrics, emitting:netsuke_config_load_total— counter labelledoutcome(success/failure);netsuke_config_load_duration_seconds— duration histogram.The merge error path is extracted into
handle_config_load_error. Becausemetricsis a façade, the instruments are no-ops until an operator installs a recorder; Netsuke bundles none.docs/developers-guide.md: new Configuration-load observability subsection documenting counter names, label conventions, and suggested histogram buckets.Testing
metrics_util::debugging::DebuggingRecorder+metrics::with_local_recorderto assert the counter carriesoutcome=failure/outcome=successand that the histogram records exactly one sample.Structured log fields (
operation,error_category) and per-phase counter labels are the scope of the follow-up #304.Validation
make check-fmt/make markdownlint/make lint/make test— pass (37 suites)🤖 Generated with Claude Code
Summary by Sourcery
Instrument startup configuration loading with metrics and document their usage.
New Features:
Enhancements:
Build:
Documentation:
Tests: