Skip to content

Fix upload/download stats to use wall-clock duration#2388

Open
amit-writes-code wants to merge 1 commit into
temporalio:mainfrom
amit-writes-code:fix/2378-wall-clock-storage-duration
Open

Fix upload/download stats to use wall-clock duration#2388
amit-writes-code wants to merge 1 commit into
temporalio:mainfrom
amit-writes-code:fix/2378-wall-clock-storage-duration

Conversation

@amit-writes-code

Copy link
Copy Markdown

Use edge detection (Start/Stop) to bracket visitProtoPayloads rather than
summing individual PayloadBatchCompleted durations

Fixes #2378

What was changed

internal/internal_task_pollers.go

  • Replaced totalDuration time.Duration field on workflowTaskStorageMetrics
    with startTime and endTime time.Time fields.
  • Added Start() — records wall-clock start immediately before
    visitProtoPayloads is called (rising edge).
  • Added Stop() — records wall-clock end immediately after
    visitProtoPayloads returns (falling edge), before any subsequent
    workflow execution or network calls.
  • Added WallClockDuration() — returns endTime.Sub(startTime), a fixed
    interval between two captured timestamps. Replaces totalDuration at
    both log sites (tagPayloadDownloadDuration, tagPayloadUploadDuration).
  • PayloadBatchCompleted no longer accumulates duration — it only tracks
    payloadCount, totalSize, and driverNames.

internal/internal_task_pollers_test.go

  • TestWorkflowTaskStorageMetrics_WallClockDuration — verifies the
    Start/Stop/WallClockDuration lifecycle: returns 0 before Start(),
    returns 0 before Stop(), returns a positive duration after Stop(),
    does not shrink on a second Stop() call.
  • TestWorkflowTaskStorageMetrics_PayloadBatchCompleted — simulates two
    concurrent callbacks and asserts that payloadCount and totalSize
    accumulate correctly, driver names are collected from both callbacks,
    and wall-clock duration is less than the sum of individual durations
    (proving edge detection rather than summation).

Why?

PayloadBatchCompleted was called once per Visit invocation with a
duration argument representing the wall-clock time for that Visit.
When payloadVisitorConcurrency > 1, multiple Visit calls run
concurrently. Summing their individual durations overcounts proportionally
to the number of concurrent visits:

Visit 1: t=0ms  → t=100ms   duration=100ms
Visit 2: t=10ms → t=120ms   duration=110ms   (runs concurrently with Visit 1)

Old: totalDuration = 100 + 110 = 210ms  ← wrong, they overlapped
New: Stop() − Start() = 120ms − 0ms = 120ms  ← correct wall-clock

Checklist

  1. Closes [Bug] External storage transfer stats reports summation of durations instead of wall-clock duration #2378
  2. How was this tested:
    • TestWorkflowTaskStorageMetrics_WallClockDuration — unit test for
      the Start/Stop/WallClockDuration lifecycle.
    • TestWorkflowTaskStorageMetrics_PayloadBatchCompleted — concurrent
      callbacks assert wall-clock < sum of individual durations.
    • Run: go test ./internal/ -run "TestWorkflowTaskStorageMetrics" -v
  3. Any docs updates needed? No — internal observability change only, no
    public API or user-visible behavior change.

Use edge detection (Start/Stop) to bracket visitProtoPayloads rather than
summing individual PayloadBatchCompleted durations Fixes temporalio#2378
@amit-writes-code amit-writes-code requested a review from a team as a code owner June 4, 2026 21:28
@amit-writes-code

Copy link
Copy Markdown
Author

@jmaeagle99 would love to know if there is any scope of optimzations here, in terms of accuracy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] External storage transfer stats reports summation of durations instead of wall-clock duration

1 participant