[nexus] Don't re-fetch a disk after committing attach/detach (fixes #10695)#10706
Open
MavenRain wants to merge 1 commit into
Open
[nexus] Don't re-fetch a disk after committing attach/detach (fixes #10695)#10706MavenRain wants to merge 1 commit into
MavenRain wants to merge 1 commit into
Conversation
…xidecomputer#10695) `DataStore::instance_attach_disk` and `DataStore::instance_detach_disk` each ran an atomic CTE that commits the disk's state transition, and then performed a second, non-transactional `disk_get` re-fetch, acquiring an additional pool connection while still holding the first, purely to return the richer `datastore::Disk` type. Under connection-pool exhaustion that post-commit re-fetch fails with "Backends exist, and appear online, but all claims are used" *after* the attach/detach has already committed. In a saga this is fatal: steno gives a forward action a single attempt and never runs the undo of a *failed* node, so a disk left `Attached` by an attach whose re-fetch failed has no detach scheduled. During unwind its create-record undo then fails permanently ("disk cannot be deleted in state \"attached\""), which wedges the instance-create unwind and leaves the instance stuck in `starting` (oxidecomputer#10695). `instance_detach_disk` has the mirror problem on the saga *undo* path: a re-fetch failure after the detach commits turns a successful detach into a permanent undo failure. Make the committing CTE the last fallible operation in both methods by returning the `model::Disk` the CTE already produces, and move the `datastore::Disk` upgrade to the two API callers in `app/instance.rs`, where a post-commit failure is recoverable (the client simply retries; attach and detach are idempotent). The instance-create saga discards the returned disk entirely, so the re-fetch was pure waste on that path. Signed-off-by: Onyeka Obi <softwareengineerasaservant@isurvivable.cv>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10695.
Problem
Running many concurrent instance-creates (30 instances × 11 disks each on
london) exhausts the qorb DB connection pool. Some instance-create sagas then get permanently stuck mid-unwind, leaving instances wedged instarting. The saga log shows a forward failure atattach_disk_output("all claims are used") and a permanent undo failure at alocal_storage_diskcreate-record undo ("disk cannot be deleted in state "attached"").Root cause
DataStore::instance_attach_diskcommits the disk→Attachedtransition via an atomic CTE and then does a second, non-transactionaldisk_getre-fetch (a fresh pool claim, acquired while still holding the CTE's connection) only to build the richerdatastore::Diskreturn value. Under pool exhaustion that re-fetch fails after the attach has committed.steno runs a forward action once and never runs the undo of a failed node, so a disk left
Attachedby an attach whose re-fetch failed has no detach scheduled. During unwind its create-record undo then hitsproject_delete_disk_no_auth(which only permitsDetached/Faulted/Creating) and fails permanently . . . steno stops the unwind at the first undo failure (saga_stuck), wedging the instance.instance_detach_diskhas the mirror problem and is used as a saga undo action, where a post-commit re-fetch failure turns a successful detach into a permanent undo failure.Change
Make the committing CTE the last fallible operation in both methods:
instance_attach_disk/instance_detach_disknow return themodel::Diskthe CTE already produces (no post-commit re-fetch).Nexus::instance_attach_disk/instance_detach_disk) do thedatastore::Diskupgrade viadisk_get, where a post-commit failure is recoverable (client retries; both ops are idempotent).After this, the disk-attach/detach saga actions can only fail before they commit, so an overloaded instance-create fails cleanly and fully unwinds instead of leaking an
Attacheddisk and wedging.Scope / non-goals
This removes the specific latent atomicity defect. It does not make instance-create robust to sustained pool exhaustion in general (a detach undo can still fail to acquire a connection and stop an unwind) . . . that is an honest DB-unavailable failure better addressed at the pool/admission layer, and is out of scope here.
Testing
Existing datastore coverage in
sled.rsexercises attach (success + attach-blocked-while-starting) and detach; those callers discard the return value and are unaffected by the type change. I was not able to build the full workspace in my environment; CI should validate compilation. A saga fault-injection test that fails the attach action after its DB commit and asserts a clean, leak-free unwind would be a good follow-up.Related
An automated audit found the same commit-then-fallible shape in several other sagas (see the linked issue comment) . . . notably
instance_create::sic_create_network_interface(orphaned NIC),snapshot_create::ssc_attach_disk_to_pantry, andvpc_create::svc_create_gateway. Those are left for follow-ups; the underlying invariant is: a saga forward action must not perform a fallible operation after committing state.