machine/rp2: fix USB buffer status race that can stall CDC TX pump#5472
Open
rdon-key wants to merge 1 commit into
Open
machine/rp2: fix USB buffer status race that can stall CDC TX pump#5472rdon-key wants to merge 1 commit into
rdon-key wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes an RP2 USB CDC TX pump stall observed with
-scheduler=coresunder XIP pressure.The problem was reproduced on RP2040 as an intermittent USB CDC monitor hang: the program continued to run, but CDC output stopped before reaching
test finished.The underlying issue appears to be a race in the RP2040/RP2350 USB IRQ handlers around
BUFF_STATUS.Previously, the IRQ handler did this:
However, the USB CDC TX handler can immediately arm the next IN transfer. If that newly armed IN transfer completes while the endpoint handler is still running, the later clear of the old
BUFF_STATUSsnapshot can also clear the newly completed transfer bit.That can cause the CDC TX pump to miss a TX completion event and remain stuck waiting for a completion that has already been cleared.
This change makes the IRQ handler acknowledge the observed
BUFF_STATUSbits before invoking endpoint handlers:With this ordering, any new completion that happens while endpoint handlers are running remains pending and can be handled by a later IRQ.
Background
This was observed after the RP2 bidirectional endpoint changes made the issue reproducible with USB CDC output under XIP pressure.
The bidirectional endpoint changes appear to have made an existing IRQ ordering race visible, rather than being the direct source of the
BUFF_STATUSclear ordering itself.Reproducer
The following program repeatedly writes USB CDC output while another goroutine continuously reads from flash to create XIP/cache pressure.
Details
Example commands:
Testing
Using the reproducer above:
Before this change, the RP2040 test could intermittently hang before printing
test finished.This does not claim to fix every possible USB CDC stall, but it fixes the observed RP2 USB CDC TX pump stall under
-scheduler=coreswith XIP pressure and matches the suspected lost-completion race.