Fix fragment cleanup failures on NFS-backed workspaces#139
Open
ErenYurekAgena wants to merge 1 commit into
Open
Fix fragment cleanup failures on NFS-backed workspaces#139ErenYurekAgena wants to merge 1 commit into
ErenYurekAgena wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request fixes fragment cleanup failures that can occur when a Modelio workspace is stored on an NFS-backed filesystem.
The issue can happen during module or RAMC update/removal. Modelio closes and deletes the old fragment directories before reinstalling or updating them. On local filesystems this normally works as expected. On NFS-like filesystems, however, recently closed files may remain temporarily busy and can appear as
.nfs*files. While that temporary state exists, recursive directory deletion may fail with errors such asDirectoryNotEmptyExceptionorDevice or resource busy.When this happens, the module/RAMC update transaction can roll back even though the failure is only caused by a transient NFS cleanup state.
Problem this prevents
This change prevents module/RAMC update or removal from failing just because the previous fragment directory cannot be removed immediately on an NFS-backed workspace.
Without this fix, the canonical fragment directory may remain in place after a transient NFS delete failure. That prevents the new fragment version from being installed cleanly and can cause the update transaction to roll back.
With this fix, when the failure is identified as a network-filesystem-style transient delete failure, the old fragment directory is moved away from its canonical location into a
delete-pendingsibling directory. This immediately frees the original fragment path, allowing the update or reinstall operation to continue.The moved directory is then cleaned up with bounded retry. If NFS still reports busy files, cleanup is retried asynchronously and also retried on later fragment mount.
What changed
The standard deletion path is still used first.
If
FileUtils.delete(path)succeeds, the method returns immediately and no fallback logic is executed.Only if deletion fails, and only if the failure matches a network-filesystem-style transient cleanup problem, the code applies the fallback behavior:
DirectoryNotEmptyException,Device or resource busy, or.nfs*filesdelete-pendingsibling pathWhy this should not affect normal local usage
The existing behavior is preserved for normal successful local filesystem deletion.
On a local filesystem, when
FileUtils.delete(path)succeeds, the new code simply returns immediately. The delete-pending fallback is not entered.Delete-pending directory scans are also guarded by a filesystem check and are only run for network-backed fragment parent directories. This means normal local filesystems do not automatically scan and clean delete-pending directories during mount.
The fallback is intentionally conservative:
Safety of the marker file
The delete-pending marker is stored as a sibling file instead of being placed inside the temporary directory.
This avoids a case where a recursive cleanup could delete an internal marker before the parent directory itself is successfully removed. By keeping the marker next to the temporary directory, later cleanup attempts can still safely identify whether the directory was created by this fallback.
Only directories with the matching
.modelio-delete-pendingmarker are considered delete-pending cleanup candidates.Tested
Tested with an NFS/EFS-backed workspace:
Tested with the same Ubuntu 22.04 container runtime using a non-EFS local filesystem workspace:
FileUtils.delete(path)path