Skip to content

WIP: ci: run some sharness tests under asan#7656

Open
chu11 wants to merge 11 commits into
flux-framework:masterfrom
chu11:issue7537_asan_ci_sharness
Open

WIP: ci: run some sharness tests under asan#7656
chu11 wants to merge 11 commits into
flux-framework:masterfrom
chu11:issue7537_asan_ci_sharness

Conversation

@chu11

@chu11 chu11 commented Jun 4, 2026

Copy link
Copy Markdown
Member

Next iteration of work, built on top of #7538.

Per suggestion, run a subset of sharness tests using a "ci=asan" tag.

WIP, I'm sure there will be iteration as we figure out how long this takes, which tests we want to keep or not keep. For the time being, removed valgrind (obviously), system tests, python/lua tests, and regression tests. But kept everything else for now. I imagine I'll remove some as I iterate. (example: do i really need to run "fake resources" tests under asan? probably not)

chu11 and others added 8 commits May 28, 2026 16:38
Problem: A test in t2714-python-cli-batch.t assumes a prior test
has been executed.  But that prior test will not run if ASAN is
configured.

Set NO_ASAN on the test to ensure it isn't executed like prior
dependent ones.
Problem: Tests in t0006-module-exec.t and t2614-job-shell-doom.t
test how segfaults are reported.  Under ASAN, segfault signals may be
handled / reported differently.

Skip tests that expect a specific message when a segfault occurs.  Under
ASAN, that specific message cannot be expected.
Problem: A test in t0006-module-exec.t and t3306-system-routercrash.t
simulate a segfault by sending a SIGSEGV signal to crash a module / broker.
With ASAN, the signal could be captured, the expected result may not happen,
and an asan log will be generated indicating a segfault happened.  Follow up
tests depend on the SIGSEGV crashing a module / broker, so we can't just skip
the test.  These tests are specifically covering SIGSEGV, so we don't want to
just change the signal.

Under ASAN, instead send a SIGKILL to crash the broker / module.  This will
ensure follow on tests continue to work as expected under ASAN.  We will still
get ample coverage of the SIGSEGV case under non-ASAN workflows.
Problem: Several tests are skipped when ASAN is enabled, but they
have no comments / explanation why.

Add comments explaining that the tests in t0005-exec.t are skipped
because ASAN causes a segfault to be reported differently than we
normally would expect.  Tests in t0016-cron-faketime.t and
t3001-mpi-personalities.t are skipped because they change LD_PRELOAD.
Tests in t2714-python-cli-batch.t are skipped because of slowness.
Problem: Some tests file under ASAN due to problems with underlying
commands.  This can be worked around if ASAN was not set on those
specific commands.

Add a no-asan-wrapper.sh script.  The simple script will unset
LD_PRELOAD and ASAN_OPTIONS before running a command.
Problem: The 'ps' and 'pkill' command hangs when ASAN is enabled.

Run 'ps' and 'pkill' under the no-asan script.
Problem: The 'dd' command has shown to have problems when run under
asan.

Wrap all calls to 'dd' under the no-asan-wrapper.sh script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem: A few tests fail or hang under ASAN for test specific reasons.

Set NO_ASAN on those tests.
@chu11 chu11 force-pushed the issue7537_asan_ci_sharness branch from 872d409 to 424c559 Compare June 4, 2026 18:12
chu11 added 2 commits June 4, 2026 13:32
Problem: The asan CI tests only run against unit tests in the src/
directory.  This is because tests did not pass or some tests hung.
Many of those issues have been fixed, although some issues still linger.

Update CI and scripts to run asan over unit tests and a subset of tests in
sharness.  Tests with "ci=asan" tag will be run under asan.  The tags will
be set in a following commit.
Add the '# ci=asan' comment to many test files in t/ directory.
This enables these tests to run in the ASAN CI workflow.  Do not
add to system related tests, regression tests, and valgrind tests.

Assisted-By: Claude Opus 4.6 <noreply@anthropic.com>
@chu11 chu11 force-pushed the issue7537_asan_ci_sharness branch from 424c559 to 7f6ec81 Compare June 4, 2026 20:32
@grondo

grondo commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Just some quick feedback -- I haven't looked at the whole PR yet. Needing to add $NOASAN foo to a bunch of commands in the testsuite is probably a non-starter. This requirement will probably be prone to error and aggravation. This is why I'd suggested a few "safe" tests get ci=asan markers or something like ci=system.

If we do need to go with this approach, maybe it would work to do something like:

  1. In the early setup in the asan ci test, create a bunch of wrapper scripts in $SHARNESS_BUILD_DIRECTORY/asan-scripts, e.g. ps dd, etc.
  2. Prepend $SHARNESS_BUILD_DIRECTORY/asan-scripts to the front of PATH for tests to asan enabled tests use the wrappers automatically.

This drops the requirement to touch every existing test for the wrapped commands, test authors don't have to worry about which commands to wrap - they just write tests, and it allows us easily to add new wrapped commands in the future.

@chu11

chu11 commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

This is why I'd suggested a few "safe" tests get ci=asan markers or something like ci=system.

We can certainly go with this approach with a few limited number of tests. I think the issue was that with python presumably having false positives (#7624) along with a number of these issues (ps, dd, etc.), the number of "safe" ones shrank by a pretty decent amount. The number of false positives recently shrank, due to unknown reasons, but I think it should be assumed they could come back.

If we do need to go with this approach, maybe it would work to do something like:

Your idea is a good one. I'll change #7538 accordingly.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.01%. Comparing base (f85f01f) to head (80253de).
⚠️ Report is 100 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7656      +/-   ##
==========================================
+ Coverage   83.99%   84.01%   +0.02%     
==========================================
  Files         576      576              
  Lines       97215    97215              
==========================================
+ Hits        81651    81673      +22     
+ Misses      15564    15542      -22     

see 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants