Reduce memory allocation pressure in HttpCompare, wayback, and paramminer#3213
Open
liquidsec wants to merge 1 commit into
Open
Reduce memory allocation pressure in HttpCompare, wayback, and paramminer#3213liquidsec wants to merge 1 commit into
liquidsec wants to merge 1 commit into
Conversation
…iner - HttpCompare: replace retained blasthttp Response with lightweight snapshot that spills body to BodySpillStore - wayback: batch YARA junk-URL matching into single blob match with offset mapping instead of per-URL calls - paramminer: scope already_checked per-URL so word sets are freed after finish() processes each endpoint
Contributor
🚀 Performance Benchmark Report
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #3213 +/- ##
=====================================
+ Coverage 90% 90% +1%
=====================================
Files 453 453
Lines 46101 46160 +59
=====================================
+ Hits 41211 41264 +53
- Misses 4890 4896 +6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses three of the measured memory hotspots from the memray profiling report (excluding baddns, which is being handled separately):
HttpCompare baseline spill: After establishing the baseline, the heavy blasthttp
Responseis replaced with a lightweight_BaselineSnapshotthat spills body bytes to the scan'sBodySpillStore(disk-backed LRU cache). External code (lightfuzz,webbrute,sqli) sees the same.text,.content,.status_code,.headersAPI. Frees the Rust-side body allocation for the lifetime of eachHttpCompareinstance.Batch wayback YARA matching: Instead of calling
rules.match(data=url)per-URL (100K+ CFFI boundary crossings on large CDX responses), all candidate URLs are concatenated into a newline-delimited blob and matched once. String instance offsets are mapped back to individual URLs viabisectand counted per-URL against the threshold. Measured: ~22 GB cumulative allocation pressure reduced to a single match invocation.Paramminer per-URL scoping:
already_checkedchanges from a flatset(hash(word+url))todict[url, set(hash(word))]. Afterfinish()processes each URL, its word set is evicted. Frees ~30-100 MB on heavy paramminer scans instead of retaining all word hashes until scan end.Also audited
_module_consumersaccounting -- all increment/decrement paths are correctly paired, no actionable leak found.