Skip to content

Reduce memory allocation pressure in HttpCompare, wayback, and paramminer#3213

Open
liquidsec wants to merge 1 commit into
devfrom
more-memory-improvements
Open

Reduce memory allocation pressure in HttpCompare, wayback, and paramminer#3213
liquidsec wants to merge 1 commit into
devfrom
more-memory-improvements

Conversation

@liquidsec

Copy link
Copy Markdown
Collaborator

Summary

Addresses three of the measured memory hotspots from the memray profiling report (excluding baddns, which is being handled separately):

  • HttpCompare baseline spill: After establishing the baseline, the heavy blasthttp Response is replaced with a lightweight _BaselineSnapshot that spills body bytes to the scan's BodySpillStore (disk-backed LRU cache). External code (lightfuzz, webbrute, sqli) sees the same .text, .content, .status_code, .headers API. Frees the Rust-side body allocation for the lifetime of each HttpCompare instance.

  • Batch wayback YARA matching: Instead of calling rules.match(data=url) per-URL (100K+ CFFI boundary crossings on large CDX responses), all candidate URLs are concatenated into a newline-delimited blob and matched once. String instance offsets are mapped back to individual URLs via bisect and counted per-URL against the threshold. Measured: ~22 GB cumulative allocation pressure reduced to a single match invocation.

  • Paramminer per-URL scoping: already_checked changes from a flat set(hash(word+url)) to dict[url, set(hash(word))]. After finish() processes each URL, its word set is evicted. Frees ~30-100 MB on heavy paramminer scans instead of retaining all word hashes until scan end.

Also audited _module_consumers accounting -- all increment/decrement paths are correctly paired, no actionable leak found.

…iner

- HttpCompare: replace retained blasthttp Response with lightweight snapshot that spills body to BodySpillStore
- wayback: batch YARA junk-URL matching into single blob match with offset mapping instead of per-URL calls
- paramminer: scope already_checked per-URL so word sets are freed after finish() processes each endpoint
@github-actions

Copy link
Copy Markdown
Contributor

🚀 Performance Benchmark Report

⚠️ No current benchmark data available

This might be because:

  • Benchmarks failed to run
  • No benchmark tests found
  • Dependencies missing

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.28571% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 90%. Comparing base (ef65b09) to head (6de8c81).

Files with missing lines Patch % Lines
bbot/core/helpers/diff.py 71% 11 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #3213   +/-   ##
=====================================
+ Coverage     90%     90%   +1%     
=====================================
  Files        453     453           
  Lines      46101   46160   +59     
=====================================
+ Hits       41211   41264   +53     
- Misses      4890    4896    +6     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liquidsec liquidsec mentioned this pull request Jun 18, 2026
28 tasks
@liquidsec liquidsec added this to the BBOT 3.0 - blazed_elijah milestone Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant