Streaming Semantic Proxy β AI-powered text rewriting through an ICAP-compatible Squid proxy. Every webpage your browser loads is rewritten in real-time by an LLM, with RAG-augmented context and semantic caching.
βββββββββββ ββββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Browser βββββΆβ Squid Proxy βββββΆβ ICAP Server βββββΆβ HTML Parser β
β + SSE ββββββ (SSL bump) β β (port 1344) β β + Injector β
ββββββ¬βββββ ββββββββββββββββ ββββββββ¬ββββββββ ββββββββ¬βββββββ
β β β
β SSE: rewritten text β βΌ
β (data-ai-id match) β ββββββββββββββββ
β β β Rewrite Pipe β
β β β RAG β Cache β
β β ββββββββ¬ββββββββ
β β β
β β βΌ
β β ββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ β LLM Client β
β (OpenAI API) β
ββββββββββββββββ
- Browser sends HTTP/HTTPS request through Squid proxy
- Squid intercepts the response and forwards it to the ICAP server via
RESPMOD - ICAP handler parses the HTML, tags each
<p>with adata-ai-idattribute, injects an SSE client runtime, and returns the modified page - Rewrite pipeline runs asynchronously β each text node is checked against the semantic cache, enriched with RAG context, sent to the LLM, validated, and pushed to the SSE queue
- Browser runtime receives SSE events and swaps DOM text in real-time
- Real-time rewriting β text updates live in the browser via Server-Sent Events
- ICAP protocol β standards-compliant; works with any ICAP-capable proxy
- SSL bump β HTTPS interception via Squid with auto-generated certificates
- RAG-augmented β style guides and glossaries improve rewrite quality
- Semantic caching β ChromaDB +
all-MiniLM-L6-v2embeddings; 0.85 similarity threshold avoids redundant LLM calls - Quality validation β rejects rewrites that drift more than 30% in length or contain artifacts
- Batch processing β nodes are batched (default 5) for throughput
- Hot-swap β rewritten text appears without a page reload
# Install
git clone https://github.com/yourname/squidcode.git
cd squidcode
pip install -e ".[dev]"
# Configure
cp .env.example .env
# Edit .env β set LLM_API_KEY, LLM_BASE_URL, and LLM_MODEL
# Start the servers
python -m squidcodeThis launches both the ICAP server (port 1344) and the SSE endpoint (port 8080). Point Squid at the ICAP server, configure your browser to use the proxy, and browse as normal.
All settings are environment variables (or .env file):
| Variable | Default | Description |
|---|---|---|
ICAP_HOST |
0.0.0.0 |
ICAP server bind address |
ICAP_PORT |
1344 |
ICAP server port |
SSE_HOST |
0.0.0.0 |
SSE server bind address |
SSE_PORT |
8080 |
SSE server port |
LLM_API_KEY |
β | OpenAI-compatible API key |
LLM_BASE_URL |
https://api.openai.com/v1 |
LLM API base URL |
LLM_MODEL |
gpt-4o-mini |
Model identifier |
LLM_TEMPERATURE |
0.3 |
Sampling temperature |
REWRITE_STYLE |
clarity |
Default rewrite style |
BATCH_SIZE |
5 |
Nodes per rewrite batch |
CACHE_SIMILARITY_THRESHOLD |
0.85 |
Cosine similarity for cache hit |
CACHE_TTL_HOURS |
24 |
Cache entry lifetime |
CACHE_PERSIST_DIR |
./cache_db |
ChromaDB storage path |
RAG_DATA_DIR |
./data |
RAG source data directory |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Sentence-transformers model |
LOG_LEVEL |
INFO |
Logging verbosity |
| Style | Effect |
|---|---|
clarity |
Simple, direct language. Active voice. Jargon replaced with plain language. |
simplify |
Shorter words and sentences for a general audience. |
formal |
Professional tone, precise vocabulary, no contractions. |
eli5 |
Explain like I'm five β simple words, analogies, brief and fun. |
Set the default via REWRITE_STYLE in .env.
Index style guides and glossaries before starting the server:
# Index a directory of style guide markdown/text files
python -m squidcode.rag.indexer --type style_guide --dir ./data/style_guides
# Index a JSON glossary file
python -m squidcode.rag.indexer --type glossary --file ./data/glossaries/general.jsonGlossary JSON format:
[
{"term": "API", "definition": "Application Programming Interface"},
{"term": "ICAP", "definition": "Internet Content Adaptation Protocol"}
]Generate a CA certificate and import it into your browser:
./squid/generate_cert.sh
# Import squid/ssl_cert/ca.crt into your browser's trusted root certificatesKey directives in squid/squid.conf:
# Enable SSL bump
http_port 3128 ssl-bump generate-host-certificates=on
sslcrtd_program /usr/lib/squid/security_file_certgen -s /var/lib/ssl_db -M 4MB
ssl_bump peek step1
ssl_bump stare step2
ssl_bump bump step3
# Enable ICAP
icap_enable on
icap_service squidcode respmod_precache icap://127.0.0.1:1344/squidcode
adaptation_access squidcode allow all
Then start Squid:
squid -z # initialize cache directories
squid -f squid/squid.confsquidcode/
βββ __init__.py # Package root (v0.1.0)
βββ __main__.py # Entry point (python -m squidcode)
βββ main.py # Starts ICAP + SSE servers
βββ config.py # Pydantic settings from env vars
βββ icap/
β βββ server.py # Async TCP ICAP server
β βββ handler.py # RESPMOD / OPTIONS handling
β βββ protocol.py # ICAP frame parser
βββ llm/
β βββ client.py # OpenAI-compatible streaming client
β βββ prompts.py # Style-specific system prompts
βββ rag/
β βββ indexer.py # CLI: index style guides & glossaries
β βββ retriever.py # ChromaDB similarity search
β βββ store.py # Collection management
βββ rewriter/
β βββ html_parser.py # BeautifulSoup text node extraction
β βββ pipeline.py # Async rewrite orchestration
β βββ script_injector.py # Inject SSE runtime into HTML
β βββ text_batcher.py # Batch nodes for LLM calls
βββ cache/
β βββ embedding.py # Sentence-transformers singleton
β βββ semantic_cache.py # ChromaDB-backed cache (TTL + similarity)
βββ sse/
β βββ endpoint.py # FastAPI SSE route
β βββ manager.py # Per-session event queues
βββ runtime/
β βββ client.js # Browser JS: SSE consumer + DOM swapper
βββ utils/
βββ quality.py # Length delta + artifact checks
βββ session.py # UUID session ID generation
squid/
βββ squid.conf # Squid proxy configuration
βββ generate_cert.sh # Self-signed CA for SSL bump
data/
βββ style_guides/default.md # Default writing style guide
βββ glossaries/general.json # Technical term definitions
tests/
βββ test_html_parser.py # HTML parsing and tagging
βββ test_icap_protocol.py # ICAP frame parsing
βββ test_llm_client.py # Prompt construction
βββ test_pipeline.py # Full pipeline with mocks
βββ test_rag_retriever.py # RAG retrieval
βββ test_script_injector.py # Runtime injection
βββ test_semantic_cache.py # Cache hit/miss/expiry
βββ test_hotswap.py # Integration: live servers + SSE
An async TCP server listening on port 1344. Handles OPTIONS (capability discovery) and RESPMOD (response modification). Non-HTML responses get a 204 No Modification pass-through. HTML responses are parsed, tagged with data-ai-id attributes on eligible <p> nodes (10+ chars, not inside <script>/<style>/<code>/<pre>), injected with the SSE runtime, and returned to Squid.
Runs as a background asyncio.Task per request. Nodes are batched (headings first, then paragraphs). For each node the pipeline:
- Checks the semantic cache (cosine similarity >= threshold)
- Retrieves RAG context (style guides + glossaries)
- Calls the LLM with a style-specific system prompt and context
- Validates output quality (length within 30%, no suspicious tokens)
- Stores in cache and pushes to the SSE queue
A FastAPI server on port 8080. Each browsing session gets a UUID. The browser runtime connects to GET /squidcode/sse/{session_id} and receives JSON events {"id": "sq-N", "text": "..."}. CORS-enabled for cross-origin access. Keepalive comments sent every 30 seconds.
Zero-dependency JavaScript injected into every page. Reads the session ID from a <meta> tag, opens an SSE connection, and swaps DOM text by matching data-ai-id selectors. Includes a brief highlight animation for visual feedback.
# Run all 49 tests (41 unit + 8 hotswap integration)
pytest tests/ -v
# Unit tests only (fast, no network)
pytest tests/ -v --ignore=tests/test_hotswap.py
# Run with a real LLM (requires .env with API key)
python test_llm_direct.pyGET /squidcode/sse/{session_id}
Returns text/event-stream. Events:
data: {"id": "sq-1", "text": "Rewritten paragraph content"}
GET /squidcode/health
Returns {"status": "ok"}.
OPTIONS icap://host:1344/squidcode
RESPMOD icap://host:1344/squidcode