Skip to content
This repository was archived by the owner on Jun 16, 2026. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ SPDX-License-Identifier: MIT OR Apache-2.0

# 2026-03-29 - Consider Readability and Possible Environment Limitations
**Learning** While some patterns are hypothetically faster, they may not improve performance in i/o bound contexts. Examples include embedding/reranking requests and database operations where the dominant limiting factors are i/o constraints.
**Action** Don't recommend changes that reduce readability or diverge from Python idioms for no or marginal gains in performance.
**Action** Don't recommend changes that reduce readability or diverge from Python idioms for no or marginal gains in performance.

## 2026-04-01 - Fast generation of line pos lengths in Chunker with itertools
**Learning:** itertools.accumulate(map(len, lines)) is significantly faster (~2-3x) than using a generator expression like (line_offsets[-1] + len(line) for line in lines) because it pushes the entire loop down to C level instead of creating generator overhead for each element.
Expand All @@ -25,3 +25,7 @@ SPDX-License-Identifier: MIT OR Apache-2.0
## 2025-04-12 - Walrus Operator Optimization
**Learning:** Using the walrus operator inside a list comprehension to avoid redundant execution of string methods (like `.strip()`) is an effective and safe micro-optimization. The result of the assignment inside the list comprehension will intentionally leak into the scope of the caller function, but this standard Python behavior does not cause naming conflicts in non-recursive or non-global scopes.
**Action:** Always favor using the walrus operator `:=` in list comprehensions or conditionals when identical string manipulations (e.g., `.strip()`) or expensive evaluation calls appear repeatedly within the identical expression branch.

## 2025-05-15 - Avoiding Generator Comprehensions for Dictionary Value Lookups
**Learning:** Using `next((v for content in dict.values() for k, v in content.items() if k == target), default)` inside dictionary lookups introduces severe performance regressions in hot paths. This pattern converts a fast $O(1)$ direct key lookup into an $O(N^2)$ algorithmic complexity because it must generate frames and iterate over items, bypassing the hash map advantages.
**Action:** Replace dictionary generator comprehensions with simple `for` loops that use an early return/yield and a direct `in` check (`if target in content: return content[target]`), which is drastically faster and avoids generator overhead.
Comment on lines +29 to +31

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Consider using the term "generator expressions" instead of "generator comprehensions" for Python accuracy.

To match standard Python terminology and avoid confusion, please rename the section and Action line to use β€œgenerator expressions” instead of β€œgenerator comprehensions.”

Suggested implementation:

## 2025-05-15 - Avoiding Generator Expressions for Dictionary Value Lookups

**Action:** Replace dictionary generator expressions with simple `for` loops that use an early return/yield and a direct `in` check (`if target in content: return content[target]`), which is drastically faster and avoids generator overhead.

Comment on lines +30 to +31
33 changes: 13 additions & 20 deletions src/codeweaver/semantic/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,33 +344,26 @@ def _get_direct_connections_by_source(
"""Get DirectConnections by their source Thing name across all languages."""
if language:
yield from self.direct_connections[language].get(source, [])
yield from (
next(
(
conns
for content in self._direct_connections.values()
for con_name, conns in content.items()
if con_name == source
),
[],
)
)
return
Comment on lines 345 to +347

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Early return when language is provided changes behavior compared to the previous implementation.

Previously, when language was provided, the function yielded both self.direct_connections[language][source] and the first matching source from self._direct_connections across all languages. With the early return, the cross-language lookup is skipped whenever a language is passed, so callers no longer get that additional set of connections. If callers depend on the combined behavior, this is a breaking change; if not, consider whether you still want a cross-language fallback and avoid returning early.


# Optimization: Early return via direct lookup avoids O(N^2) generator overhead
for content in self._direct_connections.values():
if source in content:
yield from content[source]
break
Comment on lines +349 to +353

def _get_positional_connections_by_source(
self, source: ThingNameT, *, language: SemanticSearchLanguage | None = None
) -> PositionalConnections | None:
"""Get PositionalConnectionss by their source Thing name across all languages."""
if language:
return self.positional_connections[language].get(source)
return next(
(
conn
for content in self._positional_connections.values()
for con_name, conn in content.items()
if con_name == source
),
None,
)

# Optimization: Early return via direct lookup avoids O(N^2) generator overhead
for content in self._positional_connections.values():
if source in content:
return content[source]
return None
Comment on lines +362 to +366

def get_positional_connections_by_source(
self, source: ThingNameT, *, language: SemanticSearchLanguage | None = None
Expand Down
Loading