Skip to content

fix: calcite optimization adds LITERAL_AGG(true) #963

Open
mbwhite wants to merge 2 commits into
substrait-io:mainfrom
mbwhite:isthmus-literal-agg
Open

fix: calcite optimization adds LITERAL_AGG(true) #963
mbwhite wants to merge 2 commits into
substrait-io:mainfrom
mbwhite:isthmus-literal-agg

Conversation

@mbwhite

@mbwhite mbwhite commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

fix: calcite optimization adds literal_agg

I was testing running various Calcite optimisations; TPC/H 16 specifically was optimised in such a way that it wasn't possible for the Isthmus SubstraitRelVisitor to handle it.

The issue was that calcite had added LITERAL_AGG(true).


In Apache Calcite, LITERAL_AGG(true) is an internal aggregate function injected during the decorrelation process (specifically when handling ANY, SOME, or IN subqueries).

Here is why it is there and what it does:

1. The Problem: Handling Empty Subqueries

In your "before" plan, there is a <> SOME(...) correlated subquery. By SQL standards, if a subquery inside a quantified comparison (SOME/ANY) returns zero rows, the entire condition evaluates to FALSE (or UNKNOWN), not NULL.

2. The Solution: Flagging Matches

When Calcite transforms the correlated subquery into a join (LogicalCorrelate / LogicalAggregate), it needs a foolproof way to know if the subquery actually produced any rows for a given correlated key ($cor0.L_PARTKEY).

  • LITERAL_AGG(true) evaluates the literal value true for every row entering the aggregate.
  • Because it is an aggregate function, if the subquery returns zero rows, LITERAL_AGG(true) will return NULL.
  • If the subquery returns one or more rows, it returns true.

3. How Calcite Uses It

Look closely at the massive LogicalFilter(condition=[OR(...)] right above the correlate in your "after" plan.

Calcite uses the output of LITERAL_AGG(true) (which maps to column $19 or $20 in that flattened row) to evaluate the exact short-circuiting logic of the SOME operator:

If it's NULL: The subquery was empty $\rightarrow$ handle as false/null.
If it's TRUE: The subquery returned data $\rightarrow$ proceed to check the actual value comparisons (like COUNT and MAX).

It essentially acts as a highly optimized, null-aware boolean flag for row existence during complex subquery unnesting.


🤖 built with assistance from AI

Signed-off-by: matthew brian white <whitemat@uk.ibm.com>
@github-actions

Copy link
Copy Markdown

ACTION NEEDED

Substrait follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

Signed-off-by: matthew brian white <whitemat@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant