Skip to content

Update floatreftarget.lua#14620

Open
cparks-ucla wants to merge 2 commits into
quarto-dev:mainfrom
cparks-ucla:fix-longtable-stackoverflow
Open

Update floatreftarget.lua#14620
cparks-ucla wants to merge 2 commits into
quarto-dev:mainfrom
cparks-ucla:fix-longtable-stackoverflow

Conversation

@cparks-ucla

Copy link
Copy Markdown

Description

Working around using ut8.codepoint in split_longtable_start since that was causing a stack overflow with rather large tables.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes
  • updated the appropriate changelog in the PR
  • ensured the present test suite passes
  • added new tests
  • created a separate documentation PR in Quarto's website repo and linked it to this PR

Working around using ut8.codepoint since that was causing a stack overflow with rather large tables
@posit-snyk-bot

posit-snyk-bot commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@cscheid

cscheid commented Jun 24, 2026

Copy link
Copy Markdown
Member

Thanks for the PR, but we're going to need more information than that to be able to accept it, especially if it's in a performance-sensitive setting.

@cparks-ucla

Copy link
Copy Markdown
Author

When I was rendering a document with some large tables I ran into this error:

Error running filter [sic]/Positron/resources/app/quarto/share/filters/main.lua:
...ams/Positron/resources/app/quarto/share/filters/main.lua:23309: stack overflow (string slice too long)

Enumerating over the codepoints with utf8.codes instead of passing the whole big string to ut8.codepoint seemed like a reasonable workaround. But, I couldn't say if that is worse for performance.

@cscheid

cscheid commented Jun 24, 2026

Copy link
Copy Markdown
Member

How big is that table?

@cparks-ucla

Copy link
Copy Markdown
Author

177 x 32. It isn't really readable for other reasons, just didn't expect it to crash the renderer.

@cderv

cderv commented Jun 25, 2026

Copy link
Copy Markdown
Member

I had a look with claude to this problem, as I am no Lua performance expert. Here is the findings.

The direction seems right: utf8.codepoint(s, 1, #s) returns one value per codepoint as varargs, and Lua 5.4's utf8_codepoint calls luaL_checkstack sized to the whole byte range of the slice before pushing — so it blows the C stack once the string nears ~1 MB. utf8.codes() iterates instead and pushes nothing, so it sidesteps the limit entirely.

I can reproduce it. The trigger is the byte length of the slice, not the codepoint count — it overflows around #content_str ≈ 1,000,000 bytes (so a longtable with a lot
of rows):

Reproduction (run with pandoc lua — same Lua 5.4 + utf8 lib Quarto's filters use)
> pandoc lua
Lua 5.4.8  Copyright (C) 1994-2025 Lua.org, PUC-Rio
Embedded in pandoc 3.10  Copyright (C) 2006-2024 John MacFarlane
>   function old(s) return table.pack(utf8.codepoint(s, 1, #s)) end
>   function new(s) local cp={}; for _,c in utf8.codes(s) do cp[#cp+1]=c end; return cp end
>   do for _,n in ipairs({500000,1000000}) do local s=string.rep("a",n); print(string.format("n=%-8d old ok=%s",n,tost> ring(pcall(old,s)))) end end
n=500000   old ok=true
n=1000000  old ok=false
>   print("new at 2,000,000:", pcall(new, string.rep("a", 2000000)))
new at 2,000,000:       true    table: 000002af8b052ce0
>

Binary search puts the old threshold at ~999,000–1,000,000 ASCII codepoints. A multibyte string confirms the bound is on bytes, not codepoints: 994,023 two-byte chars
(1.99 MB) already overflows even though that's fewer codepoints than the ASCII limit. That matches luaL_checkstack(L, (pose - posi) + 1, ...) reserving for the byte
range.

One suggestion for the loop:

since this path now deliberately handles very large strings, a counter with direct index assignment is meaningfully faster than table.inserttable.insert recomputes the append position on every call. Measured over 20 passes of a 1M-codepoint string:

$ pandoc lua
Lua 5.4.8  Copyright (C) 1994-2025 Lua.org, PUC-Rio
Embedded in pandoc 3.10  Copyright (C) 2006-2024 John MacFarlane
>   function with_insert(s) local cp={}; for _,c in utf8.codes(s) do table.insert(cp,c) end; return cp end
>   function with_index(s) local cp={}; local n=0; for _,c in utf8.codes(s) do n=n+1; cp[n]=c end; return cp end
>   function bench(fn,s,iters) local t0=os.clock(); for _=1,iters do fn(s) end; return os.clock()-t0 end
>   do
>>     local big=string.rep("a",1000000)
>>     with_insert(big); with_index(big)            -- warmup
>>     local ti=bench(with_insert,big,20)
>>     local td=bench(with_index,big,20)
>>     print(string.format("table.insert : %.3fs", ti))
>>     print(string.format("direct index : %.3fs", td))
>>     print(string.format("index %.1f%% faster", (ti-td)/ti*100))
>>   end
table.insert : 1.075s
direct index : 0.587s
index 45.4% faster

We could try add a regression test on our side — the function is currently local to the filter, so it needs a bit of plumbing to test in isolation. Using modules ?

However, having an example with a table that shows the problem could be interesting. If you have more to share, that would be awesome !

@cderv

cderv commented Jun 25, 2026

Copy link
Copy Markdown
Member

Actually I managed to build a self-contained reproduction. A 177×32 table (matching your dimensions) with dense cells generates a ~1 MB longtable and crashes on main:

---
format: pdf
keep-tex: true
---

```{r}
#| label: tbl-big
#| echo: false
#| tbl-cap: "A large table"
pad <- strrep("lorem ipsum dolor sit amet consectetur ", 10)
df <- as.data.frame(matrix(
substr(paste0("cell-", pad), 1, 180), 177, 32
))
knitr::kable(df, format = "latex", longtable = TRUE)
```

On main this fails with the same error you saw:

...filters/customnodes/floatreftarget.lua:18: stack overflow (string slice too long)
        in upvalue 'split_longtable_start'

With this PR it renders cleanly (the generated .tex is ~1 MB). For what it's worth, the crash only hits tables emitted as raw LaTeX longtable since plain markdown tables go through a different path. That also gives us a basis for the regression test.

@cparks-ucla

Copy link
Copy Markdown
Author

I can update the PR to use index assignment to speed things up.

(I'm on Pacific time, sorry about the wait)

Speed things up using index assignment instead of table.insert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants