Self Checks
RAGFlow workspace code commit ID
none
RAGFlow image version
v0.25.6
Other environment information
Actual behavior
2026-06-17 09:49:42,720 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are a robust Table-of-Contents (TOC) extractor.\n\nGOAL\nGiven a dictionary of chunks
{"<chunk_ID>": chunk_text}, extract TOC-like headings and return a strict JSON array of objects:\n[\n {"title":
"", "chunk_id": ""},\n ...\n]\n\nFIELDS\n- "title": the heading text (clean, no page numbers or leader dots).\n
- If any part of a chunk has no valid heading, output that part as {"title":"-1", ...}.\n- "chunk_id": the chunk
ID (string).\n - One chunk can yield multiple JSON objects in order (unmatched text + one or more
headings).\n\nRULES\n1) Preserve input chunk order strictly.\n2) If a chunk contains multiple headings, expand them in
order:\n - Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}\n - Then each heading →
{"title":"...","chunk_id":"<chunk_ID>"}\n3) Do not merge outputs across chunks; each object refers to exactly one
chunk ID.\n4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).\n5) When
ambiguous, prefer "-1" unless the text strongly looks like a heading.\n\nHEADING DETECTION (cues, not hard rules)\n-
Appears near line start, short isolated phrase, often followed by content.\n- May contain separators: — —— - : : · •\n-
Numbering styles:\n • 第[一二三四五六七八九十百]+(篇|章|节|条)\n • [((]?[一二三四五六七八九十]+[))]?\n •
[((]?[①②③④⑤⑥⑦⑧⑨⑩][))]?\n • ^\d+(\.\d+)[)..]?\s\n • ^[IVXLCDM]+[).]\n • ^[A-Z][).]\n- Canonical section cues
(general only):\n Common heading indicators include words such as:\n "Overview", "Introduction", "Background",
"Purpose", "Scope", "Definition",\n "Method", "Procedure", "Result", "Discussion", "Summary",
"Conclusion",\n "Appendix", "Reference", "Annex", "Acknowledgment", "Disclaimer".\n These are soft cues,
not strict requirements.\n- Length restriction:\n • Chinese heading: ≤25 characters\n • English heading: ≤80
characters\n- Exclude long narrative sentences, continuous prose, or bullet-style lists → output as "-1".\n\nOUTPUT
FORMAT\n- Return ONLY a valid JSON array of {"title","content"} objects.\n- No reasoning or
commentary.\n\nEXAMPLES\n\nExample 1 — No heading\nInput:\n[{"0": "Copyright page · Publication info (ISBN 123-456).
All rights reserved."}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"0"},\n ...\n]\n\nExample 2 — One
heading\nInput:\n[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]\nOutput:\n[\n
{"title":"Chapter 1: General Provisions","chunk_id":"1"},\n ...\n]\n\nExample 3 — Narrative +
heading\nInput:\n[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are
explained…"}, ...]\nOutput:\n[\n {"title":"Section 2: Definitions","chunk_id":"2"},\n ...\n]\n\nExample 4 —
Multiple headings in one chunk\nInput:\n[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C
commits… Appendix A Data Specification"}, ...]\nOutput:\n[\n {"title":"Declarations and
Commitments","chunk_id":"3"},\n {"title":"(I) Party B commits","chunk_id":"3"},\n {"title":"(II) Party
C commits","chunk_id":"3"},\n {"title":"Appendix A Data Specification","chunk_id":"3"},\n
...\n]\n\nExample 5 — Numbering styles\nInput:\n[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III)
Methods Overview."}, ...]\nOutput:\n[\n {"title":"1. Scope","chunk_id":"4"},\n {"title":"2)
Definitions","chunk_id":"4"},\n {"title":"III) Methods Overview","chunk_id":"4"},\n ...\n]\n\nExample 6 —
Long list (NOT headings)\nInput:\n{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"},
...]\nOutput:\n[\n {"title":"-1","chunk_id":"5"},\n ...\n]\n\nExample 7 — Mixed
Chinese/English\nInput:\n{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要…
第2节:术语与缩略语"}, ...]\nOutput:\n[\n {"title":"Chapter 1: Overview","chunk_id":"6"},\n
{"title":"第2节:术语与缩略语","chunk_id":"6"},\n ...\n]"
},
{
"role": "user",
"content": "OUTPUT FORMAT\n- Return ONLY the JSON array.\n- Use double quotes.\n- No extra commentary.\n- Keep
language of "title" the same as the input.\n\nINPUT\n{"1": "-11收入12650.9013.001435.97销售部178720.85\nCW-20260422
026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026
-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15\nFAILED TO
PARSE TABLE\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\n补充说明:1.表格共45⾏数据,刚好填满两⻚A4 PDF;2.所有数值
均为财务常⽤合理数值,间隔清晰,可直接编辑修改;3.适配所有主流PDF编辑软件,下载后即可编辑。\n(注:⽂档部分内容可能由 AI
⽣成)"}"
}
Expected behavior
No response
Steps to reproduce
1.According to the configuration file in Picture One
2.Parse the file and return the success log
09:49:28 Task has been received.
09:49:28 Page(1~5): Start to parse.
09:49:28 Page(1~5): [MinerU] Received binary PDF -> /tmp/mineru_bin_pdf_bq2rmc1a/测试的可编辑表格6.pdf
09:49:29 Page(1~5): [MinerU] Output directory: /tmp/111/out
09:49:30 Page(1~5): [MinerU] invoke api: http://101.36.73.92:8091/file_parse
09:49:37 Page(1~5): [MinerU] zip file returned, saving to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip...
09:49:37 Page(1~5): [MinerU] Unzip to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7...
09:49:37 Page(1~5): [MinerU] Parsed 8 blocks from PDF.
09:49:37 Page(1~5): Finish parsing.
09:49:39 Page(1~5): Start to generate keywords for every chunk ...
09:49:40 Page(1~5): Keywords generation 2 chunks completed in 1.46s
09:49:40 Page(1~5): Generate 2 chunks
09:49:41 Page(1~5): Embedding chunks (0.89s)
09:49:42 Page(1~5): Start to generate table of content ...
09:49:43 Page(1~5): Indexing done (0.91s).
09:49:47 Page(1~5): Task done (19.90s)
3.The analysis was successful, but no tabular form was generated, as shown in Figure 2.
Additional information
Picture 1

Picture 2
Overall log
2026-06-17 09:49:28,700 INFO 33 Parsed MinerU config (sensitive fields redacted): {'llm_name': 'MinerU', 'mineru_apiserver': 'http://101.36.73.92:8091', 'mineru_output_dir': '/tmp/111/out', 'mineru_backend': 'pipeline', 'mineru_delete_output': '0'}
2026-06-17 09:49:28,738 INFO 33 [MinerU] API openapi.json reachable=True url=http://101.36.73.92:8091/openapi.json
2026-06-17 09:49:28,740 INFO 33 [MinerU] Received binary PDF -> /tmp/mineru_bin_pdf_bq2rmc1a/测试的可编辑表格6.pdf
2026-06-17 09:49:29,666 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.15, progress_msg: 09:49:28 Page(15): [MinerU] Received binary PDF -> /tmp/mineru_bin_pdf_bq2rmc1a/测试的可编辑表格6.pdf
2026-06-17 09:49:29,666 INFO 33 [MinerU] Output directory: /tmp/111/out backend=pipeline api=http://101.36.73.92:8091 server_url=
2026-06-17 09:49:30,015 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.15, progress_msg: 09:49:29 Page(15): [MinerU] Output directory: /tmp/111/out
2026-06-17 09:49:30,120 INFO 33 [MinerU] request data={'output_dir': './output', 'lang_list': <MinerULanguage.CH: 'ch'>, 'backend': <MinerUBackend.PIPELINE: 'pipeline'>, 'parse_method': <MinerUParseMethod.AUTO: 'auto'>, 'formula_enable': False, 'table_enable': True, 'server_url': None, 'return_md': True, 'return_middle_json': True, 'return_model_output': True, 'return_content_list': True, 'return_images': True, 'response_format_zip': True, 'start_page_id': 0, 'end_page_id': 99999}
2026-06-17 09:49:30,120 INFO 33 [MinerU] request options=MinerUParseOptions(backend=<MinerUBackend.PIPELINE: 'pipeline'>, lang=<MinerULanguage.CH: 'ch'>, method=<MinerUParseMethod.AUTO: 'auto'>, server_url='', delete_output=False, parse_method='raw', formula_enable=False, table_enable=True)
2026-06-17 09:49:30,120 INFO 33 [MinerU] invoke api: http://101.36.73.92:8091/file_parse backend=pipeline server_url=None
2026-06-17 09:49:30,299 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.2, progress_msg: 09:49:30 Page(15): [MinerU] invoke api: http://101.36.73.92:8091/file_parse
2026-06-17 09:49:31,245 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:31,252 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.004s]
[2026-06-17 09:49:31 +0800] [28] [INFO] 127.0.0.1:40506 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 26772 22363
2026-06-17 09:49:31,269 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:31,275 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:31 +0800] [28] [INFO] 127.0.0.1:40516 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18364
[2026-06-17 09:49:31 +0800] [28] [INFO] 127.0.0.1:40522 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 10523
2026-06-17 09:49:36,280 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:36,286 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.004s]
[2026-06-17 09:49:36 +0800] [28] [INFO] 127.0.0.1:40536 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27047 22392
2026-06-17 09:49:36,304 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:36,309 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:36 +0800] [28] [INFO] 127.0.0.1:40542 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18041
[2026-06-17 09:49:36 +0800] [28] [INFO] 127.0.0.1:40550 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 9901
2026-06-17 09:49:37,144 INFO 33 [MinerU] zip file returned, saving to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip...
2026-06-17 09:49:37,432 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.3, progress_msg: 09:49:37 Page(15): [MinerU] zip file returned, saving to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip...
2026-06-17 09:49:37,534 INFO 33 [MinerU] Unzip to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7...
2026-06-17 09:49:37,534 INFO 33 [MinerU] Extract zip: zip_path=/tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip, extract_to=/tmp/111/out/测试的可编辑表格6_auto_10nvgyv7, root_hint=测试的可编辑表格6/
2026-06-17 09:49:37,721 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.4, progress_msg: 09:49:37 Page(15): [MinerU] Unzip to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7...
2026-06-17 09:49:37,721 INFO 33 [MinerU] Api completed successfully.
2026-06-17 09:49:37,722 INFO 33 [MinerU] Expected output files: 测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Searching output in: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying original path: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying sanitized filename: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying sanitized nested path: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/测试的可编辑表格6/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying vlm subdirectory: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/vlm/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying vlm subdirectory with sanitized name: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/vlm/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying parse-method path: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/auto/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,723 INFO 33 [MinerU] Parsed 8 blocks from PDF.
2026-06-17 09:49:37,927 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.75, progress_msg: 09:49:37 Page(15): [MinerU] Parsed 8 blocks from PDF.
2026-06-17 09:49:39,110 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.8, progress_msg: 09:49:37 Page(15): Finish parsing.
2026-06-17 09:49:39,143 INFO 33 naive_merge(测试的可编辑表格6.pdf): 0.032985937999910675
2026-06-17 09:49:39,143 INFO 33 Chunking(10.850661832002515) 测试的可编辑表格6.pdf/测试的可编辑表格6.pdf done
2026-06-17 09:49:39,329 INFO 33 MINIO PUT(测试的可编辑表格6.pdf) cost 0.186 s
2026-06-17 09:49:39,499 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:39 Page(15): Start to generate keywords for every chunk ...
2026-06-17 09:49:39,509 INFO 33 [HISTORY][
{
"role": "system",
"content": "## Role\nYou are a text analyzer.\n\n## Task\nExtract the most important keywords/phrases of a given piece of text content.\n\n## Requirements\n- Summarize the text content, and give the top 3 important keywords/phrases.\n- The keywords MUST be in the same language as the given piece of text content.\n- The keywords are delimited by ENGLISH COMMA.\n- Output keywords ONLY.\n\n---\n\n## Text Content\n\n财务收⽀明细总表 (可编辑)\n说明:本表格为可编辑PDF格式,可直接在WPS、Adobe Acrobat等软件中修改单元格内容,数据涵盖两⻚A4篇幅,均为真实财务类数值,字段清晰、数据间隔合理。\n凭证编号记账日期收支类型业务金额(元)增值税率(%)税额 (元)归属部门累计余额(元)\nCw-20260012026-01-01收入12860.5013.001452.32销售部12860.50\ncw-20260022026-01-02支出3250.759.00267.37行政部9609.75\ncW-20260032026-01-03收入8950.0013.001053.98销售部18559.75\nCW-20260042026-01-04支出1890.206.00105.83财务部16669.55\nCW-20260052026-01-05收入15680.3013.001767.47销售部32349.85\nCW-20260062026-01-06支出4520.809.00371.51采购部27829.05\nCW-20260072026-01-07收入7890.606.00442.50市场部35719.65\ncW-20260082026-01-08支出2150.306.00117.28行政部33569.35\ncW-20260092026-01-09收入11250.8013.001285.42销售部44820.15\ncW-20260102026-01-10支出3860.509.00313.88采购部40959.65\nCW-20260112026-01-11收入9630.206.00536.46市场部50589.85\nCW-20260122026-01-12支出1580.706.0086.48财务部49009.15\n\nCW-20260132026-01-13收入14520.7013.001653.76销售部63529.85\nCW-20260142026-01-14支出5280.309.00431.72采购部58249.55\nCW-20260152026-01-15收入8250.506.00451.69市场部66500.05\nCW-20260162026-01-16支出2760.806.00151.38行政部63739.25\nCW-20260172026-01-17收入13680.9013.001572.76销售部77420.15\nCW-20260182026-01-18支出4120.609.00337.59采购部73299.55\nCW-20260192026-01-19收入10560.406.00581.31市场部83859.95\nCW-20260202026-01-20支出1950.406.00106.22财务部81909.55\nCW-20260212026-01-21收入12350.7013.001406.83销售部94260.25\nCW-20260222026-01-22支出3580.909.00289.88行政部90679.35\nCW-20260232026-01-23收入9870.306.00548.19市场部100549.65\nCW-20260242026-01-24支出2360.506.00128.37财务部98189.15\nCW-20260252026-01-25收入15280.6013.001727.54销售部113469.75\nCW-20260262026-01-26支出4890.709.00397.20采购部108579.05\nCW-20260272026-01-27收入8650.806.00470.15市场部117229.85\nCW-20260282026-01-28支出2580.306.00140.54行政部114649.55\n收入11980.5013.001367.70销售部126630.05\nCW-20260292026-01-29\nCW-20260302026-01-30支出3960.609.00321.35采购部122669.45\nCW-20260312026-02-01收入10250.706.00564.26市场部132920.15\nCW-20260322026-02-02支出2120.406.00115.47财务部130799.75\nCW-20260332026-02-03收入14860.9013.001683.16销售部145660.65\nCW-20260342026-02-04支出4350.809.00352.87采购部141309.85\nCW-20260352026-02-05收入9320.506.00508.35市场部150630.35\nCW-20260362026-02-06支出2890.606.00157.89行政部147739.75\nCW-20260372026-02-07收入13590.8013.001546.82销售部161330.55\nCW-20260382026-02-08支出3680.709.00297.96行政部157649.85\nCW-20260392026-02-09收入10870.606.00597.40市场部168520.45\nCW-20260402026-02-10支出2450.506.00132.33财务部166069.95\nCW-20260412026-02-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15"
},
{
"role": "user",
"content": "Output: "
}
]
09:49:39 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:39,533 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:39,534 INFO 33 [HISTORY][
{
"role": "system",
"content": "## Role\nYou are a text analyzer.\n\n## Task\nExtract the most important keywords/phrases of a given piece of text content.\n\n## Requirements\n- Summarize the text content, and give the top 3 important keywords/phrases.\n- The keywords MUST be in the same language as the given piece of text content.\n- The keywords are delimited by ENGLISH COMMA.\n- Output keywords ONLY.\n\n---\n\n## Text Content\n-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\n补充说明:1.表格共45⾏数据,刚好填满两⻚A4 PDF;2.所有数值均为财务常⽤合理数值,间隔清晰,可直接编辑修改;3.适配所有主流PDF编辑软件,下载后即可编辑。\n(注:⽂档部分内容可能由 AI ⽣成)"
},
{
"role": "user",
"content": "Output: "
}
]
09:49:39 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:39,535 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:40,929 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:40 Page(15): Keywords generation 2 chunks completed in 1.46s
2026-06-17 09:49:40,929 INFO 33 Build document 测试的可编辑表格6.pdf: 12.64s
2026-06-17 09:49:41,099 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:40 Page(15): Generate 2 chunks
2026-06-17 09:49:41,316 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:41,322 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:41 +0800] [28] [INFO] 127.0.0.1:47866 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27539 21572
2026-06-17 09:49:41,341 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:41,346 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:41 +0800] [28] [INFO] 127.0.0.1:47878 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18827
[2026-06-17 09:49:41 +0800] [28] [INFO] 127.0.0.1:47880 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 11349
2026-06-17 09:49:41,985 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.7999999999999999, progress_msg:
2026-06-17 09:49:41,985 INFO 33 Embedding chunks (0.89s)
2026-06-17 09:49:42,106 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:41 Page(15): Embedding chunks (0.89s)
2026-06-17 09:49:42,261 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:42 Page(15): Start to generate table of content ...
2026-06-17 09:49:42,631 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg:
2026-06-17 09:49:42,636 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are a robust Table-of-Contents (TOC) extractor.\n\nGOAL\nGiven a dictionary of chunks {"<chunk_ID>": chunk_text}, extract TOC-like headings and return a strict JSON array of objects:\n[\n {"title": "", "chunk_id": ""},\n ...\n]\n\nFIELDS\n- "title": the heading text (clean, no page numbers or leader dots).\n - If any part of a chunk has no valid heading, output that part as {"title":"-1", ...}.\n- "chunk_id": the chunk ID (string).\n - One chunk can yield multiple JSON objects in order (unmatched text + one or more headings).\n\nRULES\n1) Preserve input chunk order strictly.\n2) If a chunk contains multiple headings, expand them in order:\n - Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}\n - Then each heading → {"title":"...","chunk_id":"<chunk_ID>"}\n3) Do not merge outputs across chunks; each object refers to exactly one chunk ID.\n4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).\n5) When ambiguous, prefer "-1" unless the text strongly looks like a heading.\n\nHEADING DETECTION (cues, not hard rules)\n- Appears near line start, short isolated phrase, often followed by content.\n- May contain separators: — —— - : : · •\n- Numbering styles:\n • 第[一二三四五六七八九十百]+(篇|章|节|条)\n • [((]?[一二三四五六七八九十]+[))]?\n • [((]?[①②③④⑤⑥⑦⑧⑨⑩][))]?\n • ^\d+(\.\d+)[)..]?\s\n • ^[IVXLCDM]+[).]\n • ^[A-Z][).]\n- Canonical section cues (general only):\n Common heading indicators include words such as:\n "Overview", "Introduction", "Background", "Purpose", "Scope", "Definition",\n "Method", "Procedure", "Result", "Discussion", "Summary", "Conclusion",\n "Appendix", "Reference", "Annex", "Acknowledgment", "Disclaimer".\n These are soft cues, not strict requirements.\n- Length restriction:\n • Chinese heading: ≤25 characters\n • English heading: ≤80 characters\n- Exclude long narrative sentences, continuous prose, or bullet-style lists → output as "-1".\n\nOUTPUT FORMAT\n- Return ONLY a valid JSON array of {"title","content"} objects.\n- No reasoning or commentary.\n\nEXAMPLES\n\nExample 1 — No heading\nInput:\n[{"0": "Copyright page · Publication info (ISBN 123-456). All rights reserved."}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"0"},\n ...\n]\n\nExample 2 — One heading\nInput:\n[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]\nOutput:\n[\n {"title":"Chapter 1: General Provisions","chunk_id":"1"},\n ...\n]\n\nExample 3 — Narrative + heading\nInput:\n[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are explained…"}, ...]\nOutput:\n[\n {"title":"Section 2: Definitions","chunk_id":"2"},\n ...\n]\n\nExample 4 — Multiple headings in one chunk\nInput:\n[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C commits… Appendix A Data Specification"}, ...]\nOutput:\n[\n {"title":"Declarations and Commitments","chunk_id":"3"},\n {"title":"(I) Party B commits","chunk_id":"3"},\n {"title":"(II) Party C commits","chunk_id":"3"},\n {"title":"Appendix A Data Specification","chunk_id":"3"},\n ...\n]\n\nExample 5 — Numbering styles\nInput:\n[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III) Methods Overview."}, ...]\nOutput:\n[\n {"title":"1. Scope","chunk_id":"4"},\n {"title":"2) Definitions","chunk_id":"4"},\n {"title":"III) Methods Overview","chunk_id":"4"},\n ...\n]\n\nExample 6 — Long list (NOT headings)\nInput:\n{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"5"},\n ...\n]\n\nExample 7 — Mixed Chinese/English\nInput:\n{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要… 第2节:术语与缩略语"}, ...]\nOutput:\n[\n {"title":"Chapter 1: Overview","chunk_id":"6"},\n {"title":"第2节:术语与缩略语","chunk_id":"6"},\n ...\n]"
},
{
"role": "user",
"content": "OUTPUT FORMAT\n- Return ONLY the JSON array.\n- Use double quotes.\n- No extra commentary.\n- Keep language of "title" the same as the input.\n\nINPUT\n{"0": "\n财务收⽀明细总表 (可编辑)\n说明:本表格为可编辑PDF格式,可直接在WPS、Adobe Acrobat等软件中修改单元格内容,数据涵盖两⻚A4篇幅,均为真实财务类数值,字段清晰、数据间隔合理。\n凭证编号记账日期收支类型业务金额(元)增值税率(%)税额 (元)归属部门累计余额(元)\nCw-20260012026-01-01收入12860.5013.001452.32销售部12860.50\ncw-20260022026-01-02支出3250.759.00267.37行政部9609.75\ncW-20260032026-01-03收入8950.0013.001053.98销售部18559.75\nCW-20260042026-01-04支出1890.206.00105.83财务部16669.55\nCW-20260052026-01-05收入15680.3013.001767.47销售部32349.85\nCW-20260062026-01-06支出4520.809.00371.51采购部27829.05\nCW-20260072026-01-07收入7890.606.00442.50市场部35719.65\ncW-20260082026-01-08支出2150.306.00117.28行政部33569.35\ncW-20260092026-01-09收入11250.8013.001285.42销售部44820.15\ncW-20260102026-01-10支出3860.509.00313.88采购部40959.65\nCW-20260112026-01-11收入9630.206.00536.46市场部50589.85\nCW-20260122026-01-12支出1580.706.0086.48财务部49009.15\n\nCW-20260132026-01-13收入14520.7013.001653.76销售部63529.85\nCW-20260142026-01-14支出5280.309.00431.72采购部58249.55\nCW-20260152026-01-15收入8250.506.00451.69市场部66500.05\nCW-20260162026-01-16支出2760.806.00151.38行政部63739.25\nCW-20260172026-01-17收入13680.9013.001572.76销售部77420.15\nCW-20260182026-01-18支出4120.609.00337.59采购部73299.55\nCW-20260192026-01-19收入10560.406.00581.31市场部83859.95\nCW-20260202026-01-20支出1950.406.00106.22财务部81909.55\nCW-20260212026-01-21收入12350.7013.001406.83销售部94260.25\nCW-20260222026-01-22支出3580.909.00289.88行政部90679.35\nCW-20260232026-01-23收入9870.306.00548.19市场部100549.65\nCW-20260242026-01-24支出2360.506.00128.37财务部98189.15\nCW-20260252026-01-25收入15280.6013.001727.54销售部113469.75\nCW-20260262026-01-26支出4890.709.00397.20采购部108579.05\nCW-20260272026-01-27收入8650.806.00470.15市场部117229.85\nCW-20260282026-01-28支出2580.306.00140.54行政部114649.55\n收入11980.5013.001367.70销售部126630.05\nCW-20260292026-01-29\nCW-20260302026-01-30支出3960.609.00321.35采购部122669.45\nCW-20260312026-02-01收入10250.706.00564.26市场部132920.15\nCW-20260322026-02-02支出2120.406.00115.47财务部130799.75\nCW-20260332026-02-03收入14860.9013.001683.16销售部145660.65\nCW-20260342026-02-04支出4350.809.00352.87采购部141309.85\nCW-20260352026-02-05收入9320.506.00508.35市场部150630.35\nCW-20260362026-02-06支出2890.606.00157.89行政部147739.75\nCW-20260372026-02-07收入13590.8013.001546.82销售部161330.55\nCW-20260382026-02-08支出3680.709.00297.96行政部157649.85\nCW-20260392026-02-09收入10870.606.00597.40市场部168520.45\nCW-20260402026-02-10支出2450.506.00132.33财务部166069.95\nCW-20260412026-02-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15"}"
}
]
09:49:42 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,637 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,651 INFO 33 PUT http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_bulk?refresh=wait_for&timeout=60s [status:200 duration:0.542s]
2026-06-17 09:49:42,716 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg:
2026-06-17 09:49:42,720 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are a robust Table-of-Contents (TOC) extractor.\n\nGOAL\nGiven a dictionary of chunks {"<chunk_ID>": chunk_text}, extract TOC-like headings and return a strict JSON array of objects:\n[\n {"title": "", "chunk_id": ""},\n ...\n]\n\nFIELDS\n- "title": the heading text (clean, no page numbers or leader dots).\n - If any part of a chunk has no valid heading, output that part as {"title":"-1", ...}.\n- "chunk_id": the chunk ID (string).\n - One chunk can yield multiple JSON objects in order (unmatched text + one or more headings).\n\nRULES\n1) Preserve input chunk order strictly.\n2) If a chunk contains multiple headings, expand them in order:\n - Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}\n - Then each heading → {"title":"...","chunk_id":"<chunk_ID>"}\n3) Do not merge outputs across chunks; each object refers to exactly one chunk ID.\n4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).\n5) When ambiguous, prefer "-1" unless the text strongly looks like a heading.\n\nHEADING DETECTION (cues, not hard rules)\n- Appears near line start, short isolated phrase, often followed by content.\n- May contain separators: — —— - : : · •\n- Numbering styles:\n • 第[一二三四五六七八九十百]+(篇|章|节|条)\n • [((]?[一二三四五六七八九十]+[))]?\n • [((]?[①②③④⑤⑥⑦⑧⑨⑩][))]?\n • ^\d+(\.\d+)[)..]?\s\n • ^[IVXLCDM]+[).]\n • ^[A-Z][).]\n- Canonical section cues (general only):\n Common heading indicators include words such as:\n "Overview", "Introduction", "Background", "Purpose", "Scope", "Definition",\n "Method", "Procedure", "Result", "Discussion", "Summary", "Conclusion",\n "Appendix", "Reference", "Annex", "Acknowledgment", "Disclaimer".\n These are soft cues, not strict requirements.\n- Length restriction:\n • Chinese heading: ≤25 characters\n • English heading: ≤80 characters\n- Exclude long narrative sentences, continuous prose, or bullet-style lists → output as "-1".\n\nOUTPUT FORMAT\n- Return ONLY a valid JSON array of {"title","content"} objects.\n- No reasoning or commentary.\n\nEXAMPLES\n\nExample 1 — No heading\nInput:\n[{"0": "Copyright page · Publication info (ISBN 123-456). All rights reserved."}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"0"},\n ...\n]\n\nExample 2 — One heading\nInput:\n[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]\nOutput:\n[\n {"title":"Chapter 1: General Provisions","chunk_id":"1"},\n ...\n]\n\nExample 3 — Narrative + heading\nInput:\n[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are explained…"}, ...]\nOutput:\n[\n {"title":"Section 2: Definitions","chunk_id":"2"},\n ...\n]\n\nExample 4 — Multiple headings in one chunk\nInput:\n[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C commits… Appendix A Data Specification"}, ...]\nOutput:\n[\n {"title":"Declarations and Commitments","chunk_id":"3"},\n {"title":"(I) Party B commits","chunk_id":"3"},\n {"title":"(II) Party C commits","chunk_id":"3"},\n {"title":"Appendix A Data Specification","chunk_id":"3"},\n ...\n]\n\nExample 5 — Numbering styles\nInput:\n[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III) Methods Overview."}, ...]\nOutput:\n[\n {"title":"1. Scope","chunk_id":"4"},\n {"title":"2) Definitions","chunk_id":"4"},\n {"title":"III) Methods Overview","chunk_id":"4"},\n ...\n]\n\nExample 6 — Long list (NOT headings)\nInput:\n{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"5"},\n ...\n]\n\nExample 7 — Mixed Chinese/English\nInput:\n{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要… 第2节:术语与缩略语"}, ...]\nOutput:\n[\n {"title":"Chapter 1: Overview","chunk_id":"6"},\n {"title":"第2节:术语与缩略语","chunk_id":"6"},\n ...\n]"
},
{
"role": "user",
"content": "OUTPUT FORMAT\n- Return ONLY the JSON array.\n- Use double quotes.\n- No extra commentary.\n- Keep language of "title" the same as the input.\n\nINPUT\n{"1": "-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\n补充说明:1.表格共45⾏数据,刚好填满两⻚A4 PDF;2.所有数值均为财务常⽤合理数值,间隔清晰,可直接编辑修改;3.适配所有主流PDF编辑软件,下载后即可编辑。\n(注:⽂档部分内容可能由 AI ⽣成)"}"
}
]
09:49:42 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,722 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,898 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.8500000000000001, progress_msg:
2026-06-17 09:49:42,960 INFO 33 Indexing doc(测试的可编辑表格6.pdf), page(0-4), chunks(2), elapsed: 0.85
2026-06-17 09:49:45,078 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:43 Page(1~5): Indexing done (0.91s).
2026-06-17 09:49:45,843 INFO 33
Filtered TOC sections:
[{'title': '财务收⽀明细总表 (可编辑)', 'chunk_id': '0'}]
2026-06-17 09:49:45,845 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are given a JSON array of TOC(table of contents) items. Each item has at least {"title": string} and may include an existing title hierarchical level.\n\nTask\n- For each item, assign a depth label using Arabic numerals only: top-level = 1, second-level = 2, third-level = 3, etc.\n- Multiple items may share the same depth (e.g., many 1s, many 2s).\n- Do not use dotted numbering (no 1.1/1.2). Use a single digit string per item indicating its depth only.\n- Preserve the original item order exactly. Do not insert, delete, or reorder.\n- Decide levels yourself to keep a coherent hierarchy. Keep peers at the same depth.\n\nOutput\n- Return a valid JSON array only (no extra text).\n- Each element must be {"level": "1|2|3", "title": }.\n- title must be the original title string.\n\nExamples\n\nExample A (chapters with sections)\nInput:\n["Chapter 1 Methods", "Section 1 Definition", "Section 2 Process", "Chapter 2 Experiment"]\n\nOutput:\n[\n {"level":"1","title":"Chapter 1 Methods"},\n {"level":"2","title":"Section 1 Definition"},\n {"level":"2","title":"Section 2 Process"},\n {"level":"1","title":"Chapter 2 Experiment"}\n]\n\nExample B (parts with chapters)\nInput:\n["Part I Theory", "Chapter 1 Basics", "Chapter 2 Methods", "Part II Applications", "Chapter 3 Case Studies"]\n\nOutput:\n[\n {"level":"1","title":"Part I Theory"},\n {"level":"2","title":"Chapter 1 Basics"},\n {"level":"2","title":"Chapter 2 Methods"},\n {"level":"1","title":"Part II Applications"},\n {"level":"2","title":"Chapter 3 Case Studies"}\n]\n\nExample C (plain headings)\nInput:\n["Introduction", "Background and Motivation", "Related Work", "Methodology", "Evaluation"]\n\nOutput:\n[\n {"level":"1","title":"Introduction"},\n {"level":"2","title":"Background and Motivation"},\n {"level":"2","title":"Related Work"},\n {"level":"1","title":"Methodology"},\n {"level":"1","title":"Evaluation"}\n]"
},
{
"role": "user",
"content": "['财务收⽀明细总表 (可编辑)']"
}
]
09:49:45 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:45,847 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:46,354 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:46,360 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:46 +0800] [28] [INFO] 127.0.0.1:47896 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27542 21360
2026-06-17 09:49:46,378 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:46,383 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:46 +0800] [28] [INFO] 127.0.0.1:47912 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18257
[2026-06-17 09:49:46 +0800] [28] [INFO] 127.0.0.1:47916 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 11233
2026-06-17 09:49:46,654 INFO 33 ------------ T O C -------------
[
{
"level": "1",
"title": "财务收⽀明细总表 (可编辑)",
"chunk_id": "0"
}
]
2026-06-17 09:49:47,661 INFO 33 PUT http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_bulk?refresh=wait_for&timeout=60s [status:200 duration:1.005s]
2026-06-17 09:49:47,804 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.9, progress_msg:
2026-06-17 09:49:48,204 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 1.0, progress_msg: 09:49:47 Page(1~5): Task done (19.90s)
2026-06-17 09:49:48,204 INFO 33 Chunk doc(测试的可编辑表格6.pdf), page(0-4), chunks(2), token(2654), elapsed:19.90
2026-06-17 09:49:48,206 INFO 33 handle_task done for task {"id": "c53b3c9a69ee11f1b8b85533fa24cb09", "doc_id": "c1941d8c69ee11f1b8b85533fa24cb09", "from_page": 0, "to_page": 4, "retry_count": 0, "kb_id": "7987ef80660d11f1a066912af183a8a6", "parser_id": "naive", "parser_config": {"table_context_size": 53, "image_context_size": 53, "layout_recognize": "MinerU@MinerU", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 3, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0, "scope": "file", "ext": {"clustering_method": "gmm", "tree_builder": "raptor"}}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "general", "batch_chunk_token_size": 4096, "retry_attempts": 2, "retry_backoff_seconds": 2.0, "retry_backoff_max_seconds": 60.0, "build_subgraph_timeout_per_chunk_seconds": 300, "build_subgraph_min_timeout_seconds": 600, "merge_timeout_seconds": 180, "resolution_timeout_seconds": 1800, "community_timeout_seconds": 1800, "lock_acquire_timeout_seconds": 600, "resolution": false}, "parent_child": {"use_parent_child": false, "children_delimiter": "\n"}, "children_delimiter": "", "llm_id": "qwen3-235b-a22b@Tongyi-Qianwen", "enable_children": false, "toc_extraction": true, "image_table_context_window": 53, "overlapped_percent": 0.1, "mineru_parse_method": "auto", "mineru_formula_enable": false, "mineru_table_enable": true, "mineru_lang": "Chinese", "metadata": {"type": "object", "properties": {}, "additionalProperties": false}, "built_in_metadata": [], "enable_metadata": false, "tenant_llm_id": 149}, "name": "\u6d4b\u8bd5\u7684\u53ef\u7f16\u8f91\u8868\u683c6.pdf", "type": "pdf", "location": "\u6d4b\u8bd5\u7684\u53ef\u7f16\u8f91\u8868\u683c6.pdf", "size": 202145, "tenant_id": "105f6de65ce711f182c1edd34d87cce0", "language": "Chinese", "embd_id": "text-embedding-v4@Tongyi-Qianwen", "pagerank": 0, "kb_parser_config": {"table_context_size": 53, "image_context_size": 53, "layout_recognize": "MinerU@MinerU", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 3, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0, "scope": "file", "ext": {"clustering_method": "gmm", "tree_builder": "raptor"}}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "general", "batch_chunk_token_size": 4096, "retry_attempts": 2, "retry_backoff_seconds": 2.0, "retry_backoff_max_seconds": 60.0, "build_subgraph_timeout_per_chunk_seconds": 300, "build_subgraph_min_timeout_seconds": 600, "merge_timeout_seconds": 180, "resolution_timeout_seconds": 1800, "community_timeout_seconds": 1800, "lock_acquire_timeout_seconds": 600, "resolution": false}, "parent_child": {"use_parent_child": false, "children_delimiter": "\n"}, "children_delimiter": "", "llm_id": "qwen3-235b-a22b@Tongyi-Qianwen", "enable_children": false, "toc_extraction": true, "image_table_context_window": 53, "overlapped_percent": 0.1, "mineru_parse_method": "auto", "mineru_formula_enable": false, "mineru_table_enable": true, "mineru_lang": "Chinese", "metadata": {"type": "object", "properties": {}, "additionalProperties": false}, "built_in_metadata": [], "enable_metadata": false, "tenant_llm_id": 149}, "img2txt_id": "qwen3.5-plus@Tongyi-Qianwen", "asr_id": "qwen3-asr-flash@Tongyi-Qianwen", "llm_id": "qwen3-235b-a22b@Tongyi-Qianwen", "update_time": 1781660965892, "task_type": ""}
2026-06-17 09:49:51,397 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.003s]
2026-06-17 09:49:51,405 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.004s]
[2026-06-17 09:49:51 +0800] [28] [INFO] 127.0.0.1:51920 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27768 24894
2026-06-17 09:49:51,424 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:51,429 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:51 +0800] [28] [INFO] 127.0.0.1:51932 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 18080
[2026-06-17 09:49:51 +0800] [28] [INFO] 127.0.0.1:51946 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 11365
2026-06-17 09:49:53,136 INFO 28 HEAD http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:53,144 INFO 28 POST http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.005s]
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51960 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents/c1941d8c69ee11f1b8b85533fa24cb09/chunks 1.1 200 10161 21493
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51966 GET /api/v1/documents/c1941d8c69ee11f1b8b85533fa24cb09/preview 1.1 200 202145 11730
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51982 GET /api/v1/documents/images/7987ef80660d11f1a066912af183a8a6-089da1545a7357fa 1.1 200 67116 3458
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51984 GET /api/v1/documents/images/7987ef80660d11f1a066912af183a8a6-7e7918716d33bf39 1.1 200 16313 5142
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51994 GET /api/v1/documents/c1941d8c69ee11f1b8b85533fa24cb09/preview 1.1 200 202145 11940
2026-06-17 09:49:56,545 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:49:56.544+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:50:26,578 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:50:26.576+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:50:56,586 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:50:56.584+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:51:26,593 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:51:26.591+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:51:56,598 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:51:56.596+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:52:26,606 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:52:26.605+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:52:56,612 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:52:56.611+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:53:26,619 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:53:26.617+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:53:46,579 INFO 28 HEAD http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:53:46,589 INFO 28 POST http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.007s]
2026-06-17 09:53:46,602 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:53:46,610 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.006s]
2026-06-17 09:53:46,624 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:53:46,631 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.005s]
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37486 GET /api/v1/users/me 1.1 200 521 89727
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37498 GET /api/v1/users/me/models 1.1 200 592 90156
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37516 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/graph 1.1 200 45 90480
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37506 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27768 90802
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37520 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 91128
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37530 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 91651
2026-06-17 09:53:46,657 WARNING 28 Database connection issue (attempt 1/5): (0, '')
2026-06-17 09:53:46,673 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:53:46,678 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37546 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 22205
[2026-06-17 09:53:47 +0800] [28] [INFO] 127.0.0.1:37544 GET /api/v1/tenants 1.1 200 241 1010452
[2026-06-17 09:53:51 +0800] [28] [INFO] 127.0.0.1:48994 GET /api/v1/datasets 1.1 200 26874 26067
[2026-06-17 09:53:51 +0800] [28] [INFO] 127.0.0.1:48998 GET /api/v1/datasets 1.1 200 26874 12789
[2026-06-17 09:53:51 +0800] [28] [INFO] 127.0.0.1:49000 GET /api/v1/users/me 1.1 200 521 5507
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49012 GET /api/v1/users/me 1.1 200 521 77409
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49020 GET /api/v1/users/me/models 1.1 200 592 77784
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49034 GET /api/v1/datasets 1.1 200 26874 78301
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49042 GET /v1/llm/list 1.1 200 19373 78466
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49006 GET /api/v1/chats 1.1 200 13787 92350
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49022 GET /api/v1/tenants 1.1 200 242 79870
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49050 GET /api/v1/users/me 1.1 200 521 5444
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49052 GET /api/v1/users/me 1.1 200 521 13080
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49066 GET /api/v1/users/me/models 1.1 200 592 15937
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49076 GET /api/v1/tenants 1.1 200 242 8379
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49090 GET /api/v1/system/version 1.1 200 48 1286
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49092 GET /api/v1/connectors 1.1 200 41 7631
2026-06-17 09:53:56,625 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:53:56.623+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:54:26,744 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:54:26.742+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:54:56,751 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:54:56.749+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
[2026-06-17 09:55:01 +0800] [28] [INFO] 127.0.0.1:45090 GET /api/v1/datasets 1.1 200 26874 25549
[2026-06-17 09:55:01 +0800] [28] [INFO] 127.0.0.1:45096 GET /api/v1/datasets 1.1 200 26874 25947
[2026-06-17 09:55:02 +0800] [28] [INFO] 127.0.0.1:45102 GET /api/v1/users/me 1.1 200 521 5576
2026-06-17 09:55:03,674 INFO 28 HEAD http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:55:03,679 INFO 28 POST http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
2026-06-17 09:55:03,893 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:55:03,898 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45106 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/graph 1.1 200 45 235751
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45120 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27768 236237
2026-06-17 09:55:03,921 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:55:03,926 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45130 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 26975
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45146 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 27291
2026-06-17 09:55:03,941 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:55:03,945 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45154 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 16495
Self Checks
RAGFlow workspace code commit ID
none
RAGFlow image version
v0.25.6
Other environment information
Actual behavior
ID (string).\n - One chunk can yield multiple JSON objects in order (unmatched text + one or more
headings).\n\nRULES\n1) Preserve input chunk order strictly.\n2) If a chunk contains multiple headings, expand them in
order:\n - Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}\n - Then each heading →
{"title":"...","chunk_id":"<chunk_ID>"}\n3) Do not merge outputs across chunks; each object refers to exactly one
chunk ID.\n4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).\n5) When
ambiguous, prefer "-1" unless the text strongly looks like a heading.\n\nHEADING DETECTION (cues, not hard rules)\n-
Appears near line start, short isolated phrase, often followed by content.\n- May contain separators: — —— - : : · •\n-
Numbering styles:\n • 第[一二三四五六七八九十百]+(篇|章|节|条)\n • [((]?[一二三四五六七八九十]+[))]?\n •
[((]?[①②③④⑤⑥⑦⑧⑨⑩][))]?\n • ^\d+(\.\d+)[)..]?\s\n • ^[IVXLCDM]+[).]\n • ^[A-Z][).]\n- Canonical section cues
(general only):\n Common heading indicators include words such as:\n "Overview", "Introduction", "Background",
"Purpose", "Scope", "Definition",\n "Method", "Procedure", "Result", "Discussion", "Summary",
"Conclusion",\n "Appendix", "Reference", "Annex", "Acknowledgment", "Disclaimer".\n These are soft cues,
not strict requirements.\n- Length restriction:\n • Chinese heading: ≤25 characters\n • English heading: ≤80
characters\n- Exclude long narrative sentences, continuous prose, or bullet-style lists → output as "-1".\n\nOUTPUT
FORMAT\n- Return ONLY a valid JSON array of {"title","content"} objects.\n- No reasoning or
commentary.\n\nEXAMPLES\n\nExample 1 — No heading\nInput:\n[{"0": "Copyright page · Publication info (ISBN 123-456).
All rights reserved."}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"0"},\n ...\n]\n\nExample 2 — One
heading\nInput:\n[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]\nOutput:\n[\n
{"title":"Chapter 1: General Provisions","chunk_id":"1"},\n ...\n]\n\nExample 3 — Narrative +
heading\nInput:\n[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are
explained…"}, ...]\nOutput:\n[\n {"title":"Section 2: Definitions","chunk_id":"2"},\n ...\n]\n\nExample 4 —
Multiple headings in one chunk\nInput:\n[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C
commits… Appendix A Data Specification"}, ...]\nOutput:\n[\n {"title":"Declarations and
Commitments","chunk_id":"3"},\n {"title":"(I) Party B commits","chunk_id":"3"},\n {"title":"(II) Party
C commits","chunk_id":"3"},\n {"title":"Appendix A Data Specification","chunk_id":"3"},\n
...\n]\n\nExample 5 — Numbering styles\nInput:\n[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III)
Methods Overview."}, ...]\nOutput:\n[\n {"title":"1. Scope","chunk_id":"4"},\n {"title":"2)
Definitions","chunk_id":"4"},\n {"title":"III) Methods Overview","chunk_id":"4"},\n ...\n]\n\nExample 6 —
Long list (NOT headings)\nInput:\n{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"},
...]\nOutput:\n[\n {"title":"-1","chunk_id":"5"},\n ...\n]\n\nExample 7 — Mixed
Chinese/English\nInput:\n{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要…
第2节:术语与缩略语"}, ...]\nOutput:\n[\n {"title":"Chapter 1: Overview","chunk_id":"6"},\n
{"title":"第2节:术语与缩略语","chunk_id":"6"},\n ...\n]"
},
{
"role": "user",
"content": "OUTPUT FORMAT\n- Return ONLY the JSON array.\n- Use double quotes.\n- No extra commentary.\n- Keep
language of "title" the same as the input.\n\nINPUT\n{"1": "-11收入12650.9013.001435.97销售部178720.85\nCW-20260422
026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026
-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15\nFAILED TO
PARSE TABLE\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\n补充说明:1.表格共45⾏数据,刚好填满两⻚A4 PDF;2.所有数值
均为财务常⽤合理数值,间隔清晰,可直接编辑修改;3.适配所有主流PDF编辑软件,下载后即可编辑。\n(注:⽂档部分内容可能由 AI
⽣成)"}"
}
Expected behavior
No response
Steps to reproduce
Additional information
Picture 1

Picture 2
Overall log
2026-06-17 09:49:28,700 INFO 33 Parsed MinerU config (sensitive fields redacted): {'llm_name': 'MinerU', 'mineru_apiserver': 'http://101.36.73.92:8091', 'mineru_output_dir': '/tmp/111/out', 'mineru_backend': 'pipeline', 'mineru_delete_output': '0'}
2026-06-17 09:49:28,738 INFO 33 [MinerU] API openapi.json reachable=True url=http://101.36.73.92:8091/openapi.json
2026-06-17 09:49:28,740 INFO 33 [MinerU] Received binary PDF -> /tmp/mineru_bin_pdf_bq2rmc1a/测试的可编辑表格6.pdf
2026-06-17 09:49:29,666 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.15, progress_msg: 09:49:28 Page(1
5): [MinerU] Received binary PDF -> /tmp/mineru_bin_pdf_bq2rmc1a/测试的可编辑表格6.pdf5): [MinerU] Output directory: /tmp/111/out2026-06-17 09:49:29,666 INFO 33 [MinerU] Output directory: /tmp/111/out backend=pipeline api=http://101.36.73.92:8091 server_url=
2026-06-17 09:49:30,015 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.15, progress_msg: 09:49:29 Page(1
2026-06-17 09:49:30,120 INFO 33 [MinerU] request data={'output_dir': './output', 'lang_list': <MinerULanguage.CH: 'ch'>, 'backend': <MinerUBackend.PIPELINE: 'pipeline'>, 'parse_method': <MinerUParseMethod.AUTO: 'auto'>, 'formula_enable': False, 'table_enable': True, 'server_url': None, 'return_md': True, 'return_middle_json': True, 'return_model_output': True, 'return_content_list': True, 'return_images': True, 'response_format_zip': True, 'start_page_id': 0, 'end_page_id': 99999}
2026-06-17 09:49:30,120 INFO 33 [MinerU] request options=MinerUParseOptions(backend=<MinerUBackend.PIPELINE: 'pipeline'>, lang=<MinerULanguage.CH: 'ch'>, method=<MinerUParseMethod.AUTO: 'auto'>, server_url='', delete_output=False, parse_method='raw', formula_enable=False, table_enable=True)
2026-06-17 09:49:30,120 INFO 33 [MinerU] invoke api: http://101.36.73.92:8091/file_parse backend=pipeline server_url=None
2026-06-17 09:49:30,299 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.2, progress_msg: 09:49:30 Page(1
5): [MinerU] invoke api: http://101.36.73.92:8091/file_parse5): [MinerU] zip file returned, saving to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip...2026-06-17 09:49:31,245 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:31,252 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.004s]
[2026-06-17 09:49:31 +0800] [28] [INFO] 127.0.0.1:40506 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 26772 22363
2026-06-17 09:49:31,269 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:31,275 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:31 +0800] [28] [INFO] 127.0.0.1:40516 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18364
[2026-06-17 09:49:31 +0800] [28] [INFO] 127.0.0.1:40522 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 10523
2026-06-17 09:49:36,280 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:36,286 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.004s]
[2026-06-17 09:49:36 +0800] [28] [INFO] 127.0.0.1:40536 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27047 22392
2026-06-17 09:49:36,304 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:36,309 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:36 +0800] [28] [INFO] 127.0.0.1:40542 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18041
[2026-06-17 09:49:36 +0800] [28] [INFO] 127.0.0.1:40550 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 9901
2026-06-17 09:49:37,144 INFO 33 [MinerU] zip file returned, saving to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip...
2026-06-17 09:49:37,432 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.3, progress_msg: 09:49:37 Page(1
2026-06-17 09:49:37,534 INFO 33 [MinerU] Unzip to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7...
2026-06-17 09:49:37,534 INFO 33 [MinerU] Extract zip: zip_path=/tmp/111/out/测试的可编辑表格6_auto_10nvgyv7.zip, extract_to=/tmp/111/out/测试的可编辑表格6_auto_10nvgyv7, root_hint=测试的可编辑表格6/
2026-06-17 09:49:37,721 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.4, progress_msg: 09:49:37 Page(1
5): [MinerU] Unzip to /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7...5): [MinerU] Parsed 8 blocks from PDF.2026-06-17 09:49:37,721 INFO 33 [MinerU] Api completed successfully.
2026-06-17 09:49:37,722 INFO 33 [MinerU] Expected output files: 测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Searching output in: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying original path: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying sanitized filename: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying sanitized nested path: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/测试的可编辑表格6/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying vlm subdirectory: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/vlm/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying vlm subdirectory with sanitized name: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/vlm/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,722 INFO 33 [MinerU] Trying parse-method path: /tmp/111/out/测试的可编辑表格6_auto_10nvgyv7/auto/测试的可编辑表格6_content_list.json
2026-06-17 09:49:37,723 INFO 33 [MinerU] Parsed 8 blocks from PDF.
2026-06-17 09:49:37,927 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.75, progress_msg: 09:49:37 Page(1
2026-06-17 09:49:39,110 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.8, progress_msg: 09:49:37 Page(1
5): Finish parsing.5): Start to generate keywords for every chunk ...2026-06-17 09:49:39,143 INFO 33 naive_merge(测试的可编辑表格6.pdf): 0.032985937999910675
2026-06-17 09:49:39,143 INFO 33 Chunking(10.850661832002515) 测试的可编辑表格6.pdf/测试的可编辑表格6.pdf done
2026-06-17 09:49:39,329 INFO 33 MINIO PUT(测试的可编辑表格6.pdf) cost 0.186 s
2026-06-17 09:49:39,499 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:39 Page(1
2026-06-17 09:49:39,509 INFO 33 [HISTORY][
{
"role": "system",
"content": "## Role\nYou are a text analyzer.\n\n## Task\nExtract the most important keywords/phrases of a given piece of text content.\n\n## Requirements\n- Summarize the text content, and give the top 3 important keywords/phrases.\n- The keywords MUST be in the same language as the given piece of text content.\n- The keywords are delimited by ENGLISH COMMA.\n- Output keywords ONLY.\n\n---\n\n## Text Content\n\n财务收⽀明细总表 (可编辑)\n说明:本表格为可编辑PDF格式,可直接在WPS、Adobe Acrobat等软件中修改单元格内容,数据涵盖两⻚A4篇幅,均为真实财务类数值,字段清晰、数据间隔合理。\n凭证编号记账日期收支类型业务金额(元)增值税率(%)税额 (元)归属部门累计余额(元)\nCw-20260012026-01-01收入12860.5013.001452.32销售部12860.50\ncw-20260022026-01-02支出3250.759.00267.37行政部9609.75\ncW-20260032026-01-03收入8950.0013.001053.98销售部18559.75\nCW-20260042026-01-04支出1890.206.00105.83财务部16669.55\nCW-20260052026-01-05收入15680.3013.001767.47销售部32349.85\nCW-20260062026-01-06支出4520.809.00371.51采购部27829.05\nCW-20260072026-01-07收入7890.606.00442.50市场部35719.65\ncW-20260082026-01-08支出2150.306.00117.28行政部33569.35\ncW-20260092026-01-09收入11250.8013.001285.42销售部44820.15\ncW-20260102026-01-10支出3860.509.00313.88采购部40959.65\nCW-20260112026-01-11收入9630.206.00536.46市场部50589.85\nCW-20260122026-01-12支出1580.706.0086.48财务部49009.15\n\nCW-20260132026-01-13收入14520.7013.001653.76销售部63529.85\nCW-20260142026-01-14支出5280.309.00431.72采购部58249.55\nCW-20260152026-01-15收入8250.506.00451.69市场部66500.05\nCW-20260162026-01-16支出2760.806.00151.38行政部63739.25\nCW-20260172026-01-17收入13680.9013.001572.76销售部77420.15\nCW-20260182026-01-18支出4120.609.00337.59采购部73299.55\nCW-20260192026-01-19收入10560.406.00581.31市场部83859.95\nCW-20260202026-01-20支出1950.406.00106.22财务部81909.55\nCW-20260212026-01-21收入12350.7013.001406.83销售部94260.25\nCW-20260222026-01-22支出3580.909.00289.88行政部90679.35\nCW-20260232026-01-23收入9870.306.00548.19市场部100549.65\nCW-20260242026-01-24支出2360.506.00128.37财务部98189.15\nCW-20260252026-01-25收入15280.6013.001727.54销售部113469.75\nCW-20260262026-01-26支出4890.709.00397.20采购部108579.05\nCW-20260272026-01-27收入8650.806.00470.15市场部117229.85\nCW-20260282026-01-28支出2580.306.00140.54行政部114649.55\n收入11980.5013.001367.70销售部126630.05\nCW-20260292026-01-29\nCW-20260302026-01-30支出3960.609.00321.35采购部122669.45\nCW-20260312026-02-01收入10250.706.00564.26市场部132920.15\nCW-20260322026-02-02支出2120.406.00115.47财务部130799.75\nCW-20260332026-02-03收入14860.9013.001683.16销售部145660.65\nCW-20260342026-02-04支出4350.809.00352.87采购部141309.85\nCW-20260352026-02-05收入9320.506.00508.35市场部150630.35\nCW-20260362026-02-06支出2890.606.00157.89行政部147739.75\nCW-20260372026-02-07收入13590.8013.001546.82销售部161330.55\nCW-20260382026-02-08支出3680.709.00297.96行政部157649.85\nCW-20260392026-02-09收入10870.606.00597.40市场部168520.45\nCW-20260402026-02-10支出2450.506.00132.33财务部166069.95\nCW-20260412026-02-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15"
},
{
"role": "user",
"content": "Output: "
}
]
09:49:39 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:39,533 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:39,534 INFO 33 [HISTORY][
{
"role": "system",
"content": "## Role\nYou are a text analyzer.\n\n## Task\nExtract the most important keywords/phrases of a given piece of text content.\n\n## Requirements\n- Summarize the text content, and give the top 3 important keywords/phrases.\n- The keywords MUST be in the same language as the given piece of text content.\n- The keywords are delimited by ENGLISH COMMA.\n- Output keywords ONLY.\n\n---\n\n## Text Content\n-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\n补充说明:1.表格共45⾏数据,刚好填满两⻚A4 PDF;2.所有数值均为财务常⽤合理数值,间隔清晰,可直接编辑修改;3.适配所有主流PDF编辑软件,下载后即可编辑。\n(注:⽂档部分内容可能由 AI ⽣成)"
},
{
"role": "user",
"content": "Output: "
}
]
09:49:39 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:39,535 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:40,929 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:40 Page(1
5): Keywords generation 2 chunks completed in 1.46s5): Generate 2 chunks2026-06-17 09:49:40,929 INFO 33 Build document 测试的可编辑表格6.pdf: 12.64s
2026-06-17 09:49:41,099 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:40 Page(1
2026-06-17 09:49:41,316 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:41,322 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:41 +0800] [28] [INFO] 127.0.0.1:47866 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27539 21572
2026-06-17 09:49:41,341 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:41,346 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:41 +0800] [28] [INFO] 127.0.0.1:47878 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18827
[2026-06-17 09:49:41 +0800] [28] [INFO] 127.0.0.1:47880 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 11349
2026-06-17 09:49:41,985 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.7999999999999999, progress_msg:
2026-06-17 09:49:41,985 INFO 33 Embedding chunks (0.89s)
2026-06-17 09:49:42,106 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:41 Page(1
5): Embedding chunks (0.89s)5): Start to generate table of content ...2026-06-17 09:49:42,261 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:42 Page(1
2026-06-17 09:49:42,631 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg:
2026-06-17 09:49:42,636 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are a robust Table-of-Contents (TOC) extractor.\n\nGOAL\nGiven a dictionary of chunks {"<chunk_ID>": chunk_text}, extract TOC-like headings and return a strict JSON array of objects:\n[\n {"title": "", "chunk_id": ""},\n ...\n]\n\nFIELDS\n- "title": the heading text (clean, no page numbers or leader dots).\n - If any part of a chunk has no valid heading, output that part as {"title":"-1", ...}.\n- "chunk_id": the chunk ID (string).\n - One chunk can yield multiple JSON objects in order (unmatched text + one or more headings).\n\nRULES\n1) Preserve input chunk order strictly.\n2) If a chunk contains multiple headings, expand them in order:\n - Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}\n - Then each heading → {"title":"...","chunk_id":"<chunk_ID>"}\n3) Do not merge outputs across chunks; each object refers to exactly one chunk ID.\n4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).\n5) When ambiguous, prefer "-1" unless the text strongly looks like a heading.\n\nHEADING DETECTION (cues, not hard rules)\n- Appears near line start, short isolated phrase, often followed by content.\n- May contain separators: — —— - : : · •\n- Numbering styles:\n • 第[一二三四五六七八九十百]+(篇|章|节|条)\n • [((]?[一二三四五六七八九十]+[))]?\n • [((]?[①②③④⑤⑥⑦⑧⑨⑩][))]?\n • ^\d+(\.\d+)[)..]?\s\n • ^[IVXLCDM]+[).]\n • ^[A-Z][).]\n- Canonical section cues (general only):\n Common heading indicators include words such as:\n "Overview", "Introduction", "Background", "Purpose", "Scope", "Definition",\n "Method", "Procedure", "Result", "Discussion", "Summary", "Conclusion",\n "Appendix", "Reference", "Annex", "Acknowledgment", "Disclaimer".\n These are soft cues, not strict requirements.\n- Length restriction:\n • Chinese heading: ≤25 characters\n • English heading: ≤80 characters\n- Exclude long narrative sentences, continuous prose, or bullet-style lists → output as "-1".\n\nOUTPUT FORMAT\n- Return ONLY a valid JSON array of {"title","content"} objects.\n- No reasoning or commentary.\n\nEXAMPLES\n\nExample 1 — No heading\nInput:\n[{"0": "Copyright page · Publication info (ISBN 123-456). All rights reserved."}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"0"},\n ...\n]\n\nExample 2 — One heading\nInput:\n[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]\nOutput:\n[\n {"title":"Chapter 1: General Provisions","chunk_id":"1"},\n ...\n]\n\nExample 3 — Narrative + heading\nInput:\n[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are explained…"}, ...]\nOutput:\n[\n {"title":"Section 2: Definitions","chunk_id":"2"},\n ...\n]\n\nExample 4 — Multiple headings in one chunk\nInput:\n[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C commits… Appendix A Data Specification"}, ...]\nOutput:\n[\n {"title":"Declarations and Commitments","chunk_id":"3"},\n {"title":"(I) Party B commits","chunk_id":"3"},\n {"title":"(II) Party C commits","chunk_id":"3"},\n {"title":"Appendix A Data Specification","chunk_id":"3"},\n ...\n]\n\nExample 5 — Numbering styles\nInput:\n[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III) Methods Overview."}, ...]\nOutput:\n[\n {"title":"1. Scope","chunk_id":"4"},\n {"title":"2) Definitions","chunk_id":"4"},\n {"title":"III) Methods Overview","chunk_id":"4"},\n ...\n]\n\nExample 6 — Long list (NOT headings)\nInput:\n{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"5"},\n ...\n]\n\nExample 7 — Mixed Chinese/English\nInput:\n{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要… 第2节:术语与缩略语"}, ...]\nOutput:\n[\n {"title":"Chapter 1: Overview","chunk_id":"6"},\n {"title":"第2节:术语与缩略语","chunk_id":"6"},\n ...\n]"
},
{
"role": "user",
"content": "OUTPUT FORMAT\n- Return ONLY the JSON array.\n- Use double quotes.\n- No extra commentary.\n- Keep language of "title" the same as the input.\n\nINPUT\n{"0": "\n财务收⽀明细总表 (可编辑)\n说明:本表格为可编辑PDF格式,可直接在WPS、Adobe Acrobat等软件中修改单元格内容,数据涵盖两⻚A4篇幅,均为真实财务类数值,字段清晰、数据间隔合理。\n凭证编号记账日期收支类型业务金额(元)增值税率(%)税额 (元)归属部门累计余额(元)\nCw-20260012026-01-01收入12860.5013.001452.32销售部12860.50\ncw-20260022026-01-02支出3250.759.00267.37行政部9609.75\ncW-20260032026-01-03收入8950.0013.001053.98销售部18559.75\nCW-20260042026-01-04支出1890.206.00105.83财务部16669.55\nCW-20260052026-01-05收入15680.3013.001767.47销售部32349.85\nCW-20260062026-01-06支出4520.809.00371.51采购部27829.05\nCW-20260072026-01-07收入7890.606.00442.50市场部35719.65\ncW-20260082026-01-08支出2150.306.00117.28行政部33569.35\ncW-20260092026-01-09收入11250.8013.001285.42销售部44820.15\ncW-20260102026-01-10支出3860.509.00313.88采购部40959.65\nCW-20260112026-01-11收入9630.206.00536.46市场部50589.85\nCW-20260122026-01-12支出1580.706.0086.48财务部49009.15\n\nCW-20260132026-01-13收入14520.7013.001653.76销售部63529.85\nCW-20260142026-01-14支出5280.309.00431.72采购部58249.55\nCW-20260152026-01-15收入8250.506.00451.69市场部66500.05\nCW-20260162026-01-16支出2760.806.00151.38行政部63739.25\nCW-20260172026-01-17收入13680.9013.001572.76销售部77420.15\nCW-20260182026-01-18支出4120.609.00337.59采购部73299.55\nCW-20260192026-01-19收入10560.406.00581.31市场部83859.95\nCW-20260202026-01-20支出1950.406.00106.22财务部81909.55\nCW-20260212026-01-21收入12350.7013.001406.83销售部94260.25\nCW-20260222026-01-22支出3580.909.00289.88行政部90679.35\nCW-20260232026-01-23收入9870.306.00548.19市场部100549.65\nCW-20260242026-01-24支出2360.506.00128.37财务部98189.15\nCW-20260252026-01-25收入15280.6013.001727.54销售部113469.75\nCW-20260262026-01-26支出4890.709.00397.20采购部108579.05\nCW-20260272026-01-27收入8650.806.00470.15市场部117229.85\nCW-20260282026-01-28支出2580.306.00140.54行政部114649.55\n收入11980.5013.001367.70销售部126630.05\nCW-20260292026-01-29\nCW-20260302026-01-30支出3960.609.00321.35采购部122669.45\nCW-20260312026-02-01收入10250.706.00564.26市场部132920.15\nCW-20260322026-02-02支出2120.406.00115.47财务部130799.75\nCW-20260332026-02-03收入14860.9013.001683.16销售部145660.65\nCW-20260342026-02-04支出4350.809.00352.87采购部141309.85\nCW-20260352026-02-05收入9320.506.00508.35市场部150630.35\nCW-20260362026-02-06支出2890.606.00157.89行政部147739.75\nCW-20260372026-02-07收入13590.8013.001546.82销售部161330.55\nCW-20260382026-02-08支出3680.709.00297.96行政部157649.85\nCW-20260392026-02-09收入10870.606.00597.40市场部168520.45\nCW-20260402026-02-10支出2450.506.00132.33财务部166069.95\nCW-20260412026-02-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15"}"
}
]
09:49:42 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,637 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,651 INFO 33 PUT http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_bulk?refresh=wait_for&timeout=60s [status:200 duration:0.542s]
2026-06-17 09:49:42,716 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg:
2026-06-17 09:49:42,720 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are a robust Table-of-Contents (TOC) extractor.\n\nGOAL\nGiven a dictionary of chunks {"<chunk_ID>": chunk_text}, extract TOC-like headings and return a strict JSON array of objects:\n[\n {"title": "", "chunk_id": ""},\n ...\n]\n\nFIELDS\n- "title": the heading text (clean, no page numbers or leader dots).\n - If any part of a chunk has no valid heading, output that part as {"title":"-1", ...}.\n- "chunk_id": the chunk ID (string).\n - One chunk can yield multiple JSON objects in order (unmatched text + one or more headings).\n\nRULES\n1) Preserve input chunk order strictly.\n2) If a chunk contains multiple headings, expand them in order:\n - Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}\n - Then each heading → {"title":"...","chunk_id":"<chunk_ID>"}\n3) Do not merge outputs across chunks; each object refers to exactly one chunk ID.\n4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).\n5) When ambiguous, prefer "-1" unless the text strongly looks like a heading.\n\nHEADING DETECTION (cues, not hard rules)\n- Appears near line start, short isolated phrase, often followed by content.\n- May contain separators: — —— - : : · •\n- Numbering styles:\n • 第[一二三四五六七八九十百]+(篇|章|节|条)\n • [((]?[一二三四五六七八九十]+[))]?\n • [((]?[①②③④⑤⑥⑦⑧⑨⑩][))]?\n • ^\d+(\.\d+)[)..]?\s\n • ^[IVXLCDM]+[).]\n • ^[A-Z][).]\n- Canonical section cues (general only):\n Common heading indicators include words such as:\n "Overview", "Introduction", "Background", "Purpose", "Scope", "Definition",\n "Method", "Procedure", "Result", "Discussion", "Summary", "Conclusion",\n "Appendix", "Reference", "Annex", "Acknowledgment", "Disclaimer".\n These are soft cues, not strict requirements.\n- Length restriction:\n • Chinese heading: ≤25 characters\n • English heading: ≤80 characters\n- Exclude long narrative sentences, continuous prose, or bullet-style lists → output as "-1".\n\nOUTPUT FORMAT\n- Return ONLY a valid JSON array of {"title","content"} objects.\n- No reasoning or commentary.\n\nEXAMPLES\n\nExample 1 — No heading\nInput:\n[{"0": "Copyright page · Publication info (ISBN 123-456). All rights reserved."}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"0"},\n ...\n]\n\nExample 2 — One heading\nInput:\n[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]\nOutput:\n[\n {"title":"Chapter 1: General Provisions","chunk_id":"1"},\n ...\n]\n\nExample 3 — Narrative + heading\nInput:\n[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are explained…"}, ...]\nOutput:\n[\n {"title":"Section 2: Definitions","chunk_id":"2"},\n ...\n]\n\nExample 4 — Multiple headings in one chunk\nInput:\n[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C commits… Appendix A Data Specification"}, ...]\nOutput:\n[\n {"title":"Declarations and Commitments","chunk_id":"3"},\n {"title":"(I) Party B commits","chunk_id":"3"},\n {"title":"(II) Party C commits","chunk_id":"3"},\n {"title":"Appendix A Data Specification","chunk_id":"3"},\n ...\n]\n\nExample 5 — Numbering styles\nInput:\n[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III) Methods Overview."}, ...]\nOutput:\n[\n {"title":"1. Scope","chunk_id":"4"},\n {"title":"2) Definitions","chunk_id":"4"},\n {"title":"III) Methods Overview","chunk_id":"4"},\n ...\n]\n\nExample 6 — Long list (NOT headings)\nInput:\n{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"}, ...]\nOutput:\n[\n {"title":"-1","chunk_id":"5"},\n ...\n]\n\nExample 7 — Mixed Chinese/English\nInput:\n{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要… 第2节:术语与缩略语"}, ...]\nOutput:\n[\n {"title":"Chapter 1: Overview","chunk_id":"6"},\n {"title":"第2节:术语与缩略语","chunk_id":"6"},\n ...\n]"
},
{
"role": "user",
"content": "OUTPUT FORMAT\n- Return ONLY the JSON array.\n- Use double quotes.\n- No extra commentary.\n- Keep language of "title" the same as the input.\n\nINPUT\n{"1": "-11收入12650.9013.001435.97销售部178720.85\nCW-20260422026-02-12支出4190.809.00341.17采购部174529.05\nCW-20260432026-02-13收入9580.706.00522.56市场部184109.75\nCW-20260442026-02-14支出2670.606.00146.08行政部181439.15\n收入15120.8013.001700.56销售部196559.95\nCW- 20260452026-02- 15\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\nFAILED TO PARSE TABLE\n补充说明:1.表格共45⾏数据,刚好填满两⻚A4 PDF;2.所有数值均为财务常⽤合理数值,间隔清晰,可直接编辑修改;3.适配所有主流PDF编辑软件,下载后即可编辑。\n(注:⽂档部分内容可能由 AI ⽣成)"}"
}
]
09:49:42 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,722 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:42,898 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.8500000000000001, progress_msg:
2026-06-17 09:49:42,960 INFO 33 Indexing doc(测试的可编辑表格6.pdf), page(0-4), chunks(2), elapsed: 0.85
2026-06-17 09:49:45,078 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: None, progress_msg: 09:49:43 Page(1~5): Indexing done (0.91s).
2026-06-17 09:49:45,843 INFO 33
Filtered TOC sections:
[{'title': '财务收⽀明细总表 (可编辑)', 'chunk_id': '0'}]
2026-06-17 09:49:45,845 INFO 33 [HISTORY][
{
"role": "system",
"content": "You are given a JSON array of TOC(table of contents) items. Each item has at least {"title": string} and may include an existing title hierarchical level.\n\nTask\n- For each item, assign a depth label using Arabic numerals only: top-level = 1, second-level = 2, third-level = 3, etc.\n- Multiple items may share the same depth (e.g., many 1s, many 2s).\n- Do not use dotted numbering (no 1.1/1.2). Use a single digit string per item indicating its depth only.\n- Preserve the original item order exactly. Do not insert, delete, or reorder.\n- Decide levels yourself to keep a coherent hierarchy. Keep peers at the same depth.\n\nOutput\n- Return a valid JSON array only (no extra text).\n- Each element must be {"level": "1|2|3", "title": }.\n- title must be the original title string.\n\nExamples\n\nExample A (chapters with sections)\nInput:\n["Chapter 1 Methods", "Section 1 Definition", "Section 2 Process", "Chapter 2 Experiment"]\n\nOutput:\n[\n {"level":"1","title":"Chapter 1 Methods"},\n {"level":"2","title":"Section 1 Definition"},\n {"level":"2","title":"Section 2 Process"},\n {"level":"1","title":"Chapter 2 Experiment"}\n]\n\nExample B (parts with chapters)\nInput:\n["Part I Theory", "Chapter 1 Basics", "Chapter 2 Methods", "Part II Applications", "Chapter 3 Case Studies"]\n\nOutput:\n[\n {"level":"1","title":"Part I Theory"},\n {"level":"2","title":"Chapter 1 Basics"},\n {"level":"2","title":"Chapter 2 Methods"},\n {"level":"1","title":"Part II Applications"},\n {"level":"2","title":"Chapter 3 Case Studies"}\n]\n\nExample C (plain headings)\nInput:\n["Introduction", "Background and Motivation", "Related Work", "Methodology", "Evaluation"]\n\nOutput:\n[\n {"level":"1","title":"Introduction"},\n {"level":"2","title":"Background and Motivation"},\n {"level":"2","title":"Related Work"},\n {"level":"1","title":"Methodology"},\n {"level":"1","title":"Evaluation"}\n]"
},
{
"role": "user",
"content": "['财务收⽀明细总表 (可编辑)']"
}
]
09:49:45 - LiteLLM:INFO: utils.py:3995 -
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:45,847 INFO 33
LiteLLM completion() model= qwen3-235b-a22b; provider = dashscope
2026-06-17 09:49:46,354 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:46,360 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:46 +0800] [28] [INFO] 127.0.0.1:47896 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27542 21360
2026-06-17 09:49:46,378 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:46,383 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:46 +0800] [28] [INFO] 127.0.0.1:47912 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22553 18257
[2026-06-17 09:49:46 +0800] [28] [INFO] 127.0.0.1:47916 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 11233
2026-06-17 09:49:46,654 INFO 33 ------------ T O C -------------
[
{
"level": "1",
"title": "财务收⽀明细总表 (可编辑)",
"chunk_id": "0"
}
]
2026-06-17 09:49:47,661 INFO 33 PUT http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_bulk?refresh=wait_for&timeout=60s [status:200 duration:1.005s]
2026-06-17 09:49:47,804 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 0.9, progress_msg:
2026-06-17 09:49:48,204 INFO 33 set_progress(c53b3c9a69ee11f1b8b85533fa24cb09), progress: 1.0, progress_msg: 09:49:47 Page(1~5): Task done (19.90s)
2026-06-17 09:49:48,204 INFO 33 Chunk doc(测试的可编辑表格6.pdf), page(0-4), chunks(2), token(2654), elapsed:19.90
2026-06-17 09:49:48,206 INFO 33 handle_task done for task {"id": "c53b3c9a69ee11f1b8b85533fa24cb09", "doc_id": "c1941d8c69ee11f1b8b85533fa24cb09", "from_page": 0, "to_page": 4, "retry_count": 0, "kb_id": "7987ef80660d11f1a066912af183a8a6", "parser_id": "naive", "parser_config": {"table_context_size": 53, "image_context_size": 53, "layout_recognize": "MinerU@MinerU", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 3, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0, "scope": "file", "ext": {"clustering_method": "gmm", "tree_builder": "raptor"}}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "general", "batch_chunk_token_size": 4096, "retry_attempts": 2, "retry_backoff_seconds": 2.0, "retry_backoff_max_seconds": 60.0, "build_subgraph_timeout_per_chunk_seconds": 300, "build_subgraph_min_timeout_seconds": 600, "merge_timeout_seconds": 180, "resolution_timeout_seconds": 1800, "community_timeout_seconds": 1800, "lock_acquire_timeout_seconds": 600, "resolution": false}, "parent_child": {"use_parent_child": false, "children_delimiter": "\n"}, "children_delimiter": "", "llm_id": "qwen3-235b-a22b@Tongyi-Qianwen", "enable_children": false, "toc_extraction": true, "image_table_context_window": 53, "overlapped_percent": 0.1, "mineru_parse_method": "auto", "mineru_formula_enable": false, "mineru_table_enable": true, "mineru_lang": "Chinese", "metadata": {"type": "object", "properties": {}, "additionalProperties": false}, "built_in_metadata": [], "enable_metadata": false, "tenant_llm_id": 149}, "name": "\u6d4b\u8bd5\u7684\u53ef\u7f16\u8f91\u8868\u683c6.pdf", "type": "pdf", "location": "\u6d4b\u8bd5\u7684\u53ef\u7f16\u8f91\u8868\u683c6.pdf", "size": 202145, "tenant_id": "105f6de65ce711f182c1edd34d87cce0", "language": "Chinese", "embd_id": "text-embedding-v4@Tongyi-Qianwen", "pagerank": 0, "kb_parser_config": {"table_context_size": 53, "image_context_size": 53, "layout_recognize": "MinerU@MinerU", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 3, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0, "scope": "file", "ext": {"clustering_method": "gmm", "tree_builder": "raptor"}}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "general", "batch_chunk_token_size": 4096, "retry_attempts": 2, "retry_backoff_seconds": 2.0, "retry_backoff_max_seconds": 60.0, "build_subgraph_timeout_per_chunk_seconds": 300, "build_subgraph_min_timeout_seconds": 600, "merge_timeout_seconds": 180, "resolution_timeout_seconds": 1800, "community_timeout_seconds": 1800, "lock_acquire_timeout_seconds": 600, "resolution": false}, "parent_child": {"use_parent_child": false, "children_delimiter": "\n"}, "children_delimiter": "", "llm_id": "qwen3-235b-a22b@Tongyi-Qianwen", "enable_children": false, "toc_extraction": true, "image_table_context_window": 53, "overlapped_percent": 0.1, "mineru_parse_method": "auto", "mineru_formula_enable": false, "mineru_table_enable": true, "mineru_lang": "Chinese", "metadata": {"type": "object", "properties": {}, "additionalProperties": false}, "built_in_metadata": [], "enable_metadata": false, "tenant_llm_id": 149}, "img2txt_id": "qwen3.5-plus@Tongyi-Qianwen", "asr_id": "qwen3-asr-flash@Tongyi-Qianwen", "llm_id": "qwen3-235b-a22b@Tongyi-Qianwen", "update_time": 1781660965892, "task_type": ""}
2026-06-17 09:49:51,397 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.003s]
2026-06-17 09:49:51,405 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.004s]
[2026-06-17 09:49:51 +0800] [28] [INFO] 127.0.0.1:51920 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27768 24894
2026-06-17 09:49:51,424 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:49:51,429 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:49:51 +0800] [28] [INFO] 127.0.0.1:51932 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 18080
[2026-06-17 09:49:51 +0800] [28] [INFO] 127.0.0.1:51946 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 11365
2026-06-17 09:49:53,136 INFO 28 HEAD http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:49:53,144 INFO 28 POST http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.005s]
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51960 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents/c1941d8c69ee11f1b8b85533fa24cb09/chunks 1.1 200 10161 21493
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51966 GET /api/v1/documents/c1941d8c69ee11f1b8b85533fa24cb09/preview 1.1 200 202145 11730
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51982 GET /api/v1/documents/images/7987ef80660d11f1a066912af183a8a6-089da1545a7357fa 1.1 200 67116 3458
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51984 GET /api/v1/documents/images/7987ef80660d11f1a066912af183a8a6-7e7918716d33bf39 1.1 200 16313 5142
[2026-06-17 09:49:53 +0800] [28] [INFO] 127.0.0.1:51994 GET /api/v1/documents/c1941d8c69ee11f1b8b85533fa24cb09/preview 1.1 200 202145 11940
2026-06-17 09:49:56,545 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:49:56.544+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:50:26,578 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:50:26.576+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:50:56,586 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:50:56.584+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:51:26,593 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:51:26.591+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:51:56,598 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:51:56.596+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:52:26,606 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:52:26.605+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:52:56,612 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:52:56.611+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:53:26,619 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:53:26.617+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:53:46,579 INFO 28 HEAD http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:53:46,589 INFO 28 POST http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.007s]
2026-06-17 09:53:46,602 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:53:46,610 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.006s]
2026-06-17 09:53:46,624 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:53:46,631 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.005s]
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37486 GET /api/v1/users/me 1.1 200 521 89727
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37498 GET /api/v1/users/me/models 1.1 200 592 90156
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37516 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/graph 1.1 200 45 90480
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37506 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27768 90802
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37520 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 91128
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37530 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 91651
2026-06-17 09:53:46,657 WARNING 28 Database connection issue (attempt 1/5): (0, '')
2026-06-17 09:53:46,673 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:53:46,678 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.003s]
[2026-06-17 09:53:46 +0800] [28] [INFO] 127.0.0.1:37546 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 22205
[2026-06-17 09:53:47 +0800] [28] [INFO] 127.0.0.1:37544 GET /api/v1/tenants 1.1 200 241 1010452
[2026-06-17 09:53:51 +0800] [28] [INFO] 127.0.0.1:48994 GET /api/v1/datasets 1.1 200 26874 26067
[2026-06-17 09:53:51 +0800] [28] [INFO] 127.0.0.1:48998 GET /api/v1/datasets 1.1 200 26874 12789
[2026-06-17 09:53:51 +0800] [28] [INFO] 127.0.0.1:49000 GET /api/v1/users/me 1.1 200 521 5507
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49012 GET /api/v1/users/me 1.1 200 521 77409
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49020 GET /api/v1/users/me/models 1.1 200 592 77784
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49034 GET /api/v1/datasets 1.1 200 26874 78301
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49042 GET /v1/llm/list 1.1 200 19373 78466
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49006 GET /api/v1/chats 1.1 200 13787 92350
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49022 GET /api/v1/tenants 1.1 200 242 79870
[2026-06-17 09:53:52 +0800] [28] [INFO] 127.0.0.1:49050 GET /api/v1/users/me 1.1 200 521 5444
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49052 GET /api/v1/users/me 1.1 200 521 13080
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49066 GET /api/v1/users/me/models 1.1 200 592 15937
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49076 GET /api/v1/tenants 1.1 200 242 8379
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49090 GET /api/v1/system/version 1.1 200 48 1286
[2026-06-17 09:53:53 +0800] [28] [INFO] 127.0.0.1:49092 GET /api/v1/connectors 1.1 200 41 7631
2026-06-17 09:53:56,625 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:53:56.623+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:54:26,744 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:54:26.742+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
2026-06-17 09:54:56,751 INFO 33 task_executor_28e888509d40_0 reported heartbeat: {"ip_address": "172.18.0.2", "pid": 33, "name": "task_executor_28e888509d40_0", "now": "2026-06-17T09:54:56.749+08:00", "boot_at": "2026-06-17T09:41:56.010+08:00", "pending": 0, "lag": 0, "done": 1, "failed": 0, "current": {}}
[2026-06-17 09:55:01 +0800] [28] [INFO] 127.0.0.1:45090 GET /api/v1/datasets 1.1 200 26874 25549
[2026-06-17 09:55:01 +0800] [28] [INFO] 127.0.0.1:45096 GET /api/v1/datasets 1.1 200 26874 25947
[2026-06-17 09:55:02 +0800] [28] [INFO] 127.0.0.1:45102 GET /api/v1/users/me 1.1 200 521 5576
2026-06-17 09:55:03,674 INFO 28 HEAD http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.002s]
2026-06-17 09:55:03,679 INFO 28 POST http://192.168.80.139:1200/ragflow_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
2026-06-17 09:55:03,893 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:55:03,898 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45106 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/graph 1.1 200 45 235751
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45120 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 27768 236237
2026-06-17 09:55:03,921 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:55:03,926 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45130 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6 1.1 200 2360 26975
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45146 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 27291
2026-06-17 09:55:03,941 INFO 28 HEAD http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0 [status:200 duration:0.001s]
2026-06-17 09:55:03,945 INFO 28 POST http://192.168.80.139:1200/ragflow_doc_meta_105f6de65ce711f182c1edd34d87cce0/_search [status:200 duration:0.002s]
[2026-06-17 09:55:03 +0800] [28] [INFO] 127.0.0.1:45154 GET /api/v1/datasets/7987ef80660d11f1a066912af183a8a6/documents 1.1 200 22547 16495