fix(devkit): sort inputs by line_number in load_inputs_and_outputs#4171
fix(devkit): sort inputs by line_number in load_inputs_and_outputs#4171devteamaegis wants to merge 1 commit into
Conversation
get_details() merges inputs and outputs positionally: it calls
.to_dict("list") on both DataFrames and then constructs a combined
DataFrame from the parallel lists.
_outputs_padding() already sorts outputs by line_number ascending, but
the inputs DataFrame was left in whatever order the JSON file was written
— which depends on async executor completion order. If inputs arrived as
[line 2, line 0, line 1] and outputs were [line 0, line 1, line 2], the
merged DataFrame incorrectly paired:
- "third query" → "ans0" (wrong: ans0 belongs to line 0 = "first query")
The fix sorts inputs by line_number inside load_inputs_and_outputs()
before returning, immediately after outputs are sorted, so both
DataFrames are in the same ascending order when get_details() merges them.
Fixes microsoft#2646
|
@devteamaegis please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
1 similar comment
|
@devteamaegis please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
854f028 to
d498d10
Compare
Problem
PFClient.runs.get_details()returned dataframe rows in inconsistent order, causing input rows to be paired with wrong output rows.Root cause in
LocalStorageOperations.load_inputs_and_outputs():_outputs_padding()already sorts the outputs DataFrame byline_numberascending (line 504), but the inputs DataFrame was left in whatever order the SDK inputs JSON file was written — which depends on async executor completion order.get_details()then merges inputs and outputs positionally via.to_dict("list")+DataFrame(data), so mismatched orderings cause silently wrong row pairings:Fixes #2646.
Fix
Sort
inputsbyline_numberascending insideload_inputs_and_outputs(), immediately after outputs are sorted and indexed, so both DataFrames are in the same order whenget_details()merges them:Tests
Added
test_load_inputs_and_outputs_sorts_inputs_by_line_numbertotests/sdk_cli_test/unittests/test_local_storage_operations.py:[line 2, line 0, line 1][line 0, line 1, line 2]load_inputs_and_outputs()returns inputs sorted as["first", "second", "third"]The test fails on
mainand passes with this fix.