Handle nulls in type coercion of higher-order UDFs, map_extract, spark repeat#23071
Handle nulls in type coercion of higher-order UDFs, map_extract, spark repeat#23071gstvg wants to merge 3 commits into
Conversation
…tract type coercion
| fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { | ||
| let [map_type, _] = take_function_args(self.name(), arg_types)?; | ||
|
|
||
| if map_type.is_null() { |
There was a problem hiding this comment.
Null is accepted in coercion, but return_type still calls get_map_entry_field on the same Null type. So
map_extract(NULL, 'a') still fails with an internal error.
We should also add a test for this.
There was a problem hiding this comment.
You are right, thank you. The only tested function array_filter return_type returned the first arg type without any checks, so it obviously passed 63ab393 Note that I also implemented null handling during execution
kumarUjjawal
left a comment
There was a problem hiding this comment.
Overall good just need few tidy ups in test.
There was a problem hiding this comment.
The code change teaches the Spark array-repeat function to accept a null count. But the test calls the two-argument repeat form, and in this test context that name resolves to the core string-repeat function, which already returned null for a null count long before this PR.
Add the null-count case to the file that actually tests the array-repeat function.
| Ok(()) | ||
| } | ||
|
|
||
| #[tokio::test] |
There was a problem hiding this comment.
The test builds the column expression and asserts that construction doesn't error. That does cover the issue, which is good. But the issue's real goal is a correct end-to-end result, and the test never resolves the lambda variables, never builds a physical plan, and never executes, so it can't catch a regression where planning succeeds but the resolved type or the executed output is wrong.
Which issue does this PR close?
Rationale for this change
Unresolved lambda variables and unspecified placeholders report a
DataType::Nulldata type, which causes an error when used as argument of a function that doesn't handle nulls during type coercionWhat changes are included in this PR?
Handle null in
coerce_single_list_argfor higher-order functions,map_extractand in sparkrepeat. All other functions (scalar, window, agg and higher-order) that uses user defined type coercion have been checked to currently handle null args.Are these changes tested?
Test passing
array_filterwith an unresolved lambda variable toDataframe::with_column(which indirectly callsvalue_fields_with_higher_order_udf_and_lambdascoercion viaProjection::try_new,projection_schema,exprlist_to_fields,Expr::to_field)Are there any user-facing changes?
No