fix: More exhaustive athena types#6544
Conversation
6e8d53c to
c14ecff
Compare
|
|
||
| type_map = { | ||
| "null": ValueType.UNKNOWN, | ||
| "null": ValueType.NULL, |
There was a problem hiding this comment.
I wonder though if we should keep this for backwards compat?
Signed-off-by: Marcus Rosti <marcus.rosti@baton.io> fix: Double quotes and type Signed-off-by: Marcus Rosti <marcus.rosti@baton.io>
c14ecff to
ead51c0
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c4dc1d85e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return list_mapping.get(inner_feast_type, ValueType.VALUE_LIST) | ||
| return ValueType.VALUE_LIST | ||
|
|
||
| base_type = re.split(r"[(<\[]", athena_type)[0].strip() |
There was a problem hiding this comment.
Preserve non-string Athena map keys
When Athena returns a map type with a non-string key, e.g. MAP<INT, STRING> or MAP(INTEGER, VARCHAR), this split drops the key type and the existing map entry below infers ValueType.MAP. That path serializes through Feast's string-keyed Map proto, so materializing or inferring features with integer/boolean/etc. map keys will fail during conversion or be represented with the wrong schema instead of being rejected or handled with the non-string-key map path.
Useful? React with 👍 / 👎.
| ValueType.UUID: ValueType.UUID_LIST, | ||
| ValueType.DECIMAL: ValueType.DECIMAL_LIST, | ||
| } | ||
| return list_mapping.get(inner_feast_type, ValueType.VALUE_LIST) |
There was a problem hiding this comment.
Keep nested array element types
For Athena columns like ARRAY<ARRAY<INT>>, the recursive call returns ValueType.INT32_LIST, which is not in list_mapping, so this fallback classifies the column as generic VALUE_LIST. The generic path is later converted to placeholder string nested lists (Array(Array(String)) in from_value_type, and the Athena type string cannot be parsed by the PyArrow fallback), so feature inference/materialization will use the wrong schema for nested numeric, boolean, or timestamp arrays.
Useful? React with 👍 / 👎.
What this PR does / why we need it:
There's a lot of uncovered athena types -- so I tried to include them all
Which issue(s) this PR fixes:
Checks
git commit -s)Testing Strategy
Misc