[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions#8245
[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions#8245weibangpeng wants to merge 2 commits into
Conversation
| (Timestamp) value, ((LocalZonedTimestampType) type).getPrecision()); | ||
| case TIME_WITHOUT_TIME_ZONE: | ||
| return timeToByteBuffer((Integer) value, ((TimeType) type).getPrecision()); | ||
| case ARRAY: |
There was a problem hiding this comment.
This writes a Paimon-private encoding into Iceberg lower/upper bounds for nested types. Iceberg's binary single-value serialization only has implementations for primitive values (Iceberg Conversions throws for list/map/struct), so an Iceberg reader that sees these bounds will try to decode them as Iceberg metadata and fail or misinterpret them. If we cannot encode these values using Iceberg's standard format, we should omit lower/upper bounds for ARRAY/MAP/ROW instead of serializing a custom round-trip format here.
… encoding Iceberg binary single-value serialization only defines formats for primitive types. Writing Paimon-private encoding for ARRAY/MAP/ROW would break standard Iceberg readers. Omit bounds for complex types in IcebergDataFileMeta.create() instead.
weibangpeng
left a comment
There was a problem hiding this comment.
Thanks for the review @JingsongLi! You're right that writing Paimon-private encoding into Iceberg bounds would break standard Iceberg readers.
I've revised the approach: instead of adding custom complex type conversions to IcebergConversions, I now skip lower/upper bounds for ARRAY, MAP, ROW, and MULTISET types in IcebergDataFileMeta.create(). This way, complex type fields simply won't have bounds in the manifest, and Iceberg readers will treat them as missing stats (which is already a valid case for the Iceberg spec).
Summary
IcebergConversionsTest plan
IcebergConversionsComplexTypeTestwith 17 tests covering:Closes #3788