[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions by weibangpeng · Pull Request #8245 · apache/paimon

weibangpeng · 2026-06-16T02:49:53Z

Summary

Added ARRAY, MAP, ROW type serialization and deserialization support to IcebergConversions
Each element/field within complex types is prefixed with a 4-byte LE length to handle variable-length children correctly
Null elements are represented with length -1

Test plan

Added IcebergConversionsComplexTypeTest with 17 tests covering:
- ARRAY: primitive types, strings, empty arrays, null elements, nested arrays, decimals, timestamps
- MAP: int→string, empty maps, null values, array values
- ROW: mixed field types, null fields, nested rows, fields with array and map
- Mixed nesting: array of rows, map with array values, row with array and map fields
All 67 existing IcebergConversions tests continue to pass (no regressions)
IcebergCompatibilityTest (19 tests) continues to pass

JingsongLi · 2026-06-16T09:00:36Z

                        (Timestamp) value, ((LocalZonedTimestampType) type).getPrecision());
            case TIME_WITHOUT_TIME_ZONE:
                return timeToByteBuffer((Integer) value, ((TimeType) type).getPrecision());
+            case ARRAY:


This writes a Paimon-private encoding into Iceberg lower/upper bounds for nested types. Iceberg's binary single-value serialization only has implementations for primitive values (Iceberg Conversions throws for list/map/struct), so an Iceberg reader that sees these bounds will try to decode them as Iceberg metadata and fail or misinterpret them. If we cannot encode these values using Iceberg's standard format, we should omit lower/upper bounds for ARRAY/MAP/ROW instead of serializing a custom round-trip format here.

… encoding Iceberg binary single-value serialization only defines formats for primitive types. Writing Paimon-private encoding for ARRAY/MAP/ROW would break standard Iceberg readers. Omit bounds for complex types in IcebergDataFileMeta.create() instead.

weibangpeng

Thanks for the review @JingsongLi! You're right that writing Paimon-private encoding into Iceberg bounds would break standard Iceberg readers.

I've revised the approach: instead of adding custom complex type conversions to IcebergConversions, I now skip lower/upper bounds for ARRAY, MAP, ROW, and MULTISET types in IcebergDataFileMeta.create(). This way, complex type fields simply won't have bounds in the manifest, and Iceberg readers will treat them as missing stats (which is already a valid case for the Iceberg spec).

[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversi…

6ff503a

…ons (apache#3788)

weibangpeng mentioned this pull request Jun 16, 2026

[Feature][core] Implement more type conversions for IcebergConversions #3788

Open

2 tasks

JingsongLi reviewed Jun 16, 2026

View reviewed changes

weibangpeng commented Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions#8245

[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions#8245
weibangpeng wants to merge 2 commits into
apache:masterfrom
weibangpeng:iceberg-complex-type-conversions

weibangpeng commented Jun 16, 2026

Uh oh!

JingsongLi Jun 16, 2026

Uh oh!

weibangpeng left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weibangpeng commented Jun 16, 2026

Summary

Test plan

Uh oh!

JingsongLi Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

weibangpeng left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants