Skip to content

[python] Support reading bitmap global index#8289

Merged
JingsongLi merged 1 commit into
apache:masterfrom
JingsongLi:codex/bitmap-python-read
Jun 19, 2026
Merged

[python] Support reading bitmap global index#8289
JingsongLi merged 1 commit into
apache:masterfrom
JingsongLi:codex/bitmap-python-read

Conversation

@JingsongLi

Copy link
Copy Markdown
Contributor

Summary

Add Python-side support for reading Java-written bitmap global index files and cover the Java/Python interoperability path.

Changes

  • Add a Python bitmap global index reader that parses bitmap index footers, dictionary blocks, block trailers, and roaring bitmaps.
  • Wire bitmap index dispatch into GlobalIndexScanner.
  • Add mixed Java/Python tests for Java bitmap index writes and Python index scans.
  • Include the bitmap index flow in run_mixed_tests.sh.

Testing

  • python -m py_compile pypaimon/globalindex/bitmap/bitmap_index_reader.py pypaimon/globalindex/bitmap/lazy_filtered_bitmap_reader.py pypaimon/globalindex/global_index_scanner.py pypaimon/tests/e2e/java_py_read_write_test.py
  • mvn -pl paimon-core -Pfast-build -Dtest=org.apache.paimon.JavaPyE2ETest#testBitmapIndexWrite -Drun.e2e.tests=true test
  • python -m pytest java_py_read_write_test.py::JavaPyReadWriteTest::test_read_bitmap_index_table -v

@JingsongLi JingsongLi marked this pull request as draft June 19, 2026 07:37
@JingsongLi JingsongLi force-pushed the codex/bitmap-python-read branch 2 times, most recently from 3bc23c7 to e6174aa Compare June 19, 2026 10:16
@JingsongLi JingsongLi marked this pull request as ready for review June 19, 2026 10:42

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding Python-side bitmap global index reading. I found one storage-format compatibility gap that should be fixed before merge.

BitmapIndexReader._read_compressible_block currently handles only NONE and ZSTD block trailers. However, the Java bitmap writer uses the shared BlockCompressionFactory: bitmap-index.compression can produce valid bitmap index files with none, zstd, lz4, or lzo dictionary/index blocks, and those are persisted with compression ids 0/1/2/3 in the trailer. A Java-written bitmap index with bitmap-index.compression=lz4 or lzo will therefore be a valid file, but the Python reader will fail with Compression type 2/3 not supported while loading the dictionary block index or a dictionary block.

Please either add LZ4/LZO decoding for the same compressible-block format (and cover non-default compression in the mixed Java/Python tests), or otherwise make the Java/Python-compatible bitmap format/config explicitly reject unsupported compression before such files can be produced/used silently.

@JingsongLi JingsongLi force-pushed the codex/bitmap-python-read branch from e6174aa to 5f7e5bd Compare June 19, 2026 11:33
@JingsongLi JingsongLi force-pushed the codex/bitmap-python-read branch from 5f7e5bd to e33908f Compare June 19, 2026 11:59

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The previous storage-format compatibility concern is addressed: the Python reader now uses the shared Java block-trailer format and supports LZ4/LZO in addition to NONE/ZSTD, and both BTree and bitmap readers go through the shared decompression path.

I re-checked the sorted-index metadata pruning, fallback-scan budget wiring, LIKE optimization, bitmap dictionary/bitmap block layout, and the mixed Java/Python coverage for compressed BTree and bitmap global indexes. The implementation now looks consistent with the Java reader/writer format to me.

@JingsongLi JingsongLi merged commit 76e32fc into apache:master Jun 19, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants