[python] Support reading bitmap global index#8289
Conversation
3bc23c7 to
e6174aa
Compare
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for adding Python-side bitmap global index reading. I found one storage-format compatibility gap that should be fixed before merge.
BitmapIndexReader._read_compressible_block currently handles only NONE and ZSTD block trailers. However, the Java bitmap writer uses the shared BlockCompressionFactory: bitmap-index.compression can produce valid bitmap index files with none, zstd, lz4, or lzo dictionary/index blocks, and those are persisted with compression ids 0/1/2/3 in the trailer. A Java-written bitmap index with bitmap-index.compression=lz4 or lzo will therefore be a valid file, but the Python reader will fail with Compression type 2/3 not supported while loading the dictionary block index or a dictionary block.
Please either add LZ4/LZO decoding for the same compressible-block format (and cover non-default compression in the mixed Java/Python tests), or otherwise make the Java/Python-compatible bitmap format/config explicitly reject unsupported compression before such files can be produced/used silently.
e6174aa to
5f7e5bd
Compare
5f7e5bd to
e33908f
Compare
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous storage-format compatibility concern is addressed: the Python reader now uses the shared Java block-trailer format and supports LZ4/LZO in addition to NONE/ZSTD, and both BTree and bitmap readers go through the shared decompression path.
I re-checked the sorted-index metadata pruning, fallback-scan budget wiring, LIKE optimization, bitmap dictionary/bitmap block layout, and the mixed Java/Python coverage for compressed BTree and bitmap global indexes. The implementation now looks consistent with the Java reader/writer format to me.
Summary
Add Python-side support for reading Java-written bitmap global index files and cover the Java/Python interoperability path.
Changes
bitmapindex dispatch intoGlobalIndexScanner.run_mixed_tests.sh.Testing
python -m py_compile pypaimon/globalindex/bitmap/bitmap_index_reader.py pypaimon/globalindex/bitmap/lazy_filtered_bitmap_reader.py pypaimon/globalindex/global_index_scanner.py pypaimon/tests/e2e/java_py_read_write_test.pymvn -pl paimon-core -Pfast-build -Dtest=org.apache.paimon.JavaPyE2ETest#testBitmapIndexWrite -Drun.e2e.tests=true testpython -m pytest java_py_read_write_test.py::JavaPyReadWriteTest::test_read_bitmap_index_table -v