[spark] Record the write operation type in snapshot properties#8236
[spark] Record the write operation type in snapshot properties#8236Zouxxyy wants to merge 4 commits into
Conversation
…ELETE Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I think adding operation as a dedicated nullable field in Snapshot is a better direction than storing it in properties. Compatibility should also be fine:
I would suggest modeling it as a first-class nullable enum or string field, for example Snapshot.Operation, rather than putting it into properties. commitKind describes the physical snapshot change, while operation describes the logical user operation, so both feel like core snapshot metadata. This would also avoid introducing a generic withCommitProperties API just for one standard field, and avoids potential conflicts around the "operation" property key. |
Purpose
A Paimon snapshot only records the physical
CommitKind(APPEND/COMPACT/OVERWRITE/...), not the logical operation that produced it — so an APPEND fromINSERT INTOcannot be told apart from one produced byMERGE INTO.This PR records the logical operation type in the snapshot
propertiesmap under the keyoperation. No format change —Snapshotalready has aproperties: Map<String, String>field.Core: add
InnerTableCommit#withCommitProperties(...), applied inTableCommitImplso the properties land on every snapshot the commit generates (both the append and overwrite paths, sinceFileStoreCommitImplsources snapshot properties fromcommittable.properties()).Spark (both v1 and v2 write paths):
WRITEOVERWRITEDELETEUPDATEMERGECREATE TABLE AS SELECTREPLACE TABLE AS SELECT/CREATE OR REPLACE TABLE AS SELECTTests
Added
SnapshotOperationTest(paimon-spark-ut) asserting the recordedoperationfor INSERT/OVERWRITE/UPDATE/DELETE/MERGE under bothspark.paimon.write.use-v2-write=trueandfalse, plus CTAS/RTAS.