You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds a SlateDB-backed UniversalDB driver with snapshot reads, serializable conflict tracking, commit serialization via a shared mutex, a CAS-based object-store leader lease, and follower forwarding over a request/reply transport layer. The overall architecture is solid. A few correctness and operational issues are worth addressing.
1. Write-write conflicts are silently undetected — transaction_conflict_tracker.rs:56
cr1_type != cr2_type fires only on Read-Write overlaps. Two concurrent transactions that both write the same key with no reads (each contributes only a Write conflict range) pass through without conflict — the second commit silently wins. Under serializable isolation a Write-Write overlap on the same snapshot version must also be a conflict. The condition should be cr1_type == Write || cr2_type == Write (i.e., any write overlapping any range from a concurrent peer is a conflict).
2. Forwarded transaction sessions are never evicted — forwarding.rs:769–860
LocalForwardingHandler::sessions has no TTL, no size cap, and no background eviction. A session is removed only on Commit (line 851) or Cancel (line 856). Two paths leave orphaned sessions:
Client crash / timeout: the client issues a Get (creating a session), then dies without sending Commit or Cancel. The Transaction object (including its DbSnapshot and write buffer) lives in the map indefinitely.
Retry in run(): each retry creates a fresh transaction via create_txn() (new txn_id), so the previous forwarded session is dropped locally without ever sending Cancel to the server. With the 100-retry default, a single failing run() call can leave up to 100 orphaned server-side sessions.
A per-session idle TTL (e.g., 2× REQUEST_TIMEOUT) with a background eviction pass would bound growth. Alternatively, run() could call cancel() on the retiring transaction before each retry.
3. committed flag set before the commit-mutex is held — transaction.rs:282–296
ifself.committed.load(Ordering::SeqCst){returnOk(());}if !self.is_active(){returnErr(DatabaseError::NotCommitted.into());}self.committed.store(true,Ordering::SeqCst);// ← set here// ...let _guard = self.commit_mutex.lock().await;if !self.is_active(){returnErr(DatabaseError::NotCommitted.into());}// ← fails here
If is_active() returns false after committed is already true (leader demotion between lines 285 and 293), the method returns NotCommitted — but future commit_ref() calls will short-circuit at line 282 and silently return Ok(()), misrepresenting the outcome to callers that retry. The committed flag should only be set to true after the mutex is held and both is_active() checks pass.
Related: the load + store sequence is not a compare-exchange, so two concurrent commit_ref() calls can both pass the load(false) check, both call consume(), and both enter the commit path. The second caller gets empty ops (consume drains atomically) so no data is double-written, but it does consume a phantom commit version in the conflict tracker. Use compare_exchange(false, true, SeqCst, SeqCst) to make this atomic.
4. cancel() sends fire-and-forget with no delivery guarantee — forwarding.rs:679–688
The detached spawn has no way to ensure the Cancel message reaches the server before the client process exits or before the transport tears down. If the spawn is silent-dropped (runtime shutting down) or the request times out, the server session is never removed — worsening the session leak in finding 2.
5. Inline test module in lease.rs — lease.rs:146–262
CLAUDE.md states: "Rust tests live under tests/, not inline #[cfg(test)] mod tests in src/." The full test module at the bottom of lease.rs should be moved to engine/packages/universaldb/tests/ (a thin #[cfg(test)] #[path = "..."] mod tests; shim in lease.rs preserves access to private items if needed).
Identical private fn now_ms() -> u64 implementations appear in both files. Extract it to super::mod.rs or a shared utility so a future change (monotonic correction, test injection) only needs to be made once.
Verified using the codebase at slatedb branch head.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation
cargo test -p universaldb slatedb -- --nocapturecargo check -p universaldb -p rivet-config -p rivet-test-deps-docker -p rivet-pools -p rivet-engineExisting generated/protocol/depot warnings appear during the broad check and are unrelated to this change.