Skip to content

feat(compiler): FFI and C# bindings for RVM, incl. host-await built-ins#672

Open
kusha wants to merge 10 commits into
microsoft:mainfrom
kusha:markbirger/host-await-ffi-csharp-bindings
Open

feat(compiler): FFI and C# bindings for RVM, incl. host-await built-ins#672
kusha wants to merge 10 commits into
microsoft:mainfrom
kusha:markbirger/host-await-ffi-csharp-bindings

Conversation

@kusha

@kusha kusha commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Exposes the registered host-await builtins, Program compilation, and RVM runtime accessors through the FFI layer and C# bindings. This is the companion to #667 (registered host-await builtins in the compiler/VM), making the feature usable from C# consumers.

Rebased onto the updated #667. The multi-identifier SetHostAwaitResponses API (below) supersedes the earlier single-identifier form and addresses the review note about regorus_rvm_set_host_await_responses clearing other identifiers' queues.

Motivation

The previous PR added registered host-await builtins to the Rust compiler and VM. However, the FFI boundary and C# bindings only exposed the raw __builtin_host_await path. This PR bridges the gap so C# consumers can:

  1. Register host-await builtins at compile time - pass builtin names to Program.CompileFromModules, which emits HostAwait instructions directly.
  2. Pre-load responses for run-to-completion mode - call Rvm.SetHostAwaitResponses to queue responses (for any number of identifiers) before execution.
  3. Inspect suspension state - call Rvm.GetHostAwaitIdentifier() and Rvm.GetHostAwaitArgument() to determine which builtin suspended and with what argument.

Changes

Rust VM (src/rvm/vm/machine.rs)

  • get_host_await_argument() and get_host_await_identifier() - const fn accessors that return the argument/identifier when the VM is in a HostAwait-suspended state.

FFI (bindings/ffi/src/rvm.rs)

  • RegorusHostAwaitBuiltin - #[repr(C)] struct carrying a builtin name. The argument count is fixed to 1 by the compiler, so it is not part of the FFI surface.
  • regorus_program_compile_from_modules_with_host_await - compile from modules with registered host-await builtins; the non-host-await path is treated as the empty-builtins case.
  • regorus_rvm_set_host_await_responses - pre-load responses for run-to-completion mode. Accepts an array of RegorusHostAwaitResponseSet (one per identifier), so multiple identifiers are configured in a single call.
  • regorus_rvm_get_host_await_argument / regorus_rvm_get_host_await_identifier - JSON accessors for suspension state.

C# Bindings (bindings/csharp/Regorus/)

  • HostAwaitBuiltin - readonly struct wrapping a builtin name (single-argument constructor; the arg count is fixed to 1 at the compiler level).
  • ExecutionMode - enum (RunToCompletion, Suspendable).
  • Program.CompileFromModules - overload accepting IReadOnlyList<HostAwaitBuiltin>.
  • Rvm.SetHostAwaitResponses(IReadOnlyDictionary<string, IReadOnlyList<string>>) - queue JSON responses for any number of identifiers in one call. Replaces the entire response configuration.
  • Rvm.GetHostAwaitArgument() / GetHostAwaitIdentifier() - read suspension state.
  • ModuleMarshalling - PinnedUtf8Strings, PinnedHostAwaitBuiltins, and PinnedHostAwaitResponseSets helpers for safe FFI marshalling.

Documentation

  • bindings/csharp/README.md - suspendable and run-to-completion examples with registered builtins.
  • bindings/csharp/API.md - Program, Rvm, HostAwaitBuiltin, ExecutionMode API reference.
  • docs/rvm/vm-runtime.md - the new VM accessors.

Tests (bindings/csharp/Regorus.Tests/RvmProgramTests.cs)

  • RegisteredHostAwait_Suspendable_SuspendAndResume - registers get_account, suspends, verifies identifier and argument, resumes with a response.
  • RegisteredHostAwait_RunToCompletion_WithPreloadedResponses - registers translate, pre-loads a response, verifies end-to-end execution.
  • RegisteredHostAwait_RunToCompletion_MultipleIdentifiersInSingleCall - preloads responses for two identifiers in one SetHostAwaitResponses call.
  • RegisteredHostAwait_RunToCompletion_FailsWhenResponseQueueExhausted - missing response surfaces as an error.
  • RegisteredHostAwait_GetAccessorsReturnNullWhenVmIsNotSuspended - accessors return null outside a suspension.
  • RegisteredHostAwait_CompileRejectsEmptyOrWhitespaceName / CompileRejectsDuplicateRegistration / CompileRejectsReservedName - registration validation errors surface through the bindings.

Notes

  • SetHostAwaitResponses replaces the full response set: it configures responses for all supplied identifiers in a single call and replaces any previously configured responses. This mirrors the underlying RegoVM::set_host_await_responses replace-all semantics (pre-existing upstream behavior).

  • CompileFromEngine does not yet support host-await builtins: Only CompileFromModules accepts HostAwaitBuiltin[]. This is a scoping choice to keep the PR smaller - there is no technical constraint preventing it. The engine-based path can be extended in a follow-up if needed.

@kusha kusha force-pushed the markbirger/host-await-ffi-csharp-bindings branch from fa62435 to 54fbb82 Compare April 9, 2026 20:31
@anakrish anakrish requested a review from Copilot April 9, 2026 21:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the RVM host-await feature end-to-end by wiring “registered host-await builtins” through the Rust compiler, VM accessors, the FFI layer, and the C# bindings, with accompanying docs and tests.

Changes:

  • Add compiler support to register host-awaitable builtin names at compile time and emit HostAwait for natural function-call syntax.
  • Expose host-await suspension inspection (identifier/argument) and run-to-completion response preloading via FFI + C# APIs.
  • Add Rust YAML regression cases plus new C# binding tests and documentation updates.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/rvm/rego/mod.rs Extends YAML harness to pass registered host-await builtins into compilation and (optionally) assert host-await arguments in suspendable mode.
tests/rvm/rego/cases/registered_host_await.yaml Adds regression coverage for registered builtins (shadowing, queueing, arg-count validation, overrides).
src/rvm/vm/machine.rs Adds VM accessors for host-await identifier/argument when suspended.
src/languages/rego/compiler/rules.rs Adds compile_from_policy_with_host_await and wires registrations into compilation.
src/languages/rego/compiler/mod.rs Adds storage + registration API for host-awaitable builtins (with validation).
src/languages/rego/compiler/function_calls.rs Updates call resolution/emission to support registered host-await builtins and emit identifier literals.
docs/rvm/vm-runtime.md Documents new VM host-await accessors.
docs/rvm/instruction-set.md Documents registered host-await builtin behavior and resolution order.
docs/rvm/architecture.md Describes explicit vs registered HostAwait emission paths.
bindings/ffi/src/rvm.rs Adds FFI structs/APIs for compile-with-builtins, response preloading, and suspension JSON accessors.
bindings/csharp/Regorus/Rvm.cs Adds C# wrappers for host-await inspection + response preloading.
bindings/csharp/Regorus/Program.cs Adds CompileFromModules overload supporting host-await builtin registrations; refactors compile paths.
bindings/csharp/Regorus/NativeMethods.cs Adds P/Invoke declarations and FFI struct for host-await builtins + new VM functions.
bindings/csharp/Regorus/ModuleMarshalling.cs Generalizes UTF-8 string pinning helper and adds host-await builtin marshalling.
bindings/csharp/Regorus/Compiler.cs Introduces public HostAwaitBuiltin struct for registration.
bindings/csharp/Regorus.Tests/RvmProgramTests.cs Adds suspendable + run-to-completion tests for registered host-await builtins.
bindings/csharp/README.md Adds usage examples for registered host-await in both execution modes.
bindings/csharp/API.md Documents new/expanded Program/Rvm/HostAwaitBuiltin APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bindings/ffi/src/rvm.rs
Comment thread bindings/csharp/Regorus/Rvm.cs Outdated
Comment thread bindings/csharp/Regorus/ModuleMarshalling.cs Outdated
Comment thread tests/rvm/rego/cases/registered_host_await.yaml Outdated
Mark Birger and others added 10 commits June 28, 2026 23:51
Allow hosts to register function names at compile time so that calls to
those names emit HostAwait instructions directly, enabling natural syntax
like fetch(x) instead of __builtin_host_await(x, "fetch").

- Add host_await_builtins map and register_host_await_builtin() to Compiler
- Validate arg_count == 1 and reject reserved __builtin_host_await name
- Extend determine_call_target() resolution: explicit > registered > user > builtin
- Both explicit and registered paths emit identical HostAwait bytecode
- Add compile_from_policy_with_host_await() entry point in rules.rs
- Extended test harness with HostAwaitBuiltinSpec and args assertion
- 9 YAML test cases: suspend/resume, run-to-completion, multiple names,
  queue, shadowing, object packing, arg_count rejection, reserved name
  rejection, standard builtin override
- Documentation: instruction-set.md, architecture.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Mark Birger <birgerm@yandex.ru>
… registration

- Compiler::register_host_await_builtin now rejects duplicate, empty,
  and whitespace-only names. Previously a duplicate registration would
  silently overwrite the existing entry, which could mask the host's
  own registration mistakes.
- YAML test cases added: empty registration list as no-op, duplicate
  name rejection, empty/whitespace name rejection, out-param (a, out)
  calling syntax with a single-arg registered builtin, and mixed
  __builtin_host_await + registered builtins in the same policy
  consuming from their respective identifier queues.
- Test harness: replace assert_eq! on HostAwait argument mismatch with
  anyhow::Error so mismatches propagate through the case reporter
  instead of panicking and skipping the harness's normal error path.
- YAML comment fix: "Registration panics" -> "Registration fails with
  an error" (registration returns Err, never panics).

Addresses anakrish + Copilot inline review comments on PR microsoft#667.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…riants

Addresses PR microsoft#667 review item microsoft#8: at the emit site in
`compile_function_call`, the discrimination between explicit
`__builtin_host_await(arg, id)` and a registered host-awaitable
builtin was being recovered by string-comparing `original_fcn_path`
against `"__builtin_host_await"`. The information was already known
in `determine_call_target` and was being thrown away.

Replace the single `CallTarget::HostAwait` variant with two:

* `ExplicitHostAwait` (unit) — the two-argument call form. The
  identifier register comes from the user's second argument.
* `RegisteredHostAwait { identifier: String }` — the one-argument
  call form for registered builtins. The identifier is the registered
  name and is captured in the variant at recognition time, so the
  emit site never re-derives it from the function path.

This removes the magic-string comparison at the emit site (the source
of truth is now `determine_call_target`) and makes both match sites
in `compile_function_call` exhaustive over the two forms — adding a
third host-await form in the future would force a compile error at
every match site instead of silently falling through.

Arities are now hardcoded in the `expected_args` extraction
(`Some(2)` for explicit, `Some(1)` for registered) rather than
carried in the variant; registered builtins are constrained to
`arg_count == 1` at registration time, so there is no per-call
variability to carry.

Bytecode output is unchanged; the full RVM test suite (97 cases) and
the registered_host_await suite (15 cases) pass without modification.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…calls only

PR microsoft#667 review (Medium): the docs implied registered host-await names
shadow user functions and builtins unconditionally, but
determine_call_target matches only the bare original_fcn_path. A
package-qualified call such as data.demo.resolve(x) is therefore not
intercepted -- it resolves through the normal path like any other call.

Rather than expand registration to qualified paths (which would let a
registered name leak into every package exposing a same-named rule),
document the unqualified-only behavior and pin it with tests.

- register_host_await_builtin: doc now states only the unqualified call
  form is intercepted; qualified calls resolve normally.
- determine_call_target: inline comment explaining the deliberate
  original_fcn_path-only match.
- docs/rvm/instruction-set.md: describe qualified-call resolution,
  including that builtins have no qualified form.
- tests: cross-package and same-package qualified calls resolve to the
  rule; bare-name shadowing of a standard builtin; Unknown-function
  outcome when no rule exists at the qualified path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR microsoft#667 review (Low): the suspendable test harness compared the
host-await argument via process_value(argument), but argument is already
a runtime Value. process_value is a YAML-fixture decoder -- it rewrites
"#undefined" to Undefined, {set!: [...]} to a set, and errors on a
runtime Value::Set. Re-running it on the runtime argument could coerce a
legitimate payload into a fixture sentinel (passing for the wrong
reason) or error outright on sets.

Compare the runtime argument directly against the expected value, which
is already decoded once at YAML load time.

Add a regression case (registered_builtin_suspendable_set_argument) that
passes a set payload: it fails under the old double-processing
("unexpected set in value read from json/yaml") and passes with the fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mode

PR microsoft#667 review (Low): a run-to-completion host-await response could carry
an `args:` payload expectation, but RTC execution pre-loads responses and
never surfaces the call argument to the harness, so the expectation was
parsed and silently dropped. A case with `args: "WRONG"` passed as long as
the result matched -- asserting a payload that was never checked.

Reject `args:` for run-to-completion fixtures at load time, directing the
author to suspendable mode where arguments are validated. Also only build
the run-to-completion response vector when the case actually runs in RTC
mode, so a suspendable case using the shared host_await_responses field
with `args:` is not wrongly rejected.

Route the fixture-load error through the same want_error handling used for
compilation errors, and add registered_builtin_run_to_completion_rejects_args
which now fails loudly instead of passing silently.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…space

PR microsoft#667 review (Low): register_host_await_builtin rejected all-whitespace
names via name.trim().is_empty(), but accepted padded names like " lookup"
or "lookup ". Those were inserted into host_await_builtins, but Rego
function-call paths produce the trimmed identifier, so a padded
registration could never match -- a silent dead registration.

Reject any name that is not already trimmed (name != name.trim()) in
addition to empty names, and update the error message accordingly. Add
test cases for leading and trailing whitespace.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expose registered host-await builtins, Program compilation, and RVM
accessors through the FFI layer and C# bindings.

FFI (bindings/ffi/src/rvm.rs):
- compile_from_modules with host-await builtin registration
- set/get host-await responses, argument, identifier
- RegorusHostAwaitBuiltin C struct

C# (bindings/csharp/Regorus/):
- Program.CompileFromModules overloads with HostAwaitBuiltin[]
- Rvm.SetHostAwaitResponses, GetHostAwaitArgument, GetHostAwaitIdentifier
- HostAwaitBuiltin readonly struct, ExecutionMode enum
- ModuleMarshalling: PinnedUtf8Strings, PinnedHostAwaitBuiltins

Rust (src/rvm/vm/machine.rs):
- get_host_await_argument() and get_host_await_identifier() accessors

Docs: README examples, API.md reference, vm-runtime.md accessors
Tests: suspendable + run-to-completion C# scenarios
- HostAwaitBuiltin: single-arg (name) ctor; arg_count fixed to 1
- SetHostAwaitResponses: multi-identifier dictionary API
- always route CompileFromModules through host-await FFI
- dead-check/IIFE cleanup, pub(crate), expanded tests + docs
@kusha kusha force-pushed the markbirger/host-await-ffi-csharp-bindings branch from 54fbb82 to 145ee89 Compare June 29, 2026 06:55
@kusha kusha marked this pull request as ready for review June 29, 2026 16:49
@anakrish

anakrish commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

@ review this PR using all the review skills in the repo. Use a separate agent for each skill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants