[db-executable-env] Phase 2: DatabaseExecutableEnvironment (prototype) by aniruddh-alt · Pull Request #2520 · oumi-ai/oumi

aniruddh-alt · 2026-06-17T15:48:25Z

Summary

Prototype of an executable database environment for RL/eval over a SQLite database. Each rollout gets an isolated session whose writes are visible within an episode but never persist or leak across rollouts.

DatabaseExecutableEnvironment (registered "database"): per-rollout RollbackSession, requires_isolation()=True, executors dispatched with a live db connection.
db_isolation: RollbackSession (opens isolation_level=None + explicit BEGIN so both DML and a leading DDL statement roll back; executors must not commit) and materialize_sqlite_snapshot.
Fleshed out the ExecutableEnvironment base executor dispatch (_step_one) the skeleton left abstract.
EHR example tools/executors (list/lookup/update patient) + a YAML config — the "bring your DB" entry point through build_environment.
sql_execution_match reward that grades candidate vs gold SQL on a fresh isolated session (reusing the env's isolation on the grading side).

85 unit tests cover the isolation contract: uncommitted write visible within an episode, rolled back on close, no cross-rollout leak, shared snapshot never mutated, leading-DDL rollback.

Status: prototype

Opened to run experiments. Deliberately deferred: wiring into verl GRPO rollouts (single-turn NL2SQL first), copy/copy-on-write isolation for concurrent committed writes, and the schema/table/row database-mutation stack for init diversity.

Series context (db-executable-env chain)

Phase 1: ExecutableEnvironment + ExecutableTool skeleton — [db-executable-env] Phase 1: ExecutableEnvironment + ExecutableTool skeleton #2467 (open)
Phase 2: DatabaseExecutableEnvironment prototype — this PR (stacked on Phase 1)

Test plan

pytest tests/unit/environments tests/unit/datasets/grpo/rewards/test_sql_execution_match.py (85 pass)
Build the example env from configs/examples/database_env/ehr_database_env.yaml via build_environment

Move `import jsonschema` from between __future__ and stdlib imports to its correct position (stdlib → third-party → first-party), fixing the ruff I001 lint failure that was blocking CI.

…nused import Add Google-style docstring to RollbackSession.__init__ to satisfy ruff D107, and remove unused `from pathlib import Path` in the test file (ruff F401). Both violations would hard-fail the pre-commit ruff hook.

…solation Per-rollout RollbackSession (never commits, rolls back on close) so writes are visible within an episode and never persist or leak across rollouts. Includes the isolation proof tests (write-then-read within an episode, rollback on close, no cross-rollout leak, shared snapshot never mutated).

Execution-match reward grades candidate vs gold SQL on a fresh rollback session (reusing the env's isolation on the grading side). EHR YAML config + builder test exercise the 'bring your DB' entry point through build_environment.

@register

- RollbackSession opens isolation_level=None + explicit BEGIN so leading DDL (CREATE/DROP as the first statement) is also rolled back, not just DML. - sql_execution_match grades gold and candidate on separate sessions so a mutating gold query can't contaminate the candidate. - export sql_execution_match from rewards/__init__ so @register fires on package import (otherwise it's missing from the registry). - config test resolves its path relative to __file__, not CWD. - document that db_path isolation is read-concurrent only (writers contend).

ToolResult.output is str | dict; subscripting it tripped pyright's pre-push check. Compare the whole output dict instead.

aniruddh-alt added 12 commits June 16, 2026 11:40

feat(environments): implement ExecutableEnvironment executor dispatch

0355ea7

fix(environments): correct isort import order in ExecutableEnvironment

05eebae

Move `import jsonschema` from between __future__ and stdlib imports to its correct position (stdlib → third-party → first-party), fixing the ruff I001 lint failure that was blocking CI.

feat(environments): add rollback-based SQLite isolation primitive

131d63e

feat(environments): add EHR example tools/executors

3cc60a6

fix(environments): assert full dict equality in DB env tests

fb75184

ToolResult.output is str | dict; subscripting it tripped pyright's pre-push check. Compare the whole output dict instead.

fix(environments): cast OmegaConf container for EnvironmentParams kwargs

d1c47f7

style(environments): ruff format ehr example schema string

430bec8

docs(environments): add docstrings to EHR example executors

d470ad8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[db-executable-env] Phase 2: DatabaseExecutableEnvironment (prototype)#2520

[db-executable-env] Phase 2: DatabaseExecutableEnvironment (prototype)#2520
aniruddh-alt wants to merge 12 commits into
aniruddh-alt/db-executable-env-01-executable-skeletonfrom
aniruddh-alt/db-executable-env-02-database-env

aniruddh-alt commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aniruddh-alt commented Jun 17, 2026

Summary

Status: prototype

Series context (db-executable-env chain)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant