Make Zobrist hashing Apple Metal (jax-metal) compatible#1317
Open
gweber wants to merge 3 commits into
Open
Conversation
lax.reduce with bitwise_xor fails to legalize on the Metal XLA backend (UNIMPLEMENTED: mhlo.reduce). Replace the XOR-reduction in chess and go Zobrist hashing with an equivalent bit-parity computation built from sum/ shifts (utils.xor_reduce), numerically identical on every backend. This unblocks chess and go self-play on Apple Silicon GPUs.
Keep the per-bit-parity path (which makes the reduction legalize on jax-metal) only on the Metal backend, and use the native bitwise_xor reduction on CPU/CUDA/TPU. The bit-parity fallback expands every value into its bits, which is ~15x slower on CPU and ~100x on CUDA; since this runs once per env step for Zobrist hashing it measurably slowed chess/go on the non-Metal backends. Backend is resolved at trace time, so jitted code pays no runtime cost. Hashes are bit-identical on all backends.
Author
|
Updated so the per-bit-parity path is used only on Metal, with the native The bit-parity fallback expands every value into its bits, which measurably slows Zobrist hashing on the non-Metal backends since it runs once per env step. Full
Hashes are bit-identical on all backends and the go/chess counting + scoring tests still pass. This keeps Metal compatibility without regressing the backends most people run on. |
Harden the backend check from a default_backend() string match to a device SIGNAL (platform + device_kind contains 'metal'/'apple'), so a CUDA device reporting platform 'gpu' is never misread as Metal, and a Metal device reporting platform 'gpu' is still caught by its Apple device_kind. Add a PGX_XOR_REDUCE=native|parity escape hatch; fall back to the parity path if devices can't be introspected. No behavior change on CPU/CUDA/Metal (verified native==parity==ground-truth).
gweber
added a commit
to gweber/pgx
that referenced
this pull request
Jun 12, 2026
…ty split) Adopt the same implementation as PR sotetsuk#1317 (metal-compat b4d97c6) so mushin and the upstream branch share ONE xor_reduce and won't conflict when metal-compat lands. Functionally identical to the prior mushin version (native on CUDA/CPU, parity only on Metal, signal-based detection + PGX_XOR_REDUCE override); only factors the Metal fallback into a helper.
gweber
added a commit
to gweber/pgx
that referenced
this pull request
Jun 12, 2026
…es + docs Brings the Mac lineage (CPU-perf pass +55-70% Go, tiered Bloom-PSK [Bloom pre-filter + exact recent window], narrower int8/int16 dtypes, segment_sum chain accumulation, chain-stats cache, chess king-danger prune / occupancy-bitboard / float16 obs, backgammon legal-mask opts, fork documentation, benchmark harness) together with this branch's Layer-Go (board-wide legal mask, hash-based superko) + backend-aware xor_reduce. Only conflict was utils.py: kept the canonical native-dispatch xor_reduce (PR sotetsuk#1317: _xor_reduce_bitparity + _use_native_xor + PGX_XOR_REDUCE override) AND added the Mac's bloom_insert/bloom_query helpers (used by the tiered Go PSK). Dropped the Mac's superseded bit-parity-only xor_reduce. Verified: go_19x19/go_9x9/chess vmapped steps run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
chessandgofail to run on the Apple Metal backend (jax-metal):The Metal XLA backend cannot legalize a
lax.reducewhose reducer isbitwise_xor(
reduce-sum/max are fine; the custom bitwise-xor reduction is not). This affects theZobrist hashing in both
chess(2 sites) andgo(1 site) — the onlylax.reducebitwise-xor uses in the codebase.
Fix
Add
pgx._src.utils.xor_reduce, which computes the identical XOR-reduction via per-bitparity built from
sum+ shifts (all Metal-supported), and use it in chess/go Zobristhashing. Numerically identical on every backend (verified bit-for-bit vs the original
lax.reduceon random uint32 arrays); no behaviour change on CPU/GPU/TPU.Result
chess and go init/step and full (mctx Gumbel) self-play now run on Apple Silicon GPUs.
Existing chess legality was cross-checked against python-chess across 200+ positions
(castling/en-passant/promotions) with the change applied.