PSDP — Within-language Parallelism
Phase-Synchronous Deterministic Parallelism. Parallelize a sequential program in the same language, without changing a single bit of its output. A commutator-norm phase-sync mechanism gives an algebraic guarantee of determinism under parallel execution.
Headline numbers
NENC kernel sweep (Java 8), near-linear scaling with core count
GRA sweep (Java 8), bandwidth-bound, plateaus accordingly
I/O bound, but bit-exactness is preserved
32-lane SIMD, bit-exact PASS, ±5% vs AVX2 (memory-bound)
claims 29–32, exposed as 5 profiles in SlimeSyCUDA (GAME_MIN to SAFETY_MAX)
each Subset A target language ships with a paired PSDP variant
Mechanism — commutator-norm phase synchronization
A sufficient condition for parallel execution to produce the same result as sequential execution is derived from the commutator norm ‖[A, B]‖ of the relation operators involved. At runtime, we enforce phase synchronization to keep the commutator below a threshold, and the Böttcher–Wenzel inequality then yields a closed-form upper bound on numerical drift. This is what eliminates the familiar phenomenon of "numbers change when the code goes parallel."
// PSDP example (excerpt from OrderBatchProcessor_PSDP.java) // The bench verifies that the SHA-256 of the input/output exactly // matches the sequential original (OrderBatchProcessor_ORIGINAL.java). public class OrderBatchProcessor { public Result process(List<Order> orders) { return orders.parallelStream() // only this is parallel .map(this::settle) .collect(PSDP.phaseSyncReduce(...)); // phase-sync reduce } }
The parallelization touches only a handful of API calls. Logic is not rewritten. A regression bench checks that sequential and parallel forms produce identical SHA-256 outputs.
Three layers
PSDP applies the same principle at three different layers. Identity of result is established via "phase sync + commutator-norm threshold" at every layer.
Eleven safety mechanisms (claims 29–32)
We classify the situations where parallel execution might leave deterministic territory into 11 categories, and provide a safety mechanism for each. SlimeSyCUDA (the GPU extension) exposes these as five staged profiles:
| GAME_MIN | Lowest overhead, intended for games. Safety mechanisms minimized; frame time is the priority. |
|---|---|
| BALANCED | Default profile. General-purpose parallelization. |
| STRICT | Strict profile for finance / scientific workloads. Bit-exact verification on every loop. |
| AUDIT | Full audit logging. Connects to the Subset A audit chain for round-trip proof. |
| SAFETY_MAX | All 11 mechanisms enabled. Intended for mission-critical applications such as aviation and medical systems. |
Languages with PSDP support
In addition to the five Subset A targets, the research implementation extends to a wider set of languages (23 converters under track_c/converter_*). Because both "sequential" and "PSDP-parallel" forms are emitted from the same Slot IR, migration (Subset A) and parallelization (Subset B) can be combined in a single tooling pipeline.
Benchmarks (Java 8, core sweep)
| Category | kernel | Speed-up | Notes |
|---|---|---|---|
| Compute | NENC (numerical equivalence) | 3.38 × | nearly linear in core count, CPU-bound |
| Graph | GRA (graph kernels) | 2.17 × | memory-bandwidth bound, bit-exact preserved |
| Database | TPC (transactions) | 1.02 × | I/O bound, but result invariance is guaranteed |
| SIMD | svt_av1_quantize_fp (AVX-512) | 3072 / 3072 | bit-exact PASS, ±5% vs AVX2 (memory-bound) |
Note: we do not accept the customary trade-off of "result drifts in exchange for speed-up." Bit-exactness is an absolute constraint. The 1.02× ceiling on the database benchmark is an I/O-bound physical limit, not a cost imposed by the safety mechanisms.
Audit suitability
- Bit-exactSequential and parallel outputs match by SHA-256. The standard failure mode of "the numbers shift slightly when we go parallel" is eliminated.
- Phase-sync guarantee‖[A, B]‖ ≤ ε is enforced at runtime. The Böttcher–Wenzel inequality then yields an algebraic upper bound on numerical drift.
- 11 safety mechanismsCovered by claims 29–32, exposed as 5 staged profiles (GAME_MIN through SAFETY_MAX) for application-specific tuning.
- Audit-chain couplingConnects to the Subset A audit chain (claim 9), giving a single pipeline that produces a bidirectional proof for both transformation and parallelization.
- Regression resilienceSame input + same version → bit-identical SHA-256. Output does not drift across parallel or GPU execution.
Typical use cases
| Financial batches | Parallelize nightly batch jobs to shorten the window, while guaranteeing "not a single yen drift." The SAFETY_MAX profile is intended for audit settings. |
|---|---|
| Scientific computing | FFT, Conv2D, LU decomposition and similar numerical kernels are parallelized bit-exactly. The familiar problem "we can't compare the results because parallel changed them" simply does not arise. |
| Video coding | Apply bit-exact parallelization to SlimeCodec QP control and the SVT-AV1 AVX-512 quantizer. Encoder parallel performance and reproducibility coexist. |
| Games / real-time | SlimeSyCUDA (GPU variant) exposes the GAME_MIN profile to minimize safety overhead and prioritize frame time. |
Technical specifications
| Patent | JP Patent App. 2026-046625 (Subset B = PSDP) / JP Patent App. 2026-046620 (Subset A coupling) |
|---|---|
| Claims | Phase synchronization (Layers 1–3) / 11 safety mechanisms (claims 29–32) / Audit chain coupling (claim 9) |
| Paper | PSDP Paper JP v5d (910 KB PDF, 2026-03-04) |
| Implementations | Java 8 PoC / AVX-512 MVP-A (svt_av1_quantize_fp 32-lane) / Rust + WASM demo (index_psdp.html) / 12+ language converters bundled |
| Standard tests | NENC / GRA / TPC kernel sweeps (Java 8) / 3072 AVX-512 blocks bit-exact |
| License model | Combined licensing with Subset A. Converter is licensed; converted output is unlicensed. Ed25519 3-hop activation. |
Related documentation
- PSDP paperPSDP Paper JP v5d (910 KB PDF, primary technical paper)
- Specification指示書_PSDP_v2並列開発.md (v2 parallel development specification)
- Reference samplesOrderBatchProcessor_ORIGINAL.java vs OrderBatchProcessor_PSDP.java (bit-exact contrast of sequential / parallel forms)
- Implementation logPSDP_IMPLEMENTATION_LOG.md (Phase A/B Rust + WASM implementation record)
- Patent specificationsJP Patent App. 2026-046625 (Subset B) / 2026-046620 (Subset A coupling)
