SlimeASM-rev — Linux ELF x86_64 binary → ASM + C reverse transpiler
Recover NASM and C sources from a tag-less native ELF, bit-exact.
Convert source-lost / vendor-lost Linux x86_64 native binaries from banking,
defense and embedded systems back into both NASM intel source and C source,
bit-exact. The emitted NASM is re-assembled with nasm + ld; the emitted C is
compiled with gcc -O0 -nostdlib -static; both produce real native
ELFs that, when run, reproduce the original binary's stdout byte-for-byte — the
strictest round-trip axis we know how to write.
- Phase A: ELF64 minimal parser + x86_64 instruction decoder (integer hot-loop subset, ~14 patterns) + NASM intel emitter + straight-line C emitter
- Phase B entry: CFG recovery (Cooper-Harvey-Kennedy iterative dominator + Aho/Sethi natural loop body) + structured C (
do { ... } while (R[1] != 0);+if/elsediamond) - Phase B (b): function-boundary recovery (prologue/epilogue + call/ret + self-recursion). Each function emits as its own C function, call → C call, ret → return;
- Phase B (d): inter-procedural Slot IR — function = Slot node, call graph lifted as first-class IR edges, deterministic JSON round-trip; self-recursion expressed naturally
- S9 bench: all 8 axes 64/64 PASS (2026-05-11), 8 samples including recursive fact(4) = 24, both ASM and C round-trips green
A reverse transpiler for “source-lost native binaries” in
Linux x86_64 banking and defense systems, built on
deterministic translation + 8-axis round-trip auto-regression + audit chain.
SlimeNENC family's first reverse-direction product, paired with the
existing forward family (COBOL / HLASM / MASM / MUMPS / PL/I / RPG / FORTRAN / Natural).
Key measurements (2026-05-11)
= ALL AXES PASS (Phase A + B entry + (b) + (d))
NASM emit → nasm + ld → run → original stdout match
C emit → gcc -nostdlib -static → run → original stdout match
CFG + do-while/if-else + per-function → gcc → run → match
ASM (via NASM) + C (via GCC), both verified on real ELFs
call graph as first-class edge, self-recursion native
Market context — where source-lost native binaries live
| Banks (Linux x86_64) | During modernization projects, native ELF / .so libraries with no source and no surviving maintenance vendor. Heirloom / Astadia focus on mainframe HLASM and do not target Linux native binaries. |
|---|---|
| Defense / aerospace | Closed binaries (embedded Linux ELF / instrumentation daemons) frozen for 10-30 years. The originating vendor cannot supply source, but a C-source recovery with audit chain is required. |
| Embedded / medical devices | FDA / PMDA / IEC 62304 obligate "complete software description". Binary-only components must be lifted to C as auditor-reproducible documentation. |
| Legacy documentation | "Working but untouchable" daemons must be lifted to C so static analysis, SBOM and CVE auditing can apply. |
| Competitive landscape | Ghidra (NSA OSS) / IDA Pro / Hex-Rays / RetDec already exist. SlimeASM-rev's differentiator is determinism + 8-axis round-trip auto-regression — every conversion is provably "lossless" via a bench harness. |
S9 bench — all 8 axes (8 samples PASS)
| Axis 1a dialect-detect | Tokenizer recognises ELF magic + ELFCLASS64 + EM_X86_64 (e_machine = 0x3E). 8/8 PASS. |
|---|---|
| Axis 1b opcode-recover | All 177 .text instructions across 8 samples decoded — db 0xNN fallback count = 0. 8/8 PASS. |
| Axis 2 mutation-detect | 1-bit flip in .text, 5 trials × 8 samples = 40 trials, 40/40 detected. Disasm output must differ — invariant. |
| Axis 3 determinism | NASM emit twice, byte-equal across all 8 samples. 8/8 PASS. |
| Axis 4 ASM round-trip | emit NASM → nasm + ld → run → original stdout match. The strictest axis: two real native binaries (original + ours) executed and compared. 8/8 PASS, including recursive fact(4) = 24. |
| Axis 5 C round-trip | emit C → gcc -O0 -nostdlib -static → run → original stdout match. Straight-line PC dispatch + STK[]-modelled call/ret/push/pop is bit-faithful. 8/8 PASS. |
| Axis 6 structured-C round-trip | CFG-recovered structured C (do-while + if/else + per-function + call → C call + ret → return;) → gcc → run → match. 8/8 PASS. |
| Axis 7 Slot IR round-trip | SlotImage → JSON → SlotImage → structural equality + JSON byte-equal double check. Function = Slot node and the call graph are preserved completely. 8/8 PASS. |
Sample inventory (8 binaries, NASM-source → ELF)
| 01 hello | Syscall write of "Hello, ELF!\n". Smallest ELF: 1 BB / 8 instructions. |
|---|---|
| 02 arith | 17 + 25 = 42 printed as 2 ASCII digits. Exercises idiv, add al, mov [rip+disp], etc. |
| 03 loop | loop sum_loop for 1+2+3+4+5 = 15. CFG has a back edge; structured C recovers as do { ... } while (R[1] != 0);. |
| 04 branch | cmp + jge diamond. Structured C recovers as if (cond) { ... } else { ... } meeting at a common join BB. |
| 05 compute | imul rax, rbx for 6 × 7 = 42. |
| 06 call_simple | _start → do_print. Prologue (push rbp; mov rbp, rsp) + epilogue (pop rbp; ret) recognised as function boundary and split into independent C functions. |
| 07 two_funcs | 3 functions (_start → add_two + print_dec). Two inter-procedural call graph edges. |
| 08 recursion | factorial(4) = 24 via self-recursion. Call graph carries a self-loop edge (fact → fact); push rbx / pop rbx caller-saved spill is preserved bit-faithfully via STK[]. |
Translation example 1 — loop recovered as structured do-while (sample 03)
The original NASM uses loop sum_loop to compute 1+2+3+4+5. The CFG carries
a back edge; SlimeASM-rev recognises a natural loop body of one BB whose terminator
is a `loop` instruction, and recovers structured C as
do { ... } while (R[1] != 0); — no goto:
; Original NASM (sample 03, loop region)
xor rax, rax
mov rcx, 5
sum_loop:
add rax, rcx
loop sum_loop ; dec rcx; jnz sum_loop
// SlimeASM-rev recovered C (structured)
R[0] = R[0] ^ R[0]; // xor rax, rax
R[1] = 0x5; // mov rcx, 5
do {
R[0] = R[0] + R[1]; // add rax, rcx
/* e2 fb loop sum_loop (terminator absorbed) */
R[1] = R[1] - 1;
} while (R[1] != 0);
Cooper-Harvey-Kennedy iterative dominator detects the back edge; Aho/Sethi natural-loop
body analysis collapses the self-loop to a single BB; the structured emitter then renders
the whole loop as do-while.
Translation example 2 — branch recovered as structured if/else (sample 04)
The original NASM uses cmp + jge; both arms reconverge at a common join
BB. SlimeASM-rev recognises cond BB + 2 succs reconverging at a single join
and recovers structured C as if (cond) { ... } else { ... }:
; Original NASM (sample 04, branch region)
mov rax, 7
cmp rax, 5
jge is_big
; small branch
... write "small\n" ...
jmp done
is_big:
... write "big\n" ...
done:
... exit ...
// SlimeASM-rev recovered C (structured, both arms join)
R[0] = 0x7; // mov rax, 7
{ int64_t _diff = (int64_t)R[0] - (int64_t)5;
ZF = (_diff == 0); SF = (_diff < 0); }
OF = 0;
/* 7d 1a jge is_big (terminator absorbed) */
if (SF == OF) { // jge condition
/* big arm */
... write "big\n" ...
} else {
/* small arm */
... write "small\n" ...
}
/* done: common join BB → exit */
The diamond is detected via intra-function dominator + post-dominator agreement; the
unconditional jmp at the end of the taken arm is absorbed by the structured form, leaving
plain C if/else with no goto.
Translation example 3 — per-function + self-recursion showcase (sample 08)
factorial(4) = 24 via self-recursion, 6 BBs in one function. The CFG is
neither a clean loop nor a join-converging diamond (each branch arm terminates with its
own ret), so the structured C still carries goto BB_xxxx; labels.
We include it as a showcase of function-boundary recovery + call → C call +
ret → return; + caller-saved spill (push rbx / pop rbx) preserved bit-faithfully via
STK[] + self-recursion:
; Original NASM (sample 08, fact function)
fact:
push rbp
mov rbp, rsp
cmp rbx, 1
jg recurse
mov rax, 1 ; base: 1! = 1
pop rbp
ret
recurse:
push rbx ; spill rbx across recursive call
sub rbx, 1
call fact ; rax := (rbx-1)!
pop rbx ; restore rbx
imul rax, rbx ; rax := rbx * (rbx-1)!
pop rbp
ret
// SlimeASM-rev recovered C (per-function, self-recursion, caller-saved spill)
static void func_401060(void) { // fact()
BB_401060:;
STK[--RSP_IDX] = R[5]; // push rbp
R[5] = R[4]; // mov rbp, rsp
/* cmp rbx, 1 */
if (ZF == 0 && SF == OF) goto BB_401071; // jg recurse
BB_40106a:;
R[0] = 1; // mov rax, 1
R[5] = STK[RSP_IDX++]; // pop rbp
return; // ret → C return
BB_401071:;
STK[--RSP_IDX] = R[3]; // push rbx
R[3] -= 1; // sub rbx, 1
func_401060(); // call fact → C function call (self-recursion)
R[3] = STK[RSP_IDX++]; // pop rbx
R[0] = R[0] * R[3]; // imul rax, rbx
R[5] = STK[RSP_IDX++]; // pop rbp
return;
}
Compiling this with gcc -O0 -nostdlib -static and running yields the
same stdout (FACT=24) as the original ELF — Axis 6 structured-C round-trip PASS.
The remaining goto labels (cond BB + each arm an independent ret)
will be eliminated by the multi-BB loop / tail-duplication extensions in later Phase B work.
Function = Slot node, call graph as first-class IR (Phase B (d))
The same SlimeNENC-family Slot IR (Core64 + Ext32 fixed-bit, claim 11) applied in reverse: each function becomes a SlotFunction node, and call edges are first-class IR (a list of callee names per function). Self-recursion is naturally a self-loop edge:
| sample | functions | call edges |
|---|---|---|
| 01-05 (linear) | 1 | 0 |
| 06 call_simple | 2 | 1 (_start → func_40100f) |
| 07 two_funcs | 3 | 2 (_start → func_401014, _start → func_401027) |
| 08 recursion | 2 | 2 (_start → func_401060, func_401060 → func_401060) |
The full SlotImage encodes/decodes via deterministic JSON (Axis 7 round-trip), so call graphs and function structure can flow into external toolchains (audit DBs, SBOM, static analysis) without information loss.
Audit fitness (finance / defense / medical-device)
- Bit-exactSame ELF input → same sha256 NASM/C output. CFG / function boundaries / instruction stream all fully deterministic.
- Native ELF round-tripEmitted NASM re-assembled via nasm + ld; emitted C compiled via gcc -nostdlib; two real native ELFs executed and stdout compared with the original. Not simulation — real-machine verification.
- Mutation detection1-bit flip in .text always changes disasm. 8 samples × 5 trials = 40/40 detected — tampering is immediately visible.
- DeterminismSame ELF disassembled + emitted twice → byte-equal per sample. Stable across parallel and GPU execution.
- Slot IR auditFunction = Slot node + call graph persisted as deterministic JSON. Joins SBOM / audit DB pipelines as a structured artifact.
- Build-time LLMLLM only at decoder-rule construction time. Runtime is deterministic rule-based — aligned with bank / defense audit requirements.
Supported instructions (Phase A, integer hot-loop subset, ~14 patterns)
| Data movement | mov reg, imm/reg (B8+r / 89 /r / C7 /0 / 88 /r) / lea reg, [rip+disp32] (8D /r mod=00 r/m=101) |
|---|---|
| Arithmetic | add r/m64, r64 (01 /r) / add al, imm8 (04) / add r/m8, imm8 (80 /0) / sub r/m64, imm8 (83 /5) / imul r64, r/m64 (REX.W 0F AF) / idiv r/m64 (F7 /7) |
| Logic | xor r/m64, r64 (31 /r; xor reg, reg idiom recognised as zero-init) |
| Compare | cmp r/m64, r64 (39 /r) / cmp r/m64, imm8 (83 /7) |
| Branch | Jcc rel8 (70-7F: je/jne/jge/jg/jl/jle/...) / Jcc rel32 (0F 80-8F) / jmp rel8/32 (EB/E9) / loop rel8 (E2) |
| Call/stack | call rel32 (E8) / ret (C3) / push/pop r64 (50-5F) / push imm (6A/68) |
| System | syscall (0F 05) — sys_write (rax=1) / sys_exit (rax=60) recognised by heuristic |
Phase B onward will extend coverage to multi-BB loops / loop nest trees, libc-linked binaries (printf / malloc, PLT/GOT dynamic linking) and SSE2 / SSE4 (XMM registers). A 30+ sample bench from `gcc -O0` C builds (rather than hand-written NASM) will become the production regression target.
License model
| Charged | WASM/WASI converter tool (developer side) |
|---|---|
| Not charged | The produced NASM / C sources (customer asset, perpetual deployment) |
| Method | Ed25519 144B signed license + 3-hop air-gap activation (finance / defense audit ready) |
| Parallelization (PSDP) | Not included. See the independent PSDP SKU under SlimeNENC. |
Related materials
- Technical overviewSlimeNENC Technical Overview (reverse-direction ASM/C chapter being prepared)
- Patent applicationJP application 2026-046620 v15b, claims 11 / 14d, target legacy-language dialect handling for COBOL / MUMPS / PL/I / RPG / assembler / native binary reverse direction.
- Sister product (forward pair)SlimeASM (HLASM + Win x64 MASM forward) — together they cover both directions of the native-code surface.
- Sister products (Slot IR shared)SlimeCOBOL / SlimePL/I / SlimeRPG / SlimeMUMPS share the Slot IR (Core64 + Ext32 fixed-bit).
- BenchmarksS9 bench harness (8 axes of correctness),
s9_bench/bench.pyauto-regresses 8 samples × 8 axes = 64/64.
Reverse PoC / Request Materials Back to SlimeNENC family SlimeASM (forward pair) SlimeCOBOL
