Over the past two months, we have documented sixteen vulnerabilities across six Cryspen projects—bugs in unverified wrappers, wrong formal specifications, false proofs, and unsound axioms. Those findings concerned the product. We now turn to the toolchain.

Today, we’re publishing our latest paper, Verification Facade: Masquerading Insecure Cryptographic Implementations as Verified Code, which examines Hax, the verification pipeline that underlies Cryspen’s “formally verified” claims.

Hax translates a subset of Rust into F*, enabling machine-checked proofs of panic freedom and functional correctness. ML-KEM and ML-DSA implementations verified via hax are being developed in partnership with Google and tested in Signal’s PQXDH protocol. If hax’s translation is faithful, the F* proofs transfer to the Rust code. If it is not, the proofs verify a model that diverges from what actually executes.

We find that it is not—in three distinct ways. We identify three classes of semantic gap in hax’s pipeline and demonstrate each through proof-of-concept exploits against ML-DSA, ML-KEM, Ed25519, and ChaCha20. Every exploit must pass all four of our inclusion gates:

  1. Correct Rust: the code compiles as valid Rust.
  2. Successful extraction: hax extracts it to F* without errors or warnings.
  3. Security-relevant model divergence: the verified F* model diverges from the deployed Rust code in a way that compromises a cryptographic security property.
  4. Invisible to testing: the divergence escapes detection by functional tests (sign-then-verify, encrypt-then-decrypt, test vectors all pass).

All source code and pre-extracted F* output are available on GitHub.

We call the resulting phenomenon a verification facade: verification that is performed but covers less than it appears to cover. Unlike verification theatre—where verification is claimed but not performed—a facade involves verification that is performed but whose model either diverges from the deployed code, depends on unenforceable assumptions, or cannot express a security-critical property.

How Hax Works

Hax translates Rust to F* through three stages. A frontend hooks into rustc and extracts the Typed High-Level Intermediate Representation (THIR)—a representation where references are still present, borrow-checker constraints are implicit, and Drop glue has not been elaborated. A transformation engine then applies 35 sequential phases, each eliminating a language feature: by the time the AST reaches the backend, mutable variables, references, raw pointers, lifetimes, all loop forms, and the ? operator have been eliminated, leaving a purely functional representation. Finally, the F* backend translates this to F* surface syntax. Integer types become refinement-typed wrappers (e.g., u8 becomes int_t U8 with range $[0,255]$). Rust’s + becomes +! (strict, with overflow proof obligation); wrapping_add becomes +. (modular).

The pipeline’s feature-witness system ensures structural correctness (the output has the right shape) but not semantic preservation (the output has the same meaning). For hax’s verification claims to hold, three unverified components must be correct: the 35 OCaml transformation phases, the F* proof libraries (which axiomatize 113 operations via assume val—roughly 19% of the integer model), and user annotations (#[opaque], assume!, fstar::replace, verification_status(lax)).

For code that fits its extraction model—deterministic for loops over bounded ranges, pure arithmetic with explicit bounds, array transformations with statically known indices—hax provides genuine verification. Every +! and *! generates a real overflow proof obligation. Bounded for loops extract to a defined (not axiomatized) fold_range combinator with real loop invariants. And ensures clauses are proof obligations, not axioms: if the function body violates the postcondition, F* rejects the code. We confirmed this experimentally.

The gaps arise elsewhere.

The Gaps and Their Exploits

We identify three classes of semantic gap and demonstrate each with proof-of-concept exploits. We distinguish three gradations: facade gaps (E1, E3, E4), where the F* model actively diverges from Rust; a conditional gap (E2), where the divergence depends on the compilation mode; and a scope gap (E5), where the model is faithful but cannot cover a critical property.

E1: ML-DSA Rejection Sampling (Facade Gap)

The gap: while-loop ghosting. When a while loop lacks a loop_decreases! annotation, hax generates a fuel function returning constant 0. The F* while_loop combinator requires fuel to strictly decrease on each iteration (fuel(next) < fuel(current))—with constant fuel 0, this obligation becomes $0 < 0$, which is $\mathsf{False}$. The loop body is proof-inert: it exists syntactically but cannot support any verification claim.

The exploit. ML-DSA signing (FIPS 204) uses rejection sampling: the signer computes $\mathbf{z} = \mathbf{y} + c\mathbf{s}_1$ and retries until $\|\mathbf{z}\|_\infty < \gamma_1 - \beta$. This loop is what makes ML-DSA signatures zero-knowledge. Our implementation is straightforward:

pub fn sign_message(s1: &SecretS1, seed: u64) -> SignatureZ {
    let mut nonce: u32 = 0;
    let mut z = generate_mask(seed, nonce);
    z = compute_z(&z, s1);
    while !check_norm_bound(&z) { // rejection sampling
        nonce = nonce.wrapping_add(1);
        z = generate_mask(seed, nonce);
        z = compute_z(&z, s1);
    }
    z
}

The code compiles, the rejection loop runs correctly, and sign-then-verify tests pass. But the extracted F* tells a different story:

let sign_message (s1: t_SecretS1) (seed: u64) =
 let z = compute_z (generate_mask seed 0) s1 in
 let (nonce, z) =
  Rust_primitives.Hax.while_loop
   (fun _ -> true)                        (* invariant *)
   (fun (_, z) -> ~.(check_norm_bound z)) (* condition *)
   (fun _ -> Int.from_machine (mk_u32 0)) (* fuel = 0 *)
   (0, z)                                 (* accumulator *)
   (fun (nonce, z) ->                     (* body *)
    let nonce = wrapping_add nonce 1 in
    let z = compute_z (generate_mask seed nonce) s1
    in (nonce, z))
 in z

Line 7 is the problem. The fuel function returns constant 0, so the body’s refinement type requires $0 < 0$—impossible. Z3 cannot use the body to prove anything about how the accumulator changes. The Rust code correctly rejects candidates, but F* has no way to verify this.

ML-DSA’s security relies on the Fiat-Shamir with Aborts paradigm: rejection sampling ensures that $\mathbf{z}$’s distribution is statistically independent of $\mathbf{s}_1$. Without this property, lattice techniques can recover the secret key from approximately 1000 signatures. The verification is supposed to cover exactly this—and it cannot.

A stronger variant. Because the while loop is proof-inert, F* cannot distinguish a correct rejection loop from a broken one. An adversarial developer could invert the loop condition—while check_norm_bound(&z) instead of while !check_norm_bound(&z)—causing the function to return the first candidate that fails the norm bound, directly leaking $\mathbf{s}_1$. The extracted F* would be structurally identical: the same while_loop call with the same fuel=0, and the same inability to prove anything about the output. A conforming verifier would eventually reject some of the resulting signatures (since FIPS 204 Algorithm 8 checks the same bound), so functional testing would catch it—but F* provides no earlier warning. The verification facade is the same whether the underlying code is correct or broken.

E2: ML-KEM Barrett Overflow (Conditional Gap)

The gap: deployment model divergence. Hax translates Rust’s + to F*’s +! (strict, panics on overflow), matching debug-mode semantics. In release mode, overflow wraps silently instead.

The exploit. Our proof-of-concept implements ML-KEM Barrett reduction with correct preconditions. The code compiles, passes debug-mode tests, and extracts to F* where arithmetic becomes strict. The F* proof establishes: “if the precondition holds, the result is correct; if not, the program panics.” But release mode introduces a third possibility—precondition violated, no panic, silently wrong result—that the F* model does not represent. Incorrect Barrett reductions corrupt NTT polynomial arithmetic, breaking ML-KEM’s IND-CCA security.

This is a conditional gap: a correct implementation that always reduces after each multiplication stays within bounds. The divergence requires a caller that skips intermediate reduction—but such callers exist in the wild.

E3: Ed25519 Clamping (Facade Gap)

The gap: assumption laundering. assume!(P) compiles to () in Rust but introduces $P$ as an axiom in F*. A false assumption introduces $\mathsf{False}$ into the proof context, making every subsequent proof trivially satisfiable via ex falso quodlibet.

The exploit. Ed25519 clamps the secret scalar: clear bits 0–2, clear bit 255, set bit 254. Our proof-of-concept introduces a subtle bug: | (OR) instead of & (AND), setting bit 255 instead of clearing it. An assume! call immediately after asserts the bit is clear:

pub fn clamp_scalar(secret: &Scalar) -> Scalar {
    let mut s = *secret;
    s[0] = s[0] & 0xF0;
    hax_lib::assume!((s[0] & 0x07) == 0);
    // BUG: | 0x80 SETS bit 255 (should be & 0x7F)
    s[31] = s[31] | 0x80;
    // assume! LIES: claims bit 255 is clear
    hax_lib::assume!((s[31] & 0x80) == 0);
    s[31] = s[31] | 0x40;
    hax_lib::assume!((s[31] & 0x40) != 0);
    s
}

In the extracted F*, line 2 computes s[31] | 0x80 (sets bit 255), while line 4 assumes (s[31] & 0x80) = 0 (bit 255 is clear). The contradiction introduces $\mathsf{False}$, and from that point forward every postcondition in the function “verifies” vacuously. The developer sees green checks; the verification is meaningless.

The Rust code still produces valid Ed25519 signatures—the curve math works for any scalar—so sign-then-verify tests pass. Only tests that check the clamped scalar directly would catch this. With bit 255 set, the scalar may exceed the group order $L$, and the effective signing key becomes $s \bmod L$—a different key than intended, enabling key-substitution attacks.

E4: ML-KEM SampleNTT (Facade Gap)

The gap: while-loop ghosting, compounded. This exploit combines the fuel=0 bug from E1 with a second problem: early return inside the loop body forces hax to use while_loop_return, an assume val combinator with no implementation at all:

assume val while_loop_return #acc_t #ret_t
  (inv: acc_t -> Type0)
  (condition: (c:acc_t {inv c}) -> bool)
  (fuel: (a:acc_t -> nat))
  (init: acc_t)
  (f: (acc_t -> ControlFlow ... acc_t))
  : ControlFlow ret_t acc_t

The exploit. ML-KEM’s SampleNTT is a rejection sampling loop that accepts coefficients only if they are less than $q = 3329$. Our implementation includes an early return None when randomness is exhausted. The extracted F* is doubly proof-inert: the combinator is an axiom Z3 cannot reason about, and the fuel is constant 0.

The property “all sampled coefficients are less than $q$” is essential—coefficients $\geq q$ break the ring structure on which ML-KEM’s security rests. F* cannot verify this property. Even a correct loop_decreases! annotation would not help, because while_loop_return has no body for Z3 to unfold.

E5: ChaCha20 Rotations (Scope Gap)

The gap: proof library axioms. The proof library axiomatizes operations that could be defined—including bit rotations—via assume val with no postconditions. F* knows only the return type, not what the function computes.

The exploit. ChaCha20 quarter rounds use left rotation by 16, 12, 8, and 7 bits. Our implementation is correct—it passes standard test vectors. The extracted F* shows the rotation amounts explicitly:

impl_u32__rotate_left (state.[ d ]) (mk_u32 16)
impl_u32__rotate_left (state.[ b ]) (mk_u32 12)
impl_u32__rotate_left (state.[ d ]) (mk_u32 8)
impl_u32__rotate_left (state.[ b ]) (mk_u32 7)

But rotate_left is declared as assume val impl_u32__rotate_left': x: u32 -> n: u32 -> u32—no postcondition. Z3 cannot distinguish rotate_left x 16 from rotate_left x 15. An implementation with amounts 15, 11, 7, 6—each off by one, producing a weak cipher—would “verify” identically.

This is a scope gap, not a facade gap: the model is faithful, but the verification cannot cover the core cryptographic property. E1 and E4 show models that are wrong; E5 shows a model that is right but incomplete. The axiomatization affects every primitive that uses rotations: SHA-256, ChaCha20, AES. HACL*’s Lib.IntTypes demonstrates that defining rotations via shift-and-or is tractable—the current library chose axiomatization for simplicity.

Classification and Verification Results

Gap Class Type Exploits
While-loop fuel=0 I-A Bug E1, E4
Reference erasure I-B Fundamental
Debug/release divergence I-C Engineering E2
Opaque function trust II-A Feature
Proof library axioms II-B Engineering E5
Unmodeled Drop/RAII II-C Fundamental
assume! III-A Feature E3

We ran the F* typechecker (v2025.03.25) on all five extracted files. In full verification mode, three pass completely: E2, E3, and E5 all report “All verification conditions discharged successfully.” These are the strongest findings: F* fully verifies the code, yet the security gaps are real and undetected. E1 and E4 fail full verification due to an F* solver limitation with inner let rec definitions—a failure unrelated to our exploits. In practice, projects using hax handle while-loop modules via ADMIT_MODULES, accepting them without full verification.

The while-loop fuel=0 default is a one-line fix; we notified the hax maintainers prior to publication. The proof library axioms could be replaced with definitions. Reference erasure and unmodeled Drop are fundamental—they would require extraction at the MIR level, as Aeneas does. Beyond these, the backend should offer a release-mode arithmetic option (+. instead of +!), and assume! should check consistency against the proof context via a Z3 query before injecting axioms.

The Full Picture

Our previous work documented code outside the verified perimeter: wrong specifications, false proofs, admitted modules. The verification facade concerns the perimeter itself. Even for code that is extracted and whose proofs are dispatched to Z3, the verified model can diverge from the deployed code—because while loops are silently proof-inert, because release-mode arithmetic differs from the model, because assume! can poison the proof context, and because axiomatized operations have no semantic content for Z3 to reason about.

These gaps have direct FIPS implications. ML-KEM and ML-DSA implementations verified via hax are deployed where FIPS compliance is relevant. Rejection sampling in both algorithms uses while loops. Production modules compile in release mode. The verification provides less assurance than it appears to.

Together with our earlier findings, the full picture is now this. The specifications are wrong. The proofs are false or never checked. The build system defaults to admitting everything. The unverified wrappers contain implementation defects. The TLS implementations reject valid signatures and accept certificates from anyone. And now: even when verification is performed, the model that F* verifies can diverge from the program that Rust executes.

Hax is an active research project that makes genuine contributions to high-assurance cryptography. For pure, bounded-loop functions with precise specifications, it provides real verification. Our critique targets structural limitations of the extraction approach, not the project’s intent. But the translation pipeline is part of the trusted computing base, and verification claims for extracted code require the same qualification and defense-in-depth practices that the formal methods community has long advocated—and that Cryspen has consistently failed to provide.

Want to work together?

Choose Symbolic Software as your trusted partner in enhancing security and fostering integrity within the digital ecosystem.

Start a Project