Today we’re releasing hpke-ng, a clean-slate Rust implementation of HPKE (RFC 9180). It is published under Apache-2.0 OR MIT, and you can install it now with cargo add hpke-ng.
Across 44 head-to-head benchmarks against hpke-rs — currently the most widely deployed Rust HPKE library — hpke-ng wins 16, ties 25, and loses 3, with the wins concentrating on the encap/decap path, the open path, and the ChaCha20 setup paths.
Both libraries call the same RustCrypto primitive crates underneath, so the cryptography itself is identical. Both pass the full RFC 9180 known-answer-test set. Both produce byte-identical wire output to each other on every ciphersuite we differentially tested. The wins are gains in framing, monomorphization, allocation behaviour, and dispatch — not in the underlying crypto math. That is the point: the math is a solved problem, and the surrounding library is where the engineering still has slack. The 25 ties tell you something too — for the AEAD-bound rows where both libraries converge to the speed of the underlying primitive, hpke-ng matches that ceiling without overhead.
Why we built another HPKE library
Earlier this year we found and reported two security bugs in hpke-rs. The first was a missing RFC 9180 §7.1.4 zero-shared-secret check: a low-order or identity public key forces the X25519 shared secret to all zeros, after which the rest of the key schedule becomes deterministic and predictable to anyone who knows the static recipient key. The fix is a single comparison against zero, which RFC 9180 explicitly requires; it was missing. The second was a u32 sequence-counter that silently wrapped in release builds, reusing nonces after 2³² messages. Nonce reuse in AEAD is catastrophic — for AES-GCM it leaks the authentication key; for ChaCha20-Poly1305 it leaks plaintext via XOR — and the wrap was happening below the type system, in a release build, with no diagnostic. Both are now fixed. We documented the broader pattern of bugs we kept finding in libraries marketed under “high assurance” branding in February.
That experience — together with the day-to-day frictions of integrating HPKE through a library that wasn’t built to make those bug classes structurally impossible — is what got us thinking about a rewrite. Three frictions in particular kept showing up.
The provider abstraction. hpke-rs is structured as a generic library over an HpkeCrypto trait, with two backend implementations — RustCrypto and libcrux — shipped as separate crates. The abstraction is real engineering and serves a real purpose: it lets a deployment swap the underlying crypto stack without touching call sites. The cost is that every primitive call goes through a trait dispatch, every Hpke value carries a 320-byte instance struct (most of it a 256-byte ChaCha20 PRNG state), and the workspace is four crates instead of one.
The struct-owned PRNG. Hpke<Crypto>::new constructs and stores a PRNG. That PRNG is reused across operations, which is fine functionally but creates a subtle aliasing hazard: cloning an Hpke does not clone the PRNG state — per the rustdoc — so a careless clone can reset randomness in ways that aren’t visible at the call site. This is the kind of footgun whose damage is invisible until the day it isn’t.
Option<&[u8]> for required-by-mode arguments. hpke.seal(&pk, info, aad, pt, None, None, None) is the canonical Base-mode call. The three Nones are the PSK, PSK ID, and sender static key — all required in the Auth or AuthPsk modes. The single seal method accepts every mode by making mode-specific arguments optional, which means the type system can’t tell you that you’ve built a Base-mode call with a PSK supplied.
These are not catastrophic problems. They’re the kind of small persistent costs you stop noticing until you build the alternative.
The shape change: enum dispatch becomes type-state
The single biggest design difference between hpke-ng and hpke-rs is what carries the ciphersuite. In hpke-rs, the ciphersuite is four runtime enums (Mode, KemAlgorithm, KdfAlgorithm, AeadAlgorithm) constructed at Hpke::new time. In hpke-ng, the ciphersuite is the type itself — Hpke<DhKemX25519HkdfSha256, HkdfSha256, ChaCha20Poly1305> — and the struct body is PhantomData<(K, F, A)>. Zero bytes at runtime; everything resolved at the call site by the compiler.
let mut hpke = Hpke::<HpkeRustCrypto>::new( Mode::Base, KemAlgorithm::DhKem25519, KdfAlgorithm::HkdfSha256, AeadAlgorithm::ChaCha20Poly1305, ); let kp = hpke.generate_key_pair()?; let (sk, pk) = kp.into_keys(); let (enc, ct) = hpke.seal( &pk, info, aad, pt, None, None, None, // psk, psk_id, sk_s )?;
type Suite = Hpke< DhKemX25519HkdfSha256, HkdfSha256, ChaCha20Poly1305, >; let mut os = OsRng; let mut rng = os.unwrap_mut(); let (sk, pk) = DhKemX25519HkdfSha256::generate(&mut rng)?; let (enc, ct) = Suite::seal_base( &mut rng, &pk, info, aad, pt, )?;
The visible consequence is the call site. The invisible consequence is what the compiler can rule out:
Hpke::<XWingDraft06, _, _>::seal_auth(...)Error::UnsupportedKemOperationK: AuthKem not satisfiedHpke::<_, _, ExportOnly>::seal_base(...)HpkeError::InvalidConfigseal_base on ExportOnlyKemAlgorithm for a private keysetup_*Some(psk) argumentHpkeError::UnnecessaryPskseal_base has no PSK parameterEach row is a runtime error in hpke-rs and a compiler diagnostic in hpke-ng. The X-Wing/seal_auth line is the cleanest example: X-Wing is a KEM, but it isn’t a Diffie-Hellman KEM, so it has no notion of authenticated encapsulation. In hpke-rs, calling seal_auth on an X-Wing-configured Hpke returns Error::UnsupportedKemOperation at runtime. In hpke-ng, the trait bound on seal_auth requires K: AuthKem, and XWingDraft06 does not implement AuthKem — so the call doesn’t compile. The same shape applies to the other rows.
This isn’t theoretical. We’ve seen each of those four shapes in production code reviews — typically as a stale match arm catching the runtime error and turning it into a generic 500. Surfacing them as compile errors deletes a class of code path entirely.
Feature parity
Before going further: the obvious skeptical question is what got cut? The answer is one thing, deliberately, and it’s the thing that buys the wins everywhere else.
HpkeError variants1114hpke-ng supports every ciphersuite hpke-rs’s RustCrypto provider supports — six DH-based KEMs, three post-quantum KEMs, three KDFs, four AEADs (including ExportOnly), all four HPKE modes — and passes the full RFC 9180 KAT vector set against both libraries. There are 14 typed HpkeError variants instead of 11, and four classes of operation that are compile errors instead of runtime errors. On every dimension that affects what your application can express, hpke-ng is a superset.
The one thing hpke-rs has that hpke-ng deliberately doesn’t: a pluggable crypto provider. hpke-rs ships an HpkeCrypto trait with two backend implementations, RustCrypto and libcrux; a deployment can swap one for the other at compile time. hpke-ng ships one provider, RustCrypto, and removes the abstraction. That’s the load-bearing tradeoff. It’s why Hpke<...> is zero-sized, why Context::seal is monomorphized rather than dispatched, why the workspace is one crate, and why the canonical call site is five arguments instead of seven.
The libcrux half of that tradeoff is, in our view, not a feature worth chasing. Our February audit of Cryspen’s “formally verified” libraries found undisclosed silent cryptographic failures in libcrux-ml-dsa (platform-dependent SHA-3 corruption that was patched without a public advisory), entropy-reducing pre-hash clamping in Ed25519, and a denial-of-service panic in libcrux-psq’s AES-GCM decryption path. Cryspen’s public response acknowledged the findings without retracting the “highest level of assurance” marketing language attached to the library; our follow-up analysis and an independent eprint paper have since surfaced further specification deviations in libcrux-ml-dsa. The “formally verified” framing that draws users to libcrux does not, on the evidence we have assembled, describe the engineering you are getting. If you’re using RustCrypto anyway — which is what hpke-ng does by default and what most hpke-rs deployments do in practice — the provider abstraction is paying rent for a backend you would be wise to avoid.
Speed
The full benchmark protocol is in cargo bench --features comparative in the hpke-ng repository: criterion harness, sample size 40–60, 2–3 second measurement window, RUSTFLAGS="-C target-cpu=native", lto = "thin", codegen-units = 1. Apple Silicon M-series, macOS. The numbers below are the median across two independent bench runs; we’ll tell you up front where the wins are and where the libraries are tied.
The wins concentrate in three places: the KEM encap/decap path, the single-shot open path, and the ChaCha20 setup paths. These are the rows where hpke-rs’s per-call overhead — trait dispatch, allocator pressure, an enum match per primitive — adds measurable framing cost on top of the underlying crypto. Start with the KEM operations on X25519:
KEM operations · X25519
max(rs, ng).The biggest deltas in the entire dataset live in this chart. encap and decap are 21–22% faster on hpke-ng — and these aren’t framing wins, they’re a real consequence of structure: hpke-ng’s labeled-KDF path is monomorphized over the concrete KDF type, so the chain of LabeledExtract + LabeledExpand calls that wrap the raw Diffie-Hellman compiles to direct calls. hpke-rs routes each through its enum-dispatched provider trait. The generate row goes the other way (+7%, single-call plumbing around the same x25519-dalek::StaticSecret constructor); we’ve left it on the chart because hiding it would be dishonest, and it’s a one-call-per-keypair operation that doesn’t appear on any hot path.
Setup is the combined fixed cost paid every time a sender or receiver context is constructed: KEM op + key schedule + Context allocation. The wins here are consistent over ChaCha20:
Setup paths · sender / receiver / PSK
sender (Base)−12%
receiver (Base)−8%
sender (PSK)−11%
sender (Base)−8%
The single-shot open path — setup_receiver + Context::open for one message — is where hpke-ng wins most consistently across payload size. Six rows over four orders of magnitude, every row is hpke-ng:
Single-shot open · X25519 + HKDF-SHA-256 + ChaCha20-Poly1305
The AES-128-GCM seal sweep shows a smaller, more uniform pattern: hpke-ng is consistently 4–6% ahead at small and mid payloads, then converges to tied as the AEAD primitive starts to dominate the per-call cost:
Single-shot seal · X25519 + HKDF-SHA-256 + AES-128-GCM
The X25519+ChaCha20 seal sweep — the same shape but with a different AEAD — comes out tied across every payload size. That’s not a hpke-ng loss; it’s the chart of a primitive both libraries call identically, where the framing overhead gets amortized away once the message is more than a few hundred bytes. The most diagnostic single benchmark in the suite is post-setup Context::seal at the two ends of that spectrum:
This is the chart we kept coming back to during development. At 64 bytes, where framing dominates, hpke-ng is 11% faster: 225 ns versus 253 ns. The framing path inside hpke-ng’s Context::seal is a fixed-size 12-byte stack array for the nonce, an XOR loop, and a direct AEAD call — no allocations. hpke-rs allocates a fresh Vec<u8> per nonce computation.
At 16 KiB, where the AEAD primitive dominates, both libraries converge to identical wall time. They share the same primitive crate, so this is the ceiling: the wall-clock cost of ChaCha20Poly1305::encrypt_in_place_detached on this hardware, which neither library can dip below because both are calling the same code. hpke-ng matches it exactly — there’s no overhead left to remove, and the framing is as thin as the standard allows.
Memory
The size-of comparison is the cleanest single visualization in the dataset:
sizeof(Hpke<K, F, A>)−320 B (zero-sized)sizeof(Context<K, F, A>)−320 B (−80%)hpke-ng::Hpke<K, F, A> is PhantomData<(K, F, A)>. There is no runtime presence; it costs zero bytes and cargo expand confirms the compiler optimizes it out completely. hpke-rs::Hpke<Crypto> carries a 256-byte ChaCha20 PRNG plus four enum discriminants and padding.
Context is the more interesting comparison because both libraries carry actual state — the AEAD key, base nonce, exporter secret, and sequence number. hpke-ng is 80 bytes against hpke-rs’s 400. The 320-byte gap is mostly the per-Context PRNG that hpke-rs’s Context keeps for any future encap operations, plus the trait-object overhead for the provider.
Practical impact: an application that holds a thousand long-lived HPKE contexts — a server with persistent client sessions, a relay, MLS group state — saves 320 bytes of resident memory just from this one type. Not enormous in absolute terms, but every byte of permanent state is a byte of working set, and working set determines cache behaviour.
Smaller
Project surface area
Expanding on the above:
End-user binary, −30%. A minimal application — generate a key, seal a message, open it back, ten lines of Rust — compiled with RUSTFLAGS="-C target-cpu=native", lto="thin", codegen-units=1, strip="symbols". hpke-ng comes in at 392 KB; hpke-rs at 561 KB. 168 KB is not nothing on embedded targets, in WASM bundles, or in CDN-served binaries.
Library code, essentially flat. hpke-ng is 32 lines smaller than hpke-rs at the library level (2,591 vs 2,623), while implementing strictly more — the full HPKE surface plus the optional post-quantum suite (X-Wing draft-06, ML-KEM-768, ML-KEM-1024) that hpke-rs reaches only via experimental feature flags. The type-state design earns its keep here: ciphersuite selection lives entirely in the type system, so there is no provider trait, no per-primitive enum dispatch, and no glue between the two.
Test code, −50%. hpke-rs’s test suite is 167 tests in 2,230 lines; hpke-ng’s is 148 tests in 1,124 lines, with deeper coverage on roundtrips (59 macro-generated (mode, KEM, KDF, AEAD) combinations vs hpke-rs’s 17 hand-written cases). The reduction is structural — the type system carries information that would otherwise be repeated test setup, and the roundtrip! macro generates one test per supported configuration from a single declaration.
Bench code, essentially flat. Both libraries spend a few hundred lines on benchmarks (777 vs 759). hpke-rs ships per-provider per-primitive bench files (twelve total); hpke-ng ships one comparative bench against hpke-rs and one internal bench. The line count is comparable; the structure is what matters — running cargo bench --features comparative produces every head-to-head number in this post directly, against a real hpke-rs install pulled in as a dev-dependency.
The one place this chart doesn’t reach is the fuzz harnesses. hpke-ng spends substantially more code on cargo-fuzz targets than hpke-rs does — deliberately so. That’s the next section.
Harder
The 1.9-second figure is the headline number. The full hpke-ng test matrix — 148 tests across library unit tests, RFC 9180 KAT, generative roundtrips across every ciphersuite × mode combination, and byte-by-byte differential vs hpke-rs — runs in about 1.9 seconds. The roundtrip layer alone is 59 macro-generated tests covering every supported (mode, KEM, KDF, AEAD) combination including all four post-quantum and X-Wing/ML-KEM rows. hpke-rs’s KAT runner is structured as a single test that iterates 144 vectors sequentially and takes about 70 seconds. Same vectors, same coverage, structured to take advantage of cargo test’s thread pool.
For day-to-day development the headline number understates the impact. A 70-second feedback loop is one you avoid running until you’re “done”; a 1.9-second feedback loop is one you run after every save.
The fuzz layer is where hpke-ng makes its biggest investment in lines of code. hpke-rs ships one cargo-fuzz target — a seal/open harness for one ciphersuite. hpke-ng ships four:
pk_from_bytes— fuzzes public-key parsing for all 9 KEMs (X25519, X448, P-256, P-384, P-521, secp256k1, X-Wing, ML-KEM-768, ML-KEM-1024).enc_from_bytes— fuzzes encapsulated-key parsing for all 9 KEMs.key_schedule— fuzzes the internal key schedule with arbitrary mode bytes (including invalid0x04..=0xFFmode values), arbitrary PSK / PSK-ID combinations, and arbitrary shared secrets.open— fuzzesHpke::open_basewith arbitrary[encap || ciphertext]byte splits against a fixed receiver keypair.
The shared invariant across all four is that panics are bugs. Authentication failures, decode errors, length mismatches — all expected outcomes that the harness considers a successful run. A panic, a misaligned-pointer fault, or a debug-assertion failure under cargo-fuzz’s instrumentation is a finding. As of release, all four targets run clean.
There are also several structural footgun-prevention details that don’t show up in the fuzz output but are worth listing.
Context is not Clone. Cloning an HPKE context lets two callers reuse the same (key, base_nonce, seq) triple and produce a nonce-reuse bug — the kind of bug that’s invisible until it isn’t and unrecoverable when it surfaces. hpke-ng’s Context deliberately doesn’t implement Clone; cloning is a compile error.
Context::seal refuses to encrypt at seq == u64::MAX. A pre-check, before nonce computation. This makes nonce-reuse via counter wraparound structurally impossible regardless of how the caller handles a MessageLimitReached error.
All-zero shared-secret rejection (RFC 9180 §7.1.4) uses subtle::ConstantTimeEq for X25519 and X448. An attacker who supplies a small-order point and watches for timing variance is one of the more subtle attacks on DHKEM; the constant-time comparison closes it.
AEAD nonce length is enforced at compile time. Context::compute_nonce uses a const assertion that the AEAD’s NONCE_LEN is between 8 and 12 bytes. Any AEAD that violates this is a compile error at the call site that uses it.
Interop
We tested interop two ways. The first is RFC 9180 known-answer tests: both libraries are run against the same vendored test vector JSON (8 MB, derived from RFC 9180’s own test vector tooling) and required to produce byte-equal key, base_nonce, exporter_secret, decrypted ciphertexts, and exported values for every vector. The second is byte-by-byte differential testing: a deterministic ChaCha20Rng feeds identical inputs to both libraries; hpke-ng plays sender, hpke-rs plays receiver, and every byte that crosses the wire is asserted equal. Roughly 600 byte-equality assertions per CI run, all passing.
Some gaps worth covering, for full disclosure:
Auth and AuthPsk differential. hpke-rs’s seed() injects raw bytes for the base ephemeral; for Auth modes there is also a sender static keypair derived earlier, before any seed-injection happens. Aligning the two libraries’ state for byte-by-byte Auth-mode differential testing would require a deeper hpke-rs API hook than hpke-test-prng exposes. Auth/AuthPsk-mode interop is verified at the KAT layer instead — both libraries pass the X25519+ChaCha20 Auth/AuthPsk vectors, the P-256+AES-128 vectors, and the P-521+AES-256 vectors.
Post-quantum differential. hpke-rs’s X-Wing and ML-KEM implementations use different SHAKE-256 seeding from hpke-ng’s RFC 9180 §7.1.3-compliant derive_key_pair construction, so the libraries produce different ephemeral keys from the same IKM bytes. The encap wire format itself is determined by the underlying KEM crate (which both use), so they should agree at the wire level — but we haven’t tested it in this repository, and we’d rather call that out than pretend otherwise.
Migrate today
Both libraries pass the same KATs against the same primitive crates. If your code uses HPKE through a small wrapper — which most production HPKE code does — switching is mechanical:
// hpke-rs
let mut hpke = Hpke::<HpkeRustCrypto>::new(
Mode::Base,
KemAlgorithm::DhKem25519,
KdfAlgorithm::HkdfSha256,
AeadAlgorithm::ChaCha20Poly1305,
);
let kp = hpke.generate_key_pair()?;
let (sk, pk) = kp.into_keys();
let (enc, ct) = hpke.seal(&pk, info, aad, pt, None, None, None)?;
// hpke-ng
type Suite = Hpke<DhKemX25519HkdfSha256, HkdfSha256, ChaCha20Poly1305>;
let mut os = OsRng;
let mut rng = os.unwrap_mut();
let (sk, pk) = DhKemX25519HkdfSha256::generate(&mut rng)?;
let (enc, ct) = Suite::seal_base(&mut rng, &pk, info, aad, pt)?;
The most common migration shape: define a type Suite = Hpke<…, …, …> alias once, change hpke.seal calls to Suite::seal_base (or seal_psk / seal_auth / seal_auth_psk per mode), thread an &mut rng through the call sites that need encap entropy, drop the Option placeholders. We’ve done this migration twice on internal codebases; both took under an hour and both produced fewer lines of code on the consumer side.
Get it
[dependencies]
hpke-ng = "0.1.0-rc.1"
- GitHub: github.com/symbolicsoft/hpke-ng
- docs.rs: docs.rs/hpke-ng
- License: Apache-2.0 OR MIT
- MSRV: Rust 1.95 (edition 2024)
If you find a row in our benchmark suite that’s wrong, an interop gap we haven’t documented, or a footgun we missed, file an issue. We’d rather know.