|
UltrafastSecp256k1 3.50.0
Ultra high-performance secp256k1 elliptic curve cryptography library
|
Audit-first, high-performance secp256k1 engine for C++ and GPU-scale batch workloads — built independently from scratch for Bitcoin, Ethereum, Silent Payments, threshold signatures (FROST, MuSig2), embedded systems, and reproducible benchmarking. UltrafastSecp256k1 combines optimized CPU arithmetic, a stable multi-backend GPU C ABI, world-first open-source GPU FROST partial verification, constant-time CPU signing paths, HD key derivation (BIP-32/44), Taproot (BIP-340/341), ZK range proofs, and 12+ platform targets including CUDA, OpenCL, Metal, WebAssembly, RISC-V, ESP32, and STM32.
This project is two things at once: 1. A high-performance secp256k1 engine — GPU-accelerated, multi-platform, production-hardened. 2. A continuous, self-evolving audit system — every exploit attempt becomes a permanent regression test. Security is treated as an ongoing process, not a static document. → How the audit system works
Keywords: secp256k1 GPU · ECDSA batch verify · Schnorr BIP-340 · FROST threshold signatures · MuSig2 · Bitcoin cryptography · CUDA secp256k1 · OpenCL ECC · BIP-352 Silent Payments · constant-time cryptography · embedded ECC · WebAssembly crypto
11.00 M BIP352 scans/s · 4.88 M ECDSA signs/s · 4.05 M ECDSA verifies/s · 3.66 M Schnorr signs/s · 5.38 M Schnorr verifies/s · 1.34 M FROST partial verifies/s · 97.2 M point compressions/s — single GPU (RTX 5060 Ti SM 12.0)
All measurements: RTX 5060 Ti (SM 12.0, CUDA 12), batch=16 384, kernel-only throughput.
| Operation | Previous | Now | Δ |
|---|---|---|---|
| ECDSA Verify (GPU) | 410.1 ns / 2.44 M/s | 246.7 ns / 4.05 M/s | +66 % throughput |
| Schnorr Verify (GPU) | 354.6 ns / 2.82 M/s | 185.9 ns / 5.38 M/s | +91 % throughput |
| FROST Partial Verify (GPU) | — | 748.9 ns / 1.34 M/s | ⭐ New — first open-source GPU FROST |
| Batch Jacobian → Compressed | — | 10.3 ns / 97.2 M/s | ⭐ New kernel |
| BIP-352 Silent Payments (GPU LUT) | 179.2 ns / 5.58 M/s | 91.0 ns / 11.00 M/s | +97 % throughput |
The ECDSA and Schnorr verify speedups come from the Shamir+GLV double-scalar multiplication, INT32 field arithmetic, and warp-level reduction pipeline. FROST partial verify is now callable via the stable C ABI as `ufsecp_gpu_frost_verify_partial_batch()`.
Benchmark reproducibility: All numbers come from pinned compiler/driver/toolkit versions with exact commands and raw logs. See `docs/BENCHMARKS.md` (methodology) and the live dashboard.
Why this library, in depth? See WHY_ULTRAFASTSECP256K1.md for a full breakdown of the audit culture, 24-workflow CI/CD pipeline, graph-assisted review model, formal verification layers, and supply-chain hardening that back these claims.
External auditor prep: Run
bash scripts/external_audit_prep.shto produce a reproducible auditor-facing bundle with preflight outputs, assurance export, traceability artifacts, and an optional full audit package.
Claim map: Top-level trust claims are keyed in docs/ASSURANCE_LEDGER.md: CPU CT routing
A-001, stable GPU ABIA-002, cross-backend GPU parityA-003, benchmark reproducibilityA-004, exploit-audit surfaceA-005, graph-assisted reviewA-006, open self-audit transparencyA-007, and ROCm/HIP status disciplineA-008.
Quick links: Discord * Benchmarks * Community Benchmarks * Adopters * Build Guide * API Reference * Binding Usage Standard * Security Policy * Threat Model * Assurance Ledger * AI Audit Protocol * **Why This Library?** * Porting Guide * Sponsor
UltrafastSecp256k1 is used by Sparrow Wallet's Frigate.
Frigate 1.4.0 switched its DuckDB extension to ufsecp.duckdb_extension using UltrafastSecp256k1, and its README documents a custom DuckDB extension wrapping UltrafastSecp256k1 for ufsecp_scan(...)-based Silent Payments scanning with CUDA, OpenCL and Metal backend support.
See: Frigate 1.4.0 release · Frigate README · Details →
Package traction: ufsecp 1,192 npm downloads/30d · react-native-ufsecp 1,295/30d · Ufsecp 1,491 NuGet total (as of 2026-03-29).
Full adopter list: ADOPTERS.md
Supported Blockchains (secp256k1-based):

GPU & Platform Support:

ufsecp_gpu)** – stable 13-op FFI for GPU batch ops across CUDA, OpenCL, and Metal, with full backend parity on the public surface
Most high-performance cryptographic libraries ship fast code and trust that it is correct. UltrafastSecp256k1 ships fast code and then systematically tries to break it. The internal self-audit system was designed in parallel with the cryptographic implementation as a first-class engineering artifact — not bolted on afterwards.
The governing idea is Bitcoin-style: **don't trust, verify. The project does not treat assurance as a PDF milestone that must be waited on before the next improvement. Instead, it treats auditability as an always-on property of the repository: reproducible builds, rerunnable tests, structured artifacts, graph-backed code navigation, and continuous adversarial review that anyone can repeat.
This top-level narrative maps directly to the assurance ledger: CT secret-key routing (A-001), exploit-style audit coverage (A-005), graph-assisted review (A-006), and self-audit transparency (A-007).
| Metric | Value |
|---|---|
| Internal audit assertions per build | **~1,000,000+** |
Audit modules (unified_audit_runner) | 55 modules, 8 sections, 0 failures |
| Exploit PoC test files | 86 tests, 14 coverage areas, 0 failures |
| CI/CD workflows | 25 GitHub Actions workflows |
| Build matrix (arch × config × OS) | 7 × 17 × 5 = 595 combinations |
| Nightly differential tests | **~1,300,000+ random checks / night** |
| Constant-time verification pipelines | 3 independent (ct-arm64, ct-verif, Valgrind CT) |
| Fuzzing adversarial corpus | 530,000+ cases (libFuzzer + ClusterFuzz-Lite) |
| Static analysis tools | 4 (CodeQL, Clang-Tidy, CPPCheck, SonarCloud) |
| Self-audit documents in repo | 13 dedicated audit/quality documents |
| Self-tests passing (all backends) | 76/76 |
| Workflow | Purpose | Trigger |
|---|---|---|
security-audit.yml | Runs full unified_audit_runner — 55 modules, ~1M+ assertions | Every push |
ct-arm64.yml | Constant-time verification on native ARM64 hardware | Every push |
ct-verif.yml | Formal constant-time verification pass | Every push |
valgrind-ct.yml | Valgrind memcheck + CT analysis | Every push |
bench-regression.yml | Performance regression gate — CI fails if throughput drops | Every push |
nightly.yml | 1.3M+ differential checks + extended fuzz + full sanitizer run | Nightly |
cflite.yml | ClusterFuzz-Lite continuous fuzzing integration | Every push |
mutation.yml | Mutation testing — verifies test suite kills every injected fault | Scheduled |
codeql.yml | GitHub CodeQL static analysis (C++) | Every push |
sonarcloud.yml | SonarCloud code quality and security rating | Every push |
scorecard.yml | OpenSSF Scorecard + Best Practices supply-chain scan | Weekly |
ci.yml | Core build + test across 17 configs × 7 architectures × 5 OSes | Every push / PR |
bench-regression.ymlAUDIT-READY status. Zero failures across all tested platforms.In addition to the 55-module unified_audit_runner, UltrafastSecp256k1 ships 86 dedicated exploit-style PoC tests that actively try to break the library across its highest-risk surfaces. Each audit/test_exploit_*.cpp target builds and runs standalone so failures stay easy to attribute and reproduce.
| Coverage Area | Representative attack focus |
|---|---|
| ECDSA / Signature | malleability, RFC 6979 KATs, recovery edge cases |
| Schnorr / BIP-340 / Batch | batch soundness, forged signatures, invalid identification paths |
| GLV / ECC Math | endomorphism invariants, multiscalar correctness, Pippenger behavior |
| BIP-32 / BIP-39 / HD Keys | path overflow, hardened isolation, mnemonic and derivation edge cases |
| MuSig2 / FROST | nonce reuse, transcript fork equivocation, stale commitment replay, rogue-key aggregation, Byzantine participants, DKG and Lagrange edge cases |
| Adaptor Signatures / ZK | adaptor parity attacks, Pedersen invariants, malformed ZK proofs |
| Crypto Primitives / AEAD | ChaCha20-Poly1305 integrity, HKDF, SHA/Keccak/RIPEMD KATs |
| ECIES | authentication forgery, encryption correctness, roundtrip safety |
| Bitcoin / Protocol BIPs | BIP-143, BIP-144, BIP-324, SegWit, Taproot protocol edge cases |
| Address / Wallet / Signing | address encoding, wallet API misuse, Ethereum and Bitcoin signing flows |
| Constant-Time / Security | CT divergence, key-recovery style probes, backend divergence detection |
| ElligatorSwift | encoding correctness and ECDH roundtrips |
| Self-Test / Recovery | self-test API behavior and recovery boundary cases |
| Batch Verify | aggregate verification math correctness |
All 86 exploit tests live in
audit/test_exploit_*.cpp. Build withcmake -S . -B build-audit -G Ninja -DCMAKE_BUILD_TYPE=Releaseand run them standalone or viactest.
| Document | Contents |
|---|---|
| WHY_ULTRAFASTSECP256K1.md | Full audit infrastructure, CI pipeline index, formal verification evidence |
| AUDIT_REPORT.md | Historical formal audit report: 641,194 checks, 0 failures |
| AUDIT_COVERAGE.md | Per-module coverage matrix |
| THREAT_MODEL.md | Layer-by-layer risk analysis |
| SECURITY.md | Vulnerability disclosure policy |
| docs/AUDIT_GUIDE.md | Navigation guide for external auditors |
| docs/CI_ENFORCEMENT.md | Full CI enforcement policy |
| docs/BACKEND_ASSURANCE_MATRIX.md | Per-backend assurance matrix |
| docs/AUDIT_TRACEABILITY.md | Requirement-to-test traceability map |
Note: UltrafastSecp256k1 has not yet undergone a paid third-party cryptographic audit. The primary assurance model here is open self-audit: reproducible tests, traceability, CI enforcement, and public review artifacts that anyone can rerun. We are open to external audit and actively preparing the codebase and evidence for outside review, but we do not wait for a formal engagement before strengthening the library ourselves. Our philosophy is to keep hardening the system continuously through internal audit on every build and every commit.
RTX 5060 Ti (CUDA 12, kernel throughput)
| Metric | Value | Notes |
|---|---|---|
| ECC operations (field/point) | ~2.3 B ops/sec | kernel-only |
| ECDSA sign | 4.88 M sigs/sec | RFC 6979, low-S |
| ECDSA verify | 4.05 M verifies/sec | Shamir+GLV (+66% vs prev) |
| Schnorr sign (BIP-340) | 3.66 M sigs/sec | BIP-340 tagged hash |
| Schnorr verify (BIP-340) | 5.38 M verifies/sec | BIP-340+GLV (+91% vs prev) |
| FROST partial verify | 1.34 M verifies/sec | ⭐ New — first open-source GPU FROST |
| Batch point compress (J→SEC1) | 97.2 M pts/sec | New kernel |
| Category | Description | Link |
|---|---|---|
| CPU | Core ECC, ECDSA, Schnorr, BIP-32, Taproot, Pedersen | examples/ |
| CUDA | GPU signatures, batch operations, device management | examples/ |
| OpenCL | Cross-vendor GPU compute | examples/ |
| Metal | Apple Silicon GPU acceleration | examples/ |
| Multi-language | C, Python, Rust, Node.js, Go, Java binding examples | examples/README.md |
| Embedded | ESP32-S3, STM32 platform ports | examples/esp32_test/ |
Star the repository if you find it useful!
Report vulnerabilities via GitHub Security Advisories or email payysoon@gmail.com. For production cryptographic systems, perform your own risk review, review the current guarantees in SUPPORTED_GUARANTEES.md, and apply the assurance level appropriate to your deployment.
For the full audit infrastructure breakdown (1M+ assertions, 23 CI/CD workflows, formal CT verification pipelines, self-audit document index), see the Engineering Quality & Self-Audit Culture section above and WHY_ULTRAFASTSECP256K1.md.
We are actively seeking sponsors and funding partners to expand continuous verification, bug bounty coverage, and long-term maintenance.
UltrafastSecp256k1 is a high-performance, zero-dependency secp256k1 library with GPU acceleration, constant-time side-channel protection, and 12+ platform targets. The funding priorities are:
We want to establish a funded bug bounty program to incentivize security researchers:
We want to make outside review easier without turning assurance into a bureaucratic checkbox:
Currently we accept vulnerability reports via GitHub Security Advisories but cannot offer financial rewards without sponsor funding.
Sponsorship helps sustain development of:
| Method | Link |
|---|---|
| GitHub Sponsors (preferred) | github.com/sponsors/shrec |
| Bitcoin Lightning | shrec@stacker.news (any Lightning wallet) |
| PayPal | paypal.me/IChkheidze |
| Corporate / Foundation | payysoon@gmail.com |
| Discord | Join our server |
All sponsors will be acknowledged in the README, release notes, and project documentation. For corporate partnerships, audit co-funding, or grant applications – please reach out via email.
Features are organized into maturity tiers (see SUPPORTED_GUARANTEES.md for detailed guarantees):
| Tier | Category | Component | Status |
|---|---|---|---|
| 1 – Core | Field / Scalar / Point | GLV, Precompute, Batch Inverse | [OK] |
| 1 – Core | Assembly | x64 MASM/GAS, BMI2/ADX, ARM64, RISC-V RV64GC | [OK] |
| 1 – Core | SIMD | AVX2/AVX-512 batch ops, Montgomery batch inverse | [OK] |
| 1 – Core | Constant-Time | CT field/scalar/point – no secret-dependent branches | [OK] |
| 1 – Core | ECDSA | Sign/Verify, RFC 6979, DER/Compact, low-S, Recovery | [OK] |
| 1 – Core | Schnorr | BIP-340 sign/verify, tagged hashing, x-only pubkeys | [OK] |
| 1 – Core | ECDH | Key exchange (raw, xonly, SHA-256) | [OK] |
| 1 – Core | Multi-scalar | Strauss/Shamir dual-scalar multiplication | [OK] |
| 1 – Core | Batch verify | ECDSA + Schnorr batch verification | [OK] |
| 1 – Core | Hashing | SHA-256 (SHA-NI), SHA-512, HMAC, Keccak-256 | [OK] |
| 1 – Core | C ABI | ufsecp stable FFI (45 exports) | [OK] |
| 2 – Protocol | BIP-32/44 | HD derivation, path parsing, xprv/xpub, coin-type | [OK] |
| 2 – Protocol | Taproot | BIP-341/342, tweak, Merkle tree | [OK] |
| 2 – Protocol | MuSig2 | BIP-327, key aggregation, 2-round signing | [OK] |
| 2 – Protocol | FROST | Threshold signatures, t-of-n | [OK] |
| 2 – Protocol | Adaptor | Schnorr + ECDSA adaptor signatures | [OK] |
| 2 – Protocol | Pedersen | Commitments, homomorphic, switch commitments | [OK] |
| 2 – Protocol | ZK Proofs | Schnorr sigma, DLEQ, Bulletproof range proofs (64-bit) | [OK] |
| 3 – Convenience | Address | P2PKH, P2WPKH, P2TR, Base58, Bech32/m, EIP-55 | [OK] |
| 3 – Convenience | Coins | 27 blockchains, auto-dispatch | [OK] |
| 2 – Protocol | BIP-352 | Silent Payments scanning pipeline (CPU + GPU) | [OK] |
| 2 – Protocol | ECIES | Elliptic curve integrated encryption | [OK] |
| – | GPU | CUDA, Metal, OpenCL, ROCm kernels | [OK] |
| – | GPU C ABI | ufsecp_gpu – 7 batch ops across 3 backends (17 FFI functions, incl. FROST) | [OK] |
| – | Platforms | x64, ARM64, RISC-V, ESP32, STM32, WASM, iOS, Android | [OK] |
Tier 1 = battle-tested core crypto with stable API. Tier 2 = protocol-level features, API may evolve. Tier 3 = convenience utilities.
All public API functions enforce canonical input encoding as required by BIP-340 and Bitcoin consensus:
r >= p or s >= n are rejected, not reducedx >= p are rejected, not reduced1 <= sk < nThe C ABI (ufsecp_*) returns distinct error codes: UFSECP_ERR_BAD_SIG (non-canonical signature) vs UFSECP_ERR_VERIFY_FAIL (valid encoding, bad math). See docs/COMPATIBILITY.md for details.
The full 7-stage BIP-352 scanning pipeline runs entirely on-GPU with zero CPU round-trips:
BIP0352/SharedSecret tagged hashspend_pubkey + output_point| Mode | ns/op | Throughput | Notes |
|---|---|---|---|
| GPU pipeline (GLV, w=4) | 179.2 ns | 5.58 M/s | GLV wNAF decomposition |
| GPU pipeline (LUT) | 91.0 ns | 11.00 M/s | 64 MB precomputed 16×64K generator table |
| GPU pipeline (LUT + pretbl) | 102.1 ns | ~9.79 M/s | Precomputed per-tweak tables |
500K tweak points per batch, 11 passes, median. Near-optimal occupancy for RTX 5060 Ti (SM 12.0, 36 SMs). ~950 billion candidates/day.
| Platform | Full Pipeline | vs GPU (LUT) |
|---|---|---|
| CUDA GPU (RTX 5060 Ti) | 91.0 ns/op | baseline |
| x86-64 CPU (i5-14400F, GCC 14) | 24,285 ns/op | 267× slower |
| ARM64 CPU (Cortex-A55, Clang 18) | 153,385 ns/op | 1,644× slower |
| RISC-V 64 (SiFive U74, GCC 13) | 257,996 ns/op | 2,765× slower |
See docs/COMMUNITY_BENCHMARKS.md for all hardware results submitted by community members — including RTX 5070 Ti (Blackwell) and a standalone BIP-352 CPU comparison vs libsecp256k1. Want to add yours? Instructions are in that file.
Independent benchmarks from Sparrow Wallet's Frigate — a DuckDB-based Silent Payments scanning pipeline using UltrafastSecp256k1 via ufsecp_scan(...). Results produced by Frigate's benchmark.py scanning mainnet to block 914,000.
GPU scanning (full BIP-352 pipeline, 2-year scan, 133M tweaks):
| Hardware | Backend | Time | Throughput |
|---|---|---|---|
| 2× NVIDIA RTX 5090 | CUDA | 3.2 s | ~41.5 M/s |
| NVIDIA RTX 5080 | CUDA | 7.7 s | ~17.3 M/s |
| Apple M1 Pro | Metal | 3m 47s | ~584 K/s |
CPU scanning (full BIP-352 pipeline, 2-year scan, 133M tweaks):
| Hardware | CPUs | Time | Throughput |
|---|---|---|---|
| Intel Core Ultra 9 285K | 24 | 3m 50s | ~577 K/s |
| Apple M1 Pro | 10 | 7m 47s | ~284 K/s |
Source: Frigate README — Performance
Standalone single-threaded benchmark by @craigraw (bench_bip352) — full results in docs/COMMUNITY_BENCHMARKS.md. Thank you for the contribution!
Full pipeline (10K points, 11 passes, median, GCC 12.4, -O3 -march=native, USE_ASM_X86_64=1):
| Backend | Median | ns/op | Ratio |
|---|---|---|---|
| libsecp256k1 | 545.2 ms | 54,519 ns | 1.00x |
| UltrafastSecp256k1 | 456.1 ms | 45,615 ns | 1.20x faster |
Per-operation breakdown (1K points, 11 passes, median):
| Operation | libsecp256k1 | UltrafastSecp256k1 | Ratio |
|---|---|---|---|
| k*P (scalar mul) | 37,975 ns | 26,460 ns | 1.44x faster |
| Serialize compressed (1st) | 36 ns | 15 ns | 2.4x faster |
| Tagged SHA-256 | 744 ns | 65 ns | 11.4x faster |
| k*G (generator mul) | 17,460 ns | 8,559 ns | 2.04x faster |
| Point addition | 2,250 ns | 2,457 ns | 0.92x |
| Serialize compressed (2nd) | 23 ns | 21 ns | 1.1x faster |
Note: Point addition is slightly slower because both inputs have Z=1 (affine), so UltrafastSecp256k1 uses direct affine addition with a field inversion to return an affine result – this eliminates the separate inversion in serialization.
Get a working selftest in under a minute:
Option A – Linux (apt)
Option B – npm (any OS)
Option C – Python (any OS)
Option D – Build from source
| Target | Backend | Install / Entry Point | Status |
|---|---|---|---|
| Linux x64 | CPU | apt install libufsecp3 | [OK] Stable |
| Windows x64 | CPU | NuGet UltrafastSecp256k1 / Release .zip | [OK] Stable |
| macOS (x64/ARM64) | CPU + Metal | brew install ufsecp / build from source | [OK] Stable |
| Android ARM64 | CPU | ‘implementation 'io.github.shrec:ufsecp’(Maven) \ilinebr </td> <td class="markdownTableBodyNone"> [OK] Stable \ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone"> **iOS ARM64** \ilinebr </td> <td class="markdownTableBodyNone"> CPU \ilinebr </td> <td class="markdownTableBodyNone"> Swift Package / CocoaPods / XCFramework \ilinebr </td> <td class="markdownTableBodyNone"> [OK] Stable \ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone"> **Browser / Node.js** \ilinebr </td> <td class="markdownTableBodyNone"> WASM \ilinebr </td> <td class="markdownTableBodyNone">npm i ufsecp\ilinebr </td> <td class="markdownTableBodyNone"> [OK] Stable \ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone"> **ESP32-S3 / ESP32** \ilinebr </td> <td class="markdownTableBodyNone"> CPU \ilinebr </td> <td class="markdownTableBodyNone"> PlatformIO / IDF component \ilinebr </td> <td class="markdownTableBodyNone"> [OK] Tested \ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone"> **STM32 (Cortex-M)** \ilinebr </td> <td class="markdownTableBodyNone"> CPU \ilinebr </td> <td class="markdownTableBodyNone"> CMake cross-compile \ilinebr </td> <td class="markdownTableBodyNone"> [OK] Tested \ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone"> **NVIDIA GPU** \ilinebr </td> <td class="markdownTableBodyNone"> CUDA 12+ \ilinebr </td> <td class="markdownTableBodyNone"> Build with-DSECP256K1_BUILD_CUDA=ON\ilinebr </td> <td class="markdownTableBodyNone"> [OK] Stable \ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone"> **AMD GPU** \ilinebr </td> <td class="markdownTableBodyNone"> ROCm/HIP \ilinebr </td> <td class="markdownTableBodyNone"> Build with-DSECP256K1_BUILD_ROCM=ON\ilinebr </td> <td class="markdownTableBodyNone"> [!] Beta \ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone"> **Apple GPU** \ilinebr </td> <td class="markdownTableBodyNone"> Metal \ilinebr </td> <td class="markdownTableBodyNone"> Build with Metal backend \ilinebr </td> <td class="markdownTableBodyNone"> [..] Experimental (discovery only) \ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone"> **Any GPU** \ilinebr </td> <td class="markdownTableBodyNone"> OpenCL \ilinebr </td> <td class="markdownTableBodyNone"> Build with-DSECP256K1_BUILD_OPENCL=ON` | [OK] Full (6/6 ops) |
| RISC-V (RV64GC) | CPU | Cross-compile | [OK] Tested |
UltrafastSecp256k1 is the only open-source library that provides full secp256k1 ECDSA + Schnorr sign/verify on GPU across four backends (as of February 2026; if you know of another, please let us know):
| Backend | Hardware | kG/s | ECDSA Sign | ECDSA Verify | Schnorr Sign | Schnorr Verify | FROST Verify |
|---|---|---|---|---|---|---|---|
| CUDA | RTX 5060 Ti | 4.59 M/s | 4.88 M/s | 4.05 M/s | 3.66 M/s | 5.38 M/s | 1.34 M/s |
| OpenCL | RTX 5060 Ti | 3.86 M/s | – | 2.44 M/s* | – | 2.82 M/s* | — |
| Metal | Apple M3 Pro | 0.33 M/s | – | – | – | – | | ROCm (HIP) | AMD GPUs | Portable | – | – | – | – |
CUDA 12.0, sm_86;sm_89, batch=16K signatures, measured on RTX 5060 Ti. The CUDA path uses our own hybrid GPU execution model, which improved end-to-end throughput by more than 10% during optimization. Metal 2.4, 8x32-bit Comba limbs, 18 GPU cores. (*) OpenCL ECDSA/Schnorr verify uses extended kernel with lazy-loaded runtime compilation.
| Operation | Time/Op | Throughput |
|---|---|---|
| Field Mul | 0.2 ns | 4,142 M/s |
| Field Add | 0.2 ns | 4,130 M/s |
| Field Inv | 10.2 ns | 98.35 M/s |
| Point Add | 1.6 ns | 619 M/s |
| Point Double | 0.8 ns | 1,282 M/s |
| Scalar Mul (Pxk) | 225.8 ns | 4.43 M/s |
| Generator Mul (Gxk) | 217.7 ns | 4.59 M/s |
| Batch Inv (Montgomery) | 2.9 ns | 340 M/s |
| Jac->Affine (per-pt) | 14.9 ns | 66.9 M/s |
| Operation | Time/Op | Throughput | Protocol | Δ vs prev |
|---|---|---|---|---|
| ECDSA Sign | 204.8 ns | 4.88 M/s | RFC 6979 + low-S | — |
| ECDSA Verify | 246.7 ns | 4.05 M/s | Shamir + GLV | +66% |
| ECDSA Sign+Recid | 311.5 ns | 3.21 M/s | Recoverable (EIP-155) | — |
| Schnorr Sign | 273.4 ns | 3.66 M/s | BIP-340 | — |
| Schnorr Verify | 185.9 ns | 5.38 M/s | BIP-340 + GLV | +91% |
| FROST Partial Verify | 748.9 ns | 1.34 M/s | t-of-n threshold | ⭐ New |
| Operation | CUDA | OpenCL | Winner |
|---|---|---|---|
| Field Mul | 0.2 ns | 0.2 ns | Tie |
| Field Inv | 10.2 ns | 14.3 ns | CUDA 1.40x |
| Point Double | 0.8 ns | 0.9 ns | CUDA 1.13x |
| Point Add | 1.6 ns | 1.6 ns | Tie |
| kG (Generator Mul) | 217.7 ns | 258.9 ns | CUDA 1.19x |
| BIP352 Pipeline | 91.0 ns | 93.6 ns | CUDA 1.03x |
Benchmarks: 2026-02-14, Linux x86_64, NVIDIA Driver 580.126.09. Both kernel-only (no buffer allocation/copy overhead).
| Operation | Time/Op | Throughput |
|---|---|---|
| Field Mul | 1.9 ns | 527 M/s |
| Field Inv | 106.4 ns | 9.40 M/s |
| Point Add | 10.1 ns | 98.6 M/s |
| Point Double | 5.1 ns | 196 M/s |
| Scalar Mul (Pxk) | 2.94 us | 0.34 M/s |
| Generator Mul (Gxk) | 3.00 us | 0.33 M/s |
Metal 2.4, 8x32-bit Comba limbs, Apple M3 Pro (18 GPU cores, Unified Memory 18 GB)
Full signature support across CPU and GPU:
| Operation | Time | Throughput |
|---|---|---|
| ECDSA Sign (RFC 6979) | 8.5 us | 118,000 op/s |
| ECDSA Verify | 23.6 us | 42,400 op/s |
| Schnorr Sign (BIP-340) | 6.8 us | 146,000 op/s |
| Schnorr Verify (BIP-340) | 24.0 us | 41,600 op/s |
| Key Generation (CT) | 9.5 us | 105,500 op/s |
| Key Generation (fast) | 5.5 us | 182,000 op/s |
| ECDH | 23.9 us | 41,800 op/s |
Schnorr sign is ~25% faster than ECDSA sign due to simpler nonce derivation (no modular inverse). Measured single-core, pinned, 2026-02-21.
The ct:: namespace provides constant-time operations for secret-key material – no secret-dependent branches or memory access patterns:
| Operation | Fast | CT | Overhead |
|---|---|---|---|
| Field Mul | 17 ns | 23 ns | 1.08x |
| Field Inverse | 0.8 us | 1.7 us | 2.05x |
| Complete Addition | – | 276 ns | – |
| Scalar Mul (kxP) | 23.6 us | 26.6 us | 1.13x |
| Generator Mul (kxG) | 5.3 us | 9.9 us | 1.86x |
CT layer provides: ct::field_mul, ct::field_inv, ct::scalar_mul, ct::point_add_complete, ct::point_dbl
Use the CT layer for: private key operations, signing, nonce generation, ECDH. Use the FAST layer for: verification, public key derivation, batch processing, benchmarks.
See THREAT_MODEL.md for a full layer-by-layer risk assessment.
| Evidence | Scope | Status |
|---|---|---|
| No secret-dependent branches | All ct:: functions | [OK] Enforced by design, verified via Clang-Tidy checks |
| No secret-dependent memory access | All ct:: table lookups use constant-index cmov | [OK] |
| ASan + UBSan CI | Every push – catches undefined behavior in CT paths | [OK] CI |
| Timing tests (dudect) | CPU field/scalar ops | [OK] Implemented in CI + nightly + native ARM64 |
| Deterministic CT verification | ct-verif LLVM + Valgrind CT | [OK] Implemented |
Assumptions: CT guarantees depend on compiler not introducing secret-dependent branches during optimization. Builds use -O2 with Clang; MSVC may require additional flags. Micro-architectural side channels (Spectre, power analysis) are outside current scope – see THREAT_MODEL.md.
UltrafastSecp256k1 provides ZK proof primitives over the secp256k1 curve:
| Proof Type | Prove | Verify | Proof Size | Use Cases |
|---|---|---|---|---|
| Knowledge Proof | 20.3 us | 21.8 us | 64 bytes | Prove knowledge of discrete log (x: P = x*G) |
| DLEQ Proof | 40.0 us | 56.4 us | 64 bytes | Prove log_G(P) == log_H(Q) – VRFs, adaptor sigs, atomic swaps |
| Bulletproof Range | 13,467 us | 2,634 us | ~620 bytes | Prove committed value in [0, 2^64) – Confidential Transactions |
Security model:
API: #include <secp256k1/zk.hpp> – namespace secp256k1::zk
Benchmarks: i7-14400F, 11 passes, pinned core, median. See docs/BENCHMARKS.md.
| Operation | x86-64 (Clang 21, AVX2) | ARM64 (Cortex-A76) | RISC-V (Milk-V Mars) |
|---|---|---|---|
| Field Mul | 17 ns | 74 ns | 95 ns |
| Field Square | 14 ns | 50 ns | 70 ns |
| Field Add | 1 ns | 8 ns | 11 ns |
| Field Inverse | 1 us | 2 us | 4 us |
| Point Add | 159 ns | 992 ns | 1 us |
| Generator Mul (kxG) | 5 us | 14 us | 33 us |
| Scalar Mul (kxP) | 25 us | 131 us | 154 us |
| Operation | CUDA (RTX 5060 Ti) | OpenCL (RTX 5060 Ti) | Metal (M3 Pro) |
|---|---|---|---|
| Field Mul | 0.2 ns | 0.2 ns | 1.9 ns |
| Field Inv | 10.2 ns | 14.3 ns | 106.4 ns |
| Point Add | 1.6 ns | 1.6 ns | 10.1 ns |
| Generator Mul (Gxk) | 217.7 ns | 295.1 ns | 3.00 us |
| Operation | ESP32-S3 LX7 (240 MHz) | ESP32 LX6 (240 MHz) | STM32F103 (72 MHz) |
|---|---|---|---|
| Field Mul | 6,105 ns | 6,993 ns | 15,331 ns |
| Field Square | 5,020 ns | 6,247 ns | 12,083 ns |
| Field Add | 850 ns | 985 ns | 4,139 ns |
| Field Inv | 2,524 us | 609 us | 1,645 us |
| Fast Scalar x G | 5,226 us | 6,203 us | 37,982 us |
| CT Scalar x G | 15,527 us | – | – |
| CT Generator x k | 4,951 us | – | – |
| Operation | 4x64 | 5x52 | Speedup |
|---|---|---|---|
| Multiplication | 42 ns | 15 ns | 2.76x |
| Squaring | 31 ns | 13 ns | 2.44x |
| Addition | 4.3 ns | 1.6 ns | 2.69x |
| Add chain (32 ops) | 286 ns | 57 ns | 5.01x |
5x52 uses __int128 lazy reduction – ideal for 64-bit platforms.
For full benchmark results, see docs/BENCHMARKS.md.
UltrafastSecp256k1 runs on resource-constrained microcontrollers with portable C++ (no __int128, no assembly required):
All 37 library tests pass on every embedded target. See examples/esp32_test/ and examples/stm32_test/.
See PORTING.md for a step-by-step checklist to add new CPU architectures, embedded targets, or GPU backends.
WebAssembly build via Emscripten – runs secp256k1 in any modern browser or Node.js:
Output: secp256k1_wasm.wasm + secp256k1.mjs (ES6 module with TypeScript declarations). See wasm/README.md for JavaScript/TypeScript integration.
All backends include batch modular inversion – a critical building block for Jacobian->Affine conversion:
| Backend | Function | Notes |
|---|---|---|
| CPU | fe_batch_inverse(FieldElement*, size_t) | Montgomery trick with scratch buffer |
| CUDA | batch_inverse_montgomery / batch_inverse_kernel | GPU Montgomery trick kernel |
| Metal | batch_inverse | Chunked parallel threadgroups |
| OpenCL | Inline PTX inverse | Batch via host orchestration |
Algorithm: Montgomery batch inverse computes N field inversions using only 1 modular inversion + 3(N-1) multiplications, amortizing the expensive inversion across the entire batch.
For N=1024: ~500x cheaper than individual inversions. A single field inversion costs ~3.5 us (Fermat), while batch amortizes to ~7 ns per element.
Branchless mixed addition (add_mixed_inplace) uses the madd-2007-bl formula: 7M + 4S (vs 11M + 5S for full Jacobian add).
Production GPU apps use a memory-efficient variant: instead of storing full Z coordinates, jacobian_add_mixed_h returns H = U2 - X1 separately. Since Z_k = Z_0 * H_0 * H_1 * … * H_{k-1}, the entire Z chain is invertible from H values + initial Z_0.
Cost: 1 Fermat inversion + 2N multiplications per thread (vs N Fermat inversions naively).
See
apps/secp256k1_search_gpu_only/gpu_only.cu(step kernel) +unified_split.cuh(batch inversion kernel)
Starting with v3.4.0, UltrafastSecp256k1 ships a stable C ABI – ufsecp – designed for FFI bindings (C#, Python, Rust, Go, Java, Node.js, Dart, React Native, PHP, Ruby, etc.):
Default behavior:
ufsecp)**: Defaults to safe behavior – all secret-key operations (sign, derive, ECDH) use CT internally. No configuration needed.fast:: and ct:: namespaces – the developer chooses explicitly per call site.Starting with v3.3.0, the GPU layer is fully accessible from any FFI language via ufsecp_gpu.h:
| Category | Functions |
|---|---|
| Discovery | gpu_backend_count, gpu_backend_name, gpu_is_available, gpu_device_count, gpu_device_info |
| Lifecycle | gpu_ctx_create, gpu_ctx_destroy, gpu_last_error, gpu_last_error_msg, gpu_error_str |
| Batch Ops | gpu_generator_mul_batch, gpu_ecdsa_verify_batch, gpu_schnorr_verify_batch, gpu_ecdh_batch, gpu_hash160_pubkey_batch, gpu_msm, gpu_frost_verify_partial_batch, gpu_ecrecover_batch |
| Batch Operation | CUDA | OpenCL | Metal |
|---|---|---|---|
generator_mul_batch | [OK] | [OK] | [OK] |
ecdsa_verify_batch | [OK] | [OK] | [OK] |
schnorr_verify_batch | [OK] | [OK] | [OK] |
ecdh_batch | [OK] | [OK] | [OK] |
hash160_pubkey_batch | [OK] | [OK] | [OK] |
msm | [OK] | [OK] | [OK] |
frost_verify_partial_batch | [OK] | [OK] | [OK] |
ecrecover_batch | [OK] | [..] temporary stub | [..] temporary stub |
See ufsecp_gpu.h and GPU Validation Matrix for details.
| Category | Functions |
|---|---|
| Context | ctx_create, ctx_destroy, selftest, last_error |
| Keys | keygen, seckey_verify, pubkey_create, pubkey_parse, pubkey_serialize |
| ECDSA | ecdsa_sign, ecdsa_sign_batch, ecdsa_verify, ecdsa_sign_der, ecdsa_verify_der, ecdsa_recover |
| Schnorr | schnorr_sign, schnorr_sign_batch, schnorr_verify |
| SHA-256 | sha256 (SHA-NI accelerated) |
| ECDH | ecdh_compressed, ecdh_xonly, ecdh_raw |
| BIP-32 | bip32_from_seed, bip32_derive_child, bip32_serialize |
| Address | address_p2pkh, address_p2wpkh, address_p2tr |
| WIF | wif_encode, wif_decode |
| Tweak | pubkey_tweak_add, pubkey_tweak_mul |
| Version | version, abi_version, version_string |
See SUPPORTED_GUARANTEES.md for Tier 1/2/3 stability guarantees.
### Testers Wanted We need community testers for platforms we cannot fully validate in CI:
- iOS – Build & run on real iPhone/iPad hardware with Xcode
- AMD GPU (ROCm/HIP) – Test on AMD Radeon RX / Instinct GPUs
Open an issue with your results!
Universal XCFramework (arm64 device + arm64 simulator). Also available via Swift Package Manager and CocoaPods.
This local helper runs the same cross-arch smoke surface now used in CI: run_selftest smoke, test_bip324_standalone, bench_kP, and bench_bip324. Install the corresponding cross toolchain, libc sysroot, qemu-user-static, and ninja-build first.
If you prefer the existing local CI entry point, the same coverage is also available as:
| Option | Default | Description |
|---|---|---|
SECP256K1_USE_ASM | ON | Assembly optimizations (x64/ARM64/RISC-V) |
SECP256K1_BUILD_CUDA | OFF | CUDA GPU support |
SECP256K1_BUILD_OPENCL | OFF | OpenCL GPU support |
SECP256K1_BUILD_ROCM | OFF | ROCm/HIP GPU support (AMD) |
SECP256K1_BUILD_TESTS | ON | Test suite |
SECP256K1_BUILD_BENCH | ON | Benchmarks |
SECP256K1_GLV_WINDOW_WIDTH | platform | GLV window width (4-7); default 5 on x86/ARM/RISC-V, 4 on ESP32/WASM |
SECP256K1_RISCV_USE_VECTOR | ON | RVV vector extension (RISC-V) |
For detailed build instructions, see docs/BUILDING.md.
Two security profiles are always active – no flag-based selection:
Choose the appropriate profile for your use case. Using FAST with secret data is a security vulnerability. See THREAT_MODEL.md for full details.
| # | Coin | Ticker | Address Types | BIP-44 |
|---|---|---|---|---|
| 1 | Bitcoin | BTC | P2PKH, P2WPKH (Bech32), P2TR (Bech32m) | m/86'/0' |
| 2 | Ethereum | ETH | EIP-55 Checksum | m/44'/60' |
| 3 | Litecoin | LTC | P2PKH, P2WPKH | m/84'/2' |
| 4 | Dogecoin | DOGE | P2PKH | m/44'/3' |
| 5 | Bitcoin Cash | BCH | P2PKH | m/44'/145' |
| 6 | Bitcoin SV | BSV | P2PKH | m/44'/236' |
| 7 | Zcash | ZEC | P2PKH (transparent) | m/44'/133' |
| 8 | Dash | DASH | P2PKH | m/44'/5' |
| 9 | DigiByte | DGB | P2PKH, P2WPKH | m/44'/20' |
| 10 | Namecoin | NMC | P2PKH | m/44'/7' |
| 11 | Peercoin | PPC | P2PKH | m/44'/6' |
| 12 | Vertcoin | VTC | P2PKH, P2WPKH | m/44'/28' |
| 13 | Viacoin | VIA | P2PKH | m/44'/14' |
| 14 | Groestlcoin | GRS | P2PKH, P2WPKH | m/44'/17' |
| 15 | Syscoin | SYS | P2PKH | m/44'/57' |
| 16 | BNB Smart Chain | BNB | EIP-55 | m/44'/60' |
| 17 | Polygon | MATIC | EIP-55 | m/44'/60' |
| 18 | Avalanche | AVAX | EIP-55 (C-Chain) | m/44'/60' |
| 19 | Fantom | FTM | EIP-55 | m/44'/60' |
| 20 | Arbitrum | ARB | EIP-55 | m/44'/60' |
| 21 | Optimism | OP | EIP-55 | m/44'/60' |
| 22 | Ravencoin | RVN | P2PKH | m/44'/175' |
| 23 | Flux | FLUX | P2PKH | m/44'/19167' |
| 24 | Qtum | QTUM | P2PKH | m/44'/2301' |
| 25 | Horizen | ZEN | P2PKH | m/44'/121' |
| 26 | Bitcoin Gold | BTG | P2PKH | m/44'/156' |
| 27 | Komodo | KMD | P2PKH | m/44'/141' |
All EVM chains (ETH, BNB, MATIC, AVAX, FTM, ARB, OP) share the same address format (EIP-55 checksummed hex).
| Platform | Architecture | Backend | Status |
|---|---|---|---|
| Desktop CPU | x86_64 (Intel / AMD) | CPU | [OK] Stable |
| Desktop CPU | ARM64 (Apple Silicon, Ampere) | CPU | [OK] Stable |
| Desktop CPU | RISC-V RV64GC | CPU | [OK] Stable |
| Raspberry Pi | ARM64 (BCM2710, Zero 2 W) | CPU | [..] Testing |
| NVIDIA GPU | RTX / GTX / Tesla (sm_50+) | CUDA 12+ | [OK] Stable (8/8 GPU C ABI ops) |
| AMD GPU | RDNA / CDNA | OpenCL | [OK] Broad (7/8 GPU C ABI ops; ecrecover_batch pending) |
| AMD GPU | RDNA / CDNA | ROCm/HIP | [!] Beta |
| Apple GPU | Apple Silicon (M1/M2/M3/M4) | Metal | [..] Experimental (7/8 GPU C ABI ops; ecrecover_batch pending) |
| Any GPU | OpenCL 1.2+ compatible | OpenCL | [OK] Broad (7/8 GPU C ABI ops; ecrecover_batch pending) |
| ESP32-S3 | Xtensa LX7 @ 240 MHz | CPU | [OK] Tested |
| ESP32-P4 | RISC-V @ 400 MHz | CPU | [OK] Supported |
| ESP32-C6 | RISC-V (single-core) | CPU | [OK] Supported |
| STM32 | ARM Cortex-M3/M4 | CPU | [..] Experimental |
| WebAssembly | WASM (Emscripten) | CPU | [OK] Stable |
| Android | ARM64 (NDK r27c) | CPU | [OK] Stable |
| iOS | ARM64 (Xcode) | CPU | [OK] Stable |
GPU C ABI ops: generator_mul_batch, ecdsa_verify_batch, schnorr_verify_batch, ecdh_batch, hash160_pubkey_batch, msm, frost_verify_partial_batch, ecrecover_batch. See GPU Validation Matrix for per-backend details.
| Target | MCU | Clock | Scalar x G | Flash | RAM |
|---|---|---|---|---|---|
| ESP32-S3 | Xtensa LX7 (dual) | 240 MHz | 5.2 ms | ~120 KB | ~8 KB |
| ESP32-PICO-D4 | Xtensa LX6 (dual) | 240 MHz | 6.2 ms | ~120 KB | ~8 KB |
| ESP32-P4 | RISC-V | 400 MHz | ~3 ms | ~120 KB | ~8 KB |
| ESP32-C6 | RISC-V (single) | 160 MHz | ~12 ms | ~120 KB | ~8 KB |
| STM32F103 | Cortex-M3 | 72 MHz | 38 ms | ~100 KB | ~6 KB |
Every executable runs a deterministic Known Answer Test (KAT) on startup, covering all arithmetic operations:
| Mode | Time | When | What |
|---|---|---|---|
| smoke | ~1-2s | App startup, embedded | Core KAT (10 scalar mul, field/scalar identities, boundary vectors) |
| ci | ~30-90s | Every push (CI) | Smoke + cross-checks, bilinearity, NAF/wNAF, batch sweeps, algebraic stress |
| stress | ~10-60min | Nightly / manual | CI + 1000 random scalar muls, 500 field triples, batch inverse up to 8192 |
libFuzzer harnesses cover core arithmetic (cpu/fuzz/):
| Target | What it tests |
|---|---|
fuzz_field | add/sub round-trip, mul identity, square, inverse |
fuzz_scalar | add/sub, mul identity, distributive law |
fuzz_point | on-curve check, negate, compress round-trip, dbl vs add |
| Platform | Backend | Compiler | Status |
|---|---|---|---|
| Linux x64 | CPU | GCC 13 / Clang 17 | [OK] CI |
| Linux x64 | CPU | Clang 17 (ASan+UBSan) | [OK] CI |
| Linux x64 | CPU | Clang 17 (TSan) | [OK] CI |
| Windows x64 | CPU | MSVC 2022 | [OK] CI |
| macOS ARM64 | CPU + Metal | AppleClang | [OK] CI |
| iOS ARM64 | CPU | Xcode | [OK] CI |
| Android ARM64 | CPU | NDK r27c | [OK] CI |
| WebAssembly | CPU | Emscripten | [OK] CI |
| ROCm/HIP | CPU + GPU | ROCm 6.3 | [OK] CI |
The unified_audit_runner executes 54 audit modules across 8 sections (mathematical invariants, constant-time analysis, differential testing, standard vectors, fuzzing, protocol security, ABI safety, performance validation).
| Platform | OS | Compiler | Modules | Verdict | Time |
|---|---|---|---|---|---|
| Windows (local) | Windows x86-64 | Clang 21.1.0 | 54/55 | AUDIT-READY | 42 s |
| Linux Docker | Linux x86-64 | GCC 13.3.0 | 54/55 | AUDIT-READY | 51 s |
| Linux CI | Linux x86-64 | Clang 17.0.6 | 55/55 | AUDIT-READY | 48 s |
| Linux CI | Linux x86-64 | GCC 13.3.0 | 55/55 | AUDIT-READY | 52 s |
| Windows CI | Windows x86-64 | MSVC 1944 | 55/55 | AUDIT-READY | 143 s |
54/55 = 1 advisory warning (dudect timing smoke – probabilistic, flakes under hypervisor noise). Full reports: audit/platform-reports/
| Target | Description |
|---|---|
bench_unified | THE standard: full apple-to-apple vs libsecp256k1 + OpenSSL |
bench_ct | Fast-vs-CT overhead comparison |
bench_field_52 | 5x52 field arithmetic micro-benchmarks |
bench_field_26 | 10x26 field arithmetic micro-benchmarks |
bench_kP | Scalar multiplication (k*P) benchmarks |
This library explores the performance ceiling of secp256k1 across CPU architectures (x64, ARM64, RISC-V, Cortex-M, Xtensa) and GPUs (CUDA, OpenCL, Metal, ROCm). Zero external dependencies. Pure C++20.
C++ API: Not yet stable. Breaking changes may occur before v4.0. Core layers (field, scalar, point, ECDSA, Schnorr) are mature. Experimental layers (MuSig2, FROST, Adaptor, Pedersen, Taproot, HD, Coins) may change.
**C ABI (ufsecp)**: Stable from v3.4.0. ABI version tracked separately. See SUPPORTED_GUARANTEES.md.
All releases starting from v3.15.0 are cryptographically signed using Sigstore cosign (keyless, GitHub OIDC identity). Older historical releases remain unsigned but are preserved unchanged.
Every release includes:
| Artifact | Purpose |
|---|---|
SHA256SUMS | Checksums for all release archives |
SHA256SUMS.sig | Cosign signature of the manifest |
SHA256SUMS.pem | Signing certificate (Sigstore OIDC) |
sbom.cdx.json | CycloneDX Software Bill of Materials |
Per-archive .sig + .pem | Individual artifact signatures |
Linux:
macOS:
Windows (PowerShell):
| Supply Chain | Status |
|---|---|
| SHA256SUMS for all artifacts | [OK] Every release |
| Cosign / Sigstore manifest signing | [OK] v3.15.0+ |
| Per-artifact Cosign signatures | [OK] v3.15.0+ |
| SLSA Build Provenance (GitHub Attestation) | [OK] Every release |
| CycloneDX SBOM | [OK] Every release |
| Reproducible builds documentation | [OK] Dockerfile.reproducible |
Is UltrafastSecp256k1 a drop-in replacement for libsecp256k1?
No. It is an independent implementation with a different API. The C ABI (
ufsecp) provides a stable FFI surface, but function signatures differ from libsecp256k1. Migration requires code changes.
Is the API stable?
The C ABI (
ufsecp) is stable from v3.4.0. The C++ API (namespacesfast::,ct::) is mature for Tier 1 features but may change before v4.0.
What is the constant-time scope?
All functions in
ct::namespace are constant-time: field arithmetic, scalar arithmetic, point multiplication, complete addition, signing, and ECDH. The C ABI uses CT internally for all secret-key operations. See CT Evidence above.
Which parts are production-safe today?
This library has not undergone a paid external audit. Tier 1 features (core ECC, ECDSA, Schnorr, ECDH, stable C ABI) are extensively tested, fuzzed, regression-gated, and run through sanitizer-backed CI. Teams can evaluate it today with a strong self-audit trail and reproducible audit evidence, then make their own deployment decision based on their risk model and review standards.
How do I reproduce the benchmarks?
See `docs/BENCHMARKS.md` for exact commands, pinned compiler/driver versions, and raw logs. The live dashboard tracks performance across commits.
| Document | Description |
|---|---|
| API Reference | Full C++ and C ABI reference |
| Build Guide | Detailed build instructions for all platforms |
| Benchmarks | Complete benchmark results and methodology |
| GPU API | GPU C ABI header (18 functions, 8 ops, 3 backends) |
| GPU Validation Matrix | Per-backend op coverage and validation status |
| Feature Maturity | Per-feature GPU/CT/fuzz/tier status table |
| Supported Guarantees | ABI stability tiers and commitment levels |
| Audit Coverage | Full audit report with 55 modules and platform verdicts |
| Audit Guide | How to run and interpret audit suite |
| Test Matrix | Comprehensive test coverage map for auditors |
| ARM64 Audit & Benchmark | ARM64 platform certification and performance analysis |
| Threat Model | Layer-by-layer security risk assessment |
| Security Policy | Vulnerability reporting and audit status |
| Porting Guide | Add new platforms, architectures, GPU backends |
| RISC-V Optimizations | RISC-V assembly details |
| ESP32 Setup | ESP32 embedded development guide |
| Examples | Multi-language binding examples (C, Python, Rust, Node.js, Go, Java) |
| Contributing | Development guidelines |
| Changelog | Version history |
Contributions are welcome! Please read CONTRIBUTING.md.
MIT License
This project is licensed under the MIT License. Previously released versions (up to v3.14.x) were under AGPL-3.0. As of v3.15.0 the license is MIT – to align with the broader Bitcoin ecosystem and remove adoption friction.
See [LICENSE](LICENSE) for full details.
| Channel | Link |
|---|---|
| Issues | GitHub Issues |
| Discussions | GitHub Discussions |
| Wiki | Documentation Wiki |
| Benchmarks | Live Dashboard |
| Security | Report Vulnerability |
| Commercial | payysoon@gmail.com |
UltrafastSecp256k1 is an independent implementation – written from scratch with our own architecture, hybrid GPU execution model, embedded ports, and optimization techniques. The library's core structure and most performance gains came from direct experimentation, profiling, and iteration. At the same time, no project exists in a vacuum. Studying public research and implementation notes from the wider cryptographic community later helped us validate decisions, avoid weaker paths, and uncover additional optimization opportunities.
We want to acknowledge the teams whose public work informed parts of our journey:
CMAKE_CUDA_SEPARABLE_COMPILATION flag required for Blackwell devices. Results in docs/COMMUNITY_BENCHMARKS.md.We share our optimizations, GPU kernels, embedded ports, and cross-platform techniques freely – because open-source cryptography grows stronger when knowledge flows in every direction.
Special thanks to the Stacker News and Delving Bitcoin communities for their early support and technical feedback.
Extra gratitude to @0xbitcoiner for the initial outreach and for helping bridge the project with the wider Bitcoin developer ecosystem.
If you find UltrafastSecp256k1 useful, consider supporting its development!
We are actively seeking sponsors for a funded bug bounty program, stronger open audit infrastructure, and ongoing development. See the Seeking Sponsors section above for details.
| Method | Link |
|---|---|
| GitHub Sponsors (preferred) | github.com/sponsors/shrec |
| Bitcoin Lightning | shrec@stacker.news via any Lightning wallet |
| PayPal | paypal.me/IChkheidze |
| Corporate / Foundation grants | payysoon@gmail.com |
All sponsors are acknowledged in the README and release notes.
UltrafastSecp256k1 – High-performance secp256k1 cryptography for CPU, CUDA, OpenCL, mobile, embedded, and WebAssembly. GPU-accelerated ECDSA and Schnorr on CUDA, zero dependencies, constant-time secret-key paths, and broad multi-platform coverage.