Performance Benchmarks ====================== This page records a reproducible OpenQuantumSim performance baseline. The numbers below are not a universal leaderboard; they document one hardware and software configuration so future benchmark results can be compared against the same reference point. .. image:: _static/benchmarks/readme_benchmark_summary.png :alt: OpenQuantumSim benchmark summary comparing deterministic QuTiP speedups and MCWF backend threading. Benchmark Environment --------------------- .. list-table:: :header-rows: 1 * - Item - Value * - Date - 2026-05-22 * - OpenQuantumSim commit - startup optimization snapshot * - CPU - Apple M1 * - Logical CPU count - 8 * - Platform - macOS 26.4.1 arm64 * - Python - 3.14.3 * - Julia backend runtime - 1.11.9 through JuliaCall * - ``julia --version`` - 1.12.5 * - OpenQuantumSim - 0.1.0a2 * - QuTiP - 5.2.3 * - NumPy / SciPy / h5py - 2.4.4 / 1.17.1 / 3.16.0 Deterministic Solver: OpenQuantumSim vs QuTiP --------------------------------------------- Command: .. code-block:: bash MPLCONFIGDIR=/private/tmp/oqs-mpl \ python benchmarks/bench_vs_qutip.py \ --repeats 5 \ --time-points 81 \ --t-final 6.0 \ --cases qubit jc5 jc10 \ --oqs-methods auto krylov ode \ --json runs/benchmarks/bench_vs_qutip_after_stats.json Settings: ``rtol=1e-8``, ``atol=1e-10``. OpenQuantumSim used the default single-threaded backend process for this deterministic benchmark. .. list-table:: :header-rows: 1 * - Case - Dimension - QuTiP median - OQS auto median - Best OQS median - OQS auto vs QuTiP - Max expectation delta * - Qubit decay - 2 - 1.43 ms - 1.00 ms - 0.77 ms (``ode``) - 1.42x - 7.49e-09 * - Jaynes-Cummings 5 - 10 - 2.18 ms - 1.14 ms - 1.14 ms (``auto``) - 1.90x - 1.21e-09 * - Jaynes-Cummings 10 - 20 - 6.71 ms - 2.60 ms - 2.08 ms (``ode``) - 2.58x - 7.23e-09 Interpretation: after reducing solver-stat conversion overhead at the Python-Julia boundary, OpenQuantumSim is faster than QuTiP for these small deterministic benchmark cases on this machine. Expectation values agree with QuTiP at about ``1e-9`` to ``1e-8``. Python Wrapper Profile ---------------------- The main small-system bottleneck was not the Julia integrator. It was repeated Python-side probing of optional fields in the Julia ``NamedTuple`` used for ``Result.solver_stats``. The conversion now uses the fields reported by ``dir(...)`` and avoids exception-heavy lookups for fields that are not present. .. list-table:: :header-rows: 1 * - Profile - Workload - Python-visible cumulative time - Solver-stat conversion time * - Before - 100 warm qubit-decay ``mesolve`` calls - 0.671 s - 0.601 s * - After - 100 warm qubit-decay ``mesolve`` calls - 0.050 s - 0.007 s Backend Startup Profile ----------------------- Command: .. code-block:: bash PYTHON_JULIACALL_HANDLE_SIGNALS=yes python - <<'PY' import time import numpy as np import openquantumsim as oqs from openquantumsim._julia_bridge import get_julia, load_backend started = time.perf_counter(); get_julia() print("get_julia", time.perf_counter() - started) started = time.perf_counter(); load_backend() print("load_backend", time.perf_counter() - started) space = oqs.SpinSpace(0.5, label="atom") psi = oqs.basis(space, "up") H = 0.0 * oqs.sigmaz(space) c = np.sqrt(0.35) * oqs.sigmam(space) e = oqs.Operator(oqs.ket2dm(psi), space, "P_excited") t = np.linspace(0.0, 1.0, 11) started = time.perf_counter() oqs.mesolve(H, oqs.ket2dm(psi), t, c_ops=[c], e_ops=[e]) print("first_mesolve", time.perf_counter() - started) PY The profile below measures a fresh Python process after the Julia backend has already been set up once. Normal runtime loads now skip ``Pkg.instantiate()`` unless loading the backend fails; ``setup_julia.py`` still forces instantiation for installation validation. .. list-table:: :header-rows: 1 * - Profile - ``import openquantumsim`` - ``get_julia()`` - ``load_backend()`` - First ``mesolve`` - Total * - Before - 0.786 s - 2.602 s - 11.337 s - 7.775 s - 22.500 s * - After - 0.334 s - 3.005 s - 6.139 s - 7.816 s - 17.294 s The same change also suppresses routine Julia package-manager output during normal solver calls. Larger Deterministic Spot Checks -------------------------------- Command: .. code-block:: bash MPLBACKEND=Agg PYTHON_JULIACALL_HANDLE_SIGNALS=yes \ python benchmarks/bench_vs_qutip.py \ --repeats 3 \ --time-points 81 \ --t-final 6.0 \ --cases jc20 jc40 \ --oqs-methods auto ode krylov \ --json runs/benchmarks/bench_vs_qutip_larger_startup_patch.json .. list-table:: :header-rows: 1 * - Case - Dimension - QuTiP median - OQS auto median - Best OQS median - OQS auto vs QuTiP - Max expectation delta * - Jaynes-Cummings 20 - 40 - 3.91 ms - 2.95 ms - 2.95 ms (``auto``) - 1.32x - 1.96e-08 * - Jaynes-Cummings 40 - 80 - 13.93 ms - 10.38 ms - 8.52 ms (``ode``) - 1.34x - 4.71e-08 Monte Carlo Wave Functions: OpenQuantumSim vs QuTiP --------------------------------------------------- Command: .. code-block:: bash PYTHON_JULIACALL_HANDLE_SIGNALS=yes JULIA_NUM_THREADS=4 \ python benchmarks/bench_mcsolve_vs_qutip.py \ --n-traj 50 200 1000 \ --time-points 31 \ --t-final 2.0 \ --max-step 0.02 \ --repeats 3 \ --json runs/benchmarks/bench_mcsolve_vs_qutip_m1_2026-05-22.json Settings: spontaneous-emission qubit, ``gamma=0.35``, one excited-state projector, QuTiP ``mcsolve`` with progress disabled, OpenQuantumSim ``mcsolve`` with ``n_jobs=-1`` and four Julia threads. .. list-table:: :header-rows: 1 * - Trajectories - QuTiP median - OQS median - OQS backend wall time - Workers - OQS vs QuTiP - OQS backend vs QuTiP * - 50 - 8.85 ms - 1.79 ms - 0.46 ms - 4 - 4.96x - 19.06x * - 200 - 33.58 ms - 3.22 ms - 1.71 ms - 4 - 10.44x - 19.67x * - 1000 - 168.01 ms - 18.60 ms - 17.25 ms - 4 - 9.03x - 9.74x Interpretation: for this MCWF smoke benchmark, threaded backend-side aggregation gives OpenQuantumSim a clear trajectory-throughput advantage over QuTiP after backend warmup. The exact speedup is workload-specific and should be re-measured for larger Hilbert spaces and more expensive observables. Monte Carlo Wave Function Scaling --------------------------------- Command: .. code-block:: bash JULIA_NUM_THREADS=4 MPLCONFIGDIR=/private/tmp/oqs-mpl \ python benchmarks/bench_mcsolve.py \ --n-traj 200 \ --time-points 31 \ --t-final 2.0 \ --max-step 0.02 \ --repeats 3 \ --warmup-trajectories 10 \ --n-jobs 1 -1 \ --json runs/benchmarks/bench_mcsolve_m1_2026-05-14.json .. list-table:: :header-rows: 1 * - ``n_jobs`` - Workers - Threaded - Median elapsed - Backend wall time - Speedup vs serial - Max expectation delta * - 1 - 1 - False - 6.595 ms - 3.868 ms - 1.00x - 1.00e-02 * - -1 - 4 - True - 4.092 ms - 1.542 ms - 1.61x - 1.00e-02 Interpretation: backend-side trajectory aggregation and threading work. The small benchmark shows useful scaling, though the wall time is still heavily affected by Python-call overhead at this size. Larger trajectory counts should give a clearer measure of Julia-side scaling. Dicke Mutual-Information Batch Runner ------------------------------------- Command: .. code-block:: bash JULIA_NUM_THREADS=1 MPLCONFIGDIR=/private/tmp/oqs-mpl \ python examples/dicke/bench_mi.py \ --N 4 \ --kappa 0.1 \ --n-traj 12 \ --time-points 21 \ --t-final 0.2 \ --max-step 0.02 \ --batch-size 2 \ --repeats 2 \ --warmup-trajectories 1 \ --n-jobs 1 2 \ --target-n-traj 1000 \ --json runs/benchmarks/bench_dicke_mi_m1_2026-05-14.json .. list-table:: :header-rows: 1 * - ``n_jobs`` - Workers - Median elapsed - Trajectories / s - Seconds / trajectory - Speedup vs serial * - 1 - 1 - 0.0836 s - 143.46 - 0.0070 - 1.00x * - 2 - 2 - 19.7627 s - 0.607 - 1.6469 - 0.004x Interpretation: this small Dicke MI benchmark exposes process-startup overhead. Each short-lived worker initializes its own Julia backend, so process parallelism is slower for small batches. Larger batches better amortize startup costs. Reproducing Results ------------------- The raw JSON outputs are generated under ``runs/benchmarks/`` and are ignored by Git. Re-run the commands above to regenerate the local benchmark artifacts.