Performance Benchmarks

This page records a reproducible OpenQuantumSim performance baseline. The numbers below are not a universal leaderboard; they document one hardware and software configuration so future benchmark results can be compared against the same reference point.

OpenQuantumSim benchmark summary comparing deterministic QuTiP speedups and MCWF backend threading.

Benchmark Environment

Item

Value

Date

2026-05-22

OpenQuantumSim commit

startup optimization snapshot

CPU

Apple M1

Logical CPU count

8

Platform

macOS 26.4.1 arm64

Python

3.14.3

Julia backend runtime

1.11.9 through JuliaCall

julia --version

1.12.5

OpenQuantumSim

0.1.0a2

QuTiP

5.2.3

NumPy / SciPy / h5py

2.4.4 / 1.17.1 / 3.16.0

Deterministic Solver: OpenQuantumSim vs QuTiP

Command:

MPLCONFIGDIR=/private/tmp/oqs-mpl \
python benchmarks/bench_vs_qutip.py \
    --repeats 5 \
    --time-points 81 \
    --t-final 6.0 \
    --cases qubit jc5 jc10 \
    --oqs-methods auto krylov ode \
    --json runs/benchmarks/bench_vs_qutip_after_stats.json

Settings: rtol=1e-8, atol=1e-10. OpenQuantumSim used the default single-threaded backend process for this deterministic benchmark.

Case

Dimension

QuTiP median

OQS auto median

Best OQS median

OQS auto vs QuTiP

Max expectation delta

Qubit decay

2

1.43 ms

1.00 ms

0.77 ms (ode)

1.42x

7.49e-09

Jaynes-Cummings 5

10

2.18 ms

1.14 ms

1.14 ms (auto)

1.90x

1.21e-09

Jaynes-Cummings 10

20

6.71 ms

2.60 ms

2.08 ms (ode)

2.58x

7.23e-09

Interpretation: after reducing solver-stat conversion overhead at the Python-Julia boundary, OpenQuantumSim is faster than QuTiP for these small deterministic benchmark cases on this machine. Expectation values agree with QuTiP at about 1e-9 to 1e-8.

Python Wrapper Profile

The main small-system bottleneck was not the Julia integrator. It was repeated Python-side probing of optional fields in the Julia NamedTuple used for Result.solver_stats. The conversion now uses the fields reported by dir(...) and avoids exception-heavy lookups for fields that are not present.

Profile

Workload

Python-visible cumulative time

Solver-stat conversion time

Before

100 warm qubit-decay mesolve calls

0.671 s

0.601 s

After

100 warm qubit-decay mesolve calls

0.050 s

0.007 s

Backend Startup Profile

Command:

PYTHON_JULIACALL_HANDLE_SIGNALS=yes python - <<'PY'
import time
import numpy as np
import openquantumsim as oqs
from openquantumsim._julia_bridge import get_julia, load_backend

started = time.perf_counter(); get_julia()
print("get_julia", time.perf_counter() - started)
started = time.perf_counter(); load_backend()
print("load_backend", time.perf_counter() - started)

space = oqs.SpinSpace(0.5, label="atom")
psi = oqs.basis(space, "up")
H = 0.0 * oqs.sigmaz(space)
c = np.sqrt(0.35) * oqs.sigmam(space)
e = oqs.Operator(oqs.ket2dm(psi), space, "P_excited")
t = np.linspace(0.0, 1.0, 11)

started = time.perf_counter()
oqs.mesolve(H, oqs.ket2dm(psi), t, c_ops=[c], e_ops=[e])
print("first_mesolve", time.perf_counter() - started)
PY

The profile below measures a fresh Python process after the Julia backend has already been set up once. Normal runtime loads now skip Pkg.instantiate() unless loading the backend fails; setup_julia.py still forces instantiation for installation validation.

Profile

import openquantumsim

get_julia()

load_backend()

First mesolve

Total

Before

0.786 s

2.602 s

11.337 s

7.775 s

22.500 s

After

0.334 s

3.005 s

6.139 s

7.816 s

17.294 s

The same change also suppresses routine Julia package-manager output during normal solver calls.

Larger Deterministic Spot Checks

Command:

MPLBACKEND=Agg PYTHON_JULIACALL_HANDLE_SIGNALS=yes \
python benchmarks/bench_vs_qutip.py \
    --repeats 3 \
    --time-points 81 \
    --t-final 6.0 \
    --cases jc20 jc40 \
    --oqs-methods auto ode krylov \
    --json runs/benchmarks/bench_vs_qutip_larger_startup_patch.json

Case

Dimension

QuTiP median

OQS auto median

Best OQS median

OQS auto vs QuTiP

Max expectation delta

Jaynes-Cummings 20

40

3.91 ms

2.95 ms

2.95 ms (auto)

1.32x

1.96e-08

Jaynes-Cummings 40

80

13.93 ms

10.38 ms

8.52 ms (ode)

1.34x

4.71e-08

Monte Carlo Wave Functions: OpenQuantumSim vs QuTiP

Command:

PYTHON_JULIACALL_HANDLE_SIGNALS=yes JULIA_NUM_THREADS=4 \
python benchmarks/bench_mcsolve_vs_qutip.py \
    --n-traj 50 200 1000 \
    --time-points 31 \
    --t-final 2.0 \
    --max-step 0.02 \
    --repeats 3 \
    --json runs/benchmarks/bench_mcsolve_vs_qutip_m1_2026-05-22.json

Settings: spontaneous-emission qubit, gamma=0.35, one excited-state projector, QuTiP mcsolve with progress disabled, OpenQuantumSim mcsolve with n_jobs=-1 and four Julia threads.

Trajectories

QuTiP median

OQS median

OQS backend wall time

Workers

OQS vs QuTiP

OQS backend vs QuTiP

50

8.85 ms

1.79 ms

0.46 ms

4

4.96x

19.06x

200

33.58 ms

3.22 ms

1.71 ms

4

10.44x

19.67x

1000

168.01 ms

18.60 ms

17.25 ms

4

9.03x

9.74x

Interpretation: for this MCWF smoke benchmark, threaded backend-side aggregation gives OpenQuantumSim a clear trajectory-throughput advantage over QuTiP after backend warmup. The exact speedup is workload-specific and should be re-measured for larger Hilbert spaces and more expensive observables.

Monte Carlo Wave Function Scaling

Command:

JULIA_NUM_THREADS=4 MPLCONFIGDIR=/private/tmp/oqs-mpl \
python benchmarks/bench_mcsolve.py \
    --n-traj 200 \
    --time-points 31 \
    --t-final 2.0 \
    --max-step 0.02 \
    --repeats 3 \
    --warmup-trajectories 10 \
    --n-jobs 1 -1 \
    --json runs/benchmarks/bench_mcsolve_m1_2026-05-14.json

n_jobs

Workers

Threaded

Median elapsed

Backend wall time

Speedup vs serial

Max expectation delta

1

1

False

6.595 ms

3.868 ms

1.00x

1.00e-02

-1

4

True

4.092 ms

1.542 ms

1.61x

1.00e-02

Interpretation: backend-side trajectory aggregation and threading work. The small benchmark shows useful scaling, though the wall time is still heavily affected by Python-call overhead at this size. Larger trajectory counts should give a clearer measure of Julia-side scaling.

Dicke Mutual-Information Batch Runner

Command:

JULIA_NUM_THREADS=1 MPLCONFIGDIR=/private/tmp/oqs-mpl \
python examples/dicke/bench_mi.py \
    --N 4 \
    --kappa 0.1 \
    --n-traj 12 \
    --time-points 21 \
    --t-final 0.2 \
    --max-step 0.02 \
    --batch-size 2 \
    --repeats 2 \
    --warmup-trajectories 1 \
    --n-jobs 1 2 \
    --target-n-traj 1000 \
    --json runs/benchmarks/bench_dicke_mi_m1_2026-05-14.json

n_jobs

Workers

Median elapsed

Trajectories / s

Seconds / trajectory

Speedup vs serial

1

1

0.0836 s

143.46

0.0070

1.00x

2

2

19.7627 s

0.607

1.6469

0.004x

Interpretation: this small Dicke MI benchmark exposes process-startup overhead. Each short-lived worker initializes its own Julia backend, so process parallelism is slower for small batches. Larger batches better amortize startup costs.

Reproducing Results

The raw JSON outputs are generated under runs/benchmarks/ and are ignored by Git. Re-run the commands above to regenerate the local benchmark artifacts.