Benchmarking

This section summarizes the benchmark tooling in benchmark/README.md and benchmark scripts.

Setup

julia --project=benchmark -e 'using Pkg; Pkg.instantiate()'

Main Benchmark Script

Run from repository root:

julia --project=benchmark benchmark/run_benchmarks.jl --backend=cpu --samples=20 --output=benchmark/results/cpu.csv

Run CPU and CUDA together:

julia --project=benchmark -e 'using Pkg; Pkg.add("CUDA")'
julia --project=benchmark benchmark/run_benchmarks.jl --backend=both --samples=20 --output=benchmark/results/both.csv

If CUDA is missing/inactive, CUDA cases are skipped automatically.

Useful Options

  • --backend=cpu|cuda|both
  • --samples=N
  • --evals=N
  • --seed=N
  • --quick
  • --output=PATH
  • --no-output
  • --no-gpu (alias for --backend=cpu)

Help:

julia --project=benchmark benchmark/run_benchmarks.jl --help

CUDA Batching vs Non-Batching

Dedicated comparison script:

julia --project=benchmark -e 'using Pkg; Pkg.add("CUDA")'
julia --project=benchmark benchmark/compare_batching_cuda.jl \
  --batch-sizes=1,4,8,16,32,64,128,256,512,1024 \
  --output=benchmark/results/cuda_batching_vs_no_batching.csv

This benchmark compares:

  • looping over many single-image solves (no_batch),
  • one batched solve (batched).

Benchmark Cases and Data

The main script benchmarks denoising on TestImages-based cases including:

  • cameraman
  • pirate
  • woman_blonde
  • mri-stack
  • resolution_test_1920

with controlled synthetic noise settings and deterministic RNG seeds.

Output

Results are written as CSV files in benchmark/results/, including timing statistics (median/mean/min), memory, allocations, and run metadata.

Benchmark Results Snapshot

This is the benchmark result block previously shown in README.md. Times are in milliseconds and are hardware-dependent.

Command

julia --project=benchmark benchmark/run_benchmarks.jl --backend=both --samples=10 --output=benchmark/results/both.csv

Results

BackendCaseImageDimsMedianMeanMinMemoryAllocs
cpusolve_allocatingcameraman512x512692.518700.534688.0551277528045630
cpusolvestatereusecameraman512x512749.794745.226721.041333730445603
cpusolve_allocatingpirate512x512628.141631.987618.2721277528045630
cpusolvestatereusepirate512x512622.439622.986613.514333730445603
cpusolve_allocatingwoman_blonde512x512745.798749.035740.0331277528045630
cpusolvestatereusewoman_blonde512x512816.535813.798805.887333730445603
cpusolve_allocatingmri-stack226x186x274702.7064702.7064677.9525835062474196
cpusolvestatereusemri-stack226x186x274869.6364869.6364809.036841071274163
cpusolve_allocatingresolutiontest19201920x192011679.99111679.99111679.99114974552045630
cpusolvestatereuseresolutiontest19201920x192011514.21311514.21311514.2131703432845603
cudasolve_allocatingcameraman512x51227.27227.47626.933193334454328
cudasolvestatereusecameraman512x51227.18227.17426.949199275254492
cudasolve_allocatingpirate512x51227.76827.74327.394193404854342
cudasolvestatereusepirate512x51227.47627.98227.389193116854243
cudasolve_allocatingwoman_blonde512x51227.15627.37326.851199531254571
cudasolvestatereusewoman_blonde512x51227.43927.44127.347199262454484
cudasolve_allocatingmri-stack226x186x27194.974194.945194.536276046465412
cudasolvestatereusemri-stack226x186x27197.457197.412196.523275676865291
cudasolve_allocatingresolutiontest19201920x1920384.540384.467383.904193456054404
cudasolvestatereuseresolutiontest19201920x1920384.019383.912383.151193187254317

Environment

CUDA toolchain:
- runtime 13.2, artifact installation
- driver 580.95.5 for 13.2
- compiler 13.2

CUDA libraries:
- CUBLAS: 13.1.0
- CURAND: 10.4.2
- CUFFT: 12.2.0
- CUSOLVER: 12.1.0
- CUSPARSE: 12.7.9
- CUPTI: 2026.1.0 (API 13.2.0)
- NVML: 13.0.0+580.95.5

Julia packages:
- CUDA: 5.11.0
- GPUArrays: 11.4.1
- GPUCompiler: 1.8.2
- KernelAbstractions: 0.9.40
- CUDA_Driver_jll: 13.2.0+0
- CUDA_Compiler_jll: 0.4.2+0
- CUDA_Runtime_jll: 0.21.0+0

Toolchain:
- Julia: 1.12.5
- LLVM: 18.1.7

1 device:
  0: NVIDIA GeForce RTX 3060 (sm_86, 11.626 GiB / 12.000 GiB available)