0% found this document useful (0 votes)
26 views19 pages

QOS: A Quantum Operating System: Emmanouil Giortamis Francisco Romão

research paper

Uploaded by

Adithya Shetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views19 pages

QOS: A Quantum Operating System: Emmanouil Giortamis Francisco Romão

research paper

Uploaded by

Adithya Shetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

QOS: A Quantum Operating System

Emmanouil Giortamis Francisco Romão


emmanouil.giortamis@tum.de francisco.romao@tum.de
TU Munich TU Munich
Germany Germany

Nathaniel Tornow Pramod Bhatotia


nathaniel.tornow@tum.de pramod.bhatotia@tum.de
TU Munich and Leibniz Supercomputing Centre TU Munich
Germany Germany
arXiv:2406.19120v1 [quant-ph] 27 Jun 2024

Abstract (e.g., GPU, TPU, FPGA) and manage them as accelerator as-
We introduce the Quantum Operating System (QOS), a uni- a-service to offload compute-intensive tasks. We argue that
fied system stack for managing quantum resources while this approach might be sub-optimal or even flawed!
mitigating their inherent limitations, namely their limited The reason is that QPUs present fundamentally unique
and noisy qubits, (temporal and spatial) heterogeneities, and hardware-level challenges that the systems community has
load imbalance. QOS features the QOS compiler—a modular not considered and cannot be directly mapped to classical
and composable compiler for analyzing and optimizing quan- accelerator-oriented computing (we empirically detail these
tum applications to run on small and noisy quantum devices hardware challenges in § 3). In particular, QPUs operate in the
with high performance and configurable overheads. For scal- NISQ-fashion (Noisy Intermediate-Scale Quantum [64]), lead-
able execution of the optimized applications, we propose the ing to a non-deterministic computing platform, where even
QOS runtime—an efficient quantum resource management two QPUs with identical qubits exhibit completely different
system that multi-programs and schedules the applications behaviors across space and time [57, 71].
across space and time while achieving high system utilization, More specifically, QPUs are inherently noisy and small in
low waiting times, and high-quality results. computational capacity [64], which limits the size of the prob-
We evaluate QOS on real quantum devices hosted by IBM, lems they can solve. Second, the degree of noise differs across
using 7000 real quantum runs of more than 70.000 bench- QPUs, even of identical architecture and model, making it diffi-
mark instances. We show that the QOS compiler achieves cult to decide which QPUs should execute a quantum program
2.6–456.5× higher quality results, while the QOS runtime without compromising performance [72]. In addition, we can
further improves the quality by 1.15–9.6× and reduces the not trivially multi-program multiple quantum programs on
waiting times by up to 5× while sacrificing only 1–3% of re- the same QPU to increase utilization since QPU qubits can in-
sults quality (or fidelity). terfere with each other in undesirable and unpredictable ways
[52], severely degrading performance [47]. Finally, it is gen-
erally impossible to save or copy a quantum program during
1 Introduction execution [56], which further limits scheduling opportunities
Quantum Cloud Computing. Quantum computing promises for preemption or resource sharing in general.
to solve computationally intractable problems with classical State-of-the-Art of Quantum Software Systems. The cur-
computers [2, 21]. Thanks to remarkable technological ad- rent state of software can be roughly compared to IBM main-
vances in materials science and engineering [31, 80], quan- frame batch OSes from the 60s, where the QPUs are managed
tum hardware has become a reality in the form of quan- through rudimentary interfaces. Researchers have proposed
tum processing units (QPUs) that consist of quantum bits specialized approaches to address some of the aforementioned
(qubits) [37]. Interestingly, QPUs are now readily available OS and QPU challenges individually, for instance, perfor-
in a quantum-as-a-service fashion offered by all major cloud mance [84], multi-programming [17], or scheduling [73]. Un-
providers [3, 5, 28, 37]. fortunately, these proposed approaches are designed to solve
While quantum hardware is now a reality, the associated an individual issue, which prevents them from being com-
quantum software systems are rudimentary. These QPUs face posed together or with other OS mechanisms to create a holis-
classical OS challenges that our systems community has tack- tic software stack. To leverage quantum computing practically,
led in the past, including scalability, performance, efficiency, we must address the key challenge of combining such mech-
faults (a.k.a. errors), scheduling, and utilization [10]. Unfortu- anisms in a unified software stack for quantum computing.
nately, no operating system exists to tackle these challenges Novelty. However, designing a unified system stack that sup-
holistically for modern quantum hardware. ports general OS abstractions while addressing the QPU chal-
Fundamental Challenges of QPUs. A natural tendency lenges is not trivial. The system should support cross-stack
would be to treat these QPUs as yet another accelerator class
1
1-qubit gate 2-qubit gate measurement noisy gates

0
probability distribution 0

noisy measurement
over measurements
1 2
1 2
qubits crosstalk
3
3
4 5 decoherence
4 5
time (b) physical QPU layout
(a) input problem graph (a) quantum circuit (c) physical circuit after tranpilation
(b) quantum circuit (c) result of circuit execution (d) MaxCut result (IBM Falcon)
for MaxCut

Figure 1. Example of a typical quantum algorithm (§ 2.1) Figure 2. Technical Foundations (§ 2.2) (a) The quantum
(a) Input graph for max-cut. (b) The quantum circuit circuit of Figure 1. (b) The physical layout of an IBM Falcon
encoding the formulation of max-cut for the graph. (c) The QPU. (c) The transpiled circuit with the QPU’s noise sources.
result of circuit execution is a probability distribution of properties by 51% on average, which leads to 2.6–456.5× im-
bitstrings. (d) The result of (c) is interpreted as a max-cut
between vertices {0,1,4} and {2,3,5}. provement in the quality of the results, depending on the
problem size (§ 9.2). The QOS runtime increases the quality of
software mechanisms, from the compiler level for quantum the results by 1.15–9.6× for the same target utilization (§ 9.4)
program optimization to the runtime level for QPU resource and reduces the waiting times by 5× while sacrificing at most
management. More specifically, we require a modular and 3% of the quality of the results (§ 9.5).
extensible compiler infrastructure for increasing execution
quality, multi-programming for increased utilization, and
scheduling for load balancing, all in the presence of QPU 2 Background
noise and heterogeneities. This way, the system can achieve 2.1 Quantum Computing 101: An Example
the cloud users’ goals, i.e., high-quality quantum computa- Let us understand the basics of quantum computing using
tion and low waiting times, and the quantum cloud operator’s the classic max-cut problem. This simple combinatorial op-
goals, i.e., QPU resource efficiency and scalability. timization problem is expressed in the quantum world as the
QOS: A Unified System Stack for Quantum Computing. Quantum Approximate Optimization Algorithm (QAOA) [21].
We propose QOS, an end-to-end system for holistically tack- Figure 1 shows a high-level example of how QAOA solves a
ling quantum computing challenges. QOS provides a unified max-cut problem of the input graph of (a). To solve it, the prob-
architecture for supporting compiler and OS mechanisms lem must be first encoded as a quantum circuit (Figure 1 (b)),
with pluggable and configurable policies. In QOS, we imple- which consists of quantum bits (qubits) and quantum gates
ment such policies to achieve the aforementioned users’ and that exhibit quantum mechanical properties. Here, we use as
operator’s goals. To achieve this, QOS builds on a unified many qubits as the number of nodes of the input graph, where
abstraction and comprises two main components: each qubit 𝑞𝑖 corresponds to a graph vertex 𝑖. To change the
• The Qernel Abstraction: We introduce the Qernel state of the qubits, we apply quantum gates over time, from
abstraction that acts as a common denominator for the left to right. There are two types of gates: 1-qubit gates (e.g.,
QOS mechanisms to apply their policies (§ 5.1). A Qernel NOT gate) and 2-qubit gates (e.g., XOR gate). Finally, at the
contains the Qernel intermediate representation (QIR) end of the circuit, we measure each qubit to read its value (0
and static and dynamic properties, leveraged by the or 1), which gives bitstrings as output.
QOS components to apply their policies. Unlike classical circuits, which operate deterministically,
• QOS Compiler: We introduce the QOS compiler (§ 5, quantum circuits are inherently probabilistic. The reason is
6, 7), an extensible and modular compiler workflow that qubits exhibit quantum mechanical properties, such as
that leverages the QIR and static properties to optimize superposition. In the superposition state, the qubit is not 0
quantum programs for increased execution quality. or 1, but it is both simultaneously (recall Schrodinger’s cat
• QOS Runtime: We present the QOS runtime (§ 8), a experiment [77]). Therefore, quantum gates also have prob-
scalable system for QPU resource efficiency. The system abilistic effects; we can’t know the result until the final mea-
offers automated QPU selection to abstract heterogene- surements (i.e., open the box and check the cat’s state). To
ity away, multi-programming to increase QPU utiliza- obtain a meaningful result, we execute the circuit in many
tion, and load-aware scheduling to achieve low waiting trials (“shots”), with each trial providing a specific bitstring
times while maintaining high execution quality. from the qubit measurement. The solution of the quantum
We implement QOS in Python by building on the Qiskit calculation is, therefore, a probability distribution over all
framework [65]. We evaluate QOS on IBM’s 27-qubit QPUs possible bitstrings of the measured qubits (Figure 1 (c)).
[37], using a dataset of more than 7000 quantum runs and In our example, the result of the final execution of the quan-
70.000 state-of-the-art quantum benchmark instances used tum circuit gives a probability distribution that represents
in popular quantum algorithms [42, 67, 89]. Our evaluation the solutions of the max-cut problem. High probability maps
shows that the QOS compiler improves the quantum program to the solution, while low (∼ 0) does not represent a solution.
Figure 1 (d) shows a solution for our example. It corresponds
2
Higher is better
(a) Fidelity (b) Spatial Heterogeneity
1.0 0.96 0.78 0.41 0.12 0.04 0.01 0.63 0.70 0.54 0.54 0.52 0.72

0.8
Fidelity

0.6
0.4
0.2
0.0
4 8 12 16 20 24

iro

ta

ai

d
no
Number of Qubits

ier

lan
mb
lka
ca

ha

alg

ck
ko

mu
QAOA-R3 GHZ QAOA-P1 TL-1 VQE-1 W-STATE

au
BV HS-1 QSVM IBM QPU

Figure 3. (a) Challenge #1, Fidelity (§ 3.1). Impact of the number of qubits (circuit size) on fidelity. There is an average 98.9%
reduction in fidelity from 4 to 24 qubits. (b) Challenge #2, Spatial heterogeneity (§ 3.2) Fidelity of a 12-qubit GHZ circuit on
different IBM QPUs. There is a 38% fidelity difference from best to worst QPU.

to the bitstring with the highest probability, 110010, which QPU Heterogeneity. Additionally, QPUs are vastly hetero-
means that we have measured 1 for the qubits 𝑞 0 , 𝑞 1 , and 𝑞 4 ; geneous across space and time, unlike classical accelerators.
therefore, a partition contains vertices {0,1,4}. Across space, QPUs vary in terms of technology, e.g., super-
conducting qubits [28, 37] or trapped ions [35], architectures
of the same technology, e.g., Falcon or Osprey superconduct-
2.2 Technical Foundations ing QPUs [37], and noise properties even for the same archi-
Execution Model. The technology and engineering required tecture [27], e.g., two identical QPUs exhibit different noise
to build QPUs renders them an expensive resource, thus, QPUs errors, etc. Across time, the QPUs are calibrated regularly to
are mainly offered in the cloud as a quantum-as-a-service maintain their performance [36, 90, 94], a process that gen-
model [3, 28, 37]. To run quantum programs, users typically erates calibration data. These data quantify the noise errors,
write circuit-level code (Figure 2 (a)), which then transpile and change after each calibration cycle unpredictably.
on the QPU to make it executable, send it to the cloud for Execution Quality. Lastly, to measure the quality of a circuit
execution, and finally get the results back. Specifically, the execution on NISQ QPUs, we use the fidelity metric [22],
transpilation process performs three key steps: (1) converting which measures the similarity between the noisy probability
the gates of the circuit to the native gate set of the QPU, (2) distribution and the ideal probability distribution that noise-
mapping the logical qubits of the circuit to the physical qubits less, ideal QPUs can obtain. Fidelity is a number in the [0,1]
of the QPU, (3) routing the qubits to the physical qubits with range, where a higher fidelity means a better quality result.
restrictive connectivity by inserting SWAP gates. Figure 2 (b)
shows the physical layout of an IBM Falcon QPU. Vertices are 3 Motivation and Key Ideas
the physical qubits, and the edges capture their connectivity, To motivate QOS, we present a set of unique challenges that
i.e., between which qubits we can apply 2-qubit gates. Figure distinguish QPUs from classical accelerators. We categorize
2 (c) shows the physical circuit after transpilation with the our findings into four challenges that must be addressed to
QPU’s noise characteristics, which we detail next. improve the practicality of quantum computing: fidelity, uti-
QPU Characteristics. Today’s QPUs are described as noisy lization, spatial and temporal heterogeneities, and load im-
intermediate-scale quantum (NISQ) devices [64] since they balance. The experimental methodology used is the same for
exhibit low qubit numbers (e.g., up to a few 100s [37]) and the final system evaluation and is explained in detail in § 9.1.
are susceptible to hardware and environmental noise. Specif-
ically, when measuring a qubit, there is a chance to read the 3.1 Fidelity
opposite value, and when applying gates, there is a chance Executing quantum programs with high fidelity is challenging
the gate performs a wrong operation [27]. On top of that, since QPUs are characterized by relatively small numbers of
when qubits are left idle (no gates applied) for more than a few qubits and noise, which leads to computation errors (§ 2.2). As
hundred microseconds, the superposition decoheres to the |0⟩ the number of qubits and gates in a quantum circuit increases,
state [39], similar to resetting a register to 0. Lastly, qubits the noise errors accumulate and the overall fidelity decreases.
destructively interfere with each other via crosstalk effects Results. Our results are highlighted in Figure 3 (a). The x axis
[12]. Figure 2 (c) shows qubits 𝑄 2 and 𝑄 3 that influence each depicts the circuit size as the number of qubits while the y axis
other via crosstalk, noisy gates, qubit 𝑄 5 that is left idle for shows the fidelity, where higher is better. The experiment is
long enough to decohere, and noisy measurements. run on the IBM Kolkata 27-qubit QPU. For each increase in
3
Higher is better Higher is better Equal is better
(a) Temporal Variance (b) Utilization 4 (c) QPU Load
1.0 100 10

Number of Pending Jobs


0.8 80 3
10

Utilization [%]
0.6 60
Fidelity

2
10
0.4 40
1
0.2 20 10
0
0.0 0 10
0 20 40 60 80 100 120 3 BV GHZ HS-1 A-P1 SVM TL-1 QE-1 TATE s bi th rs d ro oi ta ai e o a e
Calibration Day O A-R O Q V W-S lagonairo peralgiuecklan cai hankolkma umbrisbancuscnazrcbrook
QA QA a b she
Benchmark IBM QPU

Figure 4. (a) Challenge #2, Temporal variance (§ 3.2) Fidelity of a 6-qubit GHZ circuit on IBM Perth, across 120 calibration
days. There are 20 pairs of days with more than 5% difference in fidelity. (b) Challenge #3, Utilization (§ 3.3) Maximum utilization
achieved on a 27-qubit QPU for nine benchmarks while maintaining at least 0.75 fidelity. The average utilization is 26.3%, and
the max is 29.6%. (c) Challenge #4, QPU Load (§ 3.4) Number of pending jobs on different IBM QPUs. The groups separated
by vertical red lines indicate QPUs of the same size. There is up to 57× difference in number of jobs between QPUs of the same size.

qubits, the average fidelity decreases, up to 98.9% from 4 to Implications. Due to structural differences across QPUs,
24 qubits. Moreover, it is physically impossible to run circuits quantum circuits perform differently across them. Addition-
with a size larger than 27 qubits, since we cannot map them. ally, there is a high degree of temporal performance variance
Implication. NISQ devices are limited due to size and noise across calibration cycles, as the fidelity might change signif-
and, therefore, cannot be practically used for large quantum icantly from day to day with no discernible pattern.
circuits; either logically, because the circuit doesn’t fit in the Key Idea #2: Performance Estimation: We estimate a cir-
device, or the execution results would be convoluted from cuit’s potential performance on the available QPUs to auto-
noise errors, which translates to low fidelity. matically select the best-performing candidate(s).
Key Idea #1: Circuit Optimizations: To increase fidelity,
we need a generic optimization infrastructure that transforms 3.3 Utilization
circuits into a physically and practically executable size. The fidelity of circuits decreases as their size increases (§ 3.1),
and as a result, it becomes more challenging to utilize a QPU
effectively. In contrast to the classical domain, where a CPU
can be fully utilized, to get high-fidelity results in the quantum
3.2 Spatial and Temporal Heterogeneity domain, we necessarily under-utilize QPUs.
In the classical domain, two identical CPUs perform similarly Results. Figure 4 (b) shows the maximum utilization of the
for all applications, and at each point in time. In contrast, IBM Kolkata 27-qubit QPU for nine benchmarks while main-
QPUs exhibit differences in the layout and connectivity of taining at least 0.75 fidelity. No benchmark exceeds 30% uti-
qubits [30] and variations in noise errors even for QPUs of lization, while the average is 26.3%. Higher fidelity values
the same model, which leads to spatial performance variance. would yield even lower utilization and vice-versa.
Moreover, QPUs are calibrated regularly (§ 2.2), and after each Implications. There is a tradeoff between QPU utilization
calibration, the noise properties change [94]. As a result, the and performance (fidelity). In general, the lower utilization,
execution fidelity can vary across different calibration cycles, the higher fidelity, and vice-versa. In contrast to the classical
leading to temporal performance variance. domain, the tension between these objectives is vastly larger.
Results. Figure 3 (b) shows a 12-qubit GHZ circuit’s fidelity Key Idea #3: Multi-programming: We spatially multiplex
on different IBM QPUs. Fidelity varies across the QPUs, with quantum circuits to increase system utilization (also known
a maximum difference of 38% from best to worst. Note that as multi-programming [17]), and when combined with circuit
all six QPUs are of the same model (Falcon r5.11). optimizations, it also increases fidelity.
Figure 4 (a) shows a 6-qubit GHZ circuit’s fidelity over
120 calibration days executed on the IBM Perth 7-qubit QPU, 3.4 QPU Load Imbalance
where each data point represents a single day’s fidelity. The The quantum cloud faces QPU load imbalance. The root cause
largest single-day difference in fidelity is 96.5%, and there is spatiotemporal heterogeneity (§ 3.2), combined with the
are 20 instances of a single-day fidelity drop of more than manual QPU selection offered by the current quantum cloud
5%. Note that there is no way of predicting a QPU’s future model [37]. This leads to users selecting the “best performant”
calibration data to expect such performance differences. QPU based on empirical or arbitrary metrics [71].
4
Results. Figure 4 (c) shows the average number of pending QOS Compiler
jobs for different IBM QPUs across October 2023. The groups
Frontend: Analyzer
of QPUs (separated by the red dashed line) have a size of 7,
2 Qernel-IR
27, and 127 qubits, respectively. There is a 49×, 57×, and 5.7× - optimization target
Middle-end: Optimizer
- optimization budget
maximum load difference across the groups, respectively. 3 optimized Qernel(s)
Implications. Load imbalance leads to long waiting times 1 large circuit &
optimization options Backend: Virtualizer
for the users and thus, low quality of service. Additionally,
there is no 1-1 mapping between the load and performance 4 target QPU-optimized and
instantiated sub-Qernel(s)
differences between QPUs. For instance, the 12-qubit GHZ
circuit in Figure 3 (b) performs 1.1× better on IBM Hanoi than QOS Runtime
IBM Cairo, yet the former exhibits 57× higher load. Knitter Estimator
Key Idea #4: Load-aware Scheduling: We schedule (tem- 9
unbundled 5 fidelity
10 final results results estimation
porally multiplex) quantum circuits in a load-aware manner
Multi-programmer
to balance the tradeoff between fidelity and waiting times.
6 bundled Qernels
8 bundled
results Scheduler
4 Overview
Quantum computing is characterized by four main challenges 7 schedule & run

that limit its practicality: (1) Execution fidelity is hindered QPUs Classical nodes
by the small and noisy QPUs. (2) In contrast to classical ac- ...
celerators, QPUs exhibit vast spatiotemporal heterogeneities,
which renders their performance non-deterministic in both
dimensions. (3) QPUs are heavily underutilized to give high- Figure 5. QOS overview (§ 4): QOS consists of two main
components: the QOS compiler (top) and the QOS runtime
fidelity results. (4) QPUs face vast load imbalance, which leads
(bottom). Below QOS lie the QPU devices and classical nodes.
to prolonged waiting times for the users.
Existing work is narrow-scoped and focuses on tackling propose the Qernel abstraction that acts as a common denom-
one challenge at a time, but unfortunately, there are two main inator for the QOS mechanisms to apply their policies.
issues with this point solution approach. Firstly, composing QOS Compiler. We propose the QOS compiler (Figure 5,
the individual mechanisms to address all challenges at once top), a modular, extensible, and composable compiler infras-
is impossible without a common and unified infrastructure. tructure. It comprises three stages: (1) The frontend of the
Secondly, without synergies between the individual mecha- compiler, the analyzer (§ 5.3), accepts quantum circuits and
nisms, it is hard to maximize the objectives of the users, i.e., lifts them to the Qernel abstraction, generates the interme-
high fidelity and low waiting times, and the objectives of the diate representation (IR), and performs IR analysis passes to
quantum cloud operator, i.e., resource efficiency. generate the IR static properties required by the next stages.
To this end, we propose QOS, an end-to-end system that (2) The middle-end, the optimizer (§ 6), is an extensible and
tackles the challenges of quantum computing holistically. composable set of optimization passes that leverages the IR
QOS strives for three design goals: (1) A unified architecture and static properties to improve the execution fidelity of the
that supports compiler and OS mechanisms with pluggable quantum circuits with manageable overheads. (3) The back-
policies and tunable configuration for managing the tradeoffs end, the virtualizer (§ 7), compiles the optimized Qernels for
of QC. (2) QOS should enable the execution of large quantum the target QPUs, similar to classical target code generation.
circuits with high fidelity and scale with increasing incoming QOS Runtime. We propose the QOS runtime (Figure 5, bot-
workloads and additional QPUs. (3) QOS should be resource tom), a system that abstracts away the underlying heterogene-
efficient by achieving high QPU utilization and balancing ity and balances the tradeoff between the conflicting objec-
QPU load to minimize waiting times. tives of the cloud operator (resource efficiency) and the users
(high fidelity and low waiting times). The runtime comprises
4.1 The QOS Architecture four components: (1) The estimator predicts the fidelity of ex-
Figure 5 shows the overview of our system’s design. QOS ecuting the optimized Qernels to guide scheduling decisions.
comprises a layered architecture that consists of two main (2) The multi-programmer, given the estimations, bundles low
components: the QOS compiler (top) and the QOS runtime utilization Qernels to increase QPU utilization. (3) The sched-
(bottom), which we detail next. uler multiplexes and runs the Qernels across space and time
Qernel Abstraction. QOS implements a wide range of mech- with the objective to maximize fidelity and minimize wait-
anisms with different abstraction requirements, from the com- ing times. Finally, (4) the knitter post-processes the Qernel
pilation to the execution runtime level. To enable the com- execution results to return the final result to the user.
posability of these mechanisms in a unified architecture, we
5
Properties
2
Static Dynamic
1 - size: 4 - status: void
2 - depth: 4 - results: void
- gates: 5 - estims: void

(a) Quantum circuit (b) QIR Pass (c) Refined QIR Pass (d) Analysis Passes

Figure 6. Compiler Frontend: Analyzer (§ 5). (a) An example quantum circuit of 4 qubits and 5 gates. (b) The Qernel intermediate
representation (QIR) (§ 5.1). (c) The refined QIR (§5.1). (d) The Qernel’s static and dynamic properties (§5.2). During compilation,
the dynamic properties are void, but they are initialized and used during the runtime.

4.2 Execution Workflow Formally, a QIR is a directed acyclic graph (DAG) 𝐺 = (𝑉 ,𝐸),
First, users submit a circuit along with their optimization where 𝑉 is the set of gates and every edge 𝑒𝑖 ∈ 𝐸 is the qubit
target and budget 1 . The former represents the desired post- the gate acts on. The edges’ directions reflect dependencies
compilation circuit size, and the latter quantifies the additional 𝐷 = {(𝑉𝑖 ,𝑉𝑗 ) ∈𝑉 ×𝑉 } between gates, i.e., 𝑉𝑖 must be scheduled
overheads the user is willing to pay. The compiler’s frontend before 𝑉𝑗 . To identify hotspot nodes, we compute the degree
lifts the circuit to the Qernel abstraction and generates the of a node 𝑑𝑒𝑔(𝑉𝑖 ), which reflects the number of control flow
IR and its static properties 2 . The middle-end optimizes the paths the gate 𝑉𝑖 is part of. In the example of Figure 6, the
Qernels through a modular set of passes 3 , then the back- QIR reveals four layers of gates: 𝑙 1 : (𝑔0,𝑔1 ), 𝑙 2 :𝑔2 , 𝑙 3 : (𝑔3,𝑔4 ),
end generates the target QPU-optimized Qernels and submits and 𝑙 4 : 𝑀, which means that they have to be scheduled in
them to the QOS runtime 4 . The estimator predicts the fi- this order, and the pairs in the same layers are susceptible
delity of running the Qernel(s) on the QPUs to guide sched- to crosstalk noise (§ 2). Finally, the gate 𝑔2 is a hotspot node
uling 5 . The multi-programmer bundles Qernels with low since its degree is 4, the highest in 𝑉 (the measurements 𝑀
utilization and sends them to the scheduler 6 . The scheduler are terminal nodes and do not count).
assigns and runs the bundled Qernels, optimizing for maximal Refined QIR. Various optimizations require a simplified
fidelity and minimal waiting times 7 . After the execution, representation that captures only the connectivity between
the bundled results are retrieved by the multi-programmer qubits. For instance, the connectivity structure might reveal
8 to be unbundled into separate results and are sent to the hotspot qubits [4] that can be removed for fidelity improve-
knitter 9 . Finally, the knitter post-processes the separated ment or opportunities for circuit cutting (§ 6). Figure 6 (c)
results pass and returns them to the user 10 . shows the refined QIR, where we can see that 𝑞 1 and 𝑞 2 are
only connected by a single gate, and removing it would split
the circuit into two smaller circuits. Formally, a refined QIR is
5 Compiler Frontend: Qernel & Analyzer an acyclic, undirected, and weighted graph 𝐺 = (𝑉 ,𝐸), where
5.1 The Qernel Abstraction 𝑉 is the set of qubits and every edge 𝑒𝑖 ∈ 𝐸 between two qubits
The Qernel is the unified abstraction acting as a common 𝑉𝑗 ,𝑉𝑘 has a weight 𝑤𝑖 ∈ N that represents the number of gates
denominator for the QOS components. Specifically, a Qernel that act on 𝑉𝑗 and 𝑉𝑘 .
contains (1) the graph-based IR used by the QOS compiler and
(2) the Qernel properties, which comprises static IR properties 5.2 Qernel Properties
and dynamic properties used by the QOS runtime. For applying the diverse set of its mechanisms, QOS requires
Qernel Intermediate Representation (QIR). Existing op- data structures that keep up-to-date information about the
timization techniques operate at the gate level, analogous to quantum programs. Such information includes (1) the static
the instruction level in the classical domain. Therefore, we properties, which are useful for the compiler, and (2) dynamic
propose a graph-based Qernel Intermediate Representation properties, which are useful for the runtime.
(QIR) that captures the control flow of a quantum program, Static Properties. Apart from the IR, optimization passes
similar to the control flow graph of classical programs. By leverage circuit properties to be more efficient and effective.
traversing the QIR, the compiler can identify important opti- The properties include the circuit’s size (number of qubits),
mization opportunities, such as pairs of gates that cancel each depth, non-local gates and their types, the number of mea-
other (like dead code elimination), gate dependencies (useful surements, and others (Figure 6 (d)). Additionally, we include
for gate scheduling, similar to instruction scheduling), or op- the features vectors defined in [89] since they are potentially
portunities to remove hotspot gates (gates that contribute to useful for heuristic-based optimizations or regression-based
noise errors in the computation). An example QIR is shown prediction models [68].
in Figure 6 (b), where the quantum circuit consisting of four Dynamic Properties. The Qernel also contains dynamic
qubits and five gates is lifted to the QIR. properties required by the QOS runtime. These include the
6
Qernel’s execution status (done, failed, running, sched- Such optimization passes are the circuit compaction tech-
uled), the estimator’s output, i.e., fidelity estimations (§ 8.1), niques that, as the name suggests, reduce the circuit size, i.e.,
and the final post-processed results (Figure 6 (d)). the number of qubits, rendering it executable on small QPUs
and at the same time, also simplify the circuit structure, i.e., re-
5.3 Frontend: Analyzer move noisy gates. Notably, we can compose more than one of
To discover and leverage optimization opportunities, we first these techniques to achieve even better results. The compiler’s
need to perform circuit analysis, similar to classical program middle-end, the optimizer, is a composable pipeline of trans-
analysis. The analyzer transforms a quantum circuit into a formation passes that compact the QIR to increase the scalabil-
Qernel and comprises an extensible set of passes that generate ity of quantum circuits running with high execution fidelity.
the QIR and static properties of the Qernel, which are then Challenges. However, it is not trivial to implement such
used by the optimizer and subsequently by the runtime. a pipeline. Currently, there is a plethora of individual com-
QIR Transformation Pass. The first step in program anal- paction techniques that require their own sub-systems to
ysis and optimization is to generate the QIR, implemented operate, with no common infrastructure to compose them.
by the QIR transformation pass. Figure 6 (a)-(b) shows the Additionally, as we will show later, some techniques spawn
generation of the QIR for an example quantum circuit. To an exponential number of sub-circuits (Table 1) and, after exe-
generate the QIR, the pass iterates over each logical qubit of cution, require post-processing using classical hardware. We
the circuit and each gate acting on that qubit. For each such pose two questions: (1) How are the (exponentially) spawned
gate, it creates a QIR vertex𝑔𝑖 and sets the qubits 𝑞 𝑗 ,𝑞𝑘 the gate circuits from different techniques handled? (2) How can we
acts on as the edges 𝑒 𝑗 ,𝑒𝑘 of 𝑔𝑖 . By convention, the direction of manage the tradeoff between fidelity improvement and expo-
the edge follows the direction from the control to the target nential overheads from different techniques?
qubits. When reaching measurement operations, it simply Our Approach. To this end, we design our optimizer with
the goals of providing (1) a unified infrastructure for plug-
adds the terminal nodes 𝑀 . This process is repeated until all
gable compaction mechanisms and (2) tunable knobs for con-
circuit qubits and gates are covered.
figuring the tradeoff between overheads and performance
QIR Refinement Pass. To generate the refined QIR (Figure
improvement. For (1), we build and compose vastly different
6 (c)), we implement a transformation pass that traverses the
compaction techniques on the Qernel abstraction, specifically
QIR in a depth-first manner (breadth-first is equivalent). For
on the QIR and its refined form. For (2), we provide users with
each QIR node visited, i.e., a vertex 𝑔𝑖 with a pair of edges
two knobs: the optimization budget (equivalent to optimiza-
𝑞 𝑗 ,𝑞𝑘 , it checks the current refined QIR for existing nodes with
tion level) 𝑏 ∈ N and the size to reach 𝑠 ∈ N, which denotes the
the same name. If true, it increments the weight of the edges
desired post-optimization QIR size (number of qubits). Since
between the nodes by one. Otherwise, it adds 𝑞 𝑗 or 𝑞𝑘 or both
all overheads are exponential and the exponent’s base does not
as new nodes in the refined QIR and connects them with a
make a practical difference, the single budget knob, 𝑏, suffices.
weight of one.
QIR Compaction Techniques. There are two main QIR
Analysis Passes. We implement passes that analyze the QIR
compaction techniques: circuit divide-and-conquer and qubit
to identify optimization opportunities, such as gate dependen-
reuse. In the former category, the (large) QIR is cut into smaller
cies (DependencyGraphPass), hotspot nodes (HotspotNode-
fragments that are executed on small QPUs, and the execution
Pass), and graph isomorphism (IsIsomorphicPass). We also
results are merged back to a single value. Circuit cutting and
implement properties passes that traverse the QIR to generate
knitting [49, 63] belongs to this category. In our optimizer, we
the Qernel static properties (§ 5.2) and comprise the Basi-
implement a pass that automatically cuts the QIR based on
cAnalysisPass and SupermarqFeaturesPass. Specifically, the
qubits (WireCuttingPass) and a pass that cuts the QIR based
former generates the key circuit properties while the latter
on gates (GateCuttingPass). To restrict the exponential over-
computes the six feature vectors defined in [89], as explained
heads that scale with the number of cuts, we use the budget
in § 5.2. We show how the optimizer uses the information
𝑏 to cut up to 𝑏 times. At each cut location, the pass places a
obtained from these passes in § 6.
virtual gate, which must be later replaced with other gates
that simulate the effects of the original (pre-cut) gate (§ 7.1).
6 Compiler Middle-end: Optimizer We implement another circuit divide-and-conquer tech-
QPUs comprise up to only a few 100s qubits, which sets the nique, also with exponential overheads, namely the Qubit-
physical limit for circuit size and are noisy, which sets a practi- FreezingPass, which is limited to QAOA applications only [4].
cal limit to high fidelity execution (§ 3.1). To increase the scala- At a high level, it removes qubits with significantly more noisy
bility of circuits that run with high fidelity, we need a modular, gates than other qubits, i.e., hotspot qubits, along with the
extensible, and composable optimizer. Modular to support gates. This reduces the number of physical qubits required by
adding/removing optimization passes or changing their rel- an underlying (small) QPU and greatly reduces the number of
ative order, extensible to add new passes, and composable to noisy gates. We use the same budget 𝑏 = 3 to remove the nodes
chain the optimization improvements of the individual passes. with the highest degrees to restrict the exponential overheads.
7
7 qubits & 12 gates deg(q3)=6 6 qubits & 6 gates 3 qubits & 4 gates 2 qubits & 4 gates

s=2, b=3 s=2, b=2 s=2, b=0

(a) HotSpotNodePass (b) QubitFreezingPass (c) GateCuttingPass (d) QubitReusePass

Figure 7. Optimization workflow (§ 6). The initial refined QIR has 7 qubits and 12 gates. (a) The HotSpotNodePass identifies
𝑞 3 as a hotspot node with a degree of 6. (b) We optimize the refined QIR with 𝑠 = 2,𝑏 = 3 by applying the QubitFreezingPass which
reduces the qubit and gate counts to 6. (c) We apply the GateCuttingPass to further remove 2 gates. This gives two fragments
of 3 qubits each. (d) We have depleted the budget, so we use the QubitReusePass to achieve the goal of 𝑠 = 2.
Lastly, in the qubit reuse category, we implement the Qubi- Optimization Pass # ISQs Generated Post-processing
tReusePass, which “compacts” multiple logical qubits into one Wire Cutting 𝑂 (4𝑘 −8𝑘 ) 𝑂 (4𝑘 −8𝑘 )
to reduce the QIR’s size [19, 34, 75]. This process, however, Gate Cutting 𝑂 (6 )
𝑘 𝑂 (6𝑘 )
increases the QIR’s depth; therefore, the tradeoff, in this case, Qubit Freezing 𝑂 (2 )
𝑘 𝑂 (1)
is between QIR size and depth. To restrict the depth increase, Table 1. QOS Virtualizer (§ 7.1). Number of instantiated sub-
we use qubit reuse as a last resort to achieve the user’s size Qernels (ISQs) generated and post-processing complexity for
requirement (𝑠) or to render the Qernel executable by at least each optimization pass, as a function of the number of cuts 𝑘.
one QPU in the system.
Optimization Workflow. Figure 7 shows the default opti- The virtualizer consists of two stages: (1) the instantiation,
mization workflow for the refined QIR of a QAOA circuit (§ 2.1) which replaces the virtual gates from the cutting optimization
with 7 qubits and 12 gates. The optimizer aims to achieve a passes with the gates that simulate the original ones, and (2)
maximum QIR size 𝑠 = 2 with an allowed budget 𝑏 = 3. To target QPUs transpilation, which translates the high-level
achieve this, it takes the following steps: gates to the physical QPU gates and performs mapping and
Step 1: The optimizer calls the HotspotNodePass pass on the routing, as explained in § 2.2
refined QIR to find a hotspot node. The pass identifies 𝑞 3 as
a hotspot with a degree of 6 gates (Figure 7, (a)). 7.1 Instantiation
Step 2: The optimizer applies the QubitFreezingPass to re-
The circuit cutting and knitting passes we describe in § 6 cut
move 𝑞 3 and its gates. The new refined QIR size is 6 qubits
a large Qernel into sub-Qernels by analyzing the QIR, iden-
with 6 gates. Then, it updates the budget to 𝑏 =𝑏 −𝑚, where
tifying optimal cut locations, and then placing virtual gates
𝑚 = 1 is the number of qubits frozen (Figure 7, (b)).
there. However, to run the sub-Qernels, we must replace the
Step 3: The optimizer applies either gate or wire cutting. To do
virtual gates with a combination of 1-qubit gates that achieves
so, it first computes the expected cost 𝑐𝑔𝑎𝑡𝑒 ,𝑐 𝑤𝑖𝑟𝑒 , respectively,
the same computation as the original Qernel. The mapping
to achieve a circuit of size 𝑠 = 2, and selects the one with the
between virtual and 1-qubit gates depends on the chosen cut-
lowest cost 𝑐𝑚𝑖𝑛 = 𝑚𝑖𝑛(𝑐𝑔𝑎𝑡𝑒 ,𝑐 𝑤𝑖𝑟𝑒 ). In this case, the cost for
ting strategy, i.e., gate or wire cutting (§ 6). We refer to this
gate cutting is lower, so it applies the GateCuttingPass on 2
process as instantiation.
gates and updates the budget to 𝑏 = 0. The new refned QIR size
The instantiation stage takes as input the optimized sub-
is two fragments of 3 qubits and 4 gates each (Figure 7, (c)).
Qernels with virtual gates and outputs instantiated sub-Qernels
Step 4: Since 𝑏 = 0 but 𝑠𝑄𝐼𝑅 > 𝑠, the optimizer applies the
(ISQs) with the 1-qubit gates required to execute them. Sim-
QubitReusePass to achieve 𝑠 = 2. The pass identifies qubits
ilar to the general cutting approach, we implement a generic
𝑞 0,𝑞 1 as reusable and applies measurement and reset to them.
instantiation mechanism for supporting pluggable mappings
The final refined QIR now has two fragments of 𝑠 = 2 and 4
from virtual to 1-qubit gates. The mappings depend on the
gates (Figure 7, (d)).
cutting technique (i.e., gate and wire cutting require differ-
The final optimizer’s output is a Qernel with a 42.8% smaller
ent mappings) but can also differ for the same technique. For
size and 66% less noisy gates. Note that each of the above
instance, virtual gates from gate cutting can be mapped to
passes alone wouldn’t achieve this result.
different sets of 1-qubit gates that might be more optimal for
specific QPU technologies.
7 Compiler Backend: Virtualizer By replacing a single virtual gate with multiple 1-qubit
The backend stage of the QOS compiler, the virtualizer, gener- gates, the mapping function generates multiple ISQs that dif-
ates the final executable Qernels for the underlying runtime, fer only by the 1-qubit gate. Then, replacing the next virtual
similar to classical compilers that generate the target code. gate in each ISQ generates even more copies, which leads to
8
(a) QOS Compiler Backend: Virtualizer (b) QOS Runtime: Knitter

original Qernel optimized Qernel instantiated sub-Qernels transpile to target QPUs assign to
up to results results final result
classical nodes

QPU Architectures
2 gate Arch0, ..., Archn schedule &
cuts run
... ... ...

All QPUs
...

Compiler Middle-end (1) Instantiation (2) Target QPUs Transpilation Estimate, Bundle, Schedule (1) Map Phase (2) Reduce Phase

Figure 8. (a) Compiler Backend: Virtualizer (§ 7) and (b) QOS Runtime: Knitter (§ 8.4). (a) The Virtualizer consists of two
stages: (1) Instantiation, which transforms the optimized Qernel to instantiated sub-Qernels (ISQs). This process generates 𝑂 (6𝑘 )
ISQs for 𝑘 cuts (here 𝑘 = 2). (2) Target QPUs transpilation, where the ISQs are transpiled to every QPU architecture or every QPU
in general. (b) The knitter consists of two stages as well: (1) The map phase and (2) the reduce phase.

an exponential number of ISQs. Figure 8 (a) shows a Qernel the user’s goals, i.e., higher fidelity and lower waiting times,
optimized using two gate cuts. The red boxes are the two and the cloud operator’s goals, i.e., resource efficiency. It com-
virtual gates that must be replaced with 1-qubit gates. In this prises four components, which we detail next. For simplicity,
example, the mapping function will replace the first virtual we use the general term Qernel throughout this Section.
gate with six 1-qubit gates, creating six ISQs. For the next and
final virtual gate, each of the six ISQs will produce six more 8.1 Estimator
ISQs, totaling 36 ISQs. Generally, in QOS, the exact overheads The estimator is responsible for predicting the fidelity of a
are 𝑂 (2𝑘 −8𝑘 ) for 𝑘 cuts for our optimization passes (Table 1). given Qernel on the underlying QPUs without executing the
Qernel. This prediction will be the leading decision factor
7.2 Target QPUs Transpilation for the scheduler when assigning the Qernel to a QPU. To
achieve this, it computes a score for each Qernel-QPU assign-
Following instantiation, the ISQs must be transpiled (§ 2.2)
ment that captures the potential fidelity of that assignment
to the target QPUs to be sent to the runtime for scheduling
and then uses the scores to rank the assignments. The esti-
and execution. Since the number of ISQs might be large, de-
mator supports configurable scoring policies that consider (1)
pending on the optimization budget 𝑏 used at the compiler
the Qernels’ properties generated from the compiler and (2)
middle-end (§ 6), we offer two transpilation modes that differ
the QPUs’ calibration data, which are available to quantum
in granularity and overheads: (1) the coarse-grain per QPU ar-
cloud providers since they perform the calibration cycles.
chitecture and (2) the fine-grain per QPU. We show evaluation
For (1), important properties include the number and types
results for their transpilation overheads in § 9.2.
of gates, depth, and the number of measurements (§ 5.2). For
Per QPU-Architecture. In the first mode, we transpile each
(2), recall that QPUs are characterized by calibration data
ISQ for every type of QPU architecture available in the system.
that describe the exact error rates of the QPU for that cali-
This coarse-grain approach bounds the transpilation over-
bration cycle (§ 2.2), specifically, the individual qubit readout
heads because typical quantum cloud providers have a limited
errors, the individual gate errors, and the 𝑇 2 coherence times.
number of architectures, e.g., up to five [37]. Since this mode
In this work, we implement two scoring policies: a numer-
does not scale with the number of QPUs, our experimentation
ical approach for fine-grained control over the estimations
shows that it is suitable for values of budget 𝑏 ≥ 5, which
and a regression model approach for abstracting away the
generate 104 −105 ISQs.
complexity of estimation.
Per-QPU. In the second mode, we transpile each ISQ to each
Numerical Cost Policy. This policy estimates execution
available QPU in the system. This will enable the runtime
fidelity by leveraging the target-QPU transpilation output
components to make fine-grained decisions about the fidelity
of the compiler backend (§ 7). Target transpilation enables
of running the ISQ on any QPU since they will have the exact
fine-grained fidelity estimation by producing the mapping
noise information of this ISQ-QPU pair. The overheads are
between logical and physical qubits and the gate (instruction)
still bound since QPUs are constant in quantity in commercial
schedule. The mapping captures the expected readout and
clouds, e.g., up to 30 [37], in contrast to classical clouds that
gate errors, while the gate schedule captures the order and ex-
scale to thousands of classical nodes. Our experimentation
act timing that the gates will be applied on the qubits, which
showed that this transpilation mode is viable for 𝑏 < 5.
reveals the hardware decoherence and crosstalk errors, as
explained in § 2.2.
8 QOS Runtime Formally, for each qubit 𝑞𝑖 the readout error is 𝑒𝑟 (𝑖 ) , for
The QOS runtime (Figure 5, bottom) schedules and executes each gate 𝑔 𝑗 the error is 𝑒𝑔 ( 𝑗 ) , and the decoherence error is
Qernels across space and time in a scalable manner to achieve 𝑒𝑑 (𝑡 ) = 1−𝑒 −𝑡 /𝑇 2𝑖 , where 𝑡 is the idle time of the qubit 𝑞𝑖 (no
9
gates act on it [15]) and 𝑇 2 is the decoherence time of 𝑞𝑖 . The quantify this without running the bundled Qernels, we use the
crosstalk error between gates 𝑔𝑘 and 𝑔𝑙 is 𝑒𝑐𝑡 (𝑘,𝑙 ) . Putting it entanglement ratio and parallelism Qernel static properties,
all together, the final fidelity score is computed as follows: where higher values indicate a higher chance for crosstalk
Î𝑁 Î Î𝑀 ×𝑀
𝑓 𝑖𝑑 = 1− 𝑖=0 𝑒𝑟 (𝑖 ) 𝑒𝑑 (𝑖 ) 𝑀𝑗=0 𝑒𝑔 ( 𝑗 ) 𝑗=0,𝑘=0 𝑒𝑐𝑡 ( 𝑗,𝑘 ) , where 𝑁 is errors [89]. Intuitively, the entanglement ratio captures the
the circuit’s number of qubits and 𝑀 is the number of gates. proportion of 2-qubit gates over all gates, and parallelism cap-
Since all hardware error information is known at-priori, and tures how many gates run in parallel per time unit, on average.
quantum errors accumulate multiplicatively, this policy pro- To measure the spatial dimension of effective utilization,
duces high-accuracy estimations, as we show in § 9.3. it suffices to compute the ratio of allocated QPU qubits over
Regression Model Policy. As discussed in § 2.2, QPU noise the number of QPU qubits. To measure the temporal dimen-
errors are accurately measured at each calibration cycle, and sion, we compare the relative duration between two Qer-
their impact on fidelity during quantum computation can nels. The depth static property reflects the longest chain of
be described mathematically. Therefore, we can train a re- gates that will be executed; therefore, it measures the Qer-
gression model to predict the fidelity of a transpiled Qernel nel’s duration. More technically, we define effective utiliza-
𝑁 𝑚𝑎𝑥 𝑁𝐶𝑛
on a possible QPU using the QPU’s calibration data and the ∗ 100 + 𝑘𝑛=1 𝐷𝐷𝑚𝑎𝑥
Í
tion as 𝑢𝑒 𝑓 𝑓 = 𝑁𝐶𝑄𝑃𝑈 𝑛
∗ 𝑁𝑄𝑃𝑈 ∗ 100, where
Qernel’s static properties as features [68, 73]. Specifically, we 𝑁𝐶𝑚𝑎𝑥 ,𝑁𝑄𝑃𝑈 are the number of qubits of the longest Qernel
use the aforementioned errors we defined in the numerical and the QPU, respectively, 𝑘 is the number bundled Qernels
cost policy as QPU features and the static properties (§ 5.2) excluding the longest Qernel, and 𝐷 is the depth of the Qernel.
as Qernel features. Even simple regression models such as To put everything together formally, we score a possible
linear regression achieve high prediction accuracy, up to 99%. Qernel pair as follows: 𝑞𝑐 = 𝛼 𝑢𝑒 𝑓 𝑓 + 𝛽 𝐸𝑅𝑏 +𝛾 𝑃𝐴𝑏 ↦→ [0,1],
This policy is simple to use without detailed knowledge of where higher is better, 𝛼 + 𝛽 + 𝛾 = 1, and 𝑐 denotes bun-
the relationship between errors. However, in QOS, we use the dled, i.e., 𝐸𝑏 is the entanglement ratio of the bundled Qernels.
numerical cost policy by default for estimation to have a clear The four variables are tunable to give priorities on different
understanding and full control of the process. objectives, e.g., prioritize effective utilization or minimize
crosstalk. After experimenting and fine-tuning, we found
that 𝛼 = 0.25,𝛽 = 0.25,𝛾 = 0.5, and 𝑞𝑐 ≥ 0.75 gives balanced
8.2 Multi-programmer
results, as we show in § 9.4.
The size of quantum programs that run with high fidelity is Figure 9 shows an example workflow. The multi-programmer
small, leading to QPU underutilization (§ 3.3). To increase receives three Qernels with three estimations each and iden-
QPU utilization, QOS multi-programs two or more Qernels, tifies 𝑄𝑒𝑟𝑛𝑒𝑙 0 and 𝑄𝑒𝑟𝑛𝑒𝑙 2 as a possible pair since their best
potentially from different users, to run on the same QPU. QPU is the same (𝑄𝑃𝑈 5) (a). It computes their independent uti-
We refer to this multi-programming as bundling the Qernels lization, which is 31% and 37%, respectively, and the combined
together. However, trivially bundling Qernels together will utilization is under 100% (b). It computes the compatibility
deteriorate fidelity because qubits interfere with each other score that surpasses the threshold (0.9 > 0.75) (c). Next, we
via crosstalk errors (§ 2.2). On top of that, bundled Qernels detail our multi-programming policies.
that run for unequal durations do not necessarily increase Multi-programming Policies. QOS supports pluggable
utilization since QPU effective utilization is measured in space multi-programming policies for maximizing effective utiliza-
(number of QPU qubits allocated) and time (time qubits are tion or minimizing fidelity penalties. In this work, we imple-
performing actual computation). ment two multi-programming policies; the first is the fast
For example, a 10-qubit Qernel 𝑄 0 running on a 20-qubit path multi-programming, where we can immediately bundle
QPU gives 50% spatial utilization. However, assume that 𝑄 0 two Qernels if there is no conflict between them, while the
runs 3× longer than a 10-qubit Qernel 𝑄 1 . During 23 of 𝑄 0 ’s second requires re-compilation and re-estimation.
runtime, the qubits allocated to 𝑄 1 will be idle, decreasing the Restrict Policy. The restrict policy uses the target QPU tran-
effective utilization to only 66%. Recall that it is impossible spilation output to bundle Qernels if there is no overlap in
to schedule more Qernels during 𝑄 0 ’s runtime, unlike in a their layouts. Practically, this means that for Qernels 𝑄 0 and
typical CPU (§ 2.2). 𝑄 1 , their logical qubits are mapped to disjoint sets of physi-
To minimize the fidelity impact and maximize the effec- cal qubits on the QPUs. In that case, the policy bundles the
tive utilization of multi-programming, we utilize configurable Qernels together, and fidelity loss is minimized through the
Qernel compatibility functions that quantify how well-suited aforementioned compatibility score.
are two Qernels to run together. Re-evaluation Policy. This policy is the fallback of the re-
Qernel Compatibility Functions. Compatibility functions strict policy. If the Qernel layouts overlap, the two Qernels
measure the crosstalk errors and the effective utilization of are transpiled again for the target QPU, and their new fidelity
a Qernel pair by considering the Qernels’ static properties is estimated. If the new fidelity is lower up to a fixed 𝜖 > 0
(§ 5.2). To measure crosstalk effects, we identify pairs of 2-
qubit gates that run in parallel during Qernel execution. To
10
Fidelity Estimations
no overlap overlap re-evalutation policy
Qernel 0 Qernel 1 Qernel 2
Utilization
QPU5: 0.88 QPU4: 0.76 QPU5: 0.91 <100% Compatibility score >0.75
Qernel 0: 31%
QPU3: 0.55 QPU2: 0.44 QPU4: 0.64 [Qernel 0, Qernel 1]: 0.9
Qernel 2: 37%
IBM Falcon r4P
QPU1: 0.47 QPU0: 0.31 QPU0: 0.31
topology
(a) Find Qernels with same best QPU (b) Compute utilization (c) Compute compatibility score (d) Check layout overlap (e) Select and apply policy

Figure 9. QOS multi-programmer example workflow (§ 8.2). (a) Use the estimator’s output to find Qernels with the same best
QPU, (b) compute their independent utilization, and (c) compute their compatibility score. If compatible, (d) check for layout
overlap, and (e) apply the appropriate multi-programming policy.
value compared to the original fidelities, the bundling is main- QPU for estimation result 𝑖, 𝑐 ∈ (0,1): a system-defined con-
tained. Otherwise, the multi-programmer selects the next stant that weighs the fidelity difference between estimations
most compatible Qernel pair. and finally, 𝛽: a system-defined constant acting as a weighting
Figure 9 (d) shows the check for layout overlap. In this ex- factor for utilization difference, balancing system through-
ample, yellow qubits belong to 𝑄𝑒𝑟𝑛𝑒𝑙 0 and green to 𝑄𝑒𝑟𝑛𝑒𝑙 1. put and fidelity. By selecting higher 𝑐, the system prioritizes
On the left, there is no overlap, while on the right, the red qubit fidelity over waiting times, and vice versa, and by selecting
is shared between the Qernels. (e) We apply the respective higher 𝛽 the system prioritizes utilization over fidelity, and
policy (in this example, re-evaluation). vice versa. By default, 𝑐 = 𝛽 = 0.5, which aims for balanced
fidelity, waiting times, and utilization.

8.3 Scheduler 𝑓2 − 𝑓1 𝑡 2 −𝑡 1 𝑢 2 −𝑢 2
𝑆𝑐𝑜𝑟𝑒 =𝑐 − (1−𝑐) +𝛽 (1)
Scheduling quantum programs involves fundamental trade- 𝑓1 𝑡1 𝑢1
offs between conflicting objectives; specifically, users want Genetic Algorithm Policy. Genetic algorithms excel at opti-
maximal fidelity and minimal waiting times. However, to max- mizing for conflicting objectives by efficiently searching over
imize fidelity, most programs must run on the same subset vast search spaces, and for that, they can be used in the context
of QPUs that perform best in a given calibration cycle (§ 3.2). of QOS. We formulate a multi-objective optimization problem
This will lead to large and growing queues on these QPUs, with the conflicting objectives of fidelity vs. waiting times and
hence long waiting times for the users. use the NSGA-II genetic algorithm [18] to solve it. The algo-
Our scheduler assigns and runs Qernels across space (which rithm creates a Pareto front of possible solutions (schedules),
QPUs) and time (when) and supports pluggable policies for each achieving a different combination of average fidelity and
managing the aforementioned tradeoffs, prioritizing maxi- average waiting times. Then, to select one of those schedules,
mal fidelity, minimal waiting times, or a balanced approach. we use the formula described by Equation 1 to score each
The scheduler assigns Qernels to QPUs based on the fidelity schedule and select the schedule with the highest score.
estimations provided by the estimator and the execution time
estimations, which we detail next. 8.4 Knitter
Execution Time Estimation. To optimize for minimal wait- Following scheduling and execution, the QOS runtime col-
ing times, the scheduler must first estimate each Qernel’s lects the results that are part of the initial circuit submitted
execution time and then aggregate the execution time esti- by the user. Recall that the circuit is lifted to the QIR (§ 5.3),
mations in each QPU’s queue to compute the total waiting then optimized through divide-and-conquer techniques that
times. To estimate the execution time, we iterate the longest place virtual gates inside the QIR (§ 6), and finally instantiated
path of the QIR of a Qernel (§ 5.1) that corresponds to the to replace the virtual gates with 1-qubit gates (§ 7.1). The
longest-duration gate chain and thus defines the Qernel’s instantiation process generates up to 𝑂 (8𝑘 ) instantiated sub-
execution time. By summing the gate durations of each node Qernels (ISQs) for a single initial optimized Qernel (Figure
in the longest path, we get the Qernel’s total execution time. 8 (a), Table 1). Finally, the ISQs are bundled with other ISQs,
Formula-Based Policy. Optimizing for conflicting objec- possibly from other users, for increased utilization (§ 8.2).
tives involves comparing two possible solutions (e.g., maximal Therefore, to compute and return the final result to the user
fidelity vs. minimal waiting times). In the formula-based pol- is not trivial; we must first unbundle the results from multi-
icy, we use a simple scoring formula (Equation 1) to compare programming and then merge the results from ISQs to the
and select between two possible assignments. This formula original Qernel, a process called knitting. The structure of the
factors fidelity, waiting time, and utilization to determine ISQs and their respective results resembles a tree structure,
which assignment is better, given priorities. The parameters where the leaf nodes are up to 𝑂 (8𝑘 ) results, and the root
are as follows: 𝑓𝑖 : fidelity of the estimation result 𝑖, 𝑡𝑖 : waiting node is the final result. Therefore, we adopt the map-reduce
time for the QPU from estimation result 𝑖, 𝑢𝑖 : utilization of the pattern to perform knitting.
11
Lower is better
(a) Depth - 12 qubits (b) Number of CNOT gates - 12 qubits (c) Depth - 24 qubits (d) Number of CNOT gates - 24 qubits
1.2 1.2
Relative Values to Qiskit

0.91 0.83 0.56 0.85 0.71 0.25 1.25 1.2


1.0 1.0 0.88 0.70 0.52 0.89 0.65 0.34
1.00 1.0
0.8 0.8 0.8
0.6 0.6 0.75
0.6
0.4 0.4 0.50 0.4
0.2 0.2 0.25 0.2
0.0 0.0 0.00 0.0
FrozenQubits CutQC QOS FrozenQubits CutQC QOS FrozenQubits CutQC QOS FrozenQubits CutQC QOS
QAOA-R3 BV GHZ HS-1 QAOA-P1 QSVM TL-1 VQE-1 W-STATE

Figure 10. QOS Compiler (§ 9.2). Impact of the QOS compiler on the circuit depth and the number of CNOTs. The circuits are
optimized using budget 𝑏 = 3, and we compare against Qiskit (red horizontal line), FrozenQubits [4] and CutQC [85]. There is an
average 46%, 38.6%, and 29.4% reduction in circuit depth, and 70.5%, 66% and 56.6% reduction in the number of CNOTs, respectively.
Unbundling for Multi-programming. The results first optimization level (3) and run with 8192 shots. Each data point
pass through the multi-programmer to be unbundled. To do presented in the figures is the median of five runs.
this, the multi-programmer keeps a record that maps the ini- Benchmarks. We study QOS on a set of circuits used in state-
tial (solo) Qernel IDs to the new, bundled Qernel ID, as well of-the-art NISQ algorithms, adopted from the 3 benchmark
as the Qernels’ sizes. Therefore, when receiving a new result suits of Supermarq [89], MQT-Bench [67] and QASM-Bench
from a Qernel with an ID 𝑖, it scans the record to find an entry [43]. The algorithms’ circuits can be scaled by the number of
𝑖, and if found, it splits the probability distribution bitstrings qubits and depth. Specifically. We study 9 benchmarks: GHZ,
(§ 2) into two parts: the left-most and the right-most bits based W-State, Bernstein Vazirani (BV), Hamiltonian Simulation
on the Qernel sizes. Then, it forwards the unbundled results (HS-𝑡), Quantum-enhanced Support Vector Machine (QSVM),
to the knitter for the map-reduce phases. Two Local Ansatz (TL-𝑛), Variational Quantum Eigensolver
The Map Phase. To efficiently process a large number of re- (VQE-𝑛), and Approximate Optimization Algorithm (QAOA-
sults (up to 𝑂 (8𝑘 )), we follow a divide-and-conquer approach. R/P), these benchmarks cover a wide range of relevant criteria
Specifically, we split the results into 𝑘 equal sizes and distrib- for evaluating QOS.
ute them to𝑘 classical nodes to be processed in parallel (Figure For the TL and VQE circuits, we use circular and linear en-
8 (b), step (1)). We parallelize across 𝑘 to increase data locality tanglement, respectively. The HS, VQE, and TL benchmarks
and reduce communication overheads since all results for are scalable by their circuit depth with the number of time-
each of the 𝑘 cuts will be in the memory of Ë the same node. Lo- steps 𝑡 and layers in the 𝑎𝑛𝑠𝑎𝑡𝑧 𝑛. The QAOA-R/P circuits
cally, each node performs tensor product ( ) operations on are initialized using regular/power-law graphs, respectively,
the probability distributions, which are parallelizable across with degree 𝑑 ∈ {1,3}.
the node’s threads. If available in the node, QOS leverages Metrics. We evaluate the following metrics:
GPUs or TPUs to accelerate the tensor products. Following • Fidelity: We use the Hellinger fidelity as a measure of how
this process, the 𝑘 nodes output 𝑘 intermediate results, ready close a noisy result is to the desired ground truth of a quan-
to be reduced into a single result. tum circuit [22, 33]. The Hellinger fidelity is calculated as
The Reduce Phase. QOS selects any of the𝑘 nodes to perform
 22
𝐹𝑖𝑑𝑒𝑙𝑖𝑡𝑦 (𝑃𝑖𝑑𝑒𝑎𝑙 ,𝑃𝑛𝑜𝑖𝑠𝑦 ) = 1−𝐻 𝑃𝑖𝑑𝑒𝑎𝑙 ,𝑃𝑛𝑜𝑖𝑠𝑦 ↦→ [0, 1],
the reduce step. The rest of the nodes send the intermediate
where 𝐻 is the Hellinger distance between two probabil-
results to this node, which performs a thread-parallel sum of
ity distributions, and 𝑃𝑖𝑑𝑒𝑎𝑙 ,𝑃𝑛𝑜𝑖𝑠𝑦 are the ideal and noisy
𝑘 results. Equivalent to the map phase, the parallel sum can
probability distributions, respectively.
also be executed on GPUs. This produces the final output to
• Circuit Properties: Number of CNOT gates and depth.
be returned to the user (Figure 8 (b), step (2)).
When a Qernel contains more than one sub-Qernel, we
use the sub-Qernel with the maximum depth, amount of
9 Evaluation CNOTs, or an average of these two properties.
9.1 Experimental Methodology • Waiting Time: The time a circuit spends in a QPU’s queue,
Experimental Setup. We conduct two types of experiments: waiting for execution, in seconds.
(1) classical tasks, such as circuit transpilation and trace-based • Classical Overhead: The optimization and post-processing
simulations, and quantum tasks (2), which run on real QPUs overheads (§ 7) of the QOS compiler vs. Qiskit’s transpiler
for measuring the circuits’ fidelities. [66].
For (1), we use a server with a 64-core AMD EPYC 7713P • Quantum Overhead: The number of additional quantum
processor and 512 GB ECC memory. For (2), we conduct our circuits we need to execute per original quantum circuit.
experiments on IBM Falcon r5.11 QPUs. Unless otherwise Baselines. We evaluate the QOS compiler against Qiskit v0.41,
noted, we use the IBM Kolkata 27-qubit QPU. CutQC [85] and FrozenQubits [4]. QOS’s multi-programmer
Framework and Configuration. We use the Qiskit [65] is evaluated against [17]. Regarding QOS scheduler, to the
Python SDK for compiling quantum circuits and running
simulations. We compile quantum circuits with the highest
12
Higher is better
QAOA-R3 BV GHZ HS-1 QAOA-P1 QSVM TL-1 VQE-1 W-STATE
1.0
0.8
0.6
Fidelity

0.4
0.2
0.0
12 24 12 24 12 24 12 24 12 24 12 24 12 24 12 24 12 24
Qiskit CutQC QOS FrozenQubits Number of Qubits

Figure 11. QOS Compiler (§ 9.2). Impact of the QOS compiler on the circuit fidelity against Qiskit [66], CutQC [85], and
FrozenQubits [4]. The circuits are optimized using budget 𝑏 = 3. There is a mean 2.6×, 1.6×, and 1.11× improvement for 12-qubit
circuits, respectively. There is a 456.5×, 7.6×, and 1.67× improvement for circuits of 24 qubits, respectively.
best of your knowledge, [73] is the only peer-reviewed quan- gate errors of 10−3 , and measurement errors of 10−2 . We opti-
tum scheduler, but it doesn’t provide source code or enough mize with budget𝑏 ∈ {0,1,4,8} and report the estimated fidelity.
technical details to faithfully implement it. Figure 12 (b) shows that all budget 𝑏 values improve the esti-
mated fidelity, with a tradeoff of improvement vs. overheads.

9.2 QOS Compiler RQ1 takeaway: The QOS compiler improves the properties
RQ1: How well does the QOS compiler improve the fidelity of of quantum circuits by 51% on average, leading to an
circuits that run on NISQ QPUs? We evaluate the performance improvement in fidelity of 2.6–456.5×, while incurring
of the QOS compiler w.r.t the post-optimization properties acceptable classical and quantum overheads.
and fidelity of the circuits while also analyzing the classical
and quantum costs of our approach.
Effect on the Circuit Depth and Number of CNOTs. In 9.3 Estimator
Figure 10, we show the performance of the QOS compiler on
RQ2: How well does QOS’s estimator address spatial and tem-
the circuits’ depth and number of CNOTs, where we plot the
poral heterogeneities? We evaluate the estimator’s precision
relative difference in post-optimization circuit depth and the
in selecting the top-performing QPU for each benchmark. We
number of CNOTs between Qiskit (the red horizontal line)
establish a baseline using the on-average best-performing
and FrozenQubits [4], CutQC [85], and the QOS compiler.
machine every calibration day. On the day of the experiment,
Figures 10 (a) and (c) show that the circuit depth decreases
IBM Auckland was the best-performing machine (also with
by 46%, 38.6%, and 29.4%, respectively. Figures 10 (b) and (d)
the highest number of pending jobs).
show that the number of CNOTs decreases by 70.5%, 66%,
Estimator’s Accuracy. Figure 12 (c) shows the fidelity of
and 56.6%, respectively. The improvement in both metrics
the eight benchmarks when run on QPUs selected by the
against the baselines is attributed to the composability of our
estimator versus when run on the IBM Auckland QPU. The
compiler; the combined effect of circuit compactions (§ 6)
QPU selected for the BV benchmark is Auckland; therefore,
achieves better results than standalone techniques.
we omit this result. For the rest of the benchmarks, the IBM
Impact on Fidelity. Figure 11 shows the QOS-optimized
Sherbrooke and Brisbane QPUs were automatically selected.
circuits’ fidelity against Qiskit [66], CutQC [85], and Frozen-
Interestingly, the fidelity is on par or even higher than IBM
Qubits [4]. The results show a mean 2.6×, 1.6×, and 1.11×
Auckland, except for only one benchmark, the QAOA-P1.
improvement for 12-qubit circuits, respectively, and a 456.5×,
7.6×, and 1.67× improvement for circuits of 24 qubits, respec- RQ2 takeaway: QOS’s estimator automatically identifies
tively. The fidelity improvement is a consequence of lower QPUs with higher fidelity than the current standard practice.
circuit depths and fewer CNOTs, as shown in Figure 10.
Classical and Quantum Overheads. Figure 12 (a) shows the
average classical and quantum overheads of the QOS compiler.
The classical overhead is 16.6× and 2.5× for 12 and 24 qubits, 9.4 Multi-programmer
respectively, and the quantum overhead is 31.3× and 12× for RQ3: How well does QOS’s multi-programmer increase QPU uti-
12 and 24 qubits, respectively. However, fidelity improves by lization with minimum fidelity penalties? We evaluate the im-
2.6× and 456.5× for 12 and 24 qubits, respectively; therefore, pact of the multi-programmer on the fidelity of co-scheduled
for larger circuits, the fidelity improvement is worth the cost. circuits for certain utilization thresholds.
Scalability. To demonstrate that the QOS Compiler increases Utilization vs. Fidelity. Figure 13 (a) shows the average
the scalability, we run the VQE-1 benchmark on a hypothetical fidelity of nine benchmarks with utilization of 30%, 60%, and
1000-qubit QPU with one-qubit gate errors of 10−4 , two-qubit 88%. The three bars represent: no multi-programming (No
13
Lower is better Higher is better Higher is better
(a) Overheads vs. Improvement (b) Scalability to a Large QPU (c) Estimator's Performance
1.0 1.0
Relative Factor to Qiskit

Classical Overhead IBM Auckland


0.8

Estimated Fidelity
2 Quantum Overhead 0.8 QOS Estimator
10 Fidelity Improvement
0.6 Budget b 0.6

Fidelity
0
10
1 0.4 0.4
1
0.2 4 0.2
0 8
10 0.0 0.0
12 qubits 24 qubits 0 200 400 600 800 1000
-R3 HZ HS-1 A-P1QSVM TL-1 QE-1 TATE
Number of Qubits Number of Qubits OA G O V W-S
QA QA
Benchmark

Figure 12. QOS Compiler (§ 9.2) and QOS Estimator (§ 8.1). (a) Compiler: classical and quantum overheads and fidelity
improvement as a relative factor to Qiskit. For 24 qubits, the improvement outweighs the overheads. (b) Compiler: scalability to
a large, hypothetical 1000-qubit QPU. Any budget 𝑏 > 0 achieves higher quality results than using no optimizations. (c) Estimator’s
performance: fidelity of IBM Auckland vs. the QPU automatically selected by the estimator.
M/P) refers to large circuits that run solo, baseline multi- Fidelity vs. Waiting Time. Figure 14 (a) shows the perfor-
programming (Baseline M/P) refers to [17], and QOS’s multi- mance of the formula-based scheduling policy. We show the
programming approach (QOS M/P). There is an average 9.6× average fidelity and waiting time as the fidelity weight, 𝑐,
improvement in fidelity compared to solo execution and an changes (§ 8.3). A weight of 0.7 achieves ∼ 5× lower waiting
average 15% (1.15×) improvement compared to the baseline. times than full priority of fidelity while sacrificing only ∼ 2%
Effective Utilization. The results in Figure 13 (b) show that fidelity. Figure 14 (b) shows the Pareto front of scheduling
QOS achieves, on average, a 7.2% higher effective utilization solutions generated by the genetic algorithm policy. A weight
(§ 8.2), with a maximum improvement of 10.1%. 𝑐 = 0.5 achieves 2× lower waiting times with 4% lower fidelity.
Fidelity Penalty vs. Solo Execution. In Figure 13 (c), we QPU Load Balancing. Figure 14 (c) shows the QPU load as
evaluate the fidelity penalty of multiprogramming vs. solo the total runtime each QPU was active, in seconds, for the
circuit execution for utilization of 30%, 60%, and 88%. The formula-based policy. All QPUs handle similar loads, with a
fidelity loss is 2%, 9%, and 18%, respectively. The average fi- maximum difference of 15.2%.
delity loss is 9.6% compared to solo execution, which is in line
RQ4 takeaway: QOS scheduler balances the trade-off be-
with previous studies [17, 47]. In the worst case (18%), the
tween waiting times and fidelity by reducing them 5× and
fidelity loss is caused by the restrictions in high-quality qubit
only 2%, respectively, while balancing the load across QPUs.
allocations and the crosstalk errors.
RQ3 takeaway: The QOS multi-programmer improves fi-
delity by 1.15–9.6× and effective utilization by 7.2% compared 10 Related work
to the baselines while incurring an acceptable fidelity penalty
Quantum optimization techniques can be categorized as (1)
(< 10%) compared to solo execution.
qubit mapping and routing [6, 44, 50, 51, 58, 61, 81, 86, 88, 93, 96,
98], (2) instruction/pulse scheduling [15, 26, 52, 78, 82, 91, 97],
(3) gate synthesis/decomposition [14, 46, 58, 62, 78, 79, 95], (4)
9.5 Scheduler execution post-processing and readout improvement [11, 13,
RQ4: How well does QOS’s scheduler balance fidelity vs. wait- 16, 48, 59, 60, 87], and (5) circuit compaction [4, 7, 9, 20, 34, 49,
ing times and balance the load across QPUs? We evaluate our 54, 55, 63, 85]. These techniques are implemented standalone
scheduler by generating a representative workload consisting without a compiler infrastructure and, thus, are not compos-
of a dataset we collected during the development of QOS. able. Instead, the QOS compiler offers a powerful IR that en-
Dataset Collection. During our exploration of the motiva- ables incorporating such techniques in a composable manner.
tional challenges (§ 3) and experimentation and evaluation Moreover, application-specific optimizations focus only
of the QOS components and their policies, we collected a on specific algorithms to enhance fidelity but lack generality
dataset of 70.000 benchmark circuits and more than 7000 job [1, 24, 25, 32, 40, 45, 69, 70, 74, 83, 92]. In contrast, the QOS
runs in the quantum cloud. We use this dataset to simulate compiler is a generic approach applicable to all applications.
representative workloads, as we detail next. The state-of-the-art in quantum multi-programming is
Workload Generation. To generate realistic workloads, we fairly limited [17, 47] and focuses solely on high-quality map-
monitored all available QPUs on the IBM Quantum Cloud ping, overlooking the systematic selection of compatible pro-
[37] for ten days in November 2023 to estimate the hourly job grams for utilization or fidelity. Notably, key optimizations
arrival rate. The average hourly rate is 1500 jobs per hour and from [17] are integrated into the Qiskit transpiler workflow
is the baseline system workload for our evaluation. [66], therefore, are already used by the QOS Virtualizer (§ 7.2).
14
Higher is better
(a) Impact on Fidelity (b) Effective Utilization (c) Relative Fidelity
1.0 100 1.2
No M/P Basline M/P 0.98 0.91 0.82

Effective Utilization [%]


0.8 Baseline M/P 80 QOS M/P 1.0
QOS M/P 0.8

Rel. Fidelity
0.6 60
Fidelity

0.6
0.4 40
0.4
0.2 20 0.2
0.0 0 0.0
30 60 88 30 60 88 30 60 88
Utilization [%] Ideal Utilization [%] Utilization [%]

Figure 13. QOS Multi-programmer (§ 9.4). (a) Impact of multi-programming on fidelity. There is a 9.6× and 1.15× improvement
compared to no multi-programming and the baseline, respectively. (b) Effective utilization. There is 7.2% higher effective utilization
on average. (c) Relative fidelity w.r.t. solo circuit execution. There is an average 9.6% drop in fidelity due to QOS’s multi-programming.
(a) Formula-Based Policy (b) Genetic Algorithm Policy (c) QPU Load as Total Runtime
1200
Avg. Waiting Time [s]
0.82 Fidelity c=1 40000
1000

Total Runtime [s]


Waiting Time c=0.5
Avg. Fidelity

0.81 700
c=0 20000
500
0.80
200 0
1.0 0.9 0.8 0.7 0.6 0.5 0.78 0.79 0.80 0.81 0.82 0.83 0.84 ata hanoi cairo alupe umbai lagos kland airobi
Fidelity Weight Avg. Fidelity kolk d auc n
gua m
QPU

Figure 14. QOS Scheduler (§ 9.5). (a) Formula-based scheduling policy: Average fidelity vs. average waiting times. A fidelity
weight 𝑐 = 0.7 achieves ∼ 5× lower waiting time for only ∼ 2% lower fidelity. (b) Genetic algorithm policy: it creates a Pareto
front of schedules, where a fidelity weight 𝑐 = 0.5 achieves 2× lower waiting times for ∼ 4% fidelity decrease. (c) QPU load as
the total runtime of each QPU for the formula-based policy. The maximum load difference between any two QPUs is 15.2%.
Lastly, current quantum scheduling methods [8, 73, 76] are 3. To our knowledge, we are the first to account for and im-
limited because they (1) schedule circuits one at a time, (2) ne- prove both temporal and spatial QPU utilization when
glect QPU utilization, (3) lack fine control over waiting times multi-programming quantum programs, while mitigat-
versus fidelity, and (4) require manual input for final sched- ing its associated fidelity penalties.
uling decisions. Work in the quantum cloud computing area 4. Our scheduler balances the inherent tradeoff between
[38, 41, 72] and in quantum serverless [23, 29, 53]; describes fidelity and waiting times, leading to better overall QoS.
quantum cloud characteristics or potential architectures, but
QOS is the first end-to-end QPU management system.
Acknowledgements
We thank Karl Jansen and Stefan Kühn from the Center for
11 Conclusion Quantum Technology and Applications (CQTA)- Zeuthen for
We presented QOS, a system that composes cross-stack OS supporting this work by providing access to IBM quantum re-
abstractions to address the challenges of quantum computing sources. We also thank Ahmed Darwish and Dmitry Lugovoy
holistically. The synergy between compaction techniques, for their contributions to this work. Funded by the Bavarian
performance estimation, multi-programming, and schedul- State Ministry of Science and the Arts as part of the Munich
ing systematically explores the tradeoff space associated with Quantum Valley (MQV).
quantum. Specifically, QOS achieves up to 456.5× higher fi-
delity at a 12× overhead cost, up to 9.6× higher fidelity for a
target utilization for 9.6% lower fidelity than solo execution, References
and up to 5× lower waiting times for 2% lower fidelity. [1] Mahabubul Alam, Abdullah Ash-Saki, and Swaroop Ghosh. 2020.
Circuit Compilation Methodologies for Quantum Approximate
Contributions. Our main contributions include: Optimization Algorithm. In 2020 53rd Annual IEEE/ACM Inter-
1. To our knowledge, QOS is the first attempt to combine national Symposium on Microarchitecture (MICRO). 215–228.
circuit compaction with quantum resource manage- https://doi.org/10.1109/MICRO50266.2020.00029
ment to tackle the challenges of QPUs holistically. [2] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin,
2. We leverage the QOS compiler infrastructure to com- Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao,
David A Buell, et al. 2019. Quantum supremacy using a programmable
pose optimizations that improve fidelity in a scalable superconducting processor. Nature 574, 7779 (2019), 505–510.
manner, significantly outperforming their individual [3] awsQuantum [n. d.]. AWS Bracket. https://aws.amazon.com/braket/.
application (i.e., the current practice). Accessed: 2022-04-11.
15
[4] Ramin Ayanzadeh, Narges Alavisamani, Poulami Das, and Moinuddin https://doi.org/10.1145/3466752.3480044
Qureshi. 2023. FrozenQubits: Boosting Fidelity of QAOA by Skip- [17] Poulami Das, Swamit S. Tannu, Prashant J. Nair, and Moinuddin
ping Hotspot Nodes. In Proceedings of the 28th ACM International Qureshi. 2019. A Case for Multi-Programming Quantum Com-
Conference on Architectural Support for Programming Languages and puters. In Proceedings of the 52nd Annual IEEE/ACM International
Operating Systems, Volume 2 (Vancouver, BC, Canada) (ASPLOS 2023). Symposium on Microarchitecture (Columbus, OH, USA) (MICRO ’52).
Association for Computing Machinery, New York, NY, USA, 311–324. Association for Computing Machinery, New York, NY, USA, 291–303.
https://doi.org/10.1145/3575693.3575741 https://doi.org/10.1145/3352460.3358287
[5] azurequantum [n. d.]. Azure Quantum. https://azure.microsoft.com/en- [18] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast
us/products/quantum. Accessed: 2022-04-11. and elitist multiobjective genetic algorithm: NSGA-II. IEEE
[6] Jonathan M. Baker, Andrew Litteken, Casey Duckering, Henry Transactions on Evolutionary Computation 6, 2 (2002), 182–197.
Hoffmann, Hannes Bernien, and Frederic T. Chong. 2021. Exploiting https://doi.org/10.1109/4235.996017
Long-Distance Interactions and Tolerating Atom Loss in Neutral [19] Matthew DeCross, Eli Chertkov, Megan Kohagen, and Michael Foss-
Atom Quantum Architectures. In 2021 ACM/IEEE 48th Annual Feig. 2022. Qubit-reuse compilation with mid-circuit measurement
International Symposium on Computer Architecture (ISCA). 818–831. and reset. arXiv:2210.08039 [quant-ph]
https://doi.org/10.1109/ISCA52012.2021.00069 [20] Yongshan Ding, Xin-Chuan Wu, Adam Holmes, Ash Wiseth, Diana
[7] Luciano Bello, Agata M. Brańczyk, Sergey Bravyi, Almudena Carrera Franklin, Margaret Martonosi, and Frederic T. Chong. 2020. SQUARE:
Vazquez, Andrew Eddins, Daniel J. Egger, Bryce Fuller, Julien Gacon, Strategic Quantum Ancilla Reuse for Modular Quantum Programs
James R. Garrison, Jennifer R. Glick, Tanvi P. Gujarati, Ikko Hamamura, via Cost-Effective Uncomputation. In 2020 ACM/IEEE 47th Annual
Areeq I. Hasan, Takashi Imamichi, Caleb Johnson, Ieva Liepuoniute, International Symposium on Computer Architecture (ISCA). 570–583.
Owen Lockwood, Mario Motta, C. D. Pemmaraju, Pedro Rivero, Max https://doi.org/10.1109/ISCA45697.2020.00054
Rossmannek, Travis L. Scholten, Seetharami Seelam, Iskandar Sitdikov, [21] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. 2014. A Quantum
Dharmashankar Subramanian, Wei Tang, and Stefan Woerner. 2023. Approximate Optimization Algorithm. arXiv:1411.4028 [quant-ph]
Circuit Knitting Toolbox. https://github.com/Qiskit-Extensions/circuit- [22] fidelity-qiskit [n. d.]. Qiskit Hellinger fidelity. https://qiskit.org/
knitting-toolbox. https://doi.org/10.5281/zenodo.7987997 documentation/stubs/qiskit.quantum_info.hellinger_fidelity.html.
[8] Debasmita Bhoumik, Ritajit Majumdar, Amit Saha, and Susmita Accessed: 2022-04-11.
Sur-Kolay. 2023. Distributed Scheduling of Quantum Circuits with [23] Jose Garcia-Alonso, Javier Rojo, David Valencia, Enrique Moguel,
Noise and Time Optimization. arXiv:2309.06005 [quant-ph] Javier Berrocal, and Juan Manuel Murillo. 2022. Quantum Software as
[9] Benjamin Bichsel, Maximilian Baader, Timon Gehr, and Martin Vechev. a Service Through a Quantum API Gateway. IEEE Internet Computing
2020. Silq: A High-Level Quantum Language with Safe Uncomputation 26, 1 (Jan 2022), 34–41. https://doi.org/10.1109/MIC.2021.3132688
and Intuitive Semantics. In Proceedings of the 41st ACM SIGPLAN [24] Pranav Gokhale, Olivia Angiuli, Yongshan Ding, Kaiwen Gui,
Conference on Programming Language Design and Implementation Teague Tomesh, Martin Suchara, Margaret Martonosi, and Fred-
(London, UK) (PLDI 2020). Association for Computing Machinery, New eric T. Chong. 2019. Minimizing State Preparations in Variational
York, NY, USA, 286–300. https://doi.org/10.1145/3385412.3386007 Quantum Eigensolver by Partitioning into Commuting Families.
[10] Sergey Bravyi, Oliver Dial, Jay M Gambetta, Dario Gil, and Zaira arXiv:1907.13623 [quant-ph]
Nazario. 2022. The future of quantum computing with superconducting [25] Pranav Gokhale, Yongshan Ding, Thomas Propson, Christopher
qubits. Journal of Applied Physics 132, 16 (2022), 160902. Winkler, Nelson Leung, Yunong Shi, David I. Schuster, Henry
[11] Sergey Bravyi, Sarah Sheldon, Abhinav Kandala, David C. Mckay, Hoffmann, and Frederic T. Chong. 2019. Partial Compilation of
and Jay M. Gambetta. 2021. Mitigating measurement errors in Variational Algorithms for Noisy Intermediate-Scale Quantum
multiqubit experiments. Phys. Rev. A 103 (Apr 2021), 042605. Issue 4. Machines. In Proceedings of the 52nd Annual IEEE/ACM International
https://doi.org/10.1103/PhysRevA.103.042605 Symposium on Microarchitecture (Columbus, OH, USA) (MICRO ’52).
[12] Andrew W. Cross, Lev S. Bishop, Sarah Sheldon, Paul D. Nation, Association for Computing Machinery, New York, NY, USA, 266–278.
and Jay M. Gambetta. 2019. Validating quantum computers using https://doi.org/10.1145/3352460.3358313
randomized model circuits. Phys. Rev. A 100 (Sep 2019), 032328. Issue [26] Pranav Gokhale, Ali Javadi-Abhari, Nathan Earnest, Yunong Shi,
3. https://doi.org/10.1103/PhysRevA.100.032328 and Frederic T. Chong. 2020. Optimized Quantum Compilation for
[13] Siddharth Dangwal, Gokul Subramanian Ravi, Poulami Das, Kaitlin N. Near-Term Algorithms with OpenPulse. In 2020 53rd Annual IEEE/ACM
Smith, Jonathan M. Baker, and Frederic T. Chong. 2023. VarSaw: International Symposium on Microarchitecture (MICRO). 186–200.
Application-tailored Measurement Error Mitigation for Variational https://doi.org/10.1109/MICRO50266.2020.00027
Quantum Algorithms. arXiv:2306.06027 [quant-ph] [27] google-nisq-properties [n. d.]. Quantum Computer Datasheet.
[14] Poulami Das, Eric Kessler, and Yunong Shi. 2023. The Imita- https://quantumai.google/hardware/datasheet/weber.pdf. Accessed:
tion Game: Leveraging CopyCats for Robust Native Gate Selec- 2023-07-17.
tion in NISQ Programs. In 2023 IEEE International Symposium [28] googleQuantum [n. d.]. Google Cirq. https://quantumai.google/cirq.
on High-Performance Computer Architecture (HPCA). 787–801. Accessed: 2022-04-11.
https://doi.org/10.1109/HPCA56546.2023.10071025 [29] M Grossi, L Crippa, A Aita, G Bartoli, V Sammarco, E Picca, N Said,
[15] Poulami Das, Swamit Tannu, Siddharth Dangwal, and Moinuddin F Tramonto, and F Mattei. 2021. A Serverless Cloud Integration For
Qureshi. 2021. ADAPT: Mitigating Idling Errors in Qubits via Quantum Computing. arXiv preprint arXiv:2107.02007 (2021).
Adaptive Dynamical Decoupling. In MICRO-54: 54th Annual IEEE/ACM [30] Laszlo Gyongyosi and Sandor Imre. 2019. A Survey on quantum
International Symposium on Microarchitecture (Virtual Event, Greece) computing technology. Computer Science Review 31 (2019), 51–71.
(MICRO ’21). Association for Computing Machinery, New York, NY, https://doi.org/10.1016/j.cosrev.2018.11.002
USA, 950–962. https://doi.org/10.1145/3466752.3480059 [31] David Hanneke, JP Home, John D Jost, Jason M Amini, Dietrich
[16] Poulami Das, Swamit Tannu, and Moinuddin Qureshi. 2021. JigSaw: Leibfried, and David J Wineland. 2010. Realization of a programmable
Boosting Fidelity of NISQ Programs via Measurement Subsetting. two-qubit quantum processor. Nature Physics 6, 1 (2010), 13–16.
In MICRO-54: 54th Annual IEEE/ACM International Symposium [32] Tianyi Hao, Kun Liu, and Swamit Tannu. 2023. Enabling High
on Microarchitecture (Virtual Event, Greece) (MICRO ’21). Asso- Performance Debugging for Variational Quantum Algorithms Using
ciation for Computing Machinery, New York, NY, USA, 937–949. Compressed Sensing. In Proceedings of the 50th Annual International

16
Symposium on Computer Architecture (Orlando, FL, USA) (ISCA ’23). Chong, and Jonathan M. Baker. 2023. Dancing the Quantum Waltz:
Association for Computing Machinery, New York, NY, USA, Article Compiling Three-Qubit Gates on Four Level Architectures. In
9, 13 pages. https://doi.org/10.1145/3579371.3589044 Proceedings of the 50th Annual International Symposium on Com-
[33] Ernst Hellinger. 1909. Neue begründung der theorie quadratischer puter Architecture (Orlando, FL, USA) (ISCA ’23). Association for
formen von unendlichvielen veränderlichen. Journal für die reine und Computing Machinery, New York, NY, USA, Article 71, 14 pages.
angewandte Mathematik 1909, 136 (1909), 210–271. https://doi.org/10.1145/3579371.3589106
[34] Fei Hua, Yuwei Jin, Yanhao Chen, Suhas Vittal, Kevin Krsulich, Lev S [47] Lei Liu and Xinglei Dou. 2021. QuCloud: A New Qubit Map-
Bishop, John Lapeyre, Ali Javadi-Abhari, and Eddy Z Zhang. 2023. ping Mechanism for Multi-programming Quantum Computing
CaQR: A Compiler-Assisted Approach for Qubit Reuse through in Cloud Environment. In 2021 IEEE International Symposium
Dynamic Circuit. In Proceedings of the 28th ACM International on High-Performance Computer Architecture (HPCA). 167–178.
Conference on Architectural Support for Programming Languages and https://doi.org/10.1109/HPCA51647.2021.00024
Operating Systems, Volume 3. 59–71. [48] Filip B Maciejewski, Zoltán Zimborás, and Michał Oszmaniec. 2020.
[35] H. Häffner, C.F. Roos, and R. Blatt. 2008. Quantum comput- Mitigation of readout noise in near-term quantum devices by classical
ing with trapped ions. Physics Reports 469, 4 (2008), 155–203. post-processing based on detector tomography. Quantum 4 (2020), 257.
https://doi.org/10.1016/j.physrep.2008.09.003 [49] Kosuke Mitarai and Keisuke Fujii. 2021. Constructing a virtual
[36] ibmq-calibration [n. d.]. IBM Quantum calibration jobs. https: two-qubit gate by sampling single-qubit operations. New Journal of
//quantum-computing.ibm.com/admin/docs/admin/calibration-jobs. Physics 23, 2 (2021), 023021.
Accessed: 2022-04-11. [50] Abtin Molavi, Amanda Xu, Martin Diges, Lauren Pick, Swamit Tannu,
[37] ibmQuantum [n. d.]. IBM Quantum. https://www.ibm.com/quantum- and Aws Albarghouthi. 2022. Qubit Mapping and Routing via MaxSAT.
computing/. Accessed: 2022-04-11. In 2022 55th IEEE/ACM International Symposium on Microarchitecture
[38] Peter J Karalekas, Nikolas A Tezak, Eric C Peterson, Colm A (MICRO). 1078–1091. https://doi.org/10.1109/MICRO56248.2022.00077
Ryan, Marcus P da Silva, and Robert S Smith. 2020. A quantum- [51] Prakash Murali, Jonathan M. Baker, Ali Javadi-Abhari, Frederic T.
classical cloud platform optimized for variational hybrid algo- Chong, and Margaret Martonosi. 2019. Noise-Adaptive Compiler
rithms. Quantum Science and Technology 5, 2 (mar 2020), 024003. Mappings for Noisy Intermediate-Scale Quantum Computers. In Pro-
https://doi.org/10.1088/2058-9565/ab7559 ceedings of the Twenty-Fourth International Conference on Architectural
[39] P. V. Klimov, J. Kelly, Z. Chen, M. Neeley, A. Megrant, B. Burkett, R. Support for Programming Languages and Operating Systems (Providence,
Barends, K. Arya, B. Chiaro, Yu Chen, A. Dunsworth, A. Fowler, B. RI, USA) (ASPLOS ’19). Association for Computing Machinery, New
Foxen, C. Gidney, M. Giustina, R. Graff, T. Huang, E. Jeffrey, Erik Lucero, York, NY, USA, 1015–1029. https://doi.org/10.1145/3297858.3304075
J. Y. Mutus, O. Naaman, C. Neill, C. Quintana, P. Roushan, Daniel Sank, A. [52] Prakash Murali, David C. Mckay, Margaret Martonosi, and Ali
Vainsencher, J. Wenner, T. C. White, S. Boixo, R. Babbush, V. N. Smelyan- Javadi-Abhari. 2020. Software Mitigation of Crosstalk on Noisy
skiy, H. Neven, and John M. Martinis. 2018. Fluctuations of Energy- Intermediate-Scale Quantum Computers. In Proceedings of the Twenty-
Relaxation Times in Superconducting Qubits. Phys. Rev. Lett. 121 (Aug Fifth International Conference on Architectural Support for Programming
2018), 090502. Issue 9. https://doi.org/10.1103/PhysRevLett.121.090502 Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS
[40] Lingling Lao and Dan E. Browne. 2022. 2QAN: A Quantum Compiler for ’20). Association for Computing Machinery, New York, NY, USA,
2-Local Qubit Hamiltonian Simulation Algorithms. In Proceedings of the 1001–1016. https://doi.org/10.1145/3373376.3378477
49th Annual International Symposium on Computer Architecture (New [53] Hoa T. Nguyen, Muhammad Usman, and Rajkumar Buyya. 2022.
York, New York) (ISCA ’22). Association for Computing Machinery, QFaaS: A Serverless Function-as-a-Service Framework for Quantum
New York, NY, USA, 351–365. https://doi.org/10.1145/3470496.3527394 Computing. arXiv:2205.14845 [quant-ph]
[41] Frank Leymann, Johanna Barzen, Michael Falkenthal, Daniel [54] Alexandru Paler, Robert Wille, and Simon J. Devitt. 2016. Wire
Vietz, Benjamin Weder, and Karoline Wild. 2020. Quantum in recycling for quantum circuit optimization. Phys. Rev. A 94 (Oct 2016),
the Cloud: Application Potentials and Research Opportunities. 042337. Issue 4. https://doi.org/10.1103/PhysRevA.94.042337
arXiv:2003.06256 [quant-ph] [55] Anouk Paradis, Benjamin Bichsel, Samuel Steffen, and Martin Vechev.
[42] Ang Li, Samuel Stein, Sriram Krishnamoorthy, and James Ang. 2021. Unqomp: Synthesizing Uncomputation in Quantum Circuits.
2021. QASMBench: A Low-level QASM Benchmark Suite for NISQ In Proceedings of the 42nd ACM SIGPLAN International Conference on
Evaluation and Simulation. arXiv preprint arXiv:2005.13018 (2021). Programming Language Design and Implementation (Virtual, Canada)
[43] Ang Li, Samuel Stein, Sriram Krishnamoorthy, and James Ang. 2023. (PLDI 2021). Association for Computing Machinery, New York, NY,
QASMBench: A Low-Level Quantum Benchmark Suite for NISQ USA, 222–236. https://doi.org/10.1145/3453483.3454040
Evaluation and Simulation. ACM Transactions on Quantum Computing [56] James L Park. 1970. The concept of transition in quantum mechanics.
4, 2, Article 10 (feb 2023), 26 pages. https://doi.org/10.1145/3550488 Foundations of physics 1, 1 (1970), 23–33.
[44] Gushu Li, Yufei Ding, and Yuan Xie. 2019. Tackling the Qubit [57] Tirthak Patel, Abhay Potharaju, Baolin Li, Rohan Basu Roy, and Devesh
Mapping Problem for NISQ-Era Quantum Devices. In Proceedings of Tiwari. 2020. Experimental Evaluation of NISQ Quantum Computers:
the Twenty-Fourth International Conference on Architectural Support Error Measurement, Characterization, and Implications. In SC20:
for Programming Languages and Operating Systems (Providence, RI, International Conference for High Performance Computing, Networking,
USA) (ASPLOS ’19). Association for Computing Machinery, New York, Storage and Analysis. 1–15. https://doi.org/10.1109/SC41405.2020.00050
NY, USA, 1001–1014. https://doi.org/10.1145/3297858.3304023 [58] Tirthak Patel, Daniel Silver, and Devesh Tiwari. 2022. Geyser: A
[45] Gushu Li, Anbang Wu, Yunong Shi, Ali Javadi-Abhari, Yufei Ding, Compilation Framework for Quantum Computing with Neutral
and Yuan Xie. 2022. Paulihedral: A Generalized Block-Wise Compiler Atoms. In Proceedings of the 49th Annual International Symposium
Optimization Framework for Quantum Simulation Kernels. In on Computer Architecture (New York, New York) (ISCA ’22). Asso-
Proceedings of the 27th ACM International Conference on Architectural ciation for Computing Machinery, New York, NY, USA, 383–395.
Support for Programming Languages and Operating Systems (Lausanne, https://doi.org/10.1145/3470496.3527428
Switzerland) (ASPLOS ’22). Association for Computing Machinery, [59] Tirthak Patel and Devesh Tiwari. 2020. DisQ: A Novel Quantum
New York, NY, USA, 554–569. https://doi.org/10.1145/3503222.3507715 Output State Classification Method on IBM Quantum Computers
[46] Andrew Litteken, Lennart Maximilian Seifert, Jason D. Chadwick, Using Openpulse. In Proceedings of the 39th International Conference on
Natalia Nottingham, Tanay Roy, Ziqian Li, David Schuster, Frederic T. Computer-Aided Design (Virtual Event, USA) (ICCAD ’20). Association

17
for Computing Machinery, New York, NY, USA, Article 139, 9 pages. [74] Salonik Resch, Anthony Gutierrez, Joon Suk Huh, Srikant Bharad-
https://doi.org/10.1145/3400302.3415619 waj, Yasuko Eckert, Gabriel Loh, Mark Oskin, and Swamit Tannu.
[60] Tirthak Patel and Devesh Tiwari. 2020. VERITAS: Accurately 2021. Accelerating Variational Quantum Algorithms Using Circuit
Estimating the Correct Output on Noisy Intermediate-Scale Concurrency. arXiv:2109.01714 [cs.ET]
Quantum Computers. In SC20: International Conference for High [75] Movahhed Sadeghi, Soheil Khadirsharbiyani, and Mahmut Taylan
Performance Computing, Networking, Storage and Analysis. 1–16. Kandemir. 2022. Quantum Circuit Resizing. arXiv:2301.00720 [cs.ET]
https://doi.org/10.1109/SC41405.2020.00019 [76] Marie Salm, Johanna Barzen, Frank Leymann, and Benjamin Weder.
[61] Tirthak Patel and Devesh Tiwari. 2021. Qraft: Reverse Your Quantum 2022. Prioritization of Compiled Quantum Circuits for Different
Circuit and Know the Correct Program Output. In Proceedings of Quantum Computers. In 2022 IEEE International Conference on
the 26th ACM International Conference on Architectural Support for Software Analysis, Evolution and Reengineering (SANER). 1258–1265.
Programming Languages and Operating Systems (Virtual, USA) (ASPLOS https://doi.org/10.1109/SANER53432.2022.00150
’21). Association for Computing Machinery, New York, NY, USA, [77] schrondinger [n. d.]. Schrödinger’s cat. https://en.wikipedia.org/wiki/
443–455. https://doi.org/10.1145/3445814.3446743 Schr%C3%B6dinger%27s_cat. Accessed: 2022-04-11.
[62] Tirthak Patel, Ed Younis, Costin Iancu, Wibe de Jong, and Devesh [78] Yunong Shi, Nelson Leung, Pranav Gokhale, Zane Rossi, David I.
Tiwari. 2022. Quest: systematically approximating quantum circuits Schuster, Henry Hoffmann, and Frederic T. Chong. 2019. Optimized
for higher output fidelity. In Proceedings of the 27th ACM International Compilation of Aggregated Instructions for Realistic Quantum
Conference on Architectural Support for Programming Languages and Computers. In Proceedings of the Twenty-Fourth International Con-
Operating Systems. 514–528. ference on Architectural Support for Programming Languages and
[63] Tianyi Peng, Aram W Harrow, Maris Ozols, and Xiaodi Wu. 2020. Operating Systems (Providence, RI, USA) (ASPLOS ’19). Associa-
Simulating large quantum circuits on a small quantum computer. tion for Computing Machinery, New York, NY, USA, 1031–1044.
Physical Review Letters 125, 15 (2020), 150504. https://doi.org/10.1145/3297858.3304018
[64] John Preskill. 2018. Quantum Computing in the NISQ era and beyond. [79] Yunong Shi, Nelson Leung, Pranav Gokhale, Zane Rossi, David I.
Quantum 2 (aug 2018), 79. https://doi.org/10.22331/q-2018-08-06-79 Schuster, Henry Hoffmann, and Frederic T. Chong. 2019. Optimized
[65] Qiskit contributors. 2023. Qiskit: An Open-source Framework for Compilation of Aggregated Instructions for Realistic Quantum
Quantum Computing. https://doi.org/10.5281/zenodo.2573505 Computers. In Proceedings of the Twenty-Fourth International Con-
[66] qiskit-transpiler [n. d.]. Qiskit Transpiler. https://qiskit.org/ ference on Architectural Support for Programming Languages and
documentation/apidoc/transpiler.html. Accessed: 2022-06-09. Operating Systems (Providence, RI, USA) (ASPLOS ’19). Associa-
[67] Nils Quetschlich, Lukas Burgholzer, and Robert Wille. 2022. MQT tion for Computing Machinery, New York, NY, USA, 1031–1044.
Bench: Benchmarking software and design automation tools for https://doi.org/10.1145/3297858.3304018
quantum computing. arXiv preprint arXiv:2204.13719 (2022). [80] Irfan Siddiqi. 2021. Engineering high-coherence superconducting
[68] N. Quetschlich, L. Burgholzer, and R. Wille. 2023. MQT Predictor: qubits. Nature Reviews Materials 6, 10 (2021), 875–891.
Automatic Device Selection with Device-Specific Circuit Compilation [81] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Caroline
for Quantum Computing. arXiv:2305.02337 Collange, and Fernando Magno Quintao Pereira. 2018. Qubit
[69] Gokul Subramanian Ravi, Pranav Gokhale, Yi Ding, William Kirby, Allocation. In Proceedings of the 2018 International Symposium on
Kaitlin Smith, Jonathan M. Baker, Peter J. Love, Henry Hoffmann, Code Generation and Optimization (Vienna, Austria) (CGO 2018).
Kenneth R. Brown, and Frederic T. Chong. 2022. CAFQA: A Classical Association for Computing Machinery, New York, NY, USA, 113–125.
Simulation Bootstrap for Variational Quantum Algorithms. In Proceed- https://doi.org/10.1145/3168822
ings of the 28th ACM International Conference on Architectural Support [82] Kaitlin N. Smith, Gokul Subramanian Ravi, Prakash Murali, Jonathan M.
for Programming Languages and Operating Systems, Volume 1 (Vancou- Baker, Nathan Earnest, Ali Javadi-Cabhari, and Frederic T. Chong. 2022.
ver, BC, Canada) (ASPLOS 2023). Association for Computing Machinery, TimeStitch: Exploiting Slack to Mitigate Decoherence in Quantum
New York, NY, USA, 15–29. https://doi.org/10.1145/3567955.3567958 Circuits. ACM Transactions on Quantum Computing 4, 1, Article 8 (oct
[70] Gokul Subramanian Ravi, Kaitlin Smith, Jonathan M. Baker, Tejas 2022), 27 pages. https://doi.org/10.1145/3548778
Kannan, Nathan Earnest, Ali Javadi-Abhari, Henry Hoffmann, and [83] Samuel Stein, Nathan Wiebe, Yufei Ding, Peng Bo, Karol Kowalski,
Frederic T. Chong. 2023. Navigating the Dynamic Noise Landscape Nathan Baker, James Ang, and Ang Li. 2022. EQC: Ensembled Quantum
of Variational Quantum Algorithms with QISMET. In Proceedings of Computing for Variational Quantum Algorithms. In Proceedings of the
the 28th ACM International Conference on Architectural Support for 49th Annual International Symposium on Computer Architecture (New
Programming Languages and Operating Systems, Volume 2 (Vancouver, York, New York) (ISCA ’22). Association for Computing Machinery,
BC, Canada) (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 59–71. https://doi.org/10.1145/3470496.3527434
New York, NY, USA, 515–529. https://doi.org/10.1145/3575693.3575739 [84] Wei Tang, Teague Tomesh, Martin Suchara, Jeffrey Larson, and
[71] Gokul Subramanian Ravi, Kaitlin N. Smith, Pranav Gokhale, and Margaret Martonosi. 2021. CutQC: Using Small Quantum Computers
Frederic T. Chong. 2021. Quantum Computing in the Cloud: for Large Quantum Circuit Evaluations. In Proceedings of the 26th ACM
Analyzing job and machine characteristics. In 2021 IEEE Interna- International Conference on Architectural Support for Programming
tional Symposium on Workload Characterization (IISWC). 39–50. Languages and Operating Systems (Virtual, USA) (ASPLOS ’21).
https://doi.org/10.1109/IISWC53511.2021.00015 Association for Computing Machinery, New York, NY, USA, 473–486.
[72] Gokul Subramanian Ravi, Kaitlin N. Smith, Pranav Gokhale, and Fred- https://doi.org/10.1145/3445814.3446758
eric T. Chong. 2022. Quantum Computing in the Cloud: Analyzing job [85] Wei Tang, Teague Tomesh, Martin Suchara, Jeffrey Larson, and
and machine characteristics. Archive (2022). arXiv:2203.13121 [quant- Margaret Martonosi. 2021. Cutqc: using small quantum computers
ph] for large quantum circuit evaluations. In Proceedings of the 26th ACM
[73] Gokul Subramanian Ravi, Kaitlin N. Smith, Prakash Murali, and International Conference on Architectural Support for Programming
Frederic T. Chong. 2021. Adaptive job and resource management Languages and Operating Systems. 473–486.
for the growing quantum cloud. In 2021 IEEE International Con- [86] Swamit S. Tannu and Moinuddin Qureshi. 2019. Ensemble of
ference on Quantum Computing and Engineering (QCE). 301–312. Diverse Mappings: Improving Reliability of Quantum Computers by
https://doi.org/10.1109/QCE52317.2021.00047 Orchestrating Dissimilar Mistakes. In Proceedings of the 52nd Annual
IEEE/ACM International Symposium on Microarchitecture (Columbus,

18
OH, USA) (MICRO ’52). Association for Computing Machinery, New [93] Robert Wille, Lukas Burgholzer, and Alwin Zulehner. 2019. Mapping
York, NY, USA, 253–265. https://doi.org/10.1145/3352460.3358257 Quantum Circuits to IBM QX Architectures Using the Minimal
[87] Swamit S Tannu and Moinuddin K Qureshi. 2019. Mitigating measure- Number of SWAP and H Operations. In Proceedings of the 56th Annual
ment errors in quantum computers by exploiting state-dependent bias. Design Automation Conference 2019 (Las Vegas, NV, USA) (DAC ’19).
In Proceedings of the 52nd annual IEEE/ACM international symposium Association for Computing Machinery, New York, NY, USA, Article
on microarchitecture. 279–290. 142, 6 pages. https://doi.org/10.1145/3316781.3317859
[88] Swamit S. Tannu and Moinuddin K. Qureshi. 2019. Not All Qubits [94] Nicolas Wittler, Federico Roy, Kevin Pack, Max Werninghaus,
Are Created Equal: A Case for Variability-Aware Policies for NISQ-Era Anurag Saha Roy, Daniel J. Egger, Stefan Filipp, Frank K. Wilhelm,
Quantum Computers. In Proceedings of the Twenty-Fourth International and Shai Machnes. 2021. Integrated Tool Set for Control, Calibration,
Conference on Architectural Support for Programming Languages and Characterization of Quantum Devices Applied to Supercon-
and Operating Systems (Providence, RI, USA) (ASPLOS ’19). Asso- ducting Qubits. Phys. Rev. Appl. 15 (Mar 2021), 034080. Issue 3.
ciation for Computing Machinery, New York, NY, USA, 987–999. https://doi.org/10.1103/PhysRevApplied.15.034080
https://doi.org/10.1145/3297858.3304007 [95] Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu, and Aws
[89] Teague Tomesh, Pranav Gokhale, Victory Omole, Gokul Subramanian Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers.
Ravi, Kaitlin N Smith, Joshua Viszlai, Xin-Chuan Wu, Nikos Hardavellas, Proceedings of the ACM on Programming Languages 7, PLDI (jun 2023),
Margaret R Martonosi, and Frederic T Chong. 2022. Supermarq: A scal- 835–859. https://doi.org/10.1145/3591254
able quantum benchmark suite. In 2022 IEEE International Symposium [96] Chi Zhang, Ari B. Hayes, Longfei Qiu, Yuwei Jin, Yanhao Chen, and
on High-Performance Computer Architecture (HPCA). IEEE, 587–603. Eddy Z. Zhang. 2021. Time-Optimal Qubit Mapping. In Proceedings
[90] Caroline Tornow, Naoki Kanazawa, William E. Shanks, and Daniel J. of the 26th ACM International Conference on Architectural Support
Egger. 2022. Minimum Quantum Run-Time Characterization for Programming Languages and Operating Systems (Virtual, USA)
and Calibration via Restless Measurements with Dynamic Rep- (ASPLOS ’21). Association for Computing Machinery, New York, NY,
etition Rates. Phys. Rev. Appl. 17 (Jun 2022), 064061. Issue 6. USA, 360–374. https://doi.org/10.1145/3445814.3446706
https://doi.org/10.1103/PhysRevApplied.17.064061 [97] Alexander Zlokapa and Alexandru Gheorghiu. 2020. A deep
[91] Vinay Tripathi, Huo Chen, Mostafa Khezri, Ka-Wa Yip, E.M. learning model for noise prediction on near-term quantum devices.
Levenson-Falk, and Daniel A. Lidar. 2022. Suppression of arXiv:2005.10811 [quant-ph]
Crosstalk in Superconducting Qubits Using Dynamical De- [98] Alwin Zulehner, Alexandru Paler, and Robert Wille. 2019. An
coupling. Phys. Rev. Appl. 18 (Aug 2022), 024068. Issue 2. Efficient Methodology for Mapping Quantum Circuits to the IBM
https://doi.org/10.1103/PhysRevApplied.18.024068 QX Architectures. IEEE Transactions on Computer-Aided Design
[92] Cenk Tüysüz, Giuseppe Clemente, Arianna Crippa, Tobias Hartung, of Integrated Circuits and Systems 38, 7 (July 2019), 1226–1236.
Stefan Kühn, and Karl Jansen. 2023. Classical splitting of parametrized https://doi.org/10.1109/TCAD.2018.2846658
quantum circuits. Quantum Machine Intelligence 5, 2 (2023), 34.

19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy