# Hardware-aware Synthesis in the Classiq Platform

### Introduction

In classical computing, physical aspects of memory management are critical for software performance. Most algorithm developers, however, do not bother themselves with such considerations and focus on simple programming. Such simplicity is allowed because industry-standard optimized libraries and automated tools carry the burden sufficiently for most use cases. It is, therefore, desirable for quantum algorithm development to benefit similarly. While quantum computers are still in the early stages, we can begin to apply analogous thinking. How? By considering the following two ingredients: quantum auxiliary (ancilla) qubits, which temporarily store intermediate computations, and the connectivity between qubits, a physical property of the memory. This blog post explains how they relate and explore this relation using Classiq’s platform to automate optimization.

### Overview

When Dmitri Maslov proposed using relative phase Toffoli gates to construct a new implementation for multi-controlled quantum gates, they showed the benefit of their method by comparing logical-level metrics against other implementations. The paper, however, states that these metrics do not consider “the connectivity pattern of the qubits. Indeed, physical space spans only three dimensions, and every qubit cannot be connected to every other qubit in a scalable fashion within a finite-dimensional space”. Maslov, an expert in quantum algorithm design, certainly recognizes the requirements imposed by the hardware. Here, we will use Classiq’s platform to explore Maslov’s remark and present an example of the effect of qubit connectivity on multi-controlled gates.

We will briefly explain multi-controlled operations and discuss the Classiq library's implementations. Next, we dig into hardware connectivity and its constraints on designing quantum circuits at the gate level. Finally, we demonstrate Classiq’s synthesis engine capabilities to select different implementations according to the hardware connectivity. This adjustment is an example of quantum circuit co-design, where low-level information from the hardware determines the circuit design at the logical level.

### Multi-controlled Quantum Operations

Multi-controlled quantum operations serve as a building block for canonical quantum algorithms, such as Grover’s and arithmetic operations. For simplicity, we will focus on multi-controlled NOT operations (MCX). Every multi-controlled arbitrary quantum operation can indeed be written using MCX gates and a single-controlled quantum operation (try this as a simple exercise).

Apart from Maslov’s implementation, different MCX gates have been proposed throughout the years and can be found in the literature and open-source libraries like Qiskit and TKet. A common pattern among these is the trade-off between the number of auxiliary qubits vs. the circuit depth and controlled not (CX) gate count (Rare examples require exponentially precise gates instead (see here), but will be avoided here for simplicity).

What are auxiliary qubits? They serve as temporary memory storage in quantum computation - they are allocated at the beginning and returned to their initial state (uncomputed) at the end. Auxiliary qubits can be necessary if they reduce other computational resources. For example, the circuit’s depth and CX gate count affect the program quality due to the resulting loss of fidelity, especially in noisy quantum devices.

Classiq’s library combines different implementations (this deserves a separate blog post) to construct a series that plays on this trade-off. The synthesis engine selects the implementation according to the constraints provided by the user.

### Hardware Connectivity

When designing a classical program, we usually don’t consider the physical device the software runs on in our design. This part is taken care of behind the scenes by automated intermediate and low-level optimizations and further tuned only by experts. An important aspect is the overhead due to physical communication between the processor (CPU) and the different memory devices: on-chip memory (cache) is smaller but closer to the CPU, whereas the main memory, the random access memory (RAM), is farther. Therefore, cache utilization is critical in determining the program's performance and sometimes carries more weight than minimizing the number of computational operations. Examples that tackle this problem are algorithmic implementations for matrix operations, compilation passes like loop blocking and unrolling, and considerations for allocating memory on the stack vs. on the heap.

Can we find a quantum analog? While quantum computers are premature for complex memory architectures, we can view the auxiliary qubits as memory. To explain this analogy further, we need to discuss the concept of routing as part of the compilation process.

In many quantum computing architectures (like superconducting qubits), the qubits are not directly connected. For example, the connectivity diagram of an IBM Eagle processor extracted from Classiq’s platform is shown below. This diagram is formed by vertically stacking lines of 14-15 qubits and connecting them via alternating 4-qubit layers.

Now, suppose that the logical circuit gates (for example, CX) are between two non-neighboring qubits, like the two marked in blue. To perform the gate operation, we introduce SWAP gates along a selected path that connects these qubits to route the information to the target qubit.

The SWAP gate operation is expensive: every SWAP gate accounts for three CX gates! Hence, optimizing the routing stage for a given logical circuit is extremely important for performance. Unfortunately, routing optimization is an NP-hard problem, so we cannot compute the number of CX gates in the compiled circuit in advance. Some heuristical routing algorithms have been developed and are available in the literature and open-source libraries.

### Hardware-aware Synthesis with Classiq

Now, we can ask the following question: due to routing, can we have diminishing returns using auxiliary qubits? More precisely, given an implementation of an MCX gate with auxiliaries that aims at reducing the CX gate count as much as possible, will this implementation still be considered the most optimal for hardware that is not fully connected, or will the routing overhead make auxiliary usage redundant?

Let’s ask Classiq’s synthesis engine to construct a 30-control qubit MCX gate to answer this question. We tune the basis gates to CX and U and optimize the CX gate count. You can try it yourself:

- First, we synthesize for all-to-all connectivity.

As expected, the engine chose a Maslov-like implementation with the maximal number of auxiliary qubits. The resulting circuit requires 45 qubits: 31 functional qubits, split into 30 control qubits and a single target qubit, and 14 auxiliary qubits. - Then, we generate the same MCX gate for a circuit with a 1-D nearest neighbor connectivity pattern of 50 qubits.

The resulting logical-level circuit now uses 42 qubits only. Indeed, using too many auxiliaries results in routing overhead bigger than the gain obtained from temporarily storing the quantum computations.

Note 1: for the second case, the 42 qubits represent the logical-level circuit. The number of qubits might increase after applying the routing algorithm. The critical distinction here is that the *logical-level implementation* is changed, resulting in a better circuit when using Classiq.

Note 2: This is not the last word, and we might find better implementations in the future. In addition, other considerations, such as the gate-level optimizations applied during the process, can also affect the final circuit. Nevertheless, this example highlights the diminishing returns behavior of adding too many auxiliaries, which is addressed using Classiq’s automated synthesis engine and is hard to keep track of naively.

As a challenge, you can try reproducing this behavior using real hardware connectivity instead of 1-D nearest neighbor. It’s not easy to find such an example for currently available architectures. Still, it will likely be more relevant the more complex they become, especially when considering multi-core architectures. Can you think of other quantum building blocks exhibiting qubit count vs. CX count trade-off?

### Conclusion

When designing quantum algorithms, including the hardware properties in performance considerations is important. Using Classiq’s synthesis engine allows developers to channel their efforts into the algorithmic structure of the model rather than tackling the intricacies of gate-level optimizations. The need for automated tools like Classiq will only increase as quantum circuits evolve to commercial-grade complexity and include many building blocks, each exhibiting hardware dependent trade-offs. For an overview of Classiq’s synthesis engine, see parts I and II.

As a final remark, qubit connectivity is just one of many possible constraints that can inspire the use of codesign for performance improvements. For example, one can adjust CX layers in a circuit to perform better error mitigation (see here). Other use cases might consider the limitations in classical control over the qubits.

### Introduction

In classical computing, physical aspects of memory management are critical for software performance. Most algorithm developers, however, do not bother themselves with such considerations and focus on simple programming. Such simplicity is allowed because industry-standard optimized libraries and automated tools carry the burden sufficiently for most use cases. It is, therefore, desirable for quantum algorithm development to benefit similarly. While quantum computers are still in the early stages, we can begin to apply analogous thinking. How? By considering the following two ingredients: quantum auxiliary (ancilla) qubits, which temporarily store intermediate computations, and the connectivity between qubits, a physical property of the memory. This blog post explains how they relate and explore this relation using Classiq’s platform to automate optimization.

### Overview

When Dmitri Maslov proposed using relative phase Toffoli gates to construct a new implementation for multi-controlled quantum gates, they showed the benefit of their method by comparing logical-level metrics against other implementations. The paper, however, states that these metrics do not consider “the connectivity pattern of the qubits. Indeed, physical space spans only three dimensions, and every qubit cannot be connected to every other qubit in a scalable fashion within a finite-dimensional space”. Maslov, an expert in quantum algorithm design, certainly recognizes the requirements imposed by the hardware. Here, we will use Classiq’s platform to explore Maslov’s remark and present an example of the effect of qubit connectivity on multi-controlled gates.

We will briefly explain multi-controlled operations and discuss the Classiq library's implementations. Next, we dig into hardware connectivity and its constraints on designing quantum circuits at the gate level. Finally, we demonstrate Classiq’s synthesis engine capabilities to select different implementations according to the hardware connectivity. This adjustment is an example of quantum circuit co-design, where low-level information from the hardware determines the circuit design at the logical level.

### Multi-controlled Quantum Operations

Multi-controlled quantum operations serve as a building block for canonical quantum algorithms, such as Grover’s and arithmetic operations. For simplicity, we will focus on multi-controlled NOT operations (MCX). Every multi-controlled arbitrary quantum operation can indeed be written using MCX gates and a single-controlled quantum operation (try this as a simple exercise).

Apart from Maslov’s implementation, different MCX gates have been proposed throughout the years and can be found in the literature and open-source libraries like Qiskit and TKet. A common pattern among these is the trade-off between the number of auxiliary qubits vs. the circuit depth and controlled not (CX) gate count (Rare examples require exponentially precise gates instead (see here), but will be avoided here for simplicity).

What are auxiliary qubits? They serve as temporary memory storage in quantum computation - they are allocated at the beginning and returned to their initial state (uncomputed) at the end. Auxiliary qubits can be necessary if they reduce other computational resources. For example, the circuit’s depth and CX gate count affect the program quality due to the resulting loss of fidelity, especially in noisy quantum devices.

Classiq’s library combines different implementations (this deserves a separate blog post) to construct a series that plays on this trade-off. The synthesis engine selects the implementation according to the constraints provided by the user.

### Hardware Connectivity

When designing a classical program, we usually don’t consider the physical device the software runs on in our design. This part is taken care of behind the scenes by automated intermediate and low-level optimizations and further tuned only by experts. An important aspect is the overhead due to physical communication between the processor (CPU) and the different memory devices: on-chip memory (cache) is smaller but closer to the CPU, whereas the main memory, the random access memory (RAM), is farther. Therefore, cache utilization is critical in determining the program's performance and sometimes carries more weight than minimizing the number of computational operations. Examples that tackle this problem are algorithmic implementations for matrix operations, compilation passes like loop blocking and unrolling, and considerations for allocating memory on the stack vs. on the heap.

Can we find a quantum analog? While quantum computers are premature for complex memory architectures, we can view the auxiliary qubits as memory. To explain this analogy further, we need to discuss the concept of routing as part of the compilation process.

In many quantum computing architectures (like superconducting qubits), the qubits are not directly connected. For example, the connectivity diagram of an IBM Eagle processor extracted from Classiq’s platform is shown below. This diagram is formed by vertically stacking lines of 14-15 qubits and connecting them via alternating 4-qubit layers.

Now, suppose that the logical circuit gates (for example, CX) are between two non-neighboring qubits, like the two marked in blue. To perform the gate operation, we introduce SWAP gates along a selected path that connects these qubits to route the information to the target qubit.

The SWAP gate operation is expensive: every SWAP gate accounts for three CX gates! Hence, optimizing the routing stage for a given logical circuit is extremely important for performance. Unfortunately, routing optimization is an NP-hard problem, so we cannot compute the number of CX gates in the compiled circuit in advance. Some heuristical routing algorithms have been developed and are available in the literature and open-source libraries.

### Hardware-aware Synthesis with Classiq

Now, we can ask the following question: due to routing, can we have diminishing returns using auxiliary qubits? More precisely, given an implementation of an MCX gate with auxiliaries that aims at reducing the CX gate count as much as possible, will this implementation still be considered the most optimal for hardware that is not fully connected, or will the routing overhead make auxiliary usage redundant?

Let’s ask Classiq’s synthesis engine to construct a 30-control qubit MCX gate to answer this question. We tune the basis gates to CX and U and optimize the CX gate count. You can try it yourself:

- First, we synthesize for all-to-all connectivity.

As expected, the engine chose a Maslov-like implementation with the maximal number of auxiliary qubits. The resulting circuit requires 45 qubits: 31 functional qubits, split into 30 control qubits and a single target qubit, and 14 auxiliary qubits. - Then, we generate the same MCX gate for a circuit with a 1-D nearest neighbor connectivity pattern of 50 qubits.

The resulting logical-level circuit now uses 42 qubits only. Indeed, using too many auxiliaries results in routing overhead bigger than the gain obtained from temporarily storing the quantum computations.

Note 1: for the second case, the 42 qubits represent the logical-level circuit. The number of qubits might increase after applying the routing algorithm. The critical distinction here is that the *logical-level implementation* is changed, resulting in a better circuit when using Classiq.

Note 2: This is not the last word, and we might find better implementations in the future. In addition, other considerations, such as the gate-level optimizations applied during the process, can also affect the final circuit. Nevertheless, this example highlights the diminishing returns behavior of adding too many auxiliaries, which is addressed using Classiq’s automated synthesis engine and is hard to keep track of naively.

As a challenge, you can try reproducing this behavior using real hardware connectivity instead of 1-D nearest neighbor. It’s not easy to find such an example for currently available architectures. Still, it will likely be more relevant the more complex they become, especially when considering multi-core architectures. Can you think of other quantum building blocks exhibiting qubit count vs. CX count trade-off?

### Conclusion

When designing quantum algorithms, including the hardware properties in performance considerations is important. Using Classiq’s synthesis engine allows developers to channel their efforts into the algorithmic structure of the model rather than tackling the intricacies of gate-level optimizations. The need for automated tools like Classiq will only increase as quantum circuits evolve to commercial-grade complexity and include many building blocks, each exhibiting hardware dependent trade-offs. For an overview of Classiq’s synthesis engine, see parts I and II.

As a final remark, qubit connectivity is just one of many possible constraints that can inspire the use of codesign for performance improvements. For example, one can adjust CX layers in a circuit to perform better error mitigation (see here). Other use cases might consider the limitations in classical control over the qubits.

## About "The Qubit Guy's Podcast"

Hosted by The Qubit Guy (Yuval Boger, our Chief Marketing Officer), the podcast hosts thought leaders in quantum computing to discuss business and technical questions that impact the quantum computing ecosystem. Our guests provide interesting insights about quantum computer software and algorithm, quantum computer hardware, key applications for quantum computing, market studies of the quantum industry and more.

If you would like to suggest a guest for the podcast, please contact us.