Optimize `PreparePureStateD`#3048

Open

swernli wants to merge 1 commit intomainfrom

swernli/optimize-state-prep

Collaborator

swernli commented Mar 23, 2026 •

edited

Loading

This change introduces different structure for Std.StatePreparation.PreparePureStateD and underlying helper functions, which reduces the compile time by offloading more of the classical compute into static functions run on the full evaluator rather than handled inside the partial-evaluation layer.

In local testing, this improved QIR compilation of a large state preparation on 16 qubits from 14 seconds to 4 seconds.


          Optmizie PreparePureStateD

558f13d

This change introduces different structure for `Std.StatePreparation.PreparePureStateD` and underlying helper functions, which reduces the compile time by offloading more of the classical compute into static functions run on the full evaluator rather than handled inside the partial-evaluation layer.

fedimser self-assigned this

swernli marked this pull request as ready for review

March 31, 2026 22:05

swernli requested a review from orpuente-MS as a code owner

March 31, 2026 22:05

swernli requested a review from fedimser

March 31, 2026 22:06

github-actions bot commented Mar 31, 2026

Change in memory usage detected by benchmark.

Memory Report for `a5fb96c`

Test	This Branch	On Main	Difference
compile core + standard lib	24572242 bytes	24250814 bytes	321428 bytes

fedimser changed the title ~~Optmizie PreparePureStateD~~ Optimize PreparePureStateD

fedimser requested changes

View reviewed changes

Contributor

fedimser left a comment

I see 2 improvements: 1) Moving recursion from quantum operation to classical function. 2) Using doubles instead of complex.

My guess is that most performance gain comes from (1).
Did you evaluate how much gain comes from (2) and if it's any significant? If it's not significant, it might be better to leave it out.

Alternatively, I'd split it into 2 PRs for 2 optimizations.

library/std/src/Std/StatePreparation.qs

Comment on lines +265 to +273

+                          let abs0 = AbsD(coefficients[idxCoeff]);
+                          let abs1 = AbsD(coefficients[idxCoeff + 1]);
+                          let arg0 = coefficients[idxCoeff] < 0.0 ? PI() | 0.0;
+                          let arg1 = coefficients[idxCoeff + 1] < 0.0 ? PI() | 0.0;
+                          let r = Sqrt(abs0 * abs0 + abs1 * abs1);
+                          let t = 0.5 * (arg0 + arg1);
+                          let phi = arg1 - arg0;
+                          let theta = 2.0 * ArcTan2(abs1, abs0);
+                          (ComplexPolar(r, t), phi, theta)

Contributor

fedimser Apr 1, 2026

I think this can be factored into a function BlochSphereCoordinatesD taking 2 Doubles.

library/std/src/Std/StatePreparation.qs

-                  let coefficientsAsComplexPolar = Mapped(a -> ComplexAsComplexPolar(Complex(a, 0.0)), coefficients);
-                  ApproximatelyPreparePureStateCP(0.0, coefficientsAsComplexPolar, qubits);
+                  let nQubits = Length(qubits);
+                  // pad coefficients at tail length to a power of 2.

Contributor

fedimser Apr 1, 2026

Do we have a Q# styleguide and what does it say about comments?

Generally, when I write comments, I write them as sentences, starting with capital letter and ending with a period. I'd suggest following this pattern.

library/std/src/Std/StatePreparation.qs

+                  // Since we know the coefficients are real, we can optimize the first round of adjoint approximate unpreparation by directly
+                  // computing the disentangling angles and the new coefficients on those doubles without producing intermediate complex numbers.
+                  // For each 2D block, compute disentangling single-qubit rotation parameters

Contributor

fedimser Apr 1, 2026

What does "2D" stand for? My first thought without context is "2-dimensional", but it's probably something else.

library/std/src/Std/StatePreparation.qs

+                  }
               }
+              // Provides the sequence of angles or entangling CNOTs to apply for the multiplex Z step of the state preparation procedure, given a set of coefficients and control and target qubits.

Contributor

fedimser Apr 1, 2026

I think it's worth to call out that there is convention on how to interpret output:

Every element has 1 or 2 qubits.
If it's one qubit, it's a rotation.
If it's 2 qubits, it's CNOT, and angle is ignored.

library/std/src/Std/StatePreparation.qs

+              // Provides the sequence of angles or entangling CNOTs to apply for the multiplex Z step of the state preparation procedure, given a set of coefficients and control and target qubits.
+              function GenerateMultiplexZParams(
+                  tolerance : Double,

Contributor

fedimser Apr 1, 2026

Tolerance is not really used.
I'd suggest either not passing it or passing it and use in "termination case" to not emit the rotation gate.

library/std/src/Std/StatePreparation.qs

Comment on lines +420 to +432

+                  controlled adjoint (controlRegister, ...) {
+                      // pad coefficients length to a power of 2.
+                      let coefficientsPadded = Padded(2^(Length(control) + 1), 0.0, Padded(-2^Length(control), 0.0, coefficients));
+                      let (coefficients0, coefficients1) = MultiplexZCoefficients(coefficientsPadded);
+                      if AnyOutsideToleranceD(tolerance, coefficients1) {
+                          within {
+                              Controlled X(controlRegister, target);
+                          } apply {
+                              Adjoint ApproximatelyMultiplexZ(tolerance, coefficients1, control, target);
+                          }
+                      }
+                      Adjoint ApproximatelyMultiplexZ(tolerance, coefficients0, control, target);
+                  }

Contributor

fedimser Apr 1, 2026

I am not sure why we need explicit adjoin control and it can't be auto-generated.

library/std/src/Std/StatePreparation.qs

Comment on lines +380 to 404

+                  body ... {
+                      // We separately compute the operation sequence for the multiplex Z steps in a function, which
+                      // provides a performance improvement during partial-evaluation for code generation.
+                      let multiplexZParams = GenerateMultiplexZParams(tolerance, coefficients, control, target);
+                      for (angle, qs) in multiplexZParams {
+                          if Length(qs) == 2 {
+                              CNOT(qs[0], qs[1]);
+                          } elif AbsD(angle) > tolerance {
+                              Exp([PauliZ], angle, qs);
                           }
-                      } else {
-                          // Compute new coefficients.
-                          let (coefficients0, coefficients1) = MultiplexZCoefficients(coefficientsPadded);
-                          ApproximatelyMultiplexZ(tolerance, coefficients0, Most(control), target);
-                          if AnyOutsideToleranceD(tolerance, coefficients1) {
-                              within {
-                                  CNOT(Tail(control), target);
-                              } apply {
-                                  ApproximatelyMultiplexZ(tolerance, coefficients1, Most(control), target);
-                              }
+                      }
+                  }
+                  adjoint ... {
+                      // We separately compute the operation sequence for the adjoint multiplex Z steps in a function, which
+                      // provides a performance improvement during partial-evaluation for code generation.
+                      let adjMultiplexZParams = GenerateAdjMultiplexZParams(tolerance, coefficients, control, target);
+                      for (angle, qs) in adjMultiplexZParams {
+                          if Length(qs) == 2 {
+                              CNOT(qs[0], qs[1]);
+                          } elif AbsD(angle) > tolerance {
+                              Exp([PauliZ], -angle, qs);
                           }
                       }
                   }

Contributor

fedimser Apr 1, 2026

I think we can move checking tolerance and negating angle for adjoint case into classical functions.

Then you can have exactly the same quantum part that can be factored into separate operation to reduce code duplication.

operation ApproximatelyMultiplexZ(tolerance, coefficients, control, target) {
   body ... {
      ApplyRzCnots(GenerateMultiplexZParams(tolerance, coefficients, control, target));
   }
   adjoint ... {
      ApplyRzCnots(GenerateAdjMultiplexZParams(tolerance, coefficients, control, target));
   }
}

and then:

operation ApplyRzCnots(params: (Double, Qubit[])[]) {
   for (angle, qs) in adjMultiplexZParams {
        if Length(qs) == 2 {
            CNOT(qs[0], qs[1]);
        } else {
            Exp([PauliZ], angle, qs);
        }
    }
}

library/std/src/Std/StatePreparation.qs

Comment on lines +381 to +382

		// We separately compute the operation sequence for the multiplex Z steps in a function, which
		// provides a performance improvement during partial-evaluation for code generation.

Contributor

fedimser Apr 1, 2026

I think it's too much information for the reader especially if they don't know what partial eval is which is implementation detail of compiler.

I'd just call out that recursion in quantum operation is more expensive that in classical.

Or just don't have any comment here, it's obvious that we separate quantum and classical computation,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet