Skip to content

Optimize PreparePureStateD#3048

Open
swernli wants to merge 1 commit intomainfrom
swernli/optimize-state-prep
Open

Optimize PreparePureStateD#3048
swernli wants to merge 1 commit intomainfrom
swernli/optimize-state-prep

Conversation

@swernli
Copy link
Copy Markdown
Collaborator

@swernli swernli commented Mar 23, 2026

This change introduces different structure for Std.StatePreparation.PreparePureStateD and underlying helper functions, which reduces the compile time by offloading more of the classical compute into static functions run on the full evaluator rather than handled inside the partial-evaluation layer.

In local testing, this improved QIR compilation of a large state preparation on 16 qubits from 14 seconds to 4 seconds.

This change introduces different structure for `Std.StatePreparation.PreparePureStateD` and underlying helper functions, which reduces the compile time by offloading more of the classical compute into static functions run on the full evaluator rather than handled inside the partial-evaluation layer.
@fedimser fedimser self-assigned this Mar 30, 2026
@swernli swernli marked this pull request as ready for review March 31, 2026 22:05
@swernli swernli requested a review from orpuente-MS as a code owner March 31, 2026 22:05
@swernli swernli requested a review from fedimser March 31, 2026 22:06
@github-actions
Copy link
Copy Markdown

Change in memory usage detected by benchmark.

Memory Report for a5fb96c

Test This Branch On Main Difference
compile core + standard lib 24572242 bytes 24250814 bytes 321428 bytes

@fedimser fedimser changed the title Optmizie PreparePureStateD Optimize PreparePureStateD Mar 31, 2026
Copy link
Copy Markdown
Contributor

@fedimser fedimser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 2 improvements: 1) Moving recursion from quantum operation to classical function. 2) Using doubles instead of complex.

My guess is that most performance gain comes from (1).
Did you evaluate how much gain comes from (2) and if it's any significant? If it's not significant, it might be better to leave it out.

Alternatively, I'd split it into 2 PRs for 2 optimizations.

Comment on lines +265 to +273
let abs0 = AbsD(coefficients[idxCoeff]);
let abs1 = AbsD(coefficients[idxCoeff + 1]);
let arg0 = coefficients[idxCoeff] < 0.0 ? PI() | 0.0;
let arg1 = coefficients[idxCoeff + 1] < 0.0 ? PI() | 0.0;
let r = Sqrt(abs0 * abs0 + abs1 * abs1);
let t = 0.5 * (arg0 + arg1);
let phi = arg1 - arg0;
let theta = 2.0 * ArcTan2(abs1, abs0);
(ComplexPolar(r, t), phi, theta)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be factored into a function BlochSphereCoordinatesD taking 2 Doubles.

let coefficientsAsComplexPolar = Mapped(a -> ComplexAsComplexPolar(Complex(a, 0.0)), coefficients);
ApproximatelyPreparePureStateCP(0.0, coefficientsAsComplexPolar, qubits);
let nQubits = Length(qubits);
// pad coefficients at tail length to a power of 2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a Q# styleguide and what does it say about comments?

Generally, when I write comments, I write them as sentences, starting with capital letter and ending with a period. I'd suggest following this pattern.

// Since we know the coefficients are real, we can optimize the first round of adjoint approximate unpreparation by directly
// computing the disentangling angles and the new coefficients on those doubles without producing intermediate complex numbers.

// For each 2D block, compute disentangling single-qubit rotation parameters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "2D" stand for? My first thought without context is "2-dimensional", but it's probably something else.

}
}

// Provides the sequence of angles or entangling CNOTs to apply for the multiplex Z step of the state preparation procedure, given a set of coefficients and control and target qubits.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth to call out that there is convention on how to interpret output:

  • Every element has 1 or 2 qubits.
  • If it's one qubit, it's a rotation.
  • If it's 2 qubits, it's CNOT, and angle is ignored.


// Provides the sequence of angles or entangling CNOTs to apply for the multiplex Z step of the state preparation procedure, given a set of coefficients and control and target qubits.
function GenerateMultiplexZParams(
tolerance : Double,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tolerance is not really used.
I'd suggest either not passing it or passing it and use in "termination case" to not emit the rotation gate.

Comment on lines +420 to +432
controlled adjoint (controlRegister, ...) {
// pad coefficients length to a power of 2.
let coefficientsPadded = Padded(2^(Length(control) + 1), 0.0, Padded(-2^Length(control), 0.0, coefficients));
let (coefficients0, coefficients1) = MultiplexZCoefficients(coefficientsPadded);
if AnyOutsideToleranceD(tolerance, coefficients1) {
within {
Controlled X(controlRegister, target);
} apply {
Adjoint ApproximatelyMultiplexZ(tolerance, coefficients1, control, target);
}
}
Adjoint ApproximatelyMultiplexZ(tolerance, coefficients0, control, target);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need explicit adjoin control and it can't be auto-generated.

Comment on lines +380 to 404
body ... {
// We separately compute the operation sequence for the multiplex Z steps in a function, which
// provides a performance improvement during partial-evaluation for code generation.
let multiplexZParams = GenerateMultiplexZParams(tolerance, coefficients, control, target);
for (angle, qs) in multiplexZParams {
if Length(qs) == 2 {
CNOT(qs[0], qs[1]);
} elif AbsD(angle) > tolerance {
Exp([PauliZ], angle, qs);
}
} else {
// Compute new coefficients.
let (coefficients0, coefficients1) = MultiplexZCoefficients(coefficientsPadded);
ApproximatelyMultiplexZ(tolerance, coefficients0, Most(control), target);
if AnyOutsideToleranceD(tolerance, coefficients1) {
within {
CNOT(Tail(control), target);
} apply {
ApproximatelyMultiplexZ(tolerance, coefficients1, Most(control), target);
}
}
}

adjoint ... {
// We separately compute the operation sequence for the adjoint multiplex Z steps in a function, which
// provides a performance improvement during partial-evaluation for code generation.
let adjMultiplexZParams = GenerateAdjMultiplexZParams(tolerance, coefficients, control, target);
for (angle, qs) in adjMultiplexZParams {
if Length(qs) == 2 {
CNOT(qs[0], qs[1]);
} elif AbsD(angle) > tolerance {
Exp([PauliZ], -angle, qs);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move checking tolerance and negating angle for adjoint case into classical functions.

Then you can have exactly the same quantum part that can be factored into separate operation to reduce code duplication.

operation ApproximatelyMultiplexZ(tolerance, coefficients, control, target) {
   body ... {
      ApplyRzCnots(GenerateMultiplexZParams(tolerance, coefficients, control, target));
   }
   adjoint ... {
      ApplyRzCnots(GenerateAdjMultiplexZParams(tolerance, coefficients, control, target));
   }
}

and then:

operation ApplyRzCnots(params: (Double, Qubit[])[]) {
   for (angle, qs) in adjMultiplexZParams {
        if Length(qs) == 2 {
            CNOT(qs[0], qs[1]);
        } else {
            Exp([PauliZ], angle, qs);
        }
    }
} 

Comment on lines +381 to +382
// We separately compute the operation sequence for the multiplex Z steps in a function, which
// provides a performance improvement during partial-evaluation for code generation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's too much information for the reader especially if they don't know what partial eval is which is implementation detail of compiler.

I'd just call out that recursion in quantum operation is more expensive that in classical.

Or just don't have any comment here, it's obvious that we separate quantum and classical computation,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants