Conversation
|
I had problems with other data and models, too. I just tested that this removed problems also from two cases using Bernoulli and beta-binomial data models. |
SteveBronder
left a comment
There was a problem hiding this comment.
I'm going to make a few changes, but this is good!
| // When the Wolfe line search rejects, don't immediately terminate. | ||
| // Instead, let the Newton loop try at least one more iteration. | ||
| // The original code compared the stale curr.obj() (which equalled | ||
| // prev.obj() after the swap in update_next_step) and would always | ||
| // terminate on ANY Wolfe rejection — even on the very first Newton | ||
| // step. Now we only declare search_failed if the full Newton step | ||
| // itself didn't improve the objective. | ||
| bool search_failed; | ||
| if (!state.wolfe_status.accept_) { | ||
| if (full_newton_obj > state.prev().obj()) { | ||
| // The full Newton step (evaluated before Wolfe ran) improved | ||
| // the objective. Re-evaluate scratch at the full Newton step | ||
| // so we can accept it as the current iterate. | ||
| scratch.eval_.alpha() = 1.0; | ||
| update_fun(scratch, state.curr(), state.prev(), scratch.eval_, | ||
| state.wolfe_info.p_); | ||
| state.curr().update(scratch); | ||
| state.wolfe_status.accept_ = true; | ||
| search_failed = false; | ||
| } else { | ||
| search_failed = true; | ||
| } | ||
| } else { | ||
| search_failed = false; | ||
| } |
There was a problem hiding this comment.
I get what this is trying to do, though I think it caught the main issue that curr can be stale, which it called out but then did not do anything about. I was being memory greedy / too clever here and reusing the memory in curr in a few places I should not have.
Though I don't disagree with what it is doing here. If we are in a loop with a small'ish step size and a problem with a weird geometry it can be worth yolo'ing a large step size to try to jump out of it. The code actually used to have something like this in it, but at the time I dismissed it. We also test alpha = 1 a few lines above and should be reusing those results here instead of rerunning update_fun. update_fun checks if the objective or gradient is NaN and reduces the step size from 1 until we get a finite objective and gradient.
I'm going to modify this, but will keep the backup newton step check.
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Summary
laplace_marginal was failing with normal-normal model, in which case Newton should get to the mode in one step.
This fix was made with Claude assistance and the sufficiently detailed prompt was based on previous detailed analysis of behavior of Stan model included.
All code changes are by Claude. Below I have also included summary of fixes by Claude. I have looked at the changes and summaries and they make sense. All existing Stan math tests for laplace_marginal pass and my Stan model example works well now, too. The code should still be carefully checked.
Checklist
By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit)make test-headers)make test-math-dependencies)make doxygen)make cpplint)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested
Previously problematic model and data
R code and data
Stan code
Claude's summary of the fixes
Bug Report:
laplace_marginal_tol()Erratically Fails to Find the ModeSymptom
When
laplace_marginal_tol()is called with a user-supplied non-zerotheta_init, the Newton optimizer sometimes:[0, 0], orThis happens across all three solvers (1, 2, 3) and is not related to
the Laplace equations themselves — when the mode is found, the marginal
density matches the analytic result.
Root Cause
The invariant
theta = Sigma * ais violated at initialization.The Newton solver operates in the
a-parameterization wheretheta = Sigma * aand maximizes the objective:In
wolfe_line_search.hpp, theWolfeDataconstructor (line ~449)unconditionally sets
a = 0regardless of the user-suppliedtheta_init:When
theta_init != 0anda = 0, the initial objective evaluates to:The prior penalty term
-0.5 * theta' * Sigma^{-1} * theta(whichequals
-0.5 * a' * thetawhena = Sigma^{-1} * theta) is entirelymissing. The initial objective is artificially inflated by
0.5 * theta_init' * Sigma^{-1} * theta_init.Why This Causes Three Failure Modes
Mode A — "Stays near initial values": When
theta_initis near thetrue mode, the first Newton step proposes a consistent
(a_new, theta_new)with
theta_new ≈ theta_init. Butobj_newincludes the prior penaltywhile
obj_initdoes not, soobj_new < obj_init. The Wolfe line searchrejects the step (
accept_ = false), andsearch_failed = truefiresimmediately. The optimizer returns after 0–1 iterations.
Mode B — "Converges to
[0, 0]": When the rejected step causesbuild_resultto returnstate.prev()witha = [0, 0], downstreamcode that recomputes
theta = Sigma * agetstheta = [0, 0].Mode C — "Sometimes finds the mode": When
theta_init ≈ 0, theinflation
0.5 * theta_init' * Sigma^{-1} * theta_initis negligible, sothe inconsistency doesn't prevent the solver from making progress.
The severity depends on the magnitude of
theta_initrelative to theprior, which varies across MCMC draws and groups — explaining the erratic
pattern.
Secondary Issue: Corrupted First-Iteration Gradient
The first-iteration search direction gradient is computed as:
With
a = 0, the-covariance * aterm vanishes, so the gradientdirection is purely likelihood-driven with no prior pull. This can cause
the Newton step to overshoot or go in a suboptimal direction even if the
line search doesn't reject outright.
Fix
Approach
Compute the consistent
a_init = Sigma^{-1} * theta_initbeforeconstructing
NewtonState, and pass it through the constructor chainso that both
aandthetaare initialized consistently.Changed Files
stan/math/mix/functor/wolfe_line_search.hppAdded a new
WolfeInfoconstructor overload that accepts botha_initand
theta_init, forwarding to the existingWolfeDataconstructorthat takes an explicit
aargument:A trailing
int /*tag*/parameter disambiguates from the existing(obj_fun, n, theta0, theta_grad_f)overload.stan/math/mix/functor/laplace_marginal_density_estimator.hppAdded a new
NewtonStateconstructor overload that accepts(theta_size, obj_fun, theta_grad_f, a_init, theta_init)andforwards to the new
WolfeInfoconstructor.In
laplace_marginal_density_est, whenInitTheta = true, compute:and construct
NewtonStatewith botha_initandtheta_init.When
InitTheta = false, the existing zero-initialization path isunchanged.
What the Fix Does NOT Change
InitTheta = falsepath (zero initialization) is identical.in the shared initialization code.
update_funlambda already correctly enforcestheta = Sigma * afor all subsequent proposals.criteria are all unchanged.
Verification
All Laplace test suites compile and pass:
laplace_marginal_lpdf_moto_test(non-zerotheta_init)laplace_types_testwolfe_line_search_testgenerate_laplace_options_testaki_ex_testlaplace_marginal_bernoulli_logit_lpmf_testlaplace_marginal_poisson_log_lpmf_testlaplace_marginal_neg_binomial_log_lpmf_testlaplace_marginal_neg_binomial_log_summary_lpmf_testOther Observations (Not Fixed Here)
During the code examination, several additional issues were noted that
are not related to the primary bug but may warrant separate attention:
llt_with_jitteraccumulates jitter. Each retry adds increasingdiagonal jitter (
1e-10to1e-5) without removing the previousjitter. The resulting
Bmatrix differs from the theoreticalB,but the log-determinant is computed from the jittered matrix without
correction.
std::once_flagmasks repeated fallbacks. The solver fallbackwarning uses
static std::once_flag, so it fires at most once perprocess lifetime. Repeated fallbacks across calls are silently
suppressed.
LU log-determinant ignores permutation sign.
LUSolver::compute_log_determinant()takeslog()of the LUdiagonal entries. If any entry is negative (possible when
Bis notpositive definite — the reason solver 3 was needed), this produces
NaN.
Absolute-only convergence tolerance.
|curr.obj - prev.obj| < tolerancehas no relative component. Forobjectives with large magnitude the threshold may trigger too easily;
for near-zero objectives it may be too loose.
block_matrix_sqrtuses.isFinite().any()instead of.all().The check
!local_block.array().isFinite().any()only catches blockswhere every element is non-finite; a mix of finite and NaN values
passes silently.
Fixes for
laplace_marginal_tol()Newton Solver ConvergenceContext
After applying the initial
a_initfix (documented inlaplace_marginal_tol_bug.md), manylog_lik_gvalues fromlaplace_marginal_tol()still disagreed with the analytic closed-formmarginal likelihood. The
a_initfix corrected the initializationinvariant (
theta = Sigma * a) but three additional bugs in the Newtonsolver prevented reliable convergence to the true mode.
All fixes are in
stan/math/mix/functor/laplace_marginal_density_estimator.hppandstan/math/mix/functor/wolfe_line_search.hpp.Fix 1: Consistent
a_initinitialization (prior fix, included for completeness)Files:
laplace_marginal_density_estimator.hpp,wolfe_line_search.hppWhen
InitTheta = true,WolfeDataunconditionally seta = 0regardless of the user-supplied
theta_init, violating the invarianttheta = Sigma * a. This inflated the initial objective by omittingthe prior penalty
-0.5 * a' * theta, causing the Wolfe line search toreject the first Newton step.
Fix: Compute
a_init = Sigma^{-1} * theta_initviacovariance.llt().solve(theta_init)and pass it through a newconstructor overload chain (
NewtonState→WolfeInfo→WolfeData)so both
aandthetaare consistent from the start.Fix 2: Wolfe zoom-phase case [4] placement
File:
wolfe_line_search.hppIn the zoom phase of the Wolfe line search, case [4] (
high = mid, forArmijo-OK but
f(mid) <= f(low)) was placed outside the Armijo-truebranch, causing it to execute unconditionally after cases [2] and [3].
This meant every zoom iteration that reached
mid.obj() <= low.obj()would collapse the bracket incorrectly.
Fix: Nest case [4] inside the
elseof themid.obj() > low.obj()check, so it only fires when Armijo holds but
f(mid) <= f(low):Fix 3: Cumulative jitter in
llt_with_jitterFile:
laplace_marginal_density_estimator.hppEach retry in the jitter loop added
jitter_tryto the diagonalwithout removing the previous jitter. After the full retry sequence,
the total perturbation was
1e-10 + 1e-9 + ... + 1e-5 ≈ 1.111e-5rather than the intended
1e-5. SinceBis passed by reference andthe jittered matrix is kept, the Cholesky factorization (and hence the
log-determinant used in the final marginal density) was computed from an
over-perturbed matrix.
Fix: Track
prev_jitterand add only the incremental differenceeach iteration, so the total diagonal perturbation is exactly
jitter_try:Fix 4:
search_failedused stalecurr.obj(), terminating on every Wolfe rejectionFile:
laplace_marginal_density_estimator.hppThe original convergence check was:
bool search_failed = (!state.wolfe_status.accept_ && state.curr().obj() <= state.prev().obj());When the Wolfe line search rejects a step (
accept_ = false), itleaves
curr_unchanged — socurr.obj()retains the stale valuefrom the previous
update_next_stepswap, which is typically equal toprev.obj(). The<= prev.obj()comparison is therefore almostalways true, making
search_failedequivalent to!accept_. Thiscaused the Newton loop to terminate immediately on any Wolfe
rejection, even when the full Newton step would have improved the
objective.
Fix: Before the Wolfe line search runs, save the full Newton step's
objective (
full_newton_obj = scratch.eval_.obj()). After a Wolferejection, check whether the full Newton step improved the objective.
If so, re-evaluate scratch at alpha = 1, accept it as the current
iterate, and let the Newton loop continue:
This is necessary because the Wolfe line search overwrites
scratchwith intermediate trial points; the saved scalar
full_newton_objisthe only reliable record of whether the full Newton step was an
improvement.
Fix 5: Absolute-only convergence tolerance
File:
laplace_marginal_density_estimator.hppThe Newton loop convergence check used only an absolute tolerance:
bool objective_converged = std::abs(curr.obj() - prev.obj()) < options.tolerance;For the Laplace objective (which can range from −50 to −200 or beyond),
an absolute threshold of
1e-6triggers trivially. The optimizer coulddeclare convergence after a single iteration if the objective happened
to change by less than
1e-6— even though the relative change wasstill significant (e.g., a change of
5e-7on an objective of−50is1e-8relative, but on an objective of−0.001would be5e-4relative).
Fix: Require both an absolute and a relative condition:
This prevents premature convergence when the objective magnitude is
large, while preserving the absolute floor for near-zero objectives.
How the Fixes Interact
a = 0withtheta ≠ 0theta_initis far from 0log|B|computed from over-perturbed matrixcurr.obj()insearch_failedBugs 1 and 4 had the largest impact: together they meant the Newton
solver would often run only 0–2 iterations for groups where
theta_initwas far from zero. Bug 1 prevented the first step from being accepted,
and bug 4 terminated the loop as soon as the line search failed to find
a point satisfying the Wolfe conditions — even when the objective was
clearly improving.
Verification
After all five fixes,
sleep4.stan(Laplace approximation) andsleep4g.stan(analytic marginal) produce matchinglog_lik_gvaluesacross all 18 groups and 4000 posterior draws.