fix: gracefully handle missing AKS cluster during postprovision hook#7501
fix: gracefully handle missing AKS cluster during postprovision hook#7501jongio wants to merge 7 commits intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the AKS service target’s lifecycle hook behavior so that postprovision no longer fails fatally when an AKS cluster isn’t present yet (common in multi-phase provisioning), while keeping predeploy strict to avoid masking real deployment-time issues.
Changes:
- Add
postprovision-specific “graceful skip” behavior insetK8sContext()when the AKS target resource is missing/empty or Kubernetes context setup fails. - Extract a
skipPostprovisionK8sSetup()helper to consolidate the warning-and-continue logic. - Add new AKS service target tests covering postprovision skip scenarios and ensuring predeploy still fails when appropriate.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| cli/azd/pkg/project/service_target_aks.go | Implements postprovision-specific skip logic and a helper for warning/continue behavior. |
| cli/azd/pkg/project/service_target_aks_test.go | Adds table-driven tests for postprovision skipping, happy-path coverage, and a predeploy regression guard. |
71d1845 to
8383c6e
Compare
wbreza
left a comment
There was a problem hiding this comment.
Code Review — PR #7501
Nice work on this feature! The graceful degradation approach is well-designed: separating postprovision (skip gracefully) from predeploy (fail fatally) is a clean pattern. The test coverage is thorough and the context cancellation propagation fix is spot-on. A few suggestions below to harden edge cases and improve consistency.
✅ What Looks Good
- Excellent test coverage with 5 new tests covering all major skip paths, happy path, regression guard, and context cancellation
- Clean helper extraction (
postprovisionK8sError/skipPostprovisionK8sSetup) with clear separation of concerns - Proper context cancellation handling via
ctx.Err()check - Smart use of constant composition (
"post" + string(ProjectEventProvision)) to stay in sync with the project-level event name
Findings Summary
| Priority | Count |
|---|---|
| High | 1 |
| Medium | 4 |
| Low | 1 |
| Total | 6 |
See inline comments for details.
8383c6e to
f0d2c68
Compare
f0d2c68 to
4ff53a7
Compare
When infrastructure doesn't include AKS resources, the postprovision lifecycle hook in the AKS service target fails fatally trying to set up the Kubernetes context. This is expected in multi-phase provisioning workflows where AKS gets provisioned later. Modified setK8sContext() to detect the postprovision event and skip gracefully (with a user-visible warning) instead of failing when: - GetTargetResource returns an error (resource not found) - Target resource has an empty name (SupportsDelayedProvisioning) - Cluster credentials are unavailable (ensureClusterContext fails) - Namespace creation fails (ensureNamespace fails) The predeploy event path is unchanged and still fails fatally, ensuring deployment-time errors are not masked. Extracted skipPostprovisionK8sSetup() helper to eliminate repeated warning/log/return-nil pattern across all four skip points. Fixes Azure#3272 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- skipPostprovisionK8sSetup now checks ctx.Err() before returning nil to avoid swallowing context cancellation/timeouts (Ctrl+C) - Added Test_Postprovision_Skips_When_Namespace_Fails covering the ensureNamespace failure path during postprovision Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ation test - Replace magic string 'postprovision' with typed postProvisionEvent constant - Extract postprovisionK8sError() helper to eliminate 4 identical if-blocks - Add nil-guard on targetResource before calling ResourceName() - Fix log.Printf double newline and redundant .Error() call - Add Test_Postprovision_Propagates_Context_Cancellation to cover the ctx.Err() safety valve in skipPostprovisionK8sSetup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Route targetResource==nil through event-aware postprovisionK8sError so predeploy doesn't silently skip (thread #1, High) - Add preDeployEvent constant replacing raw "predeploy" strings for consistency with postProvisionEvent (thread #2) - Add console message assertions to graceful-skip tests verifying the warning was actually emitted (thread #3) - Add predeploy failure tests for credential and namespace error paths confirming errors propagate for non-postprovision events (thread #4) - Refactor createAksServiceTarget to delegate to createAksServiceTargetWithResourceManager, eliminating ~40 lines of duplication (thread #5) - Add structured telemetry span (AksPostprovisionSkipEvent) to the graceful-skip path for production monitoring (thread Azure#6) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4ff53a7 to
f01ea08
Compare
wbreza
left a comment
There was a problem hiding this comment.
Code Re-Review — PR #7501
All 6 findings from the previous review have been addressed — nicely done! The nil-guard fix, event constants, console assertions, predeploy failure tests, helper DRY refactor, and structured telemetry are all clean. A few minor suggestions below.
Prior Review Regression Check
| # | Prior Finding | Status |
|---|---|---|
| 1 | [High] targetResource==nil bypasses event guard | ✅ Resolved |
| 2 | [Medium] Inconsistent event constants | ✅ Resolved |
| 3 | [Medium] Tests don't verify skip behavior | ✅ Resolved |
| 4 | [Medium] Missing predeploy failure tests | ✅ Resolved |
| 5 | [Medium] Test helper duplication | ✅ Resolved |
| 6 | [Low] No structured telemetry | ✅ Resolved |
✅ What Looks Good
preDeployEventfollows the same composition pattern aspostProvisionEvent— correct and consistent- DRY refactor of
createAksServiceTargeteliminates ~40 lines of duplication - Dead code cleanup (
mockUserConfigManagerremoved) - Console output assertions validate the skip path was actually taken
- New predeploy failure tests close the regression gap
New Findings Summary
| Priority | Count |
|---|---|
| Medium | 1 |
| Low | 2 |
| Total | 3 |
See inline comments for details.
- Use span.End() with skip.reason attribute instead of EndWithStatus() to avoid marking intentional skips as errors in telemetry dashboards - Extract assertSkipWarningEmitted helper to deduplicate console assertion blocks across two test functions - Add ErrorContains assertion in Test_Predeploy_Fails_When_Credentials_Fail for consistency with sibling namespace test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
wbreza
left a comment
There was a problem hiding this comment.
Code Re-Review — PR #7501
fix: gracefully handle missing AKS cluster during postprovision hook by @jongio
Re-review (3rd round)
Latest commit (3c75aaa) addresses all 3 findings from my Apr 7 re-review:
| # | Priority | Finding | Status |
|---|---|---|---|
| 1 | Medium | Telemetry: EndWithStatus() marks skips as errors |
✅ Fixed — span.End() with skip.reason attribute |
| 2 | Low | DRY: duplicate console assertion blocks | ✅ Fixed — extracted assertSkipWarningEmitted |
| 3 | Low | Consistency: missing ErrorContains assertion |
✅ Fixed |
Cumulative Review Status (all rounds)
| Round | Findings | Status |
|---|---|---|
| Round 1 (6 findings) | nil-guard, event constants, console assertions, predeploy tests, helper DRY, telemetry | All ✅ |
| Round 2 (3 findings) | telemetry span, DRY assertions, ErrorContains | All ✅ |
| Total: 9 findings | All resolved |
✅ What Looks Good
- Clean separation of postprovision (graceful skip) vs predeploy (fail fatally)
- Proper context cancellation handling with
ctx.Err()check postProvisionEvent/preDeployEventconstants follow composition pattern- 5 new tests covering all skip paths + regression guards
createAksServiceTargethelper eliminates ~40 lines of test duplication
Overall Assessment: Approve — all 9 cumulative findings resolved across 3 review rounds.
Review performed with GitHub Copilot CLI
Address @weikanglim's feedback: limit the graceful skip path to the explicit delayed-provisioning case (empty ResourceName) only. GetTargetResource, ensureClusterContext, and ensureNamespace errors now propagate during postprovision — real failures are no longer masked. Removed postprovisionK8sError helper. Fixed gofmt import ordering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
wbreza
left a comment
There was a problem hiding this comment.
Code Re-Review — PR #7501 (Round 5)
Prior Review Regression Check — All 13 Findings Resolved
| Round | Reviewer | Findings | Status |
|---|---|---|---|
| 1 | copilot-bot | Context cancellation, warning message, namespace test | All ✅ |
| 2 | wbreza | nil guard, event constants, skip assertions, predeploy tests, helper DRY, telemetry | All ✅ |
| 3 | wbreza | span error status, DRY assertions, ErrorContains | All ✅ |
| 4 | wbreza | APPROVED (9/9 resolved) | ✅ |
| 5 | weikanglim | Graceful skip too broad — narrow to explicit not-provisioned case | ✅ Resolved |
The weikanglim feedback was the most significant change — correctly narrowed the graceful skip from "all postprovision errors" to only ResourceName() == "". The refactor removed postprovisionK8sError entirely and ensures credential, RBAC, and namespace failures propagate even during postprovision. New tests confirm this.
New Findings Summary
| Priority | Count |
|---|---|
| Medium | 1 |
| Low | 1 |
| Total | 2 |
See inline comments for details.
✅ What Looks Good
- Clean separation: postprovision graceful skip ONLY for
ResourceName() == ""— proper defense-in-depth - Proper context cancellation handling with
ctx.Err()check before any skip work - Telemetry span with
skip.reasonattribute (not error status) — no dashboard noise - 11 tests covering all paths: graceful skip, error propagation (3 paths x 2 events), happy path, context cancellation, predeploy regression
- Modern Go:
t.Context(),t.TempDir(),ostest.Chdir() - DRY
createAksServiceTargetWithResourceManagerwith delegation pattern
Overall Assessment: Approve — all 13 cumulative findings resolved across 5 review rounds. Code is clean and well-tested.
Review performed with GitHub Copilot CLI
Simplify Test_Postprovision_GracefulSkip from table-driven structure to a flat test. The cases that originally lived here (credential failure, GetTargetResource error) were moved to standalone tests that verify error propagation, leaving a single case that doesn't benefit from the table wrapper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Summary
Fixes #3272
When infrastructure doesn't include AKS resources, the
postprovisionlifecycle hook in the AKS service target fails fatally trying to set up the Kubernetes context. This is expected in multi-phase provisioning workflows where AKS gets provisioned later.Changes
cli/azd/pkg/project/service_target_aks.gosetK8sContext()to detect thepostprovisionevent and skip gracefully (with a user-visible warning) instead of failing only when the target resource has an empty name (viaSupportsDelayedProvisioning) - the signal that the AKS cluster hasn't been provisioned yet in a multi-phase workflowskipPostprovisionK8sSetup()helper with context cancellation propagation and structured telemetry (aks.postprovision.skipspan withskip.reasonattribute)predeployevent path is unchanged and still fails fatally - deployment-time errors are not maskedcli/azd/pkg/project/service_target_aks_test.goTest_Postprovision_GracefulSkipverifying graceful skip whenResourceName()is emptyTest_Postprovision_Skips_When_Namespace_Fails- namespace creation error during postprovision (NOTE: after weikanglim refactor, this now correctly propagates as an error)Test_Postprovision_Succeeds_When_Cluster_Available- happy path verifying normal flow when cluster existsTest_Predeploy_Still_Fails_When_Cluster_Not_Found- regression guard ensuring predeploy still errorsTest_Predeploy_Fails_When_Credentials_FailandTest_Predeploy_Fails_When_Namespace_Fails- regression guards for downstream error propagationcreateAksServiceTargetWithResourceManagerhelper;createAksServiceTargetnow delegates to itassertSkipWarningEmittedhelper for DRY console output assertionsTesting
All AKS-related tests pass (3 existing + 8 new), zero regressions.