Conversation
…sis specific to this scenario.
…ion app - Tidying up the test app.
There was a problem hiding this comment.
Pull request overview
This PR adds a new plans/ workspace containing markdown-formatted research artifacts on async performance issues in Microsoft.Data.SqlClient, along with a new repro app project that’s included in the main solution.
Changes:
- Introduces
plans/async-performance/research docs (root causes, issue inventory, prioritized recommendations, and detailed fix breakdowns). - Adds a new
ThreadStarvationrepro console app underplans/async-performance/apps/and includes it insrc/Microsoft.Data.SqlClient.slnx. - Adds/updates formatting and style guidance for markdown plan documents and C# coding style.
Reviewed changes
Copilot reviewed 72 out of 72 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient.slnx | Adds the ThreadStarvation repro app project to the solution. |
| policy/coding-style.md | Updates style guidance (wrapping rules; mandates braces for conditionals/loops). |
| plans/FORMATTING.md | Defines markdown formatting + lint workflow for plans/. |
| plans/.markdownlint.jsonc | Adds markdownlint configuration for plans/. |
| plans/async-performance/README.md | High-level index + key findings for async performance research. |
| plans/async-performance/issue-summary.md | Categorized inventory of relevant async performance issues/discussions. |
| plans/async-performance/root-causes.md | Narrative root-cause analysis of async performance problems. |
| plans/async-performance/recommendations.md | Prioritized recommendations and sequencing guidance. |
| plans/async-performance/connection-pool-redesign.md | Summary of ChannelDbConnectionPool design/progress. |
| plans/async-performance/executive-summary-feature-42261.md | Executive summary / delivery plan for Unix thread starvation feature. |
| plans/async-performance/apps/ThreadStarvation/ThreadStarvation.csproj | New repro console app project definition. |
| plans/async-performance/apps/ThreadStarvation/EntryPoint.cs | CLI parsing/dispatch to the repro harness. |
| plans/async-performance/apps/ThreadStarvation/SqlClientEventListener.cs | EventSource listener for SqlClient diagnostics output. |
| plans/async-performance/01-connection-pool/summary.md | P1 summary for completing ChannelDbConnectionPool. |
| plans/async-performance/01-connection-pool/priority-rationale.md | Rationale for ranking pool work as top priority. |
| plans/async-performance/01-connection-pool/01-factory-routing.md | Proposed fix plan for factory routing to pool v2. |
| plans/async-performance/01-connection-pool/02-lifecycle.md | Proposed fix plan for startup/shutdown lifecycle. |
| plans/async-performance/01-connection-pool/03-clear.md | Proposed fix plan for implementing Clear(). |
| plans/async-performance/01-connection-pool/04-warmup.md | Proposed fix plan for warmup behavior. |
| plans/async-performance/01-connection-pool/05-pruning.md | Proposed fix plan for pruning behavior. |
| plans/async-performance/01-connection-pool/06-transactions.md | Proposed fix plan for transaction enlistment support. |
| plans/async-performance/01-connection-pool/07-error-handling.md | Proposed fix plan for error handling/backoff. |
| plans/async-performance/01-connection-pool/08-rate-limiting.md | Proposed fix plan for opt-in connection creation rate limiting. |
| plans/async-performance/01-connection-pool/09-metrics.md | Proposed fix plan for EventSource/metrics integration. |
| plans/async-performance/01-connection-pool/10-integration-tests.md | Proposed plan for integration test coverage of pool v2. |
| plans/async-performance/02-tds-async-reads/summary.md | P2 summary for addressing snapshot/replay overhead. |
| plans/async-performance/02-tds-async-reads/priority-rationale.md | Rationale for ranking TDS async reads as P2. |
| plans/async-performance/02-tds-async-reads/01-continue-points.md | Proposed “continue points” expansion approach. |
| plans/async-performance/02-tds-async-reads/02-plp-streaming.md | Proposed PLP resume/streaming approach to avoid replay. |
| plans/async-performance/02-tds-async-reads/03-snapshot-buffer-pool.md | Proposed snapshot buffer pooling approach. |
| plans/async-performance/02-tds-async-reads/04-async-no-snapshot.md | Proposed “async without snapshot” for sequential access. |
| plans/async-performance/02-tds-async-reads/05-state-machine-parser.md | Long-term state machine / pipelines-based parser strategy. |
| plans/async-performance/03-async-transactions/summary.md | P3 summary for true async transaction APIs. |
| plans/async-performance/03-async-transactions/priority-rationale.md | Rationale for ranking async transactions as P3. |
| plans/async-performance/03-async-transactions/01-async-tds-transaction.md | Proposed async transaction manager request in TdsParser. |
| plans/async-performance/03-async-transactions/02-async-commit-rollback.md | Proposed overrides for CommitAsync/RollbackAsync. |
| plans/async-performance/03-async-transactions/03-async-begin-transaction.md | Proposed BeginTransactionAsync override guidance. |
| plans/async-performance/03-async-transactions/04-deferred-begin.md | Proposed deferred BEGIN TRANSACTION optimization. |
| plans/async-performance/04-async-sni-opens/summary.md | P4 summary for end-to-end async connection opening. |
| plans/async-performance/04-async-sni-opens/priority-rationale.md | Rationale for ranking async SNI opens as P4. |
| plans/async-performance/04-async-sni-opens/01-async-managed-connect.md | Proposed async factory/connect for managed SNI TCP. |
| plans/async-performance/04-async-sni-opens/02-async-ssl.md | Proposed async TLS handshake wiring. |
| plans/async-performance/04-async-sni-opens/03-async-prelogin.md | Proposed async prelogin handshake wiring. |
| plans/async-performance/04-async-sni-opens/04-async-login.md | Proposed async login plan. |
| plans/async-performance/04-async-sni-opens/05-async-open-pipeline.md | Proposed end-to-end async open orchestration. |
| plans/async-performance/05-allocation-reduction/summary.md | P5 summary for allocation reduction opportunities. |
| plans/async-performance/05-allocation-reduction/priority-rationale.md | Rationale for ranking allocation reduction as P5. |
| plans/async-performance/05-allocation-reduction/01-tds-buffer-pool.md | Proposed pooling for in/out TDS buffers. |
| plans/async-performance/05-allocation-reduction/02-snapshot-buffer-pool.md | Cross-reference for pooling snapshot packet data. |
| plans/async-performance/05-allocation-reduction/03-cancellation-optimization.md | Proposed CancellationToken registration optimizations. |
| plans/async-performance/05-allocation-reduction/04-setchars-buffer-pool.md | Proposed pooling for TVP char buffers. |
| plans/async-performance/05-allocation-reduction/05-semaphore-optimization.md | Proposed replacing ConcurrentQueueSemaphore allocation pattern. |
| plans/async-performance/06-packet-locking/summary.md | P6 summary for managed SNI packet locking redesign. |
| plans/async-performance/06-packet-locking/priority-rationale.md | Rationale for ranking packet locking as P6. |
| plans/async-performance/06-packet-locking/01-tcp-async-locks.md | Proposed SemaphoreSlim locking for SniTcpHandle. |
| plans/async-performance/06-packet-locking/02-np-async-locks.md | Proposed SemaphoreSlim locking for SniNpHandle. |
| plans/async-performance/06-packet-locking/03-mars-send-flow.md | Proposed MARS send flow control modernization. |
| plans/async-performance/06-packet-locking/04-mars-receive-lock.md | Proposed MARS receive queue lock replacement. |
| plans/async-performance/07-async-tvp/summary.md | P7 summary for async TVP data sources. |
| plans/async-performance/07-async-tvp/priority-rationale.md | Rationale for ranking async TVPs as P7. |
| plans/async-performance/07-async-tvp/01-async-enumerable-tvp.md | Proposed IAsyncEnumerable-based TVP acceptance. |
| plans/async-performance/07-async-tvp/02-async-row-writing.md | Proposed async TVP row writing pipeline changes. |
| plans/async-performance/07-async-tvp/03-async-bulkcopy.md | Proposed async BulkCopy enumeration support. |
| Directory.Packages.props | Adjusts central package versions (impacts restore/build behavior). |
| .github/prompts/ado-work-item-clarification.prompt.md | Updates ADO prompt guidance for forcing Markdown format updates. |
| .github/instructions/plans-formatting.instructions.md | Adds repo instructions for plans/**/*.md formatting/linting. |
| .github/instructions/coding-style.instructions.md | Adds repo instruction to follow policy/coding-style.md for C#. |
| @@ -0,0 +1,18 @@ | |||
| <Project Sdk="Microsoft.NET.Sdk"> | |||
There was a problem hiding this comment.
This .csproj appears to be saved with a UTF-8 BOM (the invisible character before <Project> on line 1). policy/coding-style.md specifies text files should be UTF-8 encoded without BOM; please resave without BOM to align with repo standards and avoid tooling diffs.
| <PackageReference Include="Microsoft.Data.SqlClient" /> | ||
| <PackageReference Include="Microsoft.Data.SqlClient.Extensions.Azure" /> | ||
| <PackageReference Include="System.CommandLine" Version="2.0.3" VersionOverride="2.0.3" /> | ||
| </ItemGroup> |
There was a problem hiding this comment.
PackageReference for System.CommandLine specifies both Version and VersionOverride. With central package management enabled, this is redundant and may cause restore warnings/errors depending on NuGet/MSBuild settings. Prefer a single mechanism: either define System.CommandLine via PackageVersion (possibly in a local Directory.Packages.props for this app) and use VersionOverride only when needed, or remove VersionOverride and keep Version.
| 3. **Large data reads are catastrophically slow in async** — the TDS snapshot/replay mechanism | ||
| causes exponential time growth with data size. |
There was a problem hiding this comment.
Key finding #3 says the snapshot/replay mechanism "causes exponential time growth with data size", but the root-cause analysis in this same plan describes it as O(n²) (quadratic) in packet count. Exponential is materially different/misleading; please correct the wording to "quadratic" (or "super-linear / quadratic") to keep the summary technically accurate.
| <PackageVersion Include="Microsoft.Data.SqlClient" Version="7.0.0" /> | ||
| <PackageVersion Include="Microsoft.Data.SqlClient.Extensions.Azure" Version="1.0.0" /> |
There was a problem hiding this comment.
PackageVersion entries for Microsoft.Data.SqlClient and Microsoft.Data.SqlClient.Extensions.Azure were added unconditionally. This creates two concrete restore/build hazards:
- When building with
-p:ReferenceType=Package, the conditionalItemGroupbelow also defines versions for these same package IDs, resulting in duplicatePackageVersionitems. doc/Directory.Packages.propsimports the root props and also definesPackageVersionfor these package IDs, which will now be duplicates as well.
To avoid breaking package-mode builds and doc builds, consider removing these unconditioned entries and instead scoping the versions to the plans/ app (e.g., add a Directory.Packages.props under plans/async-performance/apps/ThreadStarvation/ similar to doc/apps/AzureAuthentication/Directory.Packages.props, or use PackageVersion Update=... in the importing props where overriding is intended).
| <PackageVersion Include="Microsoft.Data.SqlClient" Version="7.0.0" /> | |
| <PackageVersion Include="Microsoft.Data.SqlClient.Extensions.Azure" Version="1.0.0" /> |
| version specified. | ||
| --> | ||
| <PackageVersion Include="Microsoft.Data.SqlClient" Version="7.0.0" /> | ||
| <PackageVersion Include="Microsoft.Data.SqlClient.Extensions.Azure" Version="1.0.0" /> |
There was a problem hiding this comment.
Why reference MDS v7.0.0 and Azure extension 1.0.0 in project reference?
| 4. **The key insight:** PLP data is append-only. We never need to "re-interpret" already-read bytes | ||
| — we just need to copy them from packets to the output buffer. On resume, we can skip the copy | ||
| for already-processed data. |
There was a problem hiding this comment.
I've also noticed that the PLP chunks are usually sized to fit into a single packet. If the reassembled TDS packet contains nothing other than a complete PLP chunk, then we can steal that packet buffer, slice it up to form a ReadOnlyMemory and append that ReadOnlyMemory instance to the list of PLP chunks. This saves a buffer copy (although we would still need to allocate a byte array when we replace the TDS packet buffer.)
| ### Phase 4: Remove Snapshot Mechanism | ||
|
|
||
| - Once all paths are async-native, remove `StateSnapshot` entirely | ||
| - Remove `SnapshotStatus`, `_snapshot`, `_cachedSnapshot` |
There was a problem hiding this comment.
What about the AAsyncCallContext-derived classes?
|
|
||
| ### Phase 3: Convert Token Processing to Async | ||
|
|
||
| - `TryProcessToken_*` methods become async |
There was a problem hiding this comment.
It's not strictly necessary for this phase, but I'd really like to see these methods turned into proper .NET types which read their data from (or write their data to) a PipeReader/PipeWriter. It's a very large part of TdsParser.cs, and the restructuring would make testing these changes a lot easier.
|
|
||
| ### Phase 2: Convert Leaf Methods to Async | ||
|
|
||
| - Start with `TryReadByte`, `TryReadInt16`, etc. |
There was a problem hiding this comment.
For the write path, it may also be worth refactoring TdsValueSetter beforehand - there's some duplicate write logic there.
|
|
||
| - `TryRun()` calls dozens of `TryRead*` methods — all must become async | ||
| - Sync callers need a sync wrapper (or separate sync path) | ||
| - Deep async call chains have overhead (state machine allocation per frame) |
There was a problem hiding this comment.
In (at least) netcore, if the ValueTask completes synchronously then there's no heap allocation. There may be less overhead here than we expect.
| ### 3. Cold-Start Amplification | ||
|
|
||
| Issue #601 reports that opening 100 connections in parallel takes minutes because connections are | ||
| created serially. P1 enables parallel creation, but if each creation still blocks a thread, the |
There was a problem hiding this comment.
Fixing P1 could actually result in worse application startup performance until P4 is fixed because we'd be blocking large parts of the entire application's thread pool while the connection pool populates. This is particularly likely to be a problem in cases where an application sets the minimum pool size to more than the number of threads (which is quite likely - people will expect an async SqlClient to be network-bound, so a large minimum pool size would make intuitive sense.)
Perhaps it'd be safer to temporarily limit the degree of parallelism on P1 until P4 is complete? Alternatively, steps 1 & 2 in the sequencing consideration could be reversed.
|
|
||
| ## Problem | ||
|
|
||
| `ConcurrentQueueSemaphore` (used in managed SNI for stream-level read/write locking) allocates a |
There was a problem hiding this comment.
Also: when wrapping an SniNetworkStream with an SniSslStream, we only need to lock on the outer ConcurrentQueueSemaphore.
| - `SqlSequentialTextReader.cs` lines 363-421 — text streaming | ||
| - `TdsParser.cs` lines 1426, 3560, 3573, 12962-13082 — packet and value buffers | ||
|
|
||
| ### Not Pooled — Allocation Hotspots |
There was a problem hiding this comment.
Separately, there's a remaining piece of work to remove the boxing of SqlGuid values.
|
|
||
| - Opt-in: users must explicitly provide `IAsyncEnumerable<SqlDataRecord>` | ||
| - Existing sync paths unchanged | ||
| - Requires `#if NET` guard — `IAsyncEnumerable` not available on .NET Framework |
There was a problem hiding this comment.
IAsyncEnumerable is available in Microsoft.Bcl.AsyncInterfaces, would this not work?
| - The property implements custom caching logic (not the standard `AcquireAndReturn` helper) and | ||
| also checks `OperatingSystem.IsWindows()` at runtime. Setting the switch on a non-Windows OS | ||
| has no effect. | ||
| - When `ILLink.Substitutions.xml` is used for trimming, the switch value may be baked in at |
There was a problem hiding this comment.
I introduced this - the plan will eventually be to have a custom .targets file which allows someone to specify set an MSBuild property in their project file and drop the dependency on the native SNI DLL entirely (or to enable single-file applications to work.)
Description
This PR contains lots of research, anaylsis, and summary artifacts related to a variety of async performance issues we currently have in the driver. It will be used and expanded on as we choose specific targets to work on.
The first effort will focus on Unix thread starvation due to interactions with login flows and query executions.
The PR also sets up the
plans/directory where we can start dumping plans and artifacts from other investigations, and tries to establish some style standards for Markdown.