Improve normalization of losses and metrics, fix bugs, run distributed model tests on cpu#477
Draft
jlamypoirier wants to merge 14 commits intomainfrom
Draft
Improve normalization of losses and metrics, fix bugs, run distributed model tests on cpu#477jlamypoirier wants to merge 14 commits intomainfrom
jlamypoirier wants to merge 14 commits intomainfrom
Conversation
25 tasks
fakeredis 2.34 introduced Resp3Writer hardcoded for all TCP connections regardless of protocol negotiation. When XREADGROUP BLOCK times out on an empty stream, Resp3Writer.dump(None) sends RESP3 null (b'_\r\n'). The redis-py RESP2 parser (used by default) raises Protocol Error: b'_'. Fix: monkey-patch TCPFakeRequestHandler.setup in fake_redis_server() to replace Resp3Writer with Resp2Writer, restoring correct RESP2 null encoding (b'*-1\r\n') for blocking timeouts. The patch is guarded on the presence of Resp3Writer (2.34+ only) and raises explicitly if Resp2Writer is missing so future breakage is immediately diagnosable.
- Add `divisor` parameter to fused loss functions (entropy, z-loss, grpo) to allow normalizing by actual token count rather than total sequence positions - Fix `_get_grad_output` to not pre-divide by parallel/split factors (handled by divisor) - Fix loss accumulation across cross-entropy splits in LM head - Fix variable naming bug in `_set_distributed_reduction_map` - Update tests to pass explicit divisor and match new normalization behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_losses_with_counts
25 tasks
- Fix schedule tflops divide by 1e12 (was reporting raw flops) - Change loss reductions from AVG to SUM (needed with token-count weighting) - Add CPU/gloo fallback support in distributed test configs - Fix pp tied weight bias ignore_duplicates - Adjust micro_batch_size and compare targets for distributed configs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MTPLlamaModel uses mtp_norms[0] for the first prediction head instead of model.norm (as in standard Llama). The converter was inheriting the Llama mapping (head.final_norm → model.norm), so the native HuggingFace model loaded converted checkpoints with mtp_norms[0] uninitialized. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g datasets Add __getstate__/__setstate__ to DistributedDim to drop the process group when pickling, so DataLoader worker processes can be spawned even when the dataset or collate_fn captures a DistributedConfig with active process groups. Also expand test_data_streaming to cover num_workers=1 and increase _NUM_BATCHES from 2 to 10 for better coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests for padding, multi-token prediction, micro-batch splits, prediction mask, label counts, GRPO data, position index, inference phase, document count, and cumulative sequence lengths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pand preprocessing tests - Guard cross-document label masking against documents shorter than prediction distance - Fix num_documents to exclude the padding pseudo-document from the count - Add comprehensive test coverage: all split/target indices, predicted_tokens in (1,3), padding variants, and complex multi-document cases with loss masking spans and GRPO data - Refactor test helpers into cached properties indexed by [split_index][target_index] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Description