Skip to content

Make unroll_length schedulable#1833

Open
QuantuMope wants to merge 1 commit intopytorchfrom
PR/andrew/schedulable-unroll-length
Open

Make unroll_length schedulable#1833
QuantuMope wants to merge 1 commit intopytorchfrom
PR/andrew/schedulable-unroll-length

Conversation

@QuantuMope
Copy link
Copy Markdown
Contributor

@QuantuMope QuantuMope commented Mar 30, 2026

This PR allows for a scheduled unroll length if we are running synced off-policy RL training:

  1. async_unroll=False
  2. whole_replay_buffer_training=False

It also allows for a scheduled value of 0, which in turn skips unrolling to train from the replay buffer.

This allows us to "simulate" very diverse training strategies. E.g.,

unroll_length = StepScheduler("iterations", [
    # unroll to collect an "offline" dataset
    (1, int(initial_collect_steps / num_para_envs)),        
    # perform offline training iterations. no unroll 
    (offline_training_iters, 0),     
    # continue with online RL                                    
    (offline_training_iters + 1, desired_unroll_length)])

Codex cleverly makes a minimal change with full backward compatibility by adding the following code

    @property
    def unroll_length(self):
        return self._unroll_length()

    @unroll_length.setter
    def unroll_length(self, value):
        self._unroll_length = as_scheduler(value)

self.unroll_with_grad = unroll_with_grad
self.use_root_inputs_for_after_train_iter = use_root_inputs_for_after_train_iter
self.async_unroll = async_unroll
if not isinstance(self._unroll_length, ConstantScheduler):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantScheduler --> should check against a base class, e.g. Scheduler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check against ConstantScheduler here because a scalar input will be converted to one before this check due to the setter function on line 479.

Any non-constant scheduler should then raise an error if we're doing on-policy or async unroll.

Copy link
Copy Markdown
Contributor Author

@QuantuMope QuantuMope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Haichao, responded to your comment. Let me know if I misunderstood it.

self.unroll_with_grad = unroll_with_grad
self.use_root_inputs_for_after_train_iter = use_root_inputs_for_after_train_iter
self.async_unroll = async_unroll
if not isinstance(self._unroll_length, ConstantScheduler):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check against ConstantScheduler here because a scalar input will be converted to one before this check due to the setter function on line 479.

Any non-constant scheduler should then raise an error if we're doing on-policy or async unroll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants