You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, why warmup micro batch is Eq.6 in paper when interleave is enabled? That is not consistant with Figure 2 in paper. According to Eq.6, wamup micro batch num of last pp rank is 5, but is 9 in Figure 2.
Secondly, When seq1F1B - I is enabled, do we really not need to adjust the calculation of the model chunk according to the currently calculated chunk sequence?
The text was updated successfully, but these errors were encountered:
Firstly, why warmup micro batch is Eq.6 in paper when interleave is enabled? That is not consistant with Figure 2 in paper. According to Eq.6, wamup micro batch num of last pp rank is 5, but is 9 in Figure 2.
Secondly, When seq1F1B - I is enabled, do we really not need to adjust the calculation of the model chunk according to the currently calculated chunk sequence?
The text was updated successfully, but these errors were encountered: