Revert "Optionally use flash-attn
's CE loss for metrics (#3394)"
#6059
Job | Run time |
---|---|
2m 33s | |
2m 33s |
flash-attn
's CE loss for metrics (#3394)"
#6059
Job | Run time |
---|---|
2m 33s | |
2m 33s |