-
Notifications
You must be signed in to change notification settings - Fork 2
/
benchmarks_51850.out
2419 lines (2416 loc) · 84 KB
/
benchmarks_51850.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
ERROR: Unable to locate a modulefile for 'cuda/11.7'
1.13.1
[2023-11-20 18:56:33,060] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2023-11-20 18:56:33,625] [INFO] [distributed.py:83:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=26.0.144.205, master_port=6000
[2023-11-20 18:56:33,625] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
[2023-11-20 18:56:36,253] [INFO] [checkpointing.py:223:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
num_attention_heads: 40, hidden_size: 5120, train_micro_batch_size_per_gpu: 4, tensor_mp_size: 1, pipeline_mp_size: 1, dp_size: 1
Actual
------
QKV Transform: 1.981074571609497
Flash: 0.023078203201293945
Attention linproj: 0.007080554962158203
QKV Transform: 0.005939006805419922
Flash: 0.04616570472717285
Attention linproj: 0.0017130374908447266
QKV Transform: 0.0059931278228759766
Flash: 0.009404420852661133
Attention linproj: 0.001708984375
QKV Transform: 0.006645679473876953
Flash: 0.009401321411132812
Attention linproj: 0.001703023910522461
QKV Transform: 0.005727291107177734
Flash: 0.010465145111083984
Attention linproj: 0.0017018318176269531
QKV Transform: 0.006586313247680664
Flash: 0.009397506713867188
Attention linproj: 0.0017006397247314453
QKV Transform: 0.005678892135620117
Flash: 0.00940251350402832
Attention linproj: 0.0017008781433105469
QKV Transform: 0.006659984588623047
Flash: 0.009417295455932617
Attention linproj: 0.0016982555389404297
QKV Transform: 0.00669097900390625
Flash: 0.009408950805664062
Attention linproj: 0.0017023086547851562
QKV Transform: 0.0059587955474853516
Flash: 0.00940084457397461
Attention linproj: 0.001722574234008789
QKV Transform: 0.006693124771118164
Flash: 0.009410858154296875
Attention linproj: 0.001699686050415039
QKV Transform: 0.005788564682006836
Flash: 0.014647960662841797
Attention linproj: 0.001714944839477539
QKV Transform: 0.006562709808349609
Flash: 0.009398698806762695
Attention linproj: 0.0017006397247314453
QKV Transform: 0.0057718753814697266
Flash: 0.009414911270141602
Attention linproj: 0.0016999244689941406
QKV Transform: 0.005827665328979492
Flash: 0.009400367736816406
Attention linproj: 0.0017018318176269531
QKV Transform: 0.005784749984741211
Flash: 0.00942087173461914
Attention linproj: 0.0017032623291015625
QKV Transform: 0.006652355194091797
Flash: 0.009413957595825195
Attention linproj: 0.0017027854919433594
QKV Transform: 0.006678342819213867
Flash: 0.009418487548828125
Attention linproj: 0.0016989707946777344
QKV Transform: 0.006597280502319336
Flash: 0.009401082992553711
Attention linproj: 0.0017001628875732422
QKV Transform: 0.0062103271484375
Flash: 0.009462594985961914
Attention linproj: 0.0016834735870361328
QKV Transform: 0.005795717239379883
Flash: 0.010497570037841797
Attention linproj: 0.0016834735870361328
QKV Transform: 0.0058345794677734375
Flash: 0.00945591926574707
Attention linproj: 0.001688241958618164
QKV Transform: 0.005779743194580078
Flash: 0.00945734977722168
Attention linproj: 0.0016837120056152344
QKV Transform: 0.0066111087799072266
Flash: 0.00946664810180664
Attention linproj: 0.0016903877258300781
QKV Transform: 0.005610227584838867
Flash: 0.009470939636230469
Attention linproj: 0.001688241958618164
QKV Transform: 0.0056591033935546875
Flash: 0.00945281982421875
Attention linproj: 0.001684427261352539
QKV Transform: 0.005719184875488281
Flash: 0.009450674057006836
Attention linproj: 0.001680612564086914
QKV Transform: 0.005784034729003906
Flash: 0.009456396102905273
Attention linproj: 0.0016906261444091797
QKV Transform: 0.0057446956634521484
Flash: 0.009459733963012695
Attention linproj: 0.0016896724700927734
QKV Transform: 0.005728483200073242
Flash: 0.009467601776123047
Attention linproj: 0.0016849040985107422
QKV Transform: 0.0057523250579833984
Flash: 0.009459733963012695
Attention linproj: 0.0016863346099853516
QKV Transform: 0.005760908126831055
Flash: 0.009460210800170898
Attention linproj: 0.0016891956329345703
QKV Transform: 0.005677461624145508
Flash: 0.009451866149902344
Attention linproj: 0.0016977787017822266
QKV Transform: 0.006618022918701172
Flash: 0.009455204010009766
Attention linproj: 0.0017096996307373047
QKV Transform: 0.0065991878509521484
Flash: 0.009452104568481445
Attention linproj: 0.0017082691192626953
QKV Transform: 0.005740642547607422
Flash: 0.009420633316040039
Attention linproj: 0.0016870498657226562
QKV Transform: 0.0057680606842041016
Flash: 0.009449481964111328
Attention linproj: 0.0016837120056152344
QKV Transform: 0.0056629180908203125
Flash: 0.009450674057006836
Attention linproj: 0.0016858577728271484
QKV Transform: 0.0066204071044921875
Flash: 0.00944972038269043
Attention linproj: 0.001682281494140625
QKV Transform: 0.005728244781494141
Flash: 0.009447336196899414
Attention linproj: 0.0016841888427734375
QKV Transform: 0.005759239196777344
Flash: 0.010497331619262695
Attention linproj: 0.0016870498657226562
QKV Transform: 0.00564122200012207
Flash: 0.012602567672729492
Attention linproj: 0.001695871353149414
QKV Transform: 0.005609035491943359
Flash: 0.009453773498535156
Attention linproj: 0.0016880035400390625
QKV Transform: 0.006627082824707031
Flash: 0.009455680847167969
Attention linproj: 0.0016853809356689453
QKV Transform: 0.005637168884277344
Flash: 0.00945591926574707
Attention linproj: 0.001697540283203125
QKV Transform: 0.006182432174682617
Flash: 0.009454965591430664
Attention linproj: 0.0017085075378417969
QKV Transform: 0.0064771175384521484
Flash: 0.010503530502319336
Attention linproj: 0.0016880035400390625
QKV Transform: 0.0062754154205322266
Flash: 0.011567354202270508
Attention linproj: 0.0016896724700927734
QKV Transform: 0.005675554275512695
Flash: 0.009446859359741211
Attention linproj: 0.0016894340515136719
QKV Transform: 0.005711078643798828
Flash: 0.009467601776123047
Attention linproj: 0.0016884803771972656
QKV Transform: 0.005688905715942383
Flash: 0.00947880744934082
Attention linproj: 0.0016889572143554688
QKV Transform: 0.005761384963989258
Flash: 0.009436368942260742
Attention linproj: 0.0017070770263671875
QKV Transform: 0.005697011947631836
Flash: 0.009458780288696289
Attention linproj: 0.0016939640045166016
QKV Transform: 0.0057218074798583984
Flash: 0.009455442428588867
Attention linproj: 0.0016851425170898438
QKV Transform: 0.005805492401123047
Flash: 0.009454011917114258
Attention linproj: 0.0016849040985107422
QKV Transform: 0.005661725997924805
Flash: 0.009451866149902344
Attention linproj: 0.001683950424194336
QKV Transform: 0.005734443664550781
Flash: 0.009459972381591797
Attention linproj: 0.0017096996307373047
QKV Transform: 0.006653785705566406
Flash: 0.009454727172851562
Attention linproj: 0.0017087459564208984
QKV Transform: 0.006639957427978516
Flash: 0.009454965591430664
Attention linproj: 0.001697540283203125
QKV Transform: 0.005805492401123047
Flash: 0.009444475173950195
Attention linproj: 0.001684427261352539
QKV Transform: 0.005806684494018555
Flash: 0.009446382522583008
Attention linproj: 0.00168609619140625
QKV Transform: 0.005658149719238281
Flash: 0.009447336196899414
Attention linproj: 0.0016889572143554688
QKV Transform: 0.00569915771484375
Flash: 0.009467601776123047
Attention linproj: 0.0016872882843017578
QKV Transform: 0.005751132965087891
Flash: 0.009457111358642578
Attention linproj: 0.0016894340515136719
QKV Transform: 0.005745887756347656
Flash: 0.009467840194702148
Attention linproj: 0.0016956329345703125
QKV Transform: 0.00573277473449707
Flash: 0.009447574615478516
Attention linproj: 0.0016922950744628906
QKV Transform: 0.00662541389465332
Flash: 0.009457826614379883
Attention linproj: 0.0016903877258300781
QKV Transform: 0.005618572235107422
Flash: 0.009448051452636719
Attention linproj: 0.0016901493072509766
QKV Transform: 0.006652116775512695
Flash: 0.009446859359741211
Attention linproj: 0.0016870498657226562
QKV Transform: 0.006650447845458984
Flash: 0.009445667266845703
Attention linproj: 0.001688241958618164
QKV Transform: 0.005655527114868164
Flash: 0.009439706802368164
Attention linproj: 0.0017096996307373047
QKV Transform: 0.0058672428131103516
Flash: 0.009428262710571289
Attention linproj: 0.0016906261444091797
QKV Transform: 0.005640983581542969
Flash: 0.009453773498535156
Attention linproj: 0.0016851425170898438
QKV Transform: 0.005696773529052734
Flash: 0.009461641311645508
Attention linproj: 0.0016875267028808594
QKV Transform: 0.005818843841552734
Flash: 0.009456872940063477
Attention linproj: 0.0016934871673583984
QKV Transform: 0.005800962448120117
Flash: 0.010488748550415039
Attention linproj: 0.0016934871673583984
QKV Transform: 0.005752086639404297
Flash: 0.009462356567382812
Attention linproj: 0.0016901493072509766
QKV Transform: 0.0066525936126708984
Flash: 0.009455680847167969
Attention linproj: 0.0016865730285644531
QKV Transform: 0.005746364593505859
Flash: 0.009462356567382812
Attention linproj: 0.0016925334930419922
QKV Transform: 0.005616188049316406
Flash: 0.009455442428588867
Attention linproj: 0.0016884803771972656
QKV Transform: 0.005767107009887695
Flash: 0.009450912475585938
Attention linproj: 0.001692056655883789
QKV Transform: 0.005710124969482422
Flash: 0.010499000549316406
Attention linproj: 0.0016911029815673828
QKV Transform: 0.006806612014770508
Flash: 0.012595891952514648
Attention linproj: 0.0016930103302001953
QKV Transform: 0.005651235580444336
Flash: 0.009443998336791992
Attention linproj: 0.0016906261444091797
QKV Transform: 0.0066912174224853516
Flash: 0.00944972038269043
Attention linproj: 0.0016870498657226562
QKV Transform: 0.005677938461303711
Flash: 0.009452581405639648
Attention linproj: 0.0016896724700927734
QKV Transform: 0.005828142166137695
Flash: 0.009463310241699219
Attention linproj: 0.001689910888671875
QKV Transform: 0.0057525634765625
Flash: 0.009442806243896484
Attention linproj: 0.0016870498657226562
QKV Transform: 0.005753040313720703
Flash: 0.009446859359741211
Attention linproj: 0.0016973018646240234
QKV Transform: 0.005671262741088867
Flash: 0.009458780288696289
Attention linproj: 0.001691579818725586
QKV Transform: 0.0057260990142822266
Flash: 0.00944375991821289
Attention linproj: 0.0016863346099853516
QKV Transform: 0.005711793899536133
Flash: 0.00944662094116211
Attention linproj: 0.0016918182373046875
QKV Transform: 0.005706310272216797
Flash: 0.009435176849365234
Attention linproj: 0.0016894340515136719
QKV Transform: 0.005789995193481445
Flash: 0.00944066047668457
Attention linproj: 0.0016894340515136719
QKV Transform: 0.005793094635009766
Flash: 0.009440898895263672
Attention linproj: 0.0016880035400390625
QKV Transform: 0.0057752132415771484
Flash: 0.0094451904296875
Attention linproj: 0.0016925334930419922
QKV Transform: 0.0057373046875
Flash: 0.009447097778320312
Attention linproj: 0.0016911029815673828
QKV Transform: 0.006791591644287109
Flash: 0.009436607360839844
Attention linproj: 0.0016942024230957031
QKV Transform: 0.00572514533996582
Flash: 0.009448528289794922
Attention linproj: 0.0016875267028808594
QKV Transform: 0.006656169891357422
Flash: 0.00946044921875
Attention linproj: 0.0016908645629882812
QKV Transform: 0.006829500198364258
Flash: 0.009455204010009766
Attention linproj: 0.0016868114471435547
QKV Transform: 0.005813121795654297
Flash: 0.01049494743347168
Attention linproj: 0.001688241958618164
QKV Transform: 0.006529331207275391
Flash: 0.009456872940063477
Attention linproj: 0.0016865730285644531
QKV Transform: 0.005784749984741211
Flash: 0.00944066047668457
Attention linproj: 0.0016949176788330078
QKV Transform: 0.005824089050292969
Flash: 0.009463071823120117
Attention linproj: 0.0016918182373046875
QKV Transform: 0.005652904510498047
Flash: 0.009450674057006836
Attention linproj: 0.0017120838165283203
QKV Transform: 0.0066301822662353516
Flash: 0.009453773498535156
Attention linproj: 0.0016908645629882812
QKV Transform: 0.00658726692199707
Flash: 0.00946044921875
Attention linproj: 0.0016930103302001953
QKV Transform: 0.0066432952880859375
Flash: 0.009443283081054688
Attention linproj: 0.001689910888671875
QKV Transform: 0.005662202835083008
Flash: 0.010502815246582031
Attention linproj: 0.0016922950744628906
QKV Transform: 0.0065686702728271484
Flash: 0.009453058242797852
Attention linproj: 0.0016868114471435547
QKV Transform: 0.005721569061279297
Flash: 0.009444713592529297
Attention linproj: 0.001688241958618164
QKV Transform: 0.005694389343261719
Flash: 0.009444713592529297
Attention linproj: 0.0016906261444091797
QKV Transform: 0.00585627555847168
Flash: 0.00944972038269043
Attention linproj: 0.0016925334930419922
QKV Transform: 0.006662607192993164
Flash: 0.009439468383789062
Attention linproj: 0.001691579818725586
QKV Transform: 0.00574183464050293
Flash: 0.009453535079956055
Attention linproj: 0.0016846656799316406
QKV Transform: 0.005774021148681641
Flash: 0.009464263916015625
Attention linproj: 0.0016911029815673828
QKV Transform: 0.005805253982543945
Flash: 0.009441375732421875
Attention linproj: 0.0016906261444091797
QKV Transform: 0.005738258361816406
Flash: 0.009465456008911133
Attention linproj: 0.0016937255859375
QKV Transform: 0.005704164505004883
Flash: 0.009444952011108398
Attention linproj: 0.0016918182373046875
QKV Transform: 0.005733966827392578
Flash: 0.00945425033569336
Attention linproj: 0.0016884803771972656
QKV Transform: 0.006680011749267578
Flash: 0.009457588195800781
Attention linproj: 0.0016894340515136719
QKV Transform: 0.005815744400024414
Flash: 0.009451150894165039
Attention linproj: 0.0016934871673583984
QKV Transform: 0.0056591033935546875
Flash: 0.009435653686523438
Attention linproj: 0.0016856193542480469
QKV Transform: 0.005759000778198242
Flash: 0.00944662094116211
Attention linproj: 0.0016927719116210938
QKV Transform: 0.006668567657470703
Flash: 0.00944972038269043
Attention linproj: 0.001697540283203125
QKV Transform: 0.005690097808837891
Flash: 0.009438037872314453
Attention linproj: 0.001689910888671875
QKV Transform: 0.005864143371582031
Flash: 0.009449958801269531
Attention linproj: 0.0016918182373046875
QKV Transform: 0.0057680606842041016
Flash: 0.00943899154663086
Attention linproj: 0.00168609619140625
QKV Transform: 0.005646705627441406
Flash: 0.009432792663574219
Attention linproj: 0.0017049312591552734
QKV Transform: 0.005754947662353516
Flash: 0.00944662094116211
Attention linproj: 0.0016942024230957031
QKV Transform: 0.005815744400024414
Flash: 0.009438276290893555
Attention linproj: 0.0016963481903076172
QKV Transform: 0.0066509246826171875
Flash: 0.009441852569580078
Attention linproj: 0.0016887187957763672
QKV Transform: 0.005726337432861328
Flash: 0.009442329406738281
Attention linproj: 0.0016906261444091797
QKV Transform: 0.006603240966796875
Flash: 0.010496854782104492
Attention linproj: 0.0016965866088867188
QKV Transform: 0.005703926086425781
Flash: 0.009444713592529297
Attention linproj: 0.0016875267028808594
QKV Transform: 0.005677223205566406
Flash: 0.009440898895263672
Attention linproj: 0.0016934871673583984
QKV Transform: 0.006634950637817383
Flash: 0.00945281982421875
Attention linproj: 0.0016903877258300781
QKV Transform: 0.00667881965637207
Flash: 0.009453773498535156
Attention linproj: 0.0016865730285644531
QKV Transform: 0.005694150924682617
Flash: 0.009436845779418945
Attention linproj: 0.0016918182373046875
QKV Transform: 0.005666494369506836
Flash: 0.009462118148803711
Attention linproj: 0.0016903877258300781
QKV Transform: 0.006478309631347656
Flash: 0.00944972038269043
Attention linproj: 0.0016927719116210938
QKV Transform: 0.005814552307128906
Flash: 0.010498046875
Attention linproj: 0.0016875267028808594
QKV Transform: 0.0057621002197265625
Flash: 0.011547565460205078
Attention linproj: 0.0017447471618652344
QKV Transform: 0.005667924880981445
Flash: 0.009394168853759766
Attention linproj: 0.001695871353149414
QKV Transform: 0.005610227584838867
Flash: 0.00945281982421875
Attention linproj: 0.0016944408416748047
QKV Transform: 0.006778717041015625
Flash: 0.009453773498535156
Attention linproj: 0.001695394515991211
QKV Transform: 0.005722522735595703
Flash: 0.009433746337890625
Attention linproj: 0.0016906261444091797
QKV Transform: 0.005657672882080078
Flash: 0.009441614151000977
Attention linproj: 0.0016922950744628906
QKV Transform: 0.005826473236083984
Flash: 0.009446382522583008
Attention linproj: 0.0016963481903076172
Attention duration (in seconds): 0.0191
Attention throughput (in TFLOP/s): 108.192
MLP_h_4h: 1.9207894802093506
MLP_4h_h: 0.006655454635620117
MLP_h_4h: 0.007151126861572266
MLP_4h_h: 0.006555318832397461
MLP_h_4h: 0.007151603698730469
MLP_4h_h: 0.006548881530761719
MLP_h_4h: 0.007161378860473633
MLP_4h_h: 0.006587028503417969
MLP_h_4h: 0.007266521453857422
MLP_4h_h: 0.00658869743347168
MLP_h_4h: 0.007382631301879883
MLP_4h_h: 0.006646156311035156
MLP_h_4h: 0.007417201995849609
MLP_4h_h: 0.006642341613769531
MLP_h_4h: 0.007420539855957031
MLP_4h_h: 0.006652116775512695
MLP_h_4h: 0.007421731948852539
MLP_4h_h: 0.0066449642181396484
MLP_h_4h: 0.007420778274536133
MLP_4h_h: 0.006662130355834961
MLP_h_4h: 0.007476806640625
MLP_4h_h: 0.006655216217041016
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.006649494171142578
MLP_h_4h: 0.0074198246002197266
MLP_4h_h: 0.006602764129638672
MLP_h_4h: 0.0073320865631103516
MLP_4h_h: 0.00658726692199707
MLP_h_4h: 0.007385730743408203
MLP_4h_h: 0.006646394729614258
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.0066471099853515625
MLP_h_4h: 0.007418155670166016
MLP_4h_h: 0.0066471099853515625
MLP_h_4h: 0.007418394088745117
MLP_4h_h: 0.006644725799560547
MLP_h_4h: 0.007429361343383789
MLP_4h_h: 0.006644487380981445
MLP_h_4h: 0.007421731948852539
MLP_4h_h: 0.006641864776611328
MLP_h_4h: 0.0074231624603271484
MLP_4h_h: 0.006642580032348633
MLP_h_4h: 0.007417440414428711
MLP_4h_h: 0.006644487380981445
MLP_h_4h: 0.0074236392974853516
MLP_4h_h: 0.0066449642181396484
MLP_h_4h: 0.007418632507324219
MLP_4h_h: 0.0066432952880859375
MLP_h_4h: 0.007420778274536133
MLP_4h_h: 0.006644010543823242
MLP_h_4h: 0.0074350833892822266
MLP_4h_h: 0.006642818450927734
MLP_h_4h: 0.007417201995849609
MLP_4h_h: 0.006643056869506836
MLP_h_4h: 0.007423877716064453
MLP_4h_h: 0.006644010543823242
MLP_h_4h: 0.007419109344482422
MLP_4h_h: 0.006644725799560547
MLP_h_4h: 0.007421016693115234
MLP_4h_h: 0.0066449642181396484
MLP_h_4h: 0.007416248321533203
MLP_4h_h: 0.006640911102294922
MLP_h_4h: 0.007421731948852539
MLP_4h_h: 0.00664520263671875
MLP_h_4h: 0.007386207580566406
MLP_4h_h: 0.006596565246582031
MLP_h_4h: 0.007338047027587891
MLP_4h_h: 0.006604433059692383
MLP_h_4h: 0.00733494758605957
MLP_4h_h: 0.00659632682800293
MLP_h_4h: 0.007380962371826172
MLP_4h_h: 0.006647825241088867
MLP_h_4h: 0.007422208786010742
MLP_4h_h: 0.006642818450927734
MLP_h_4h: 0.007421255111694336
MLP_4h_h: 0.00664520263671875
MLP_h_4h: 0.007416486740112305
MLP_4h_h: 0.0066449642181396484
MLP_h_4h: 0.007430076599121094
MLP_4h_h: 0.0066492557525634766
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.006644725799560547
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.006644010543823242
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.0066432952880859375
MLP_h_4h: 0.007422685623168945
MLP_4h_h: 0.006643533706665039
MLP_h_4h: 0.007420539855957031
MLP_4h_h: 0.006640195846557617
MLP_h_4h: 0.007420539855957031
MLP_4h_h: 0.006643533706665039
MLP_h_4h: 0.007431983947753906
MLP_4h_h: 0.0066416263580322266
MLP_h_4h: 0.007416486740112305
MLP_4h_h: 0.00664210319519043
MLP_h_4h: 0.007421016693115234
MLP_4h_h: 0.006639719009399414
MLP_h_4h: 0.0074193477630615234
MLP_4h_h: 0.006640911102294922
MLP_h_4h: 0.0074176788330078125
MLP_4h_h: 0.006643056869506836
MLP_h_4h: 0.0074198246002197266
MLP_4h_h: 0.006644248962402344
MLP_h_4h: 0.007419586181640625
MLP_4h_h: 0.006647348403930664
MLP_h_4h: 0.007418394088745117
MLP_4h_h: 0.006636381149291992
MLP_h_4h: 0.007335186004638672
MLP_4h_h: 0.006602048873901367
MLP_h_4h: 0.007335186004638672
MLP_4h_h: 0.006601572036743164
MLP_h_4h: 0.00733494758605957
MLP_4h_h: 0.00660252571105957
MLP_h_4h: 0.00733184814453125
MLP_4h_h: 0.00660395622253418
MLP_h_4h: 0.007376194000244141
MLP_4h_h: 0.006647586822509766
MLP_h_4h: 0.007420539855957031
MLP_4h_h: 0.006646394729614258
MLP_h_4h: 0.007419109344482422
MLP_4h_h: 0.006646156311035156
MLP_h_4h: 0.007419109344482422
MLP_4h_h: 0.006644010543823242
MLP_h_4h: 0.007433414459228516
MLP_4h_h: 0.0066471099853515625
MLP_h_4h: 0.007418632507324219
MLP_4h_h: 0.0066454410552978516
MLP_h_4h: 0.0074176788330078125
MLP_4h_h: 0.00664520263671875
MLP_h_4h: 0.007464170455932617
MLP_4h_h: 0.0068776607513427734
MLP_h_4h: 0.007689237594604492
MLP_4h_h: 0.006922245025634766
MLP_h_4h: 0.007787466049194336
MLP_4h_h: 0.006966590881347656
MLP_h_4h: 0.007784366607666016
MLP_4h_h: 0.0069620609283447266
MLP_h_4h: 0.007776021957397461
MLP_4h_h: 0.006876230239868164
MLP_h_4h: 0.007742404937744141
MLP_4h_h: 0.006938934326171875
MLP_h_4h: 0.007608652114868164
MLP_4h_h: 0.006795644760131836
MLP_h_4h: 0.007597684860229492
MLP_4h_h: 0.00679469108581543
MLP_h_4h: 0.007599830627441406
MLP_4h_h: 0.0067291259765625
MLP_h_4h: 0.007505893707275391
MLP_4h_h: 0.006712198257446289
MLP_h_4h: 0.0075304508209228516
MLP_4h_h: 0.006713390350341797
MLP_h_4h: 0.007506608963012695
MLP_4h_h: 0.00670933723449707
MLP_h_4h: 0.0075054168701171875
MLP_4h_h: 0.0066928863525390625
MLP_h_4h: 0.007435798645019531
MLP_4h_h: 0.006644725799560547
MLP_h_4h: 0.0074198246002197266
MLP_4h_h: 0.006650209426879883
MLP_h_4h: 0.007417440414428711
MLP_4h_h: 0.006645917892456055
MLP_h_4h: 0.007423877716064453
MLP_4h_h: 0.006649017333984375
MLP_h_4h: 0.0074214935302734375
MLP_4h_h: 0.006645679473876953
MLP_h_4h: 0.007428169250488281
MLP_4h_h: 0.006650686264038086
MLP_h_4h: 0.0074310302734375
MLP_4h_h: 0.006655216217041016
MLP_h_4h: 0.0074312686920166016
MLP_4h_h: 0.006651401519775391
MLP_h_4h: 0.007429838180541992
MLP_4h_h: 0.0066525936126708984
MLP_h_4h: 0.0074291229248046875
MLP_4h_h: 0.0066606998443603516
MLP_h_4h: 0.007441282272338867
MLP_4h_h: 0.0066525936126708984
MLP_h_4h: 0.007432460784912109
MLP_4h_h: 0.006647586822509766
MLP_h_4h: 0.007421970367431641
MLP_4h_h: 0.006649017333984375
MLP_h_4h: 0.007418394088745117
MLP_4h_h: 0.006648063659667969
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.006659269332885742
MLP_h_4h: 0.007421970367431641
MLP_4h_h: 0.006646394729614258
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.0066525936126708984
MLP_h_4h: 0.007421970367431641
MLP_4h_h: 0.0066492557525634766
MLP_h_4h: 0.007416248321533203
MLP_4h_h: 0.0066471099853515625
MLP_h_4h: 0.007422685623168945
MLP_4h_h: 0.006646871566772461
MLP_h_4h: 0.0074193477630615234
MLP_4h_h: 0.0066449642181396484
MLP_h_4h: 0.007391929626464844
MLP_4h_h: 0.0066072940826416016
MLP_h_4h: 0.0073375701904296875
MLP_4h_h: 0.006608486175537109
MLP_h_4h: 0.0073375701904296875
MLP_4h_h: 0.006605625152587891
MLP_h_4h: 0.007380962371826172
MLP_4h_h: 0.006645917892456055
MLP_h_4h: 0.0074214935302734375
MLP_4h_h: 0.006652116775512695
MLP_h_4h: 0.007418155670166016
MLP_4h_h: 0.006645917892456055
MLP_h_4h: 0.007421255111694336
MLP_4h_h: 0.006650686264038086
MLP_h_4h: 0.007421731948852539
MLP_4h_h: 0.00664520263671875
MLP_h_4h: 0.007421970367431641
MLP_4h_h: 0.006649494171142578
MLP_h_4h: 0.007417440414428711
MLP_4h_h: 0.006646156311035156
MLP_h_4h: 0.007401227951049805
MLP_4h_h: 0.0066111087799072266
MLP_h_4h: 0.007334470748901367
MLP_4h_h: 0.0066070556640625
MLP_h_4h: 0.0073354244232177734
MLP_4h_h: 0.006613492965698242
MLP_h_4h: 0.0073702335357666016
MLP_4h_h: 0.006645917892456055
MLP_h_4h: 0.0074214935302734375
MLP_4h_h: 0.0066487789154052734
MLP_h_4h: 0.007416486740112305
MLP_4h_h: 0.006647586822509766
MLP_h_4h: 0.007420539855957031
MLP_4h_h: 0.006652116775512695
MLP_h_4h: 0.007422208786010742
MLP_4h_h: 0.0066492557525634766
MLP_h_4h: 0.007418632507324219
MLP_4h_h: 0.0066487789154052734
MLP_h_4h: 0.007434368133544922
MLP_4h_h: 0.00665283203125
MLP_h_4h: 0.007418155670166016
MLP_4h_h: 0.0066530704498291016
MLP_h_4h: 0.0074231624603271484
MLP_4h_h: 0.0066335201263427734
MLP_h_4h: 0.0073337554931640625
MLP_4h_h: 0.006610393524169922
MLP_h_4h: 0.007340908050537109
MLP_4h_h: 0.0066089630126953125
MLP_h_4h: 0.007335186004638672
MLP_4h_h: 0.006605863571166992
MLP_h_4h: 0.007338762283325195
MLP_4h_h: 0.006606340408325195
MLP_h_4h: 0.007379293441772461
MLP_4h_h: 0.0066530704498291016
MLP_h_4h: 0.007428169250488281
MLP_4h_h: 0.0066471099853515625
MLP_h_4h: 0.0074198246002197266
MLP_4h_h: 0.0066471099853515625
MLP_h_4h: 0.007421016693115234
MLP_4h_h: 0.006651639938354492
MLP_h_4h: 0.00741887092590332
MLP_4h_h: 0.00664830207824707
MLP_h_4h: 0.0074193477630615234
MLP_4h_h: 0.006654977798461914
MLP_h_4h: 0.007440090179443359
MLP_4h_h: 0.0066487789154052734
MLP_h_4h: 0.007451057434082031
MLP_4h_h: 0.006714820861816406
MLP_h_4h: 0.007512331008911133
MLP_4h_h: 0.006711244583129883
MLP_h_4h: 0.007505655288696289
MLP_4h_h: 0.006712436676025391
MLP_h_4h: 0.0075037479400634766
MLP_4h_h: 0.006714344024658203
MLP_h_4h: 0.0075054168701171875
MLP_4h_h: 0.006710529327392578
MLP_h_4h: 0.0075113773345947266
MLP_4h_h: 0.00670933723449707
MLP_h_4h: 0.0075037479400634766
MLP_4h_h: 0.006711006164550781
MLP_h_4h: 0.007512092590332031
MLP_4h_h: 0.006714344024658203
MLP_h_4h: 0.007505893707275391
MLP_4h_h: 0.006710052490234375
MLP_h_4h: 0.007509946823120117
MLP_4h_h: 0.0067138671875
MLP_h_4h: 0.007505178451538086
MLP_4h_h: 0.006712675094604492
MLP_h_4h: 0.0075075626373291016
MLP_4h_h: 0.006711483001708984
MLP_h_4h: 0.007517576217651367
MLP_4h_h: 0.006709575653076172
MLP_h_4h: 0.0075054168701171875
MLP_4h_h: 0.006710052490234375
MLP_h_4h: 0.007508754730224609
MLP_4h_h: 0.006715059280395508
MLP_h_4h: 0.007506370544433594
MLP_4h_h: 0.006690502166748047
MLP_h_4h: 0.0074198246002197266
MLP_4h_h: 0.006651639938354492
MLP_h_4h: 0.007420063018798828
MLP_4h_h: 0.006651163101196289
MLP duration (in seconds): 0.0141
MLP throughput (in TFLOP/s): 243.388
LN1: 0.0034399032592773438
QKV Transform: 0.005773782730102539
Flash: 0.004411935806274414
Attention linproj: 0.001750946044921875
Post-attention Dropout: 0.0627593994140625
Post-attention residual: 0.002286672592163086
LN2: 0.0002090930938720703
MLP_h_4h: 0.007080078125
MLP_4h_h: 0.0065076351165771484
Post-MLP residual: 0.002236604690551758
Attention layer time: 0.09699368476867676
LN1: 0.00021266937255859375
QKV Transform: 0.0060579776763916016
Flash: 0.009376049041748047
Attention linproj: 0.0016970634460449219
Post-attention Dropout: 0.0006148815155029297
Post-attention residual: 0.00020623207092285156
LN2: 0.00019216537475585938
MLP_h_4h: 0.00829005241394043
MLP_4h_h: 0.006514549255371094
Post-MLP residual: 0.0006029605865478516
Attention layer time: 0.03418779373168945
LN1: 0.0002090930938720703
QKV Transform: 0.006093502044677734
Flash: 0.009407281875610352
Attention linproj: 0.0016987323760986328
Post-attention Dropout: 0.0006132125854492188
Post-attention residual: 0.00020813941955566406
LN2: 0.0001926422119140625
MLP_h_4h: 0.008277177810668945
MLP_4h_h: 0.006516695022583008
Post-MLP residual: 0.0006177425384521484
Attention layer time: 0.034253835678100586
LN1: 0.00021004676818847656
QKV Transform: 0.005780935287475586
Flash: 0.009423017501831055
Attention linproj: 0.0016951560974121094
Post-attention Dropout: 0.0006153583526611328
Post-attention residual: 0.00020694732666015625
LN2: 0.00019216537475585938
MLP_h_4h: 0.008305549621582031
MLP_4h_h: 0.0065233707427978516
Post-MLP residual: 0.0006136894226074219
Attention layer time: 0.03396868705749512
LN1: 0.0002079010009765625
QKV Transform: 0.0058863162994384766
Flash: 0.009436368942260742
Attention linproj: 0.00171661376953125
Post-attention Dropout: 0.0006165504455566406
Post-attention residual: 0.00020503997802734375
LN2: 0.00019359588623046875
MLP_h_4h: 0.008260250091552734
MLP_4h_h: 0.006536722183227539
Post-MLP residual: 0.0006194114685058594
Attention layer time: 0.03409624099731445
LN1: 0.0002086162567138672
QKV Transform: 0.005858182907104492
Flash: 0.009432792663574219
Attention linproj: 0.0016987323760986328
Post-attention Dropout: 0.0006144046783447266
Post-attention residual: 0.00020766258239746094
LN2: 0.00019407272338867188
MLP_h_4h: 0.008289575576782227
MLP_4h_h: 0.006529331207275391
Post-MLP residual: 0.0006144046783447266
Attention layer time: 0.03404831886291504
LN1: 0.00020766258239746094
QKV Transform: 0.005697965621948242
Flash: 0.009424448013305664
Attention linproj: 0.001699209213256836
Post-attention Dropout: 0.0006196498870849609
Post-attention residual: 0.00020623207092285156
LN2: 0.0001914501190185547
MLP_h_4h: 0.008276224136352539
MLP_4h_h: 0.00651860237121582
Post-MLP residual: 0.0006170272827148438
Attention layer time: 0.03387951850891113
LN1: 0.00022721290588378906
QKV Transform: 0.005782604217529297
Flash: 0.00944209098815918
Attention linproj: 0.001695871353149414
Post-attention Dropout: 0.0006167888641357422
Post-attention residual: 0.00020742416381835938
LN2: 0.00019025802612304688
MLP_h_4h: 0.008278369903564453
MLP_4h_h: 0.0065114498138427734
Post-MLP residual: 0.0006177425384521484
Attention layer time: 0.03396344184875488
LN1: 0.00020837783813476562
QKV Transform: 0.006078004837036133
Flash: 0.009447574615478516
Attention linproj: 0.0016934871673583984
Post-attention Dropout: 0.0006144046783447266
Post-attention residual: 0.00020623207092285156
LN2: 0.0001926422119140625
MLP_h_4h: 0.008289098739624023
MLP_4h_h: 0.00651097297668457
Post-MLP residual: 0.0006189346313476562
Attention layer time: 0.03425407409667969
LN1: 0.00020742416381835938
QKV Transform: 0.0059244632720947266
Flash: 0.009418964385986328
Attention linproj: 0.0016934871673583984
Post-attention Dropout: 0.0006113052368164062
Post-attention residual: 0.0002067089080810547
LN2: 0.00020313262939453125
MLP_h_4h: 0.00829625129699707
MLP_4h_h: 0.006524085998535156
Post-MLP residual: 0.0006177425384521484
Attention layer time: 0.034110307693481445
LN1: 0.00020742416381835938
QKV Transform: 0.005948543548583984
Flash: 0.009440898895263672
Attention linproj: 0.001695394515991211
Post-attention Dropout: 0.0006322860717773438
Post-attention residual: 0.00020623207092285156
LN2: 0.00019311904907226562
MLP_h_4h: 0.008277177810668945
MLP_4h_h: 0.006532430648803711
Post-MLP residual: 0.0006163120269775391
Attention layer time: 0.03415846824645996
LN1: 0.0002079010009765625
QKV Transform: 0.005986928939819336
Flash: 0.00942087173461914
Attention linproj: 0.0016963481903076172
Post-attention Dropout: 0.0006182193756103516
Post-attention residual: 0.0002079010009765625
LN2: 0.0001919269561767578
MLP_h_4h: 0.008292675018310547
MLP_4h_h: 0.006520509719848633
Post-MLP residual: 0.000621795654296875
Attention layer time: 0.03416252136230469
LN1: 0.00020837783813476562
QKV Transform: 0.005887746810913086
Flash: 0.00944828987121582
Attention linproj: 0.0016994476318359375
Post-attention Dropout: 0.00061798095703125
Post-attention residual: 0.00020742416381835938
LN2: 0.00019168853759765625
MLP_h_4h: 0.008281946182250977
MLP_4h_h: 0.006525278091430664
Post-MLP residual: 0.0006189346313476562
Attention layer time: 0.03408312797546387
LN1: 0.000209808349609375
QKV Transform: 0.00580906867980957
Flash: 0.009432315826416016
Attention linproj: 0.0016939640045166016
Post-attention Dropout: 0.0006151199340820312
Post-attention residual: 0.00020599365234375
LN2: 0.00019121170043945312
MLP_h_4h: 0.008297204971313477
MLP_4h_h: 0.006514549255371094
Post-MLP residual: 0.00061798095703125
Attention layer time: 0.03399538993835449
LN1: 0.0002071857452392578
QKV Transform: 0.0058443546295166016
Flash: 0.00943613052368164
Attention linproj: 0.0016973018646240234
Post-attention Dropout: 0.0006148815155029297
Post-attention residual: 0.00020742416381835938
LN2: 0.0001938343048095703
MLP_h_4h: 0.008515596389770508
MLP_4h_h: 0.006525754928588867
Post-MLP residual: 0.0006220340728759766
Attention layer time: 0.03426003456115723
LN1: 0.00020837783813476562
QKV Transform: 0.0057239532470703125
Flash: 0.009424686431884766
Attention linproj: 0.0016949176788330078
Post-attention Dropout: 0.0006177425384521484
Post-attention residual: 0.0002067089080810547
LN2: 0.0001964569091796875
MLP_h_4h: 0.008295297622680664
MLP_4h_h: 0.006525754928588867
Post-MLP residual: 0.0006177425384521484
Attention layer time: 0.03391408920288086
LN1: 0.00020742416381835938
QKV Transform: 0.00583648681640625
Flash: 0.009437799453735352
Attention linproj: 0.0016913414001464844
Post-attention Dropout: 0.0006189346313476562
Post-attention residual: 0.00020813941955566406
LN2: 0.0001914501190185547
MLP_h_4h: 0.008297443389892578
MLP_4h_h: 0.006522178649902344
Post-MLP residual: 0.0006189346313476562
Attention layer time: 0.034027099609375
LN1: 0.00020933151245117188
QKV Transform: 0.005716800689697266
Flash: 0.009418725967407227
Attention linproj: 0.0016961097717285156
Post-attention Dropout: 0.0006167888641357422
Post-attention residual: 0.00020647048950195312
LN2: 0.00019073486328125
MLP_h_4h: 0.008304595947265625
MLP_4h_h: 0.0065155029296875
Post-MLP residual: 0.0006163120269775391
Attention layer time: 0.03388667106628418
LN1: 0.00020694732666015625
QKV Transform: 0.0056743621826171875
Flash: 0.009436607360839844
Attention linproj: 0.0016942024230957031
Post-attention Dropout: 0.0006144046783447266
Post-attention residual: 0.00020647048950195312
LN2: 0.00019097328186035156
MLP_h_4h: 0.00829935073852539
MLP_4h_h: 0.006518125534057617
Post-MLP residual: 0.0006170272827148438
Attention layer time: 0.0338742733001709
LN1: 0.00020837783813476562
QKV Transform: 0.005830049514770508
Flash: 0.009439468383789062
Attention linproj: 0.0016970634460449219
Post-attention Dropout: 0.0006136894226074219
Post-attention residual: 0.00020837783813476562
LN2: 0.00019240379333496094
MLP_h_4h: 0.008287906646728516
MLP_4h_h: 0.006523609161376953
Post-MLP residual: 0.0006165504455566406
Attention layer time: 0.03401756286621094
LN1: 0.0002071857452392578
QKV Transform: 0.0056836605072021484
Flash: 0.00943899154663086
Attention linproj: 0.0016984939575195312
Post-attention Dropout: 0.0006177425384521484
Post-attention residual: 0.0002090930938720703
LN2: 0.00019240379333496094
MLP_h_4h: 0.008266448974609375
MLP_4h_h: 0.006520271301269531
Post-MLP residual: 0.0006196498870849609
Attention layer time: 0.03386521339416504
LN1: 0.00020766258239746094
QKV Transform: 0.006057024002075195
Flash: 0.009433746337890625