-
Notifications
You must be signed in to change notification settings - Fork 2
/
benchmarks_51842.out
2418 lines (2415 loc) · 84.6 KB
/
benchmarks_51842.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1.13.1
[2023-11-20 17:51:55,554] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2023-11-20 17:51:56,362] [INFO] [distributed.py:83:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=26.0.148.159, master_port=6000
[2023-11-20 17:51:56,362] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
[2023-11-20 17:51:59,316] [INFO] [checkpointing.py:223:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
num_attention_heads: 20, hidden_size: 2560, train_micro_batch_size_per_gpu: 4, tensor_mp_size: 1, pipeline_mp_size: 1, dp_size: 1
Actual
------
QKV Transform: 2.16180157661438
Flash: 0.019667625427246094
Attention linproj: 0.00694584846496582
QKV Transform: 0.002521514892578125
Flash: 0.007395505905151367
Attention linproj: 0.0005259513854980469
QKV Transform: 0.002070903778076172
Flash: 0.007428407669067383
Attention linproj: 0.0005309581756591797
QKV Transform: 0.0021505355834960938
Flash: 0.007422924041748047
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002098560333251953
Flash: 0.007433414459228516
Attention linproj: 0.0005288124084472656
QKV Transform: 0.002145051956176758
Flash: 0.007421255111694336
Attention linproj: 0.0005202293395996094
QKV Transform: 0.002095460891723633
Flash: 0.007433414459228516
Attention linproj: 0.0005176067352294922
QKV Transform: 0.002071857452392578
Flash: 0.007438182830810547
Attention linproj: 0.0005185604095458984
QKV Transform: 0.002113819122314453
Flash: 0.0074312686920166016
Attention linproj: 0.000518798828125
QKV Transform: 0.0020904541015625
Flash: 0.008472681045532227
Attention linproj: 0.0005171298980712891
QKV Transform: 0.0020530223846435547
Flash: 0.0074291229248046875
Attention linproj: 0.0005211830139160156
QKV Transform: 0.0020771026611328125
Flash: 0.007427215576171875
Attention linproj: 0.0005216598510742188
QKV Transform: 0.002044200897216797
Flash: 0.00743412971496582
Attention linproj: 0.0005207061767578125
QKV Transform: 0.002208709716796875
Flash: 0.007400035858154297
Attention linproj: 0.0005190372467041016
QKV Transform: 0.002244710922241211
Flash: 0.00744318962097168
Attention linproj: 0.0005180835723876953
QKV Transform: 0.002679586410522461
Flash: 0.007431507110595703
Attention linproj: 0.0005204677581787109
QKV Transform: 0.0022525787353515625
Flash: 0.0074121952056884766
Attention linproj: 0.0005202293395996094
QKV Transform: 0.0022101402282714844
Flash: 0.007417201995849609
Attention linproj: 0.0005245208740234375
QKV Transform: 0.0021288394927978516
Flash: 0.007419586181640625
Attention linproj: 0.0005242824554443359
QKV Transform: 0.0021331310272216797
Flash: 0.007421016693115234
Attention linproj: 0.0005240440368652344
QKV Transform: 0.002000093460083008
Flash: 0.01058340072631836
Attention linproj: 0.0005183219909667969
QKV Transform: 0.00189208984375
Flash: 0.007416486740112305
Attention linproj: 0.000518798828125
QKV Transform: 0.002138853073120117
Flash: 0.015842914581298828
Attention linproj: 0.0005207061767578125
QKV Transform: 0.0021491050720214844
Flash: 0.007416248321533203
Attention linproj: 0.0005211830139160156
QKV Transform: 0.0020644664764404297
Flash: 0.007415771484375
Attention linproj: 0.0005202293395996094
QKV Transform: 0.002214193344116211
Flash: 0.007421731948852539
Attention linproj: 0.0005197525024414062
QKV Transform: 0.002188444137573242
Flash: 0.007444858551025391
Attention linproj: 0.0005209445953369141
QKV Transform: 0.0020875930786132812
Flash: 0.007420063018798828
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002192974090576172
Flash: 0.007404327392578125
Attention linproj: 0.000522613525390625
QKV Transform: 0.002038717269897461
Flash: 0.007420778274536133
Attention linproj: 0.0005307197570800781
QKV Transform: 0.002088785171508789
Flash: 0.007403135299682617
Attention linproj: 0.0005190372467041016
QKV Transform: 0.0020291805267333984
Flash: 0.007432222366333008
Attention linproj: 0.0005185604095458984
QKV Transform: 0.002109050750732422
Flash: 0.007421016693115234
Attention linproj: 0.0005207061767578125
QKV Transform: 0.0021653175354003906
Flash: 0.0074388980865478516
Attention linproj: 0.0005180835723876953
QKV Transform: 0.002201557159423828
Flash: 0.007418632507324219
Attention linproj: 0.0005185604095458984
QKV Transform: 0.0022192001342773438
Flash: 0.007440090179443359
Attention linproj: 0.0005137920379638672
QKV Transform: 0.002146005630493164
Flash: 0.007421731948852539
Attention linproj: 0.000518798828125
QKV Transform: 0.0021665096282958984
Flash: 0.0074310302734375
Attention linproj: 0.0005421638488769531
QKV Transform: 0.0021080970764160156
Flash: 0.007440090179443359
Attention linproj: 0.000522613525390625
QKV Transform: 0.0020647048950195312
Flash: 0.0074405670166015625
Attention linproj: 0.0005116462707519531
QKV Transform: 0.002063751220703125
Flash: 0.007431983947753906
Attention linproj: 0.0005197525024414062
QKV Transform: 0.0020394325256347656
Flash: 0.007417917251586914
Attention linproj: 0.0005199909210205078
QKV Transform: 0.002108335494995117
Flash: 0.0074236392974853516
Attention linproj: 0.0005209445953369141
QKV Transform: 0.002054929733276367
Flash: 0.007432222366333008
Attention linproj: 0.0005202293395996094
QKV Transform: 0.002190113067626953
Flash: 0.007428646087646484
Attention linproj: 0.0005197525024414062
QKV Transform: 0.002157926559448242
Flash: 0.0074329376220703125
Attention linproj: 0.0005207061767578125
QKV Transform: 0.002213001251220703
Flash: 0.007435798645019531
Attention linproj: 0.0005207061767578125
QKV Transform: 0.0021774768829345703
Flash: 0.0074198246002197266
Attention linproj: 0.0005242824554443359
QKV Transform: 0.0021152496337890625
Flash: 0.007422208786010742
Attention linproj: 0.0005297660827636719
QKV Transform: 0.0021047592163085938
Flash: 0.00741887092590332
Attention linproj: 0.0005218982696533203
QKV Transform: 0.002147197723388672
Flash: 0.007436275482177734
Attention linproj: 0.0005252361297607422
QKV Transform: 0.0020978450775146484
Flash: 0.007436513900756836
Attention linproj: 0.0005240440368652344
QKV Transform: 0.0021746158599853516
Flash: 0.007429838180541992
Attention linproj: 0.0005228519439697266
QKV Transform: 0.0020503997802734375
Flash: 0.007425785064697266
Attention linproj: 0.0005218982696533203
QKV Transform: 0.0021483898162841797
Flash: 0.007428169250488281
Attention linproj: 0.0005209445953369141
QKV Transform: 0.0020809173583984375
Flash: 0.007432222366333008
Attention linproj: 0.0005192756652832031
QKV Transform: 0.002090930938720703
Flash: 0.010580301284790039
Attention linproj: 0.0005228519439697266
QKV Transform: 0.0022079944610595703
Flash: 0.007429361343383789
Attention linproj: 0.0005223751068115234
QKV Transform: 0.002155780792236328
Flash: 0.007425546646118164
Attention linproj: 0.0005238056182861328
QKV Transform: 0.0021631717681884766
Flash: 0.007420539855957031
Attention linproj: 0.0005238056182861328
QKV Transform: 0.00213623046875
Flash: 0.007436275482177734
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002000570297241211
Flash: 0.007429838180541992
Attention linproj: 0.000518798828125
QKV Transform: 0.0020742416381835938
Flash: 0.007440328598022461
Attention linproj: 0.0005357265472412109
QKV Transform: 0.0019922256469726562
Flash: 0.0074307918548583984
Attention linproj: 0.0005197525024414062
QKV Transform: 0.002114534378051758
Flash: 0.0074329376220703125
Attention linproj: 0.0005214214324951172
QKV Transform: 0.002074718475341797
Flash: 0.007422447204589844
Attention linproj: 0.000518798828125
QKV Transform: 0.0027942657470703125
Flash: 0.007437467575073242
Attention linproj: 0.000518798828125
QKV Transform: 0.0021584033966064453
Flash: 0.007427692413330078
Attention linproj: 0.0005185604095458984
QKV Transform: 0.002139568328857422
Flash: 0.008482694625854492
Attention linproj: 0.0005183219909667969
QKV Transform: 0.002065896987915039
Flash: 0.007435321807861328
Attention linproj: 0.0005195140838623047
QKV Transform: 0.0021393299102783203
Flash: 0.007447719573974609
Attention linproj: 0.000518798828125
QKV Transform: 0.002131223678588867
Flash: 0.007433891296386719
Attention linproj: 0.0005211830139160156
QKV Transform: 0.0021238327026367188
Flash: 0.007436990737915039
Attention linproj: 0.0005214214324951172
QKV Transform: 0.0021812915802001953
Flash: 0.007432460784912109
Attention linproj: 0.0005185604095458984
QKV Transform: 0.0021202564239501953
Flash: 0.00742030143737793
Attention linproj: 0.0005247592926025391
QKV Transform: 0.0021457672119140625
Flash: 0.007423877716064453
Attention linproj: 0.0005233287811279297
QKV Transform: 0.0021359920501708984
Flash: 0.010580778121948242
Attention linproj: 0.0005261898040771484
QKV Transform: 0.002043485641479492
Flash: 0.007440805435180664
Attention linproj: 0.0005214214324951172
QKV Transform: 0.0022513866424560547
Flash: 0.007414340972900391
Attention linproj: 0.0005207061767578125
QKV Transform: 0.002095460891723633
Flash: 0.007431745529174805
Attention linproj: 0.0005195140838623047
QKV Transform: 0.00213623046875
Flash: 0.007435798645019531
Attention linproj: 0.0005190372467041016
QKV Transform: 0.002142190933227539
Flash: 0.007432222366333008
Attention linproj: 0.0005190372467041016
QKV Transform: 0.0021457672119140625
Flash: 0.0074193477630615234
Attention linproj: 0.0005207061767578125
QKV Transform: 0.0022182464599609375
Flash: 0.007428646087646484
Attention linproj: 0.0005180835723876953
QKV Transform: 0.0021347999572753906
Flash: 0.007419586181640625
Attention linproj: 0.0005252361297607422
QKV Transform: 0.0021169185638427734
Flash: 0.007436990737915039
Attention linproj: 0.0005254745483398438
QKV Transform: 0.0021119117736816406
Flash: 0.00743412971496582
Attention linproj: 0.0005238056182861328
QKV Transform: 0.0020890235900878906
Flash: 0.007431745529174805
Attention linproj: 0.0005238056182861328
QKV Transform: 0.0020911693572998047
Flash: 0.007421731948852539
Attention linproj: 0.0005214214324951172
QKV Transform: 0.0020656585693359375
Flash: 0.0074346065521240234
Attention linproj: 0.0005185604095458984
QKV Transform: 0.002109050750732422
Flash: 0.00744938850402832
Attention linproj: 0.0005180835723876953
QKV Transform: 0.002090930938720703
Flash: 0.007431507110595703
Attention linproj: 0.0005185604095458984
QKV Transform: 0.0021474361419677734
Flash: 0.007440090179443359
Attention linproj: 0.0005197525024414062
QKV Transform: 0.0021545886993408203
Flash: 0.0074384212493896484
Attention linproj: 0.0005230903625488281
QKV Transform: 0.0021076202392578125
Flash: 0.007422685623168945
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002115011215209961
Flash: 0.009522438049316406
Attention linproj: 0.0005185604095458984
QKV Transform: 0.0020482540130615234
Flash: 0.00743556022644043
Attention linproj: 0.0005202293395996094
QKV Transform: 0.0018014907836914062
Flash: 0.007445573806762695
Attention linproj: 0.000518798828125
QKV Transform: 0.0020737648010253906
Flash: 0.007436275482177734
Attention linproj: 0.0005183219909667969
QKV Transform: 0.002213716506958008
Flash: 0.007447004318237305
Attention linproj: 0.0005230903625488281
QKV Transform: 0.002139568328857422
Flash: 0.0074160099029541016
Attention linproj: 0.0005218982696533203
QKV Transform: 0.002190113067626953
Flash: 0.007421016693115234
Attention linproj: 0.0005259513854980469
QKV Transform: 0.002214670181274414
Flash: 0.007425546646118164
Attention linproj: 0.0005238056182861328
QKV Transform: 0.0021276473999023438
Flash: 0.0074460506439208984
Attention linproj: 0.0005197525024414062
QKV Transform: 0.0020294189453125
Flash: 0.007416963577270508
Attention linproj: 0.0005192756652832031
QKV Transform: 0.0020194053649902344
Flash: 0.007432699203491211
Attention linproj: 0.0005216598510742188
QKV Transform: 0.0020265579223632812
Flash: 0.007428407669067383
Attention linproj: 0.0005192756652832031
QKV Transform: 0.0020422935485839844
Flash: 0.007431983947753906
Attention linproj: 0.000518798828125
QKV Transform: 0.0021572113037109375
Flash: 0.007441282272338867
Attention linproj: 0.0005211830139160156
QKV Transform: 0.002182483673095703
Flash: 0.007430076599121094
Attention linproj: 0.000518798828125
QKV Transform: 0.0021126270294189453
Flash: 0.007429599761962891
Attention linproj: 0.0005221366882324219
QKV Transform: 0.0019834041595458984
Flash: 0.007429599761962891
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002113819122314453
Flash: 0.007428407669067383
Attention linproj: 0.0005316734313964844
QKV Transform: 0.002155303955078125
Flash: 0.007436275482177734
Attention linproj: 0.0005404949188232422
QKV Transform: 0.0019779205322265625
Flash: 0.0074312686920166016
Attention linproj: 0.0005345344543457031
QKV Transform: 0.0020220279693603516
Flash: 0.007426738739013672
Attention linproj: 0.0005185604095458984
QKV Transform: 0.0020720958709716797
Flash: 0.007444620132446289
Attention linproj: 0.0005195140838623047
QKV Transform: 0.0020210742950439453
Flash: 0.007428646087646484
Attention linproj: 0.0005207061767578125
QKV Transform: 0.002151966094970703
Flash: 0.007428407669067383
Attention linproj: 0.0005202293395996094
QKV Transform: 0.0021216869354248047
Flash: 0.007442951202392578
Attention linproj: 0.000518798828125
QKV Transform: 0.002246379852294922
Flash: 0.007406711578369141
Attention linproj: 0.0005228519439697266
QKV Transform: 0.001940011978149414
Flash: 0.007424116134643555
Attention linproj: 0.0005183219909667969
QKV Transform: 0.0022134780883789062
Flash: 0.007442951202392578
Attention linproj: 0.0005242824554443359
QKV Transform: 0.0019109249114990234
Flash: 0.007424831390380859
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002093791961669922
Flash: 0.007447242736816406
Attention linproj: 0.0005235671997070312
QKV Transform: 0.002061128616333008
Flash: 0.0074307918548583984
Attention linproj: 0.0005204677581787109
QKV Transform: 0.001953601837158203
Flash: 0.007439136505126953
Attention linproj: 0.0005185604095458984
QKV Transform: 0.0020749568939208984
Flash: 0.00742030143737793
Attention linproj: 0.0005190372467041016
QKV Transform: 0.002124309539794922
Flash: 0.007438182830810547
Attention linproj: 0.0005192756652832031
QKV Transform: 0.002104520797729492
Flash: 0.007430076599121094
Attention linproj: 0.0005192756652832031
QKV Transform: 0.002179861068725586
Flash: 0.00742650032043457
Attention linproj: 0.000518798828125
QKV Transform: 0.002192258834838867
Flash: 0.0074253082275390625
Attention linproj: 0.0005192756652832031
QKV Transform: 0.0022346973419189453
Flash: 0.007409811019897461
Attention linproj: 0.0005247592926025391
QKV Transform: 0.0028848648071289062
Flash: 0.00772404670715332
Attention linproj: 0.0005323886871337891
QKV Transform: 0.0021219253540039062
Flash: 0.007417917251586914
Attention linproj: 0.0005192756652832031
QKV Transform: 0.002156496047973633
Flash: 0.0074329376220703125
Attention linproj: 0.0005249977111816406
QKV Transform: 0.002182483673095703
Flash: 0.007432460784912109
Attention linproj: 0.0005233287811279297
QKV Transform: 0.002177715301513672
Flash: 0.0074274539947509766
Attention linproj: 0.0005223751068115234
QKV Transform: 0.0021789073944091797
Flash: 0.007424354553222656
Attention linproj: 0.0005257129669189453
QKV Transform: 0.002171039581298828
Flash: 0.007428407669067383
Attention linproj: 0.0005249977111816406
QKV Transform: 0.0020439624786376953
Flash: 0.007431745529174805
Attention linproj: 0.0005211830139160156
QKV Transform: 0.002110719680786133
Flash: 0.007437944412231445
Attention linproj: 0.0005199909210205078
QKV Transform: 0.0020804405212402344
Flash: 0.00744175910949707
Attention linproj: 0.0005197525024414062
QKV Transform: 0.0020689964294433594
Flash: 0.007421016693115234
Attention linproj: 0.0005221366882324219
QKV Transform: 0.002100706100463867
Flash: 0.007435798645019531
Attention linproj: 0.0005192756652832031
QKV Transform: 0.002079010009765625
Flash: 0.00743556022644043
Attention linproj: 0.0005183219909667969
QKV Transform: 0.0021576881408691406
Flash: 0.007421255111694336
Attention linproj: 0.0005204677581787109
QKV Transform: 0.0021419525146484375
Flash: 0.007433652877807617
Attention linproj: 0.000522613525390625
QKV Transform: 0.002185344696044922
Flash: 0.007430076599121094
Attention linproj: 0.0005252361297607422
QKV Transform: 0.0021522045135498047
Flash: 0.007427692413330078
Attention linproj: 0.0005297660827636719
Attention duration (in seconds): 0.0116
Attention throughput (in TFLOP/s): 51.728
MLP_h_4h: 1.9001247882843018
MLP_4h_h: 0.001791238784790039
MLP_h_4h: 0.0022580623626708984
MLP_4h_h: 0.0017049312591552734
MLP_h_4h: 0.002227783203125
MLP_4h_h: 0.0016760826110839844
MLP_h_4h: 0.0022215843200683594
MLP_4h_h: 0.001674652099609375
MLP_h_4h: 0.002221822738647461
MLP_4h_h: 0.001676321029663086
MLP_h_4h: 0.0022377967834472656
MLP_4h_h: 0.0016837120056152344
MLP_h_4h: 0.0022361278533935547
MLP_4h_h: 0.0016880035400390625
MLP_h_4h: 0.0022344589233398438
MLP_4h_h: 0.0016846656799316406
MLP_h_4h: 0.00222015380859375
MLP_4h_h: 0.001695394515991211
MLP_h_4h: 0.0022232532501220703
MLP_4h_h: 0.0016896724700927734
MLP_h_4h: 0.0022215843200683594
MLP_4h_h: 0.0016853809356689453
MLP_h_4h: 0.0022242069244384766
MLP_4h_h: 0.0016868114471435547
MLP_h_4h: 0.0022416114807128906
MLP_4h_h: 0.0016894340515136719
MLP_h_4h: 0.002242565155029297
MLP_4h_h: 0.0016944408416748047
MLP_h_4h: 0.0022437572479248047
MLP_4h_h: 0.0017001628875732422
MLP_h_4h: 0.002239227294921875
MLP_4h_h: 0.0016896724700927734
MLP_h_4h: 0.002239704132080078
MLP_4h_h: 0.001691579818725586
MLP_h_4h: 0.002253293991088867
MLP_4h_h: 0.0016880035400390625
MLP_h_4h: 0.002253293991088867
MLP_4h_h: 0.0016968250274658203
MLP_h_4h: 0.0022537708282470703
MLP_4h_h: 0.0016932487487792969
MLP_h_4h: 0.002250194549560547
MLP_4h_h: 0.0016865730285644531
MLP_h_4h: 0.0022542476654052734
MLP_4h_h: 0.0016870498657226562
MLP_h_4h: 0.0022497177124023438
MLP_4h_h: 0.0016934871673583984
MLP_h_4h: 0.0022513866424560547
MLP_4h_h: 0.0016937255859375
MLP_h_4h: 0.0022530555725097656
MLP_4h_h: 0.0016930103302001953
MLP_h_4h: 0.002253293991088867
MLP_4h_h: 0.0017027854919433594
MLP_h_4h: 0.002254009246826172
MLP_4h_h: 0.0016932487487792969
MLP_h_4h: 0.002271413803100586
MLP_4h_h: 0.0016927719116210938
MLP_h_4h: 0.002263307571411133
MLP_4h_h: 0.0016987323760986328
MLP_h_4h: 0.002274036407470703
MLP_4h_h: 0.0016984939575195312
MLP_h_4h: 0.002271890640258789
MLP_4h_h: 0.0016987323760986328
MLP_h_4h: 0.002274036407470703
MLP_4h_h: 0.0016911029815673828
MLP_h_4h: 0.002268075942993164
MLP_4h_h: 0.0016977787017822266
MLP_h_4h: 0.002269268035888672
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.0022728443145751953
MLP_4h_h: 0.0016949176788330078
MLP_h_4h: 0.00225830078125
MLP_4h_h: 0.0017042160034179688
MLP_h_4h: 0.0022711753845214844
MLP_4h_h: 0.0017023086547851562
MLP_h_4h: 0.002269268035888672
MLP_4h_h: 0.0017023086547851562
MLP_h_4h: 0.002274036407470703
MLP_4h_h: 0.0017025470733642578
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.0017006397247314453
MLP_h_4h: 0.002270936965942383
MLP_4h_h: 0.0016980171203613281
MLP_h_4h: 0.002269268035888672
MLP_4h_h: 0.0016925334930419922
MLP_h_4h: 0.002268552780151367
MLP_4h_h: 0.0016987323760986328
MLP_h_4h: 0.00226593017578125
MLP_4h_h: 0.001703500747680664
MLP_h_4h: 0.0022704601287841797
MLP_4h_h: 0.0016984939575195312
MLP_h_4h: 0.002270221710205078
MLP_4h_h: 0.0017001628875732422
MLP_h_4h: 0.002270221710205078
MLP_4h_h: 0.0016989707946777344
MLP_h_4h: 0.0022661685943603516
MLP_4h_h: 0.001697540283203125
MLP_h_4h: 0.002266407012939453
MLP_4h_h: 0.0016932487487792969
MLP_h_4h: 0.0022673606872558594
MLP_4h_h: 0.0017039775848388672
MLP_h_4h: 0.0022666454315185547
MLP_4h_h: 0.001703023910522461
MLP_h_4h: 0.002267122268676758
MLP_4h_h: 0.0016999244689941406
MLP_h_4h: 0.0022690296173095703
MLP_4h_h: 0.0016977787017822266
MLP_h_4h: 0.002268075942993164
MLP_4h_h: 0.0016984939575195312
MLP_h_4h: 0.002268552780151367
MLP_4h_h: 0.001699209213256836
MLP_h_4h: 0.0022699832916259766
MLP_4h_h: 0.0016987323760986328
MLP_h_4h: 0.002273082733154297
MLP_4h_h: 0.0016999244689941406
MLP_h_4h: 0.0022678375244140625
MLP_4h_h: 0.0016989707946777344
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.0017099380493164062
MLP_h_4h: 0.002286195755004883
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.0022842884063720703
MLP_4h_h: 0.0016927719116210938
MLP_h_4h: 0.002287149429321289
MLP_4h_h: 0.0016908645629882812
MLP_h_4h: 0.0022699832916259766
MLP_4h_h: 0.0016903877258300781
MLP_h_4h: 0.0022759437561035156
MLP_4h_h: 0.0016937255859375
MLP_h_4h: 0.002271413803100586
MLP_4h_h: 0.0016930103302001953
MLP_h_4h: 0.0022716522216796875
MLP_4h_h: 0.0016918182373046875
MLP_h_4h: 0.0022726058959960938
MLP_4h_h: 0.001692056655883789
MLP_h_4h: 0.002271890640258789
MLP_4h_h: 0.001691579818725586
MLP_h_4h: 0.002270936965942383
MLP_4h_h: 0.0016908645629882812
MLP_h_4h: 0.0022733211517333984
MLP_4h_h: 0.0016913414001464844
MLP_h_4h: 0.0022716522216796875
MLP_4h_h: 0.0016908645629882812
MLP_h_4h: 0.0022726058959960938
MLP_4h_h: 0.0016937255859375
MLP_h_4h: 0.002261638641357422
MLP_4h_h: 0.0016989707946777344
MLP_h_4h: 0.002298593521118164
MLP_4h_h: 0.001699209213256836
MLP_h_4h: 0.002295255661010742
MLP_4h_h: 0.0016977787017822266
MLP_h_4h: 0.002297639846801758
MLP_4h_h: 0.0016994476318359375
MLP_h_4h: 0.0022974014282226562
MLP_4h_h: 0.0016987323760986328
MLP_h_4h: 0.002285480499267578
MLP_4h_h: 0.0016977787017822266
MLP_h_4h: 0.0022890567779541016
MLP_4h_h: 0.0016980171203613281
MLP_h_4h: 0.002295970916748047
MLP_4h_h: 0.0016980171203613281
MLP_h_4h: 0.0023059844970703125
MLP_4h_h: 0.0016977787017822266
MLP_h_4h: 0.0022935867309570312
MLP_4h_h: 0.0016984939575195312
MLP_h_4h: 0.0022962093353271484
MLP_4h_h: 0.001699686050415039
MLP_h_4h: 0.0022962093353271484
MLP_4h_h: 0.0017168521881103516
MLP_h_4h: 0.002302408218383789
MLP_4h_h: 0.0016989707946777344
MLP_h_4h: 0.0023012161254882812
MLP_4h_h: 0.001699686050415039
MLP_h_4h: 0.002300262451171875
MLP_4h_h: 0.0016982555389404297
MLP_h_4h: 0.0023038387298583984
MLP_4h_h: 0.001699686050415039
MLP_h_4h: 0.002302885055541992
MLP_4h_h: 0.0016989707946777344
MLP_h_4h: 0.002301931381225586
MLP_4h_h: 0.0017101764678955078
MLP_h_4h: 0.0022966861724853516
MLP_4h_h: 0.0017194747924804688
MLP_h_4h: 0.0022935867309570312
MLP_4h_h: 0.0017023086547851562
MLP_h_4h: 0.0022954940795898438
MLP_4h_h: 0.001703023910522461
MLP_h_4h: 0.00229644775390625
MLP_4h_h: 0.0017027854919433594
MLP_h_4h: 0.002299785614013672
MLP_4h_h: 0.0017039775848388672
MLP_h_4h: 0.002294778823852539
MLP_4h_h: 0.0017046928405761719
MLP_h_4h: 0.0022940635681152344
MLP_4h_h: 0.0017049312591552734
MLP_h_4h: 0.0022935867309570312
MLP_4h_h: 0.001695871353149414
MLP_h_4h: 0.0022661685943603516
MLP_4h_h: 0.0016970634460449219
MLP_h_4h: 0.002271890640258789
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.002268552780151367
MLP_4h_h: 0.0016982555389404297
MLP_h_4h: 0.002271890640258789
MLP_4h_h: 0.0016965866088867188
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.0017018318176269531
MLP_h_4h: 0.0022978782653808594
MLP_4h_h: 0.0017044544219970703
MLP_h_4h: 0.0022945404052734375
MLP_4h_h: 0.0017025470733642578
MLP_h_4h: 0.0022940635681152344
MLP_4h_h: 0.0017042160034179688
MLP_h_4h: 0.002294301986694336
MLP_4h_h: 0.001705169677734375
MLP_h_4h: 0.0022945404052734375
MLP_4h_h: 0.0017049312591552734
MLP_h_4h: 0.0023086071014404297
MLP_4h_h: 0.0017075538635253906
MLP_h_4h: 0.002298593521118164
MLP_4h_h: 0.0017056465148925781
MLP_h_4h: 0.0022993087768554688
MLP_4h_h: 0.0017092227935791016
MLP_h_4h: 0.002302885055541992
MLP_4h_h: 0.0017058849334716797
MLP_h_4h: 0.002298593521118164
MLP_4h_h: 0.0017199516296386719
MLP_h_4h: 0.0022957324981689453
MLP_4h_h: 0.0017082691192626953
MLP_h_4h: 0.002287149429321289
MLP_4h_h: 0.0017075538635253906
MLP_h_4h: 0.0022962093353271484
MLP_4h_h: 0.0017170906066894531
MLP_h_4h: 0.002295255661010742
MLP_4h_h: 0.0017070770263671875
MLP_h_4h: 0.00229644775390625
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.0022945404052734375
MLP_4h_h: 0.001706838607788086
MLP_h_4h: 0.002294301986694336
MLP_4h_h: 0.0017085075378417969
MLP_h_4h: 0.002292633056640625
MLP_4h_h: 0.0017168521881103516
MLP_h_4h: 0.002295255661010742
MLP_4h_h: 0.0017099380493164062
MLP_h_4h: 0.0022902488708496094
MLP_4h_h: 0.0017066001892089844
MLP_h_4h: 0.0022716522216796875
MLP_4h_h: 0.001708984375
MLP_h_4h: 0.0022704601287841797
MLP_4h_h: 0.0017061233520507812
MLP_h_4h: 0.002270936965942383
MLP_4h_h: 0.0017087459564208984
MLP_h_4h: 0.0022704601287841797
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.0022711753845214844
MLP_4h_h: 0.0017054080963134766
MLP_h_4h: 0.0022716522216796875
MLP_4h_h: 0.0017087459564208984
MLP_h_4h: 0.0022716522216796875
MLP_4h_h: 0.0017087459564208984
MLP_h_4h: 0.0022673606872558594
MLP_4h_h: 0.001708984375
MLP_h_4h: 0.0022733211517333984
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.002268075942993164
MLP_4h_h: 0.0017082691192626953
MLP_h_4h: 0.0022735595703125
MLP_4h_h: 0.0017116069793701172
MLP_h_4h: 0.002264261245727539
MLP_4h_h: 0.0017054080963134766
MLP_h_4h: 0.0022656917572021484
MLP_4h_h: 0.0017037391662597656
MLP_h_4h: 0.0022721290588378906
MLP_4h_h: 0.0016984939575195312
MLP_h_4h: 0.0022652149200439453
MLP_4h_h: 0.0017080307006835938
MLP_h_4h: 0.0022902488708496094
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.0022895336151123047
MLP_4h_h: 0.0017066001892089844
MLP_h_4h: 0.002287626266479492
MLP_4h_h: 0.0017161369323730469
MLP_h_4h: 0.0022902488708496094
MLP_4h_h: 0.0017201900482177734
MLP_h_4h: 0.002291440963745117
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.002290487289428711
MLP_4h_h: 0.0017104148864746094
MLP_h_4h: 0.0022890567779541016
MLP_4h_h: 0.0017142295837402344
MLP_h_4h: 0.002293825149536133
MLP_4h_h: 0.0017163753509521484
MLP_h_4h: 0.0022895336151123047
MLP_4h_h: 0.001714944839477539
MLP_h_4h: 0.002292156219482422
MLP_4h_h: 0.001714944839477539
MLP_h_4h: 0.0022895336151123047
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.002292156219482422
MLP_4h_h: 0.001714944839477539
MLP duration (in seconds): 0.0040
MLP throughput (in TFLOP/s): 212.697
LN1: 0.004076957702636719
QKV Transform: 0.0016138553619384766
Flash: 0.0022056102752685547
Attention linproj: 0.000537872314453125
Post-attention Dropout: 0.06640791893005371
Post-attention residual: 0.004055500030517578
LN2: 0.00018358230590820312
MLP_h_4h: 0.002364635467529297
MLP_4h_h: 0.0016968250274658203
Post-MLP residual: 0.0020263195037841797
Attention layer time: 0.08559989929199219
LN1: 0.00013256072998046875
QKV Transform: 0.0027561187744140625
Flash: 0.006319761276245117
Attention linproj: 0.0005259513854980469
Post-attention Dropout: 0.0003478527069091797
Post-attention residual: 0.00011515617370605469
LN2: 0.00011682510375976562
MLP_h_4h: 0.0029191970825195312
MLP_4h_h: 0.0017063617706298828
Post-MLP residual: 0.00033164024353027344
Attention layer time: 0.015591144561767578
LN1: 0.0001354217529296875
QKV Transform: 0.002033233642578125
Flash: 0.007412910461425781
Attention linproj: 0.0005230903625488281
Post-attention Dropout: 0.0003440380096435547
Post-attention residual: 0.00011444091796875
LN2: 0.00011754035949707031
MLP_h_4h: 0.003586292266845703
MLP_4h_h: 0.0017185211181640625
Post-MLP residual: 0.00033664703369140625
Attention layer time: 0.01663661003112793
LN1: 0.00013375282287597656
QKV Transform: 0.0026051998138427734
Flash: 0.00742650032043457
Attention linproj: 0.0005223751068115234
Post-attention Dropout: 0.000335693359375
Post-attention residual: 0.00011324882507324219
LN2: 0.00011539459228515625
MLP_h_4h: 0.0036079883575439453
MLP_4h_h: 0.0017113685607910156
Post-MLP residual: 0.0003407001495361328
Attention layer time: 0.01720142364501953
LN1: 0.00013184547424316406
QKV Transform: 0.002663135528564453
Flash: 0.008554220199584961
Attention linproj: 0.0005135536193847656
Post-attention Dropout: 0.0003383159637451172
Post-attention residual: 0.00011324882507324219
LN2: 0.00011587142944335938
MLP_h_4h: 0.0034999847412109375
MLP_4h_h: 0.0017132759094238281
Post-MLP residual: 0.00033974647521972656
Attention layer time: 0.0182955265045166
LN1: 0.00013136863708496094
QKV Transform: 0.0025413036346435547
Flash: 0.007406949996948242
Attention linproj: 0.0005199909210205078
Post-attention Dropout: 0.0003352165222167969
Post-attention residual: 0.00011277198791503906
LN2: 0.00011539459228515625
MLP_h_4h: 0.003621339797973633
MLP_4h_h: 0.0017216205596923828
Post-MLP residual: 0.0003387928009033203
Attention layer time: 0.017144441604614258
LN1: 0.0001323223114013672
QKV Transform: 0.0025320053100585938
Flash: 0.007413625717163086
Attention linproj: 0.0005252361297607422
Post-attention Dropout: 0.00034737586975097656
Post-attention residual: 0.00011396408081054688
LN2: 0.0001163482666015625
MLP_h_4h: 0.003582000732421875
MLP_4h_h: 0.0017137527465820312
Post-MLP residual: 0.0003342628479003906
Attention layer time: 0.01711583137512207
LN1: 0.00014257431030273438
QKV Transform: 0.002551555633544922
Flash: 0.007411003112792969
Attention linproj: 0.0005211830139160156
Post-attention Dropout: 0.0003445148468017578
Post-attention residual: 0.00011277198791503906
LN2: 0.00011706352233886719
MLP_h_4h: 0.003596067428588867
MLP_4h_h: 0.0017139911651611328
Post-MLP residual: 0.0003368854522705078
Attention layer time: 0.0171663761138916
LN1: 0.00013303756713867188
QKV Transform: 0.0026030540466308594
Flash: 0.007436037063598633
Attention linproj: 0.00051116943359375
Post-attention Dropout: 0.0003349781036376953
Post-attention residual: 0.00011205673217773438
LN2: 0.00011396408081054688
MLP_h_4h: 0.010118722915649414
MLP_4h_h: 0.0016980171203613281
Post-MLP residual: 0.0003376007080078125
Attention layer time: 0.023689746856689453
LN1: 0.0001323223114013672
QKV Transform: 0.0022585391998291016
Flash: 0.007416248321533203
Attention linproj: 0.0005235671997070312
Post-attention Dropout: 0.00034427642822265625
Post-attention residual: 0.00011301040649414062
LN2: 0.000133514404296875
MLP_h_4h: 0.003567934036254883
MLP_4h_h: 0.001712799072265625
Post-MLP residual: 0.00033593177795410156
Attention layer time: 0.016841888427734375
LN1: 0.000133514404296875
QKV Transform: 0.0025751590728759766
Flash: 0.007425785064697266
Attention linproj: 0.0005195140838623047
Post-attention Dropout: 0.0003342628479003906
Post-attention residual: 0.00011348724365234375
LN2: 0.00011515617370605469
MLP_h_4h: 0.0035932064056396484
MLP_4h_h: 0.0017082691192626953
Post-MLP residual: 0.000339508056640625
Attention layer time: 0.017147302627563477
LN1: 0.0001304149627685547
QKV Transform: 0.0025348663330078125
Flash: 0.0074329376220703125
Attention linproj: 0.0005214214324951172
Post-attention Dropout: 0.00033664703369140625
Post-attention residual: 0.00011277198791503906
LN2: 0.00011491775512695312
MLP_h_4h: 0.003604412078857422
MLP_4h_h: 0.0017104148864746094
Post-MLP residual: 0.0003368854522705078
Attention layer time: 0.017127275466918945
LN1: 0.00013065338134765625
QKV Transform: 0.002564668655395508
Flash: 0.007422447204589844
Attention linproj: 0.0005414485931396484
Post-attention Dropout: 0.00033855438232421875
Post-attention residual: 0.00011301040649414062
LN2: 0.0001163482666015625
MLP_h_4h: 0.003571033477783203
MLP_4h_h: 0.0017123222351074219
Post-MLP residual: 0.00033545494079589844
Attention layer time: 0.017145872116088867
LN1: 0.00013113021850585938
QKV Transform: 0.0025322437286376953
Flash: 0.0074269771575927734
Attention linproj: 0.0005211830139160156
Post-attention Dropout: 0.000347137451171875
Post-attention residual: 0.00011444091796875
LN2: 0.0001163482666015625
MLP_h_4h: 0.0035791397094726562
MLP_4h_h: 0.0017130374908447266
Post-MLP residual: 0.0003380775451660156
Attention layer time: 0.017121553421020508
LN1: 0.000133514404296875
QKV Transform: 0.002515077590942383
Flash: 0.00743412971496582
Attention linproj: 0.0005199909210205078
Post-attention Dropout: 0.00033283233642578125
Post-attention residual: 0.00011277198791503906
LN2: 0.00011515617370605469
MLP_h_4h: 0.003607511520385742
MLP_4h_h: 0.0017099380493164062
Post-MLP residual: 0.00033783912658691406
Attention layer time: 0.017107725143432617
LN1: 0.00013303756713867188
QKV Transform: 0.002299070358276367
Flash: 0.0074291229248046875
Attention linproj: 0.0005204677581787109
Post-attention Dropout: 0.00033593177795410156
Post-attention residual: 0.00011444091796875
LN2: 0.00011587142944335938
MLP_h_4h: 0.0035991668701171875
MLP_4h_h: 0.0017108917236328125
Post-MLP residual: 0.0003383159637451172
Attention layer time: 0.016889572143554688
LN1: 0.00013065338134765625
QKV Transform: 0.0026144981384277344
Flash: 0.007416486740112305
Attention linproj: 0.0005242824554443359
Post-attention Dropout: 0.0003383159637451172
Post-attention residual: 0.00011157989501953125
LN2: 0.00011515617370605469
MLP_h_4h: 0.003602266311645508
MLP_4h_h: 0.0017092227935791016
Post-MLP residual: 0.00033783912658691406
Attention layer time: 0.01719498634338379
LN1: 0.00013136863708496094
QKV Transform: 0.002560138702392578
Flash: 0.007429838180541992
Attention linproj: 0.0005230903625488281
Post-attention Dropout: 0.0003447532653808594
Post-attention residual: 0.00011491775512695312
LN2: 0.00011706352233886719
MLP_h_4h: 0.0035789012908935547
MLP_4h_h: 0.0017137527465820312
Post-MLP residual: 0.0003361701965332031
Attention layer time: 0.017151594161987305
LN1: 0.0001323223114013672
QKV Transform: 0.0025587081909179688
Flash: 0.007426023483276367
Attention linproj: 0.0005209445953369141
Post-attention Dropout: 0.0003345012664794922
Post-attention residual: 0.00011301040649414062
LN2: 0.00011444091796875
MLP_h_4h: 0.0036139488220214844
MLP_4h_h: 0.001708984375
Post-MLP residual: 0.0003497600555419922
Attention layer time: 0.017160654067993164
LN1: 0.0001308917999267578
QKV Transform: 0.0025517940521240234
Flash: 0.00743412971496582
Attention linproj: 0.0005202293395996094
Post-attention Dropout: 0.0003349781036376953
Post-attention residual: 0.00011277198791503906
LN2: 0.00011396408081054688
MLP_h_4h: 0.003612518310546875
MLP_4h_h: 0.0017087459564208984
Post-MLP residual: 0.0003371238708496094
Attention layer time: 0.01714801788330078
LN1: 0.00013256072998046875
QKV Transform: 0.002493143081665039
Flash: 0.007415294647216797
Attention linproj: 0.0005145072937011719
Post-attention Dropout: 0.00033736228942871094
Post-attention residual: 0.00011324882507324219
LN2: 0.00011610984802246094
MLP_h_4h: 0.003598451614379883
MLP_4h_h: 0.0017120838165283203
Post-MLP residual: 0.0003352165222167969
Attention layer time: 0.017078638076782227
LN1: 0.00013113021850585938
QKV Transform: 0.002537250518798828
Flash: 0.007415771484375
Attention linproj: 0.0005204677581787109