JustPaste.it

nsys_stat_rnnt

Generating SQLite file nsys_rnnt.sqlite from nsys_rnnt.nsys-rep
Exporting 5097333 events: [================================================100%]
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/nvtxsum.py]...
 
** NVTX Range Summary (nvtxsum):
 
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Style Range
-------- --------------- --------- ----------- ----------- --------- ---------- ----------- -------- -------------------------------------------------------------------------
16.6 443864089899 29272 15163435.7 15296093.5 525460 75601741 3195537.9 StartEnd ENC:run
16.5 440570476479 29432 14969097.5 15184381.0 5190 75488589 3372848.4 PushPop TensorRT:ExecutionContext::enqueue
16.5 440524735477 475 927420495.7 944250581.0 2191669 1876032379 190676855.6 StartEnd DALI:makeDaliBatch
16.4 437102430431 475 920215643.0 937182594.0 500 1869174697 190548032.7 StartEnd Dali::WaitForFreeBufIndexes
12.0 320069221511 29271 10934686.9 10892785.0 170963 75684312 2995694.1 StartEnd DEC: run
8.4 223844401120 29272 7647048.4 7860396.0 233925 67161290 909296.1 PushPop TensorRT:encoder_post_rnn
8.1 215289766169 29272 7354802.1 7224804.0 163183 56677083 2946542.9 PushPop TensorRT:encoder_pre_rnn
4.7 126851557924 29272 4333546.0 4521646.0 4690 2375412792 14048370.3 StartEnd DEC: wait for encoder transfer
0.2 6170572950 474 13018086.4 12493323.0 2537545 177038603 7601351.0 StartEnd DALI:RunDaliBatch
0.2 4497367956 474 9488118.1 8508592.0 2094197 176062176 7750985.4 PushPop DALI:[DALI][Executor] RunGPU
0.1 2733228433 29272 93373.5 840.0 220 2363594280 13942702.0 StartEnd ENC: wait for FE
0.1 2509435816 474 5294168.4 5328020.0 68801 9909657 827584.6 PushPop DALI:[DALI][GPU op] __Pad_10
0.1 2286660980 29272 78117.7 39080.5 19760 1068415380 6244583.4 StartEnd ENC: wait for decoder init
0.1 2196720878 29272 75045.1 36640.0 28310 1068415660 6244599.1 StartEnd DEC: tensor init
0.0 417428989 1 417428989.0 417428989.0 417428989 417428989 0.0 StartEnd Main::Warmup
0.0 377987377 474 797441.7 622176.0 136502 6320663 549145.3 PushPop DALI:[DALI][GPU op] __ArithmeticGenericOp_4
0.0 364581006 29272 12454.9 13450.0 5680 1980116 12058.0 PushPop TensorRT:(Unnamed Layer* 1) [ElementWise]
0.0 307852380 474 649477.6 356526.5 332656 3074355 723197.1 PushPop DALI:[DALI][GPU op] __Shapes_7
0.0 240199753 474 506750.5 451243.0 348766 1164240 135254.8 PushPop DALI:[DALI][GPU op] __PreemphasisFilter_2
0.0 208167101 474 439171.1 536764.5 55231 757273 213270.2 PushPop DALI:[DALI][GPU op] __Pad_13
0.0 197100277 29272 6733.4 6890.0 4230 89671 1743.2 PushPop TensorRT:(Unnamed Layer* 3) [ElementWise]
0.0 186097762 474 392611.3 345846.0 40581 1429366 109014.1 PushPop DALI:[DALI][GPU op] __MelFilterBank_5
0.0 184406409 474 389043.1 17905.5 11710 171770449 7888491.7 PushPop DALI:[DALI][GPU op] __Cast_1
0.0 171980122 474 362827.3 321821.0 83002 3702316 231048.3 PushPop DALI:[DALI][GPU op] __Spectrogram_3
0.0 150291959 29272 5134.3 5100.0 1150 48791 2267.5 StartEnd ENC: transfer state
0.0 93187617 474 196598.3 103322.0 91382 1869553 351770.2 PushPop DALI:[DALI][GPU op] __ArithmeticGenericOp_9
0.0 80646412 474 170140.1 139597.0 100452 554030 104838.9 PushPop DALI:[DALI][GPU op] __Normalize_12
0.0 70611257 29272 2412.2 2610.0 530 20580 1193.0 PushPop TensorRT:encoder_reshape
0.0 60896172 474 128472.9 111212.0 91842 374417 38591.5 PushPop DALI:[DALI][GPU op] __ArithmeticGenericOp_8
0.0 57142150 474 120553.1 39151.0 21320 421878 143969.2 StartEnd DALI:allocateDaliBatch
0.0 39039489 29272 1333.7 1310.0 370 381737 3970.4 StartEnd ENC: wait for decoder engine
0.0 38015051 948 40100.3 34331.0 16451 363246 22619.6 PushPop DALI:[DALI][C API] daliOutputCopySamples
0.0 22590615 474 47659.5 15415.5 10040 1670410 191537.5 PushPop DALI:[DALI][GPU op] __Cast_14
0.0 20725409 29272 708.0 590.0 170 19210 586.4 StartEnd DALI::ReleaseBatch
0.0 18015637 29272 615.5 580.0 190 17251 563.6 PushPop TensorRT:(Unnamed Layer* 0) [Constant]
0.0 16307034 474 34403.0 29525.5 20570 1720801 78452.9 PushPop DALI:[DALI][GPU op] __ToDecibels_6
0.0 10605219 7342 1444.5 1450.0 370 21700 712.6 StartEnd DALI::GetBatch
0.0 10084700 29272 344.5 280.0 120 14550 511.0 PushPop TensorRT:(Unnamed Layer* 2) [Constant]
0.0 8260205 474 17426.6 14985.5 10720 378846 17587.4 PushPop DALI:[DALI][GPU op] __Reshape_11
0.0 8219893 474 17341.5 14215.0 9091 43690 7547.2 PushPop DALI:[DALI][GPU op] INPUT_0
0.0 7853677 474 16568.9 12985.0 4770 188204 12361.2 PushPop DALI:[DALI][Executor] RunMixed
0.0 5642008 12 470167.3 460413.0 420657 563290 39622.8 StartEnd myelin-exec:myelinGraphLoadPersistent
0.0 5370178 12 447514.8 435548.0 395577 645251 64397.7 StartEnd myelinGraphDeserializeBinary
0.0 4322640 474 9119.5 6895.0 2400 182163 9597.6 PushPop DALI:[DALI][Executor] RunCPU
0.0 2428473 40 60711.8 49621.0 34940 461098 65965.0 PushPop TensorRT:decoder_rnn
0.0 2152161 474 4540.4 4440.0 1630 14201 1010.7 PushPop DALI:[DALI][ExternalSource] SetDataSource
0.0 1439775 475 3031.1 2630.0 1080 140702 6426.3 StartEnd SyncWorkQueue::GetBatch
0.0 1113699 40 27842.5 20760.5 10920 152603 22541.7 PushPop TensorRT:{ForeignNode[(Unnamed Layer* 0) [Constant]...decoder_embedding]}
0.0 508534 40 12713.4 12520.5 9530 29261 3499.9 PushPop TensorRT:joint_fc1_b
0.0 256615 40 6415.4 5925.0 4200 15070 2199.1 PushPop TensorRT:joint_fc1_a
0.0 236525 40 5913.1 6170.0 3830 10770 1931.7 StartEnd myelin-exec:myelinGraphExecute
0.0 236214 40 5905.4 5090.0 2760 20420 3408.1 PushPop TensorRT:[HostToDeviceCopy]
0.0 159635 40 3990.9 3870.0 2560 9350 1439.0 PushPop TensorRT:Select3
0.0 157612 72 2189.1 490.0 330 117792 13819.4 StartEnd myelin-exec:myelinTensorGetMemory
0.0 33550 40 838.8 775.0 540 1630 253.4 StartEnd myelin-exec:myelinGraphLoad
0.0 30981 56 553.2 530.0 260 2850 346.8 StartEnd myelin-exec:myelinTensorSetMemory
0.0 25211 1 25211.0 25211.0 25211 25211 0.0 PushPop DALI:[DALI][Executor] PresizeData
0.0 17200 40 430.0 390.0 320 1030 132.1 PushPop TensorRT:Reformatting CopyNode for Input Tensor 0 to joint_fc1_b
0.0 16760 12 1396.7 960.0 840 4370 1069.8 StartEnd myelinBinaryGraphCreate
0.0 13980 40 349.5 315.0 270 620 74.9 PushPop TensorRT:Reformatting CopyNode for Output Tensor 0 to joint_fc1_b
0.0 13720 40 343.0 285.0 160 2830 410.4 StartEnd myelin-exec:myelinGraphUnload
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/osrtsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain OS Runtime trace data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/cudaapisum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain CUDA trace data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/gpukernsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain CUDA kernel data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/gpumemtimesum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain GPU memory data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/gpumemsizesum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain GPU memory data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/openmpevtsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain OpenMP event data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/khrdebugsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain KHR Extension (KHR_DEBUG) data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/khrdebuggpusum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain GPU KHR Extension (KHR_DEBUG) data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/vulkanmarkerssum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain Vulkan Debug Extension (Vulkan Debug Util) data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/vulkangpumarkersum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain GPU Vulkan Debug Extension (GPU Vulkan Debug markers) data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/dx11pixsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain DX11 CPU debug markers.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/dx12gpumarkersum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain DX12 GPU debug markers.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/dx12pixsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain DX12 CPU debug markers.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/wddmqueuesdetails.py]...
SKIPPED: nsys_rnnt.sqlite does not contain WDDM context data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/unifiedmemory.py]...
SKIPPED: nsys_rnnt.sqlite does not contain CUDA memory transfers data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/unifiedmemorytotals.py]...
SKIPPED: nsys_rnnt.sqlite does not contain CUDA memory transfers data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/umcpupagefaults.py]...
SKIPPED: nsys_rnnt.sqlite does not contain CUDA Unified Memory CPU page faults data.
 
Processing [nsys_rnnt.sqlite] with [/opt/nvidia/nsight-systems/2022.4.1/host-linux-x64/reports/openaccsum.py]...
SKIPPED: nsys_rnnt.sqlite does not contain OpenACC event data.