379 lines
15 KiB
Plaintext
379 lines
15 KiB
Plaintext
Options:
|
|
Input size: N = 8, C = 8, H = 8, W = 8
|
|
Output size: N = 8, K = 8, OH = 6, OW = 6
|
|
Filter size: K = 8, C = 8, R = 3, S = 3
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000106 sec
|
|
Validating...
|
|
output[2][0][0][0] : correct_value = 0.365192, your_value = 0.000000
|
|
output[2][0][0][1] : correct_value = -1.197354, your_value = 0.000000
|
|
output[2][0][0][2] : correct_value = 0.148179, your_value = 0.000000
|
|
output[2][0][0][3] : correct_value = 0.321454, your_value = 0.000000
|
|
output[2][0][0][4] : correct_value = 0.076720, your_value = 0.000000
|
|
output[2][0][0][5] : correct_value = 0.564233, your_value = 0.000000
|
|
output[2][0][1][0] : correct_value = -0.917365, your_value = 0.000000
|
|
output[2][0][1][1] : correct_value = 0.539823, your_value = 0.000000
|
|
output[2][0][1][2] : correct_value = 0.033946, your_value = 0.000000
|
|
output[2][0][1][3] : correct_value = 0.390882, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 3.127122 GFLOPS
|
|
Options:
|
|
Input size: N = 8, C = 8, H = 8, W = 8
|
|
Output size: N = 8, K = 8, OH = 6, OW = 6
|
|
Filter size: K = 8, C = 8, R = 3, S = 3
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000314 sec
|
|
Validating...
|
|
output[4][0][0][0] : correct_value = 0.062610, your_value = 0.000000
|
|
output[4][0][0][1] : correct_value = -0.539305, your_value = 0.000000
|
|
output[4][0][0][2] : correct_value = -1.391267, your_value = 0.000000
|
|
output[4][0][0][3] : correct_value = 0.877585, your_value = 0.000000
|
|
output[4][0][0][4] : correct_value = 1.212355, your_value = 0.000000
|
|
output[4][0][0][5] : correct_value = -0.208027, your_value = 0.000000
|
|
output[4][0][1][0] : correct_value = 0.371816, your_value = 0.000000
|
|
output[4][0][1][1] : correct_value = 0.381102, your_value = 0.000000
|
|
output[4][0][1][2] : correct_value = -0.378577, your_value = 0.000000
|
|
output[4][0][1][3] : correct_value = -0.433649, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 1.056621 GFLOPS
|
|
Options:
|
|
Input size: N = 3, C = 3, H = 256, W = 256
|
|
Output size: N = 3, K = 3, OH = 129, OW = 129
|
|
Filter size: K = 3, C = 3, R = 128, S = 128
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000753 sec
|
|
Validating...
|
|
output[0][0][0][0] : correct_value = -21.891909, your_value = 0.000000
|
|
output[0][0][0][1] : correct_value = -18.751963, your_value = 0.000000
|
|
output[0][0][0][2] : correct_value = 15.855617, your_value = 0.000000
|
|
output[0][0][0][3] : correct_value = 7.974744, your_value = 0.000000
|
|
output[0][0][0][4] : correct_value = 25.468287, your_value = 0.000000
|
|
output[0][0][0][5] : correct_value = 1.117465, your_value = 0.000000
|
|
output[0][0][0][6] : correct_value = -3.991683, your_value = 0.000000
|
|
output[0][0][0][7] : correct_value = -23.022726, your_value = 0.000000
|
|
output[0][0][0][8] : correct_value = 3.441098, your_value = 0.000000
|
|
output[0][0][0][9] : correct_value = 1.015278, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 19554.238083 GFLOPS
|
|
Options:
|
|
Input size: N = 3, C = 3, H = 256, W = 256
|
|
Output size: N = 3, K = 3, OH = 129, OW = 129
|
|
Filter size: K = 3, C = 3, R = 128, S = 128
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.001323 sec
|
|
Validating...
|
|
output[0][0][0][0] : correct_value = -21.891909, your_value = 0.000000
|
|
output[0][0][0][1] : correct_value = -18.751963, your_value = 0.000000
|
|
output[0][0][0][2] : correct_value = 15.855617, your_value = 0.000000
|
|
output[0][0][0][3] : correct_value = 7.974744, your_value = 0.000000
|
|
output[0][0][0][4] : correct_value = 25.468287, your_value = 0.000000
|
|
output[0][0][0][5] : correct_value = 1.117465, your_value = 0.000000
|
|
output[0][0][0][6] : correct_value = -3.991683, your_value = 0.000000
|
|
output[0][0][0][7] : correct_value = -23.022726, your_value = 0.000000
|
|
output[0][0][0][8] : correct_value = 3.441098, your_value = 0.000000
|
|
output[0][0][0][9] : correct_value = 1.015278, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 11126.537634 GFLOPS
|
|
Options:
|
|
Input size: N = 128, C = 128, H = 8, W = 8
|
|
Output size: N = 128, K = 64, OH = 1, OW = 1
|
|
Filter size: K = 64, C = 128, R = 8, S = 8
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.003022 sec
|
|
Validating...
|
|
output[32][0][0][0] : correct_value = -4.038291, your_value = 0.000000
|
|
output[32][1][0][0] : correct_value = 8.774968, your_value = 0.000000
|
|
output[32][2][0][0] : correct_value = 3.793368, your_value = 0.000000
|
|
output[32][3][0][0] : correct_value = 5.628757, your_value = 0.000000
|
|
output[32][4][0][0] : correct_value = -9.612059, your_value = 0.000000
|
|
output[32][5][0][0] : correct_value = -1.431466, your_value = 0.000000
|
|
output[32][6][0][0] : correct_value = -1.338276, your_value = 0.000000
|
|
output[32][7][0][0] : correct_value = -5.794408, your_value = 0.000000
|
|
output[32][8][0][0] : correct_value = -9.986221, your_value = 0.000000
|
|
output[32][9][0][0] : correct_value = -6.172217, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 44.414198 GFLOPS
|
|
Options:
|
|
Input size: N = 128, C = 128, H = 8, W = 8
|
|
Output size: N = 128, K = 64, OH = 1, OW = 1
|
|
Filter size: K = 64, C = 128, R = 8, S = 8
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.004228 sec
|
|
Validating...
|
|
output[64][0][0][0] : correct_value = -9.491610, your_value = 0.000000
|
|
output[64][1][0][0] : correct_value = 1.862121, your_value = 0.000000
|
|
output[64][2][0][0] : correct_value = -2.617583, your_value = 0.000000
|
|
output[64][3][0][0] : correct_value = 2.565935, your_value = 0.000000
|
|
output[64][4][0][0] : correct_value = 11.778970, your_value = 0.000000
|
|
output[64][5][0][0] : correct_value = -1.517091, your_value = 0.000000
|
|
output[64][6][0][0] : correct_value = -1.629764, your_value = 0.000000
|
|
output[64][7][0][0] : correct_value = 9.531843, your_value = 0.000000
|
|
output[64][8][0][0] : correct_value = 3.416710, your_value = 0.000000
|
|
output[64][9][0][0] : correct_value = -3.197026, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 31.744105 GFLOPS
|
|
Options:
|
|
Input size: N = 5, C = 4, H = 64, W = 64
|
|
Output size: N = 5, K = 4, OH = 49, OW = 49
|
|
Filter size: K = 4, C = 4, R = 16, S = 16
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000185 sec
|
|
Validating...
|
|
output[1][0][0][0] : correct_value = 0.677187, your_value = 0.000000
|
|
output[1][0][0][1] : correct_value = -6.503040, your_value = 0.000000
|
|
output[1][0][0][2] : correct_value = 2.914294, your_value = 0.000000
|
|
output[1][0][0][3] : correct_value = 1.550453, your_value = 0.000000
|
|
output[1][0][0][4] : correct_value = 6.465028, your_value = 0.000000
|
|
output[1][0][0][5] : correct_value = -2.904995, your_value = 0.000000
|
|
output[1][0][0][6] : correct_value = 2.994764, your_value = 0.000000
|
|
output[1][0][0][7] : correct_value = 3.913258, your_value = 0.000000
|
|
output[1][0][0][8] : correct_value = 4.068929, your_value = 0.000000
|
|
output[1][0][0][9] : correct_value = -6.206963, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 531.557550 GFLOPS
|
|
Options:
|
|
Input size: N = 5, C = 4, H = 64, W = 64
|
|
Output size: N = 5, K = 4, OH = 49, OW = 49
|
|
Filter size: K = 4, C = 4, R = 16, S = 16
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000395 sec
|
|
Validating...
|
|
output[2][0][0][0] : correct_value = 3.029225, your_value = 0.000000
|
|
output[2][0][0][1] : correct_value = 0.363245, your_value = 0.000000
|
|
output[2][0][0][2] : correct_value = 0.399010, your_value = 0.000000
|
|
output[2][0][0][3] : correct_value = -3.041136, your_value = 0.000000
|
|
output[2][0][0][4] : correct_value = 4.128718, your_value = 0.000000
|
|
output[2][0][0][5] : correct_value = 0.399713, your_value = 0.000000
|
|
output[2][0][0][6] : correct_value = 1.838342, your_value = 0.000000
|
|
output[2][0][0][7] : correct_value = 4.219049, your_value = 0.000000
|
|
output[2][0][0][8] : correct_value = 3.028255, your_value = 0.000000
|
|
output[2][0][0][9] : correct_value = 4.631683, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 248.937030 GFLOPS
|
|
Options:
|
|
Input size: N = 4, C = 5, H = 64, W = 64
|
|
Output size: N = 4, K = 4, OH = 49, OW = 49
|
|
Filter size: K = 4, C = 5, R = 16, S = 16
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000199 sec
|
|
Validating...
|
|
output[1][0][0][0] : correct_value = 1.719762, your_value = 0.000000
|
|
output[1][0][0][1] : correct_value = 1.531765, your_value = 0.000000
|
|
output[1][0][0][2] : correct_value = 1.210176, your_value = 0.000000
|
|
output[1][0][0][3] : correct_value = -4.477330, your_value = 0.000000
|
|
output[1][0][0][4] : correct_value = 0.494131, your_value = 0.000000
|
|
output[1][0][0][5] : correct_value = -0.255764, your_value = 0.000000
|
|
output[1][0][0][6] : correct_value = 0.686315, your_value = 0.000000
|
|
output[1][0][0][7] : correct_value = -4.050873, your_value = 0.000000
|
|
output[1][0][0][8] : correct_value = -0.804008, your_value = 0.000000
|
|
output[1][0][0][9] : correct_value = 0.196441, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 493.998394 GFLOPS
|
|
Options:
|
|
Input size: N = 4, C = 5, H = 64, W = 64
|
|
Output size: N = 4, K = 4, OH = 49, OW = 49
|
|
Filter size: K = 4, C = 5, R = 16, S = 16
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000416 sec
|
|
Validating...
|
|
output[2][0][0][0] : correct_value = -0.896348, your_value = 0.000000
|
|
output[2][0][0][1] : correct_value = 1.049320, your_value = 0.000000
|
|
output[2][0][0][2] : correct_value = -0.101196, your_value = 0.000000
|
|
output[2][0][0][3] : correct_value = -2.969104, your_value = 0.000000
|
|
output[2][0][0][4] : correct_value = 1.388640, your_value = 0.000000
|
|
output[2][0][0][5] : correct_value = 2.128573, your_value = 0.000000
|
|
output[2][0][0][6] : correct_value = -1.974248, your_value = 0.000000
|
|
output[2][0][0][7] : correct_value = 3.362661, your_value = 0.000000
|
|
output[2][0][0][8] : correct_value = -0.045959, your_value = 0.000000
|
|
output[2][0][0][9] : correct_value = 0.739286, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 236.383186 GFLOPS
|
|
Options:
|
|
Input size: N = 4, C = 2, H = 127, W = 67
|
|
Output size: N = 4, K = 4, OH = 112, OW = 52
|
|
Filter size: K = 4, C = 2, R = 16, S = 16
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.000141 sec
|
|
Validating...
|
|
output[1][0][0][0] : correct_value = 0.267448, your_value = 0.000000
|
|
output[1][0][0][1] : correct_value = 0.991923, your_value = 0.000000
|
|
output[1][0][0][2] : correct_value = -3.470256, your_value = 0.000000
|
|
output[1][0][0][3] : correct_value = -2.110252, your_value = 0.000000
|
|
output[1][0][0][4] : correct_value = -0.595913, your_value = 0.000000
|
|
output[1][0][0][5] : correct_value = -0.380152, your_value = 0.000000
|
|
output[1][0][0][6] : correct_value = 2.502929, your_value = 0.000000
|
|
output[1][0][0][7] : correct_value = 0.645218, your_value = 0.000000
|
|
output[1][0][0][8] : correct_value = 4.552518, your_value = 0.000000
|
|
output[1][0][0][9] : correct_value = 0.116878, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 676.051068 GFLOPS
|
|
Options:
|
|
Input size: N = 4, C = 2, H = 127, W = 67
|
|
Output size: N = 4, K = 4, OH = 112, OW = 52
|
|
Filter size: K = 4, C = 2, R = 16, S = 16
|
|
Number of iterations: 1
|
|
Validation: on
|
|
|
|
Initializing... done!
|
|
Initializing Convolution...
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Using 4 devices
|
|
[GPU 0] NVIDIA GeForce RTX 3090
|
|
[GPU 1] NVIDIA GeForce RTX 3090
|
|
[GPU 2] NVIDIA GeForce RTX 3090
|
|
[GPU 3] NVIDIA GeForce RTX 3090
|
|
Calculating...(iter=0) 0.003150 sec
|
|
Validating...
|
|
output[2][0][0][0] : correct_value = -2.424045, your_value = 0.000000
|
|
output[2][0][0][1] : correct_value = 0.212011, your_value = 0.000000
|
|
output[2][0][0][2] : correct_value = 3.081800, your_value = 0.000000
|
|
output[2][0][0][3] : correct_value = -1.424965, your_value = 0.000000
|
|
output[2][0][0][4] : correct_value = 1.231082, your_value = 0.000000
|
|
output[2][0][0][5] : correct_value = -0.647472, your_value = 0.000000
|
|
output[2][0][0][6] : correct_value = -3.381246, your_value = 0.000000
|
|
output[2][0][0][7] : correct_value = 1.497054, your_value = 0.000000
|
|
output[2][0][0][8] : correct_value = -0.003502, your_value = 0.000000
|
|
output[2][0][0][9] : correct_value = -1.916239, your_value = 0.000000
|
|
Too many error, only first 10 values are printed.
|
|
Result: INVALID
|
|
Avg. throughput: 30.292328 GFLOPS
|