Options: Input size: N = 32, C = 64, H = 256, W = 256 Output size: N = 32, K = 64, OH = 241, OW = 241 Filter size: K = 64, C = 64, R = 16, S = 16 Number of iterations: 1 Validation: off Initializing... done! Initializing Convolution... Calculating...(iter=0) 40.041388 sec Avg. throughput: 97.343027 GFLOPS