This post summarizes how 3D convolution is implemented as a 2D matrix multiplication in CAFFE and other popular CNN implementations.
we start by analyzing the code in conv_layer, specifically we look at forward_cpu code. If we follow the example of the first convolution layer, we know it is a 3D tensor operations but converted to a 2D matrix multiplication. Th important dimensions are M_, N_, K_.
Why is M_, N_, K_ important, we can see in forward CPU, it matrix A in BLAS is the weight matrix. it has M rows and K columns. B is the linearized 3D tensor in 2D matrix using a img2col operation. Its dimenstion is K by N. (see the documentation for A B and BLAST below)
So the convolution becomes
A(M by K) * B (K by N)
The code that calculates M_, N_, K_ are shown below
// Prepare the matrix multiplication computation. // Each input will be convolved as a single GEMM. M_ = num_output_ / group_; K_ = channels_ * kernel_h_ * kernel_w_ / group_; N_ = height_out_ * width_out_;
The call to forward CPU in convolution layer
for (int n = 0; n < num_; ++n) { // im2col transformation: unroll input regions for filtering // into column matrix for multplication. im2col_cpu(bottom_data + bottom[i]->offset(n), channels_, height_, width_, kernel_h_, kernel_w_, pad_h_, pad_w_, stride_h_, stride_w_, col_data); // Take inner products for groups. for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[i]->offset(n) + top_offset * g); } // Add bias. if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), bias_multiplier_.cpu_data(), (Dtype)1., top_data + (*top)[i]->offset(n)); } } }
CAFFE function call
template<> void caffe_cpu_gemm<float>(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K, const float alpha, const float* A, const float* B, const float beta, float* C) { int lda = (TransA == CblasNoTrans) ? K : M; int ldb = (TransB == CblasNoTrans) ? N : K; cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, N); }
function documentation
void cblas_dgemm ( const enum CBLAS_ORDER __ Order , const enum CBLAS_TRANSPOSE __ TransA ,const enum CBLAS_TRANSPOSE __ TransB , const int __ M , const int __ N , const int __ K ,const double __ alpha , const double *__ A , const int __ lda , const double *__ B , constint __ ldb , const double __ beta , double *__ C , const int __ ldc );
Parameters
Order |
Specifies row-major (C) or column-major (Fortran) data ordering. |
TransA |
Specifies whether to transpose matrix |
TransB |
Specifies whether to transpose matrix |
M |
Number of rows in matrices |
N |
Number of columns in matrices |
K |
Number of columns in matrix |
alpha |
Scaling factor for the product of matrices A and B. |
A |
Matrix A. |
lda |
The size of the first dimention of matrix |
B |
Matrix B. |
ldb |
The size of the first dimention of matrix |
beta |
Scaling factor for matrix C. |
C |
Matrix C. |
ldc |
The size of the first dimention of matrix |
So for example, for the first convolution layer, it is essentially a
96 by 363 (A) * 363 by 3025 and you get a 96 by 3025 matrix, corresponding to the 96 * 55 *55 3D tensor.