This post includes some helpful tutorials I found about convolution used in image processing and convolutional neural nets. It really helped me understand the operator and how convolution works. Overall, I recommend a twophase approach. First, understand how convolution kernel works for image processing (2D convolution, easier to grasp), then move on to convolutional neural nets (3D convolution, harder to wrap your head around)
Great tutorials on convolution for image processing (I recommend skim through 1,2, 3 first, they are very helpful in helping you understand the masking operation in a 2D convolution scenario).
 “ApplicationsofConvolutioninImageProcessingwithMatlab“, a very well written and relatively concise tutorial with good examples in MATLAB.
 the link is here “http://www.math.washington.edu/~wcasper/math326/projects/sung_kim.pdf”
 imagine it as shifting a Matrix Mask across the input image
 SimpleSpatialOperations, a more comprehensive with more mathematical details. But it contains more interesting contents for indepth learning as well.
 Apple’s tutorial, lots of pictures!
Once I understood the basic convolution operations in image processing(mostly 2D), I moved on to research into how convolution works in convolutional neural nets
 https://github.com/Yangqing/caffe/wiki/ConvolutioninCaffe:amemo

for w in 1..W for h in 1..H for x in 1..K for y in 1..K for m in 1..M for d in 1..D output(w, h, m) += input(w+x, h+y, d) * filter(m, x, y, d) end end end end end end
 this is example is essentially a build up from a previous example on 2D convolution shown here (http://en.wikipedia.org/wiki/Kernel_(image_processing)), the third dimension in the input and the output are added in through the two inner most for loops

for each image row in output image: for each pixel in image row: set accumulator to zero for each kernel row in kernel: for each element in kernel row: if element position corresponding* to pixel position then multiply element value corresponding* to pixel value add result to accumulator endif set output image pixel to accumulator

 this is example is essentially a build up from a previous example on 2D convolution shown here (http://en.wikipedia.org/wiki/Kernel_(image_processing)), the third dimension in the input and the output are added in through the two inner most for loops
 Visually, a good picture shown here illustrates a 4D weight matrix (4D tensor) for using convolution in convolutional neural nets
 (TODO: link is down, http://deeplearning.net/tutorial/lenet.html, grab the picture next time)
 the explanation about the picture (4 indices are good, the one paragraph above is bad)
 The wikipedia page is not bad as well,
