Tutorial on convolution for Image Processing and Convolutional Neural Nets

This post includes some helpful tutorials I found about convolution used in image processing and convolutional neural nets. It really helped me understand the operator and how convolution works. Overall, I recommend a two-phase approach. First, understand how convolution kernel works for image processing (2D convolution, easier to grasp), then move on to convolutional neural nets (3D convolution, harder to wrap your head around)

Great tutorials on convolution for image processing (I recommend skim through 1,2, 3 first, they are very helpful in helping you understand the masking operation in a 2D convolution scenario).

  1. ApplicationsofConvolutioninImageProcessingwithMatlab“, a very well written and relatively concise tutorial with good examples in MATLAB.
    1. the link is here “http://www.math.washington.edu/~wcasper/math326/projects/sung_kim.pdf”
    2. imagine it as shifting a Matrix Mask across the input image
  2. SimpleSpatialOperations, a more comprehensive with more mathematical details. But it contains more interesting contents for in-depth learning as well.
    1. http://www.cs.uu.nl/docs/vakken/ibv/reader/chapter5.pdf
  3. Apple’s tutorial, lots of pictures!
    1. https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

Once I understood the basic convolution operations in image processing(mostly 2D), I moved on to research into how convolution works in convolutional neural nets

  1. https://github.com/Yangqing/caffe/wiki/Convolution-in-Caffe:-a-memo
    1. for w in 1..W
        for h in 1..H
          for x in 1..K
            for y in 1..K
              for m in 1..M
                for d in 1..D
                  output(w, h, m) += input(w+x, h+y, d) * filter(m, x, y, d)
      1. this is example is essentially a build up from a previous example on 2D convolution shown here (http://en.wikipedia.org/wiki/Kernel_(image_processing)), the third dimension in the input and the output are added in through the two inner most for loops
        1. for each image row in output image:
             for each pixel in image row:
                set accumulator to zero
                for each kernel row in kernel:
                   for each element in kernel row:
                      if element position  corresponding* to pixel position then
                         multiply element value  corresponding* to pixel value
                         add result to accumulator
                set output image pixel to accumulator
    2. Visually, a good picture shown here illustrates a 4D weight matrix (4D tensor) for using convolution in convolutional neural nets
      1. (TODO: link is down, http://deeplearning.net/tutorial/lenet.html, grab the picture next time)
      2. the explanation about the picture (4 indices are good, the one paragraph above is bad)
    3. The wikipedia page is not bad as well,
      1. http://en.wikipedia.org/wiki/Convolutional_neural_network
This entry was posted in Convoluted Neural Nets. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s