Meeting Notes with Aditya on Convolutional Neural Nets with CAFE, Dec 22nd

This is a summary of meeting notes with Aditya on Dec 22nd.

We discussed the structure and different types layers in CAFE, a framework that does convolutional neural networks. The link to CAFE is here http://caffe.berkeleyvision.org

  • On a high level, there are three different configuration files for CAFE (the example is here following the example https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet.
  • Different Type of Layers
    • Data Layer
      • Used for serializing images and store them into the database
      • top is what comes out of the layer,
      • different types of back end (level db, in-memory, HDF5, image)
        • level db is suppose to be the fastest, but it is not very flexible. If you want to change the type of one parameter, you have to recompile the whole thing and rerun the entire database conversion scheme. 
        • HDF5: supposingly more flexible but it might not be as effective as leveldb
        • image: too much space
      • The original images are in compressed JPEG format, it needs to be decompressed and serialized.
      • Mean file, contains the mean of all the images, it is currently generated outside and have to hard code a directory to the path file (BAD PRACTICE)
      • INCLUDE: this means the data layer can be used for Train and Test phase, the only part that is different is the DIRECTORY.
    • Convolutional Layer
      • bottom: data (means the output of the data layer is fed into the convolutional layer).
      • top: output is conv1
      • blobs_1f
      • weight_decay: 1
      • weight_decay: 0 (doesn’t update the weight for the bias term)
      • Convolution Param
        • num_output: 96 (there should be 96 filters)
        • kernel_size: 11 (the dimension of the sliding window in the original image)
        • stride: 4 (the sliding window shifts by 4 pixels every time)
        • In the end there should be 55*55*96 (55 comes from 276 pixels in the image divided by 4, there are 55*55 11×11 windows).
        • weight_filter: guassian
        • bias_filter constant 0
    • RELU (rectified linear, basically it does a max(o, value) for each output)
      • This layer is very fast and inexpensive, it is done in between a convolutional layer and a pooling layer, or in between two convolutional layers
      • Somehow it adds nonlinearity into the system, which makes the system more useful
    • Pooling
      • pool: MAX (max pool)
      • kernel_size
      • stride:
    • LRN (local response normalization)
      • similar to convolution, slide the window across
    • Inner Product
      • This is the traditional neural nets hidden layer, it is a big matrix multiplication
    • Loss
      • Different ways to calculate the loss and propagate the errors backward to update the parameters
  • Possible sequence between layers
    • convolution + RELU + convolution
    • convolution + RELU + pooling
  • Possible computation graphs
    • The computation graph is a DAG, it is not linear, it can have multiple loss layer in the end, it is acyclic.
    • We should keep this in mind when specifying the computation.
  • Other notes and ideas
    • If it is hard to do convolution, then we should consider hybrid CPU and GPU computation
    • Can we do data or task parallelism?
This entry was posted in Convoluted Neural Nets and tagged , . Bookmark the permalink.

Leave a comment