This is a summary of meeting notes with Aditya on Dec 22nd.
We discussed the structure and different types layers in CAFE, a framework that does convolutional neural networks. The link to CAFE is here http://caffe.berkeleyvision.org
- On a high level, there are three different configuration files for CAFE (the example is here following the example https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet.
- Solver.prototxt (https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/solver.prototxt)
- general supervised learning parameters
- net: location of the train_val.prototxt
- test_iter: number of test iterations
- test_interval
- base_lr
- lr_policy (learning policy)
- ….
- Most of the parameters are required. The parameters are generally suited to the learning task itself, not deeply associated with the neural nets architecture.
- general supervised learning parameters
- Train_val.prototxt (https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/train_val.prototxt)
- It contains the information about each layer in the neural nets, used for feed forward and backward propagation to update the parameters
- Deploy.prototxt (https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/deploy.prototxt)
- Used for validating the model, it usually has the exact same layer setting as the train_val.prototxt because the parameters only work in that area.
- Solver.prototxt (https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/solver.prototxt)
- Different Type of Layers
- Data Layer
- Used for serializing images and store them into the database
- top is what comes out of the layer,
- different types of back end (level db, in-memory, HDF5, image)
- level db is suppose to be the fastest, but it is not very flexible. If you want to change the type of one parameter, you have to recompile the whole thing and rerun the entire database conversion scheme.
-
- HDF5: supposingly more flexible but it might not be as effective as leveldb
- image: too much space
- The original images are in compressed JPEG format, it needs to be decompressed and serialized.
- Mean file, contains the mean of all the images, it is currently generated outside and have to hard code a directory to the path file (BAD PRACTICE)
- INCLUDE: this means the data layer can be used for Train and Test phase, the only part that is different is the DIRECTORY.
- Convolutional Layer
- bottom: data (means the output of the data layer is fed into the convolutional layer).
- top: output is conv1
- blobs_1f
- weight_decay: 1
- weight_decay: 0 (doesn’t update the weight for the bias term)
- Convolution Param
- num_output: 96 (there should be 96 filters)
- kernel_size: 11 (the dimension of the sliding window in the original image)
- stride: 4 (the sliding window shifts by 4 pixels every time)
- In the end there should be 55*55*96 (55 comes from 276 pixels in the image divided by 4, there are 55*55 11×11 windows).
- weight_filter: guassian
- bias_filter constant 0
- RELU (rectified linear, basically it does a max(o, value) for each output)
- This layer is very fast and inexpensive, it is done in between a convolutional layer and a pooling layer, or in between two convolutional layers
- Somehow it adds nonlinearity into the system, which makes the system more useful
- Pooling
- pool: MAX (max pool)
- kernel_size
- stride:
- LRN (local response normalization)
- similar to convolution, slide the window across
- Inner Product
- This is the traditional neural nets hidden layer, it is a big matrix multiplication
- Loss
- Different ways to calculate the loss and propagate the errors backward to update the parameters
- Data Layer
- Possible sequence between layers
- convolution + RELU + convolution
- convolution + RELU + pooling
- Possible computation graphs
- The computation graph is a DAG, it is not linear, it can have multiple loss layer in the end, it is acyclic.
- We should keep this in mind when specifying the computation.
- Other notes and ideas
- If it is hard to do convolution, then we should consider hybrid CPU and GPU computation
- Can we do data or task parallelism?