How to instrument CAFFE to get timing on each layer

This is a post on how to instrument the CAFFE library ( here to have timing information on each layer.

We start from the file in caffe/src/solver.cpp. Solver implements the net interface

Solver<Dtype>::Solver(const SolverParameter& param)

    : net_()

In the function Solve(), there is one line that points to the actual forward and backward propagation, and calculates the loss in the end.

Dtype loss = net_->ForwardBackward(bottom_vec);

You can find the actual implementation of the ForwardBackward in net.hpp in caffe/include/net.hpp

Dtype ForwardBackward(const vector<Blob<Dtype>* > & bottom) {

Dtype loss;

Forward(bottom, &loss);


return loss;


In net.cpp we can find the definition of the forward function

template <typename Dtype>

string Net<Dtype>::Forward(const string& input_blob_protos, Dtype* loss) {

BlobProtoVector blob_proto_vec;

if (net_input_blobs_.size()) {


CHECK_EQ(blob_proto_vec.blobs_size(), net_input_blobs_.size())

<< "Incorrect input size.";

for (int i = 0; i < blob_proto_vec.blobs_size(); ++i) {






for (int i = 0; i < net_output_blobs_.size(); ++i) {



string output;


return output;


The relevant code is ForwardPrefilled(loss) as all other parts are just serializing and deserializing the code

template <typename Dtype>

const vector<Blob<Dtype>*>& Net<Dtype>::ForwardPrefilled(Dtype* loss) {

if (loss != NULL) {

*loss = ForwardFromTo(0, layers_.size() - 1);

} else {

ForwardFromTo(0, layers_.size() - 1);


return net_output_blobs_;


As we continue to track down the code in net.cpp, we go to

ForwardFromTo(0, layers_.size() – 1)

template <typename Dtype>

Dtype Net<Dtype>::ForwardFromTo(int start, int end) {

CHECK_GE(start, 0);

CHECK_LT(end, layers_.size());

Dtype loss = 0;

for (int i = start; i <= end; ++i) {

// LOG(ERROR) << "Forwarding " << layer_names_[i];

layers_[i]->Reshape(bottom_vecs_[i], &top_vecs_[i]);

Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], &top_vecs_[i]);

loss += layer_loss;

if (debug_info_) { ForwardDebugInfo(i); }


return loss;


This is the piece of code that we have always been expecting. It finally makes the connection to each layer. Calling

layers_[i]->Forward(bottom_vc_[i],  &top_vecs[i]);

To get this to work, I created a vector<long> timings structure in net.hpp and time the execution everytime Forward is called for a layer. In Solver.cpp , solve() method, I print out the values of the vector after all the iterations are done.

This relies on the fact that net is constructed only once throughout the training. As a result, the timings are just for one set up and run. If you reconstruct the layers for a different learning rate, it will print it out at the end of the first phase and at the end of the second learning rate phase.

This entry was posted in Convoluted Neural Nets. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s