Developing High Performance Collaborative Filtering in Ligra

This is a post recording my effort on implementing ALS in Ligra.

The reference implementation that I started with is GraphChi’s Alternating Least Squares implementation. If I have the time, I will try add an Stochastic Gradient Descent implementation.

The bets paper I found that details how SGD and ALS collaborative filtering works is the following

We model our code after this particular implementation, which is supposed to be modeled after GraphLab (Datos)

To run this implementation with a small input, I followed the instruction of the following post

Ligra is a high performance graph framework created by Julian Shun at CMU for his Phd. It has performance comparable to Galois on many applications. Its PageRank is about 2-3x slower than best published results obtained by GraphMat as of Sep 2015. But the framework is very easy to work with and the API is much more flexible, making it a good target to try out my optimizations.

The git repository is here

One thing to note is that for collaborative filtering, we will be working with a weighted graph, that is each edge has a weight associated with it (the rating). The first thing I had to deal with is to get the right adjacency format into the framework,

(1) The header has to be WeightedAdjacencyGraph (2) The weights are at the end of the edges,

This is a sample weighted adjacency matrix I created with 5 nodes and 7 edges

To implement ALS, I decided to adopt a strategy of using vertexMap. But I plan to try out different ways, such as using an edge map to accumulate observation and latent vectors. But the problem I see with that approach is that a large intermediate array of data has to be created.

Now that I have some basic program working with Ligra here that can scan all vertices and their in-neighbors

I have to figure out a format that would work for both GraphChi and Ligra. GraphChi’s collaborative filtering uses a Matrix Market format, that is a little bit annoying. It is not an adjacency matrix format, it is not a N by N matrix (N is the number of vertices). Instead it is a N by M matrix (N is the number of users, M is the number of movies). The content of each cell is the rating for the particular movie by the user.

Well, it seems like we just have to do something special for collaborative filtering when it comes to graph format. Apparently, the original netflix dataset also comes in a similar to Matrix Market format, not really framed as a graph problem.


  • MovieIDs range from 1 to 17770 sequentially.
  • CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.

I have yet to figure out the Yahoo music dataset format. In this post, I would be happy if we can get Netflix data.

The way I think I am going to go about getting the datasets into graphs is writing a MatrixMarket to EdgeList converter. Unfortunately, EdgeList is still the best common data structure.

GraphChi does this in toolkilts/collaborative_filtering/io.hpp, convert_matrix_market

sharderobj.preprocessing_add_edge(I, (M==N && allow_square)?J:M + J, als_edge_type((float)val));

M is the number of rows. What this does is assign a new ID to the destination, to make sure destination doesn’t overlap with the source. M+J, is maximum user ID + the movie ID.

I plan to follow the same procedure in generating the graph edge list. I generated Intel format of edge graph and had a converter that generates Ligra weighted graph from our CSR representation of the graph. One more thing to note is that we need to make the graph symmetric, as ALS first go through the users and then the movies, the connections need to be both ways.

Additionally, the user IDs need to be contiguous and the movie IDs need to be contiguous too to make the algorithm work. I need to keep this in mind when we start sorting the vertices based on degree later. This would require a different set of sorting mechanisms that sort the first contiguous users’ vertices and then the movies’ vertices.

Now coming back to implement the update function and the compute function, I decided to reuse as much code from GraphChi as possible. There are a few simple operations, such as triangular form and solve it using Cholesky decomposition.

The original code is here

I decided to reuse the vertex data structures and the Eigen library, along with the Eigen Wrapper.

It turns out that Eigen is a really really easy to use library as it is all header files. All you need to do is copy the eigen directory in the apps directory and copy over the following wrapper file

The INSTALL instruction for Eigen is easy to read, use method 1 and copy the folder to the apps directory.

After that, include eigen_wrapper in the ALS file and it all compiles. There might be some warning messages, I ignored the warning messages for now for convenience.

Once we have the functions and everything, we can quickly copy over the relevant code to get ALS implemented as this version of ALS file

At this stage, we surpassingly have the functionalities of a complete collaborative filtering. However, we have no way of verifying correctness. As a result, the next steps will be computing and printing the least square errors and compare it against GraphChi’s output. Additionally, we might need a smaller common test dataset if we want to speed up the process.

The output for running on smallnetflix data with the command shown in the blog post

yunming@lanka:graphchi-cpp$ ./toolkits/collaborative_filtering/als –training=data/smallnetflix_mm –validation=data/smallnetflix_mme –lambda=0.065 –minval=1 –maxval=5 –max_iter=6 –quiet=1

WARNING:  common.hpp(print_copyright:214): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any  comments or bug reports to

[training] => [data/smallnetflix_mm]

[validation] => [data/smallnetflix_mme]

[lambda] => [0.065]

[minval] => [1]

[maxval] => [5]

[max_iter] => [6]

[quiet] => [1]

=== REPORT FOR sharder() ===


edata_flush: 0.782395s (count: 3, min: 0.095316s, max: 0.432129, avg: 0.260798s)

execute_sharding: 1.67286 s

finish_shard.sort: 0.019674 s

preprocessing: 0.301614 s

shard_final: 1.1544 s


app: sharder

6.75944) Iteration:   0 Training RMSE:   1.49727  Validation  RMSE:   1.25809

10.0998) Iteration:   1 Training RMSE:  0.791493  Validation  RMSE:   1.18716

13.4461) Iteration:   2 Training RMSE:  0.706001  Validation  RMSE:   1.16641

16.7715) Iteration:   3 Training RMSE:  0.675242  Validation  RMSE:   1.16117

20.0819) Iteration:   4 Training RMSE:  0.656819  Validation  RMSE:   1.15222

23.404) Iteration:   5 Training RMSE:  0.654305  Validation  RMSE:    1.1495

We used the square root RMSE and we only measure the training phase as we don’t have validation phase set up quite yet.

One last thing I had to fix is that when calculating RMSE, we really want to sum up for the users, not the movies. So we need have some way of identifying which nodes are users, which are movies. GraphChi does this by recording the out degree. The movie have 0 out degree. However, this is was difficult to implement in Ligra because I doubled the number of edges and movies have out degree as well.

One iteration in the program would be one iteration on users with movies fixed, then one iteration on movies with user fixed (alternating iteration).

Currently, I pass a command line argument -nusers to the program, so it will accumulate RMSE for only the user nodes.

One other caveat, the number of edges that we divide by in calculating RMSE in graphchi is the number of ratings. However, we doubled the number of edges in Ligra. As a result, we had to reduce the number of edges by two.

The relevant code in Graphchi for calculating RMSE is here

By default, the loss_type is 1 (starting from 0), so it would be RMSE, sqrt(rmse/num_edges)

We mimic the same way of calculating the RMSE.

Right now with all the components I have, my work flow is

First get the matrix market data, run it through my parser to convert it to a graph format. Run a page rank (weighted compilation, has to be a push version because Ligra’s input adjacency graph is a push graph) to generate a CSR. (In the future, there should some converter that converts an edgelist to a binary CSR format, I should put this on schedule).

Once I have the CSR binary, I can run writeAdj to write a adjacency format that Ligra can recognize. Then I can run my version of ALS and compare against GraphChi.

This entry was posted in High Performance Computing and tagged . Bookmark the permalink.

One Response to Developing High Performance Collaborative Filtering in Ligra

  1. Pingback: Quick documentation on ALS for Ligra | Yunming Zhang's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s