Monthly Archives: October 2015

The -e cycles:pp flag

This is a flag documenting on how and why you want to use the -e cycles:pp flag when using perf.

Posted in Tools | Tagged | Leave a comment

How to see if your machine is running AVX2 or AVX1

A short post documenting how you find what generation of AVX instruction you are using.

Posted in Tools | Tagged | Leave a comment

Disassembly analysis for a simple vectorized loop

This is my post on analyzing the AVX2 disassemblies generated for a simple 20 double vectorized multiplication and add loop.

Posted in High Performance Computing | Tagged | Leave a comment

Quick documentation on ALS for Ligra

Here is the link and a short documentation to the ALS implementation in Ligra. Right now, it is about 1.5-2x faster than GraphChi depending on the input. I believe that it is faster only because of the overhead in GraphChi. … Continue reading

Posted in High Performance Computing | Tagged | Leave a comment

How to use VTUNE on a linux cluster with remote access on a mac

This is a post documenting how to use VTune analyzer on mac (only a GUI interface) to remote SSH control tasks in a linux cluster that had the full blown version of VTune installed.

Posted in Tools | Tagged | Leave a comment

Notes on High Performance Collaborative SGD (actually it is Gradient Descent) in GraphMat and Ligra

Here is my notes on understanding, running and testing high performance collaborative filtering in GraphMat, a state of the art graph processing framework from Intel. I implemented an equivalent algorithm in Ligra, another high performance graph processing framework.

Posted in High Performance Computing | Tagged | Leave a comment

Use PMU tools to interactively monitor memory bandwidth

I learned to use PMU tools here to monitor the memory bandwidth on each channel of each socket. It was a great way of monitoring the read and write memory bandwidth of your application in real time.

Posted in Tools | Tagged | 1 Comment