Using Top, PS and Awk for benchmarking CPU and memory utilization in a cluster environment

This is a post on my experience building scripts that benchmark the CPU and memory utilization of Hadoop processes in a cluster using Top and Awk Linux commands.

You can get the CPU utilization of the system through ps or top command. (potentially vmstat as well, but I didn’t study the difference for vmstat). This post focuses on the difference between top and ps.

The version I used for CPU utilization has the following documentation

%cpu       %CPU     cpu utilization of the process in “##.#” format.

Currently, it is the CPU time used divided by the

time the process has been running (cputime/realtime

ratio), expressed as a percentage. It will not add up

to 100% unless you are lucky. (alias pcpu).

As a result, it had a lot of issues with adding up CPU utilization of multiple processes. I used this as the indicator of CPU utilization at first. Then I realized that it is problematic, so I switched to using top command.

The top command’s CPU utilization reflects

 

%CPU  —  CPU usage

The task’s share of the elapsed CPU time since the last  screen

update, expressed as a percentage of total CPU time.  In a true

SMP environment, if ’Irix mode’ is Off,  top  will  operate  in

’Solaris  mode’ where a task’s cpu usage will be divided by the

total number of CPUs.  You toggle ’Irix/Solaris’ modes with the

’I’ interactive command

 

So it is much more up to date as it reflects the CPU utilization since the last update. I attached two graphs showing the difference of using PS and TOP on the CPU over time graph for a hash join application (same configuration)

ps-graph top-script

 

As you can see, because PS calculates more of an accumulated CPU utilization, it is a smooth curve. On the other hand, top records the more instantaneous CPU utilization. So it drops and rises frequently. TOP is a much better choice in this case. (This problem could be an issue with the PS version I am using)

Next, it is useful to know how to use the top command in an ssh script. This way, you can benchmark the CPU and memory utilization of a remote node.

Here command I used

CMD=”top -b  -n1 | awk ‘/java/  {print \”\t\t\” \$9 \”\t\” \$10}’ ”

 

for node in $NODES; do

ssh -o ConnectTimeout=2 $node $CMD >>$node;

done

 

Let me walk through this command, the first CMD is getting the output of top related to the java processes

top -b (batch mode) -n1(run a single iteration)  | awk /java/              gives the following output

3871 yz17      18   0 2646m 114m  15m S  0.0  0.5   0:03.57 java

4056 yz17      18   0 2673m  83m  15m S  0.0  0.3   0:02.46 java

4164 yz17      21   0 2643m  98m  15m S  0.0  0.4   0:03.79 java

 

All the lines in the top command that has the “java” word in it. One thing to note is that “/java/” , you need the two “/” .

 

A good tutorial on AWK I found is here

http://www.thegeekstuff.com/2010/01/awk-introduction-tutorial-7-awk-print-examples/

 

Once you get the line, you can use awk print{ $(column number)} to access each column.

For example, $1 would be pid,

top -b -n1 | awk ‘/java/ { print $1 }’

3871

4056

4164

 

Again, read the post if you want to figure out why AWK works this way.

Great, so here is a tutorial that can help you get started creating your own CPU and Memory benchmark script for a cluster of computers.

Advertisements
This entry was posted in Habanero Java, Hadoop Research, Tools. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s