Home > General > Performance “stats” without PerfStat or Ops Mgr

Performance “stats” without PerfStat or Ops Mgr

April 1st, 2009

PerfStat is a great way to get some quite detailed performance information out of the filer when you have a performance or other issue that you can’t quite put your finger on. You need to have access to the PerfStat Viewer, or get someone to process this output for you, and then you need to trawl through it.

Operations Manager, and more specifically Performance Advisor is brilliant and 99% of the time gives you the counters you need to diagnose the problem. Once you’ve found your way round it, it is completely indispensible!

But what if you don’t have Operations Manager, or you just want to quickly pull out information on one area of the system?

First things you want to look at sysstat. Everyone’s best friend and great way of seeing “Is my system busy?”. Whenever you run sysstat, make sure to through it the “-s” modifier so that you get a summary at the end of the output. If you don’t define a number of iterations (-c <num>), then ctrl+c to break the output. “-x” is great for giving all areas of output, but it can be a little wide sometimes. “-u” is my favourite as it gives you utilisation readings and these the usually the most useful when troubleshooting.

Most of the columns are fairly self explanatory. CPU is % busy, NFS, CIFS, HTTP, FCP and iSCSI are all protocol operations counters. Net kB/s in and out are obvious (for reference a single gigabit interface will happily sustain around 80MB/s, but can stretch to 110/120MB/s). Disk and Tape in&out. Watch the cache age when it gets really low, but there’s better counters for that. Cache hit is a counter you want as close to 100% as possible. The more data is getting read from cache the better! CP Type is Consistency Points, I won’t go into detail as to what these are, there is a very good KB article on this already (https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb23471). And finally Disk Utilisation which seems to cause some confusion. This is the reading from the single busiest disk in the system, and not an average. This reading can interestingly go about 100% (much like CPU can too), and this simply means the disks are doing more than they should!

So sysstat is a great way to get a high level view of “Is my system busy” and also gives you a rough idea of where the bottleneck is. If the CPU is really high, but nothing else, then this is what is holding back the system. If the disk utilisation is very high, then again, here is the problem. But these aren’t conclusive figures, and don’t point directly at a culprit. For instance if disk utilisation is very high, you may need to run a wafl reallocate as you have added some new disks and these aren’t holding any data yet. If your CPU is very high, it may be that you are doing a lot of other processing like A-SIS and SnapVault, or it could be very random IO so the CPU is working harder at trying to make calculations around this.

The next step may be to look at statit. A “priv set advanced” command, and not for the feint hearted, a great command to get a snapshot of details over a period. Simply run “statit -b” at the start of the monitoring period, and then “statit -e” at the end. Make sure to log your output window as you’ll get a lot from statit (more than the standard Windows and Putty buffer will show). There is a lot of statit output, and I won’t go into too much detail in it all here (but maybe another day). Most of it is pretty self explanatory really.

This brings me onto the real reason for this article in the first place. One of my favourite commands, and certainly a largely overlooked one, “stats”. This has a lot of information at it’s fingertips, pretty much anything you can see from in Performance Advisor and anything you can report on in PerfStats is available in the stats command. And possibly a lot more! “stats” works very similar to sysstat in that it reports counters based on the iterations. If you simply run it, it’ll report what the system is doing at that exact time. If you tell it to run every 5 seconds, it’ll report what happened over those 5 seconds.

So first up, don’t just in and run “stats show” without having a few minutes to spare. The output is very complete! First you want to see what counters are available. Stats is split into “Objects”, “Instances” and “Counters”. To show each, we can use “stats list …”

filer01> stats list objects

Objects:

dump

logical_replication_source

logical_replication_destination

vfiler

qtree

aggregate

iscsi

fcp

cifs

volume

lun

target

nfsv3

ifnet

processor

disk

system


 

filer01> stats list instances ifnet

Instances for object name: ifnet

B2net

Storage-101


 

filer01> stats list counters ifnet

Counters for object name: ifnet

recv_packets

recv_errors

send_packets

send_errors

collisions

recv_data

send_data

recv_mcasts

send_mcasts

recv_drop_packets

 

As an example above, I can show all the objects available to me, I can query all the networking instances I have setup (2 VIFs, 1 with a VLAN), and I can see what counters I can report on. So putting this together…

filer01> stats show ifnet:Storage-101:collisions

ifnet:Storage-101:collisions:0/s

 

Great, my storage interface doesn’t have any network collisions for the period this has run! That’s good news for me!

If I want to run this over several iterations, I can feed it some more options. Note: The options must go before the counter information!

filer01> stats show -n 5 -i 1 ifnet:Storage-101:collisions

Instance collisions

/s

Storage-101 0

Storage-101 0

Storage-101 0

Storage-101 0

Storage-101 0

 

Great, so over a period of 5 seconds I’m still not getting collisions!

You’ll notice from above that there are a lot of performance counters available, and not all of them have the most verbose names. You can query any of these by running “stats explain counters”.

filer01> stats explain counters ifnet collisions

Counters for object name: ifnet

Name: collisions

Description: Collisions per second on CSMA interfaces

Properties: rate

Unit: per_sec

 

So lets take another example, I want to look at latency readings on my Exchange system…

filer01> stats show -n 5 -i 1 volume:exch01_db:read_latency volume:exch01_db:write_latency volume:exch01_logs:read_latency volume:exch01_logs:write_latency

Instance read_latency write_latenc

ms ms

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

It’s 8 in the morning, none of the sales team is awake yet! The column headings get a bit skewed, but we can see read latency in the first column, and write latency in the second.

One of my biggest complaints about sysstat is what happens if I want to keep this running over a period of time and log the output? Well, I can change “options autologout” and leave my laptop plugged in, but that’s never a good idea. “stats” gives you the ability to pipe all stats output direct to a file. Brilliant news!

filer01> stats show -n 5 -i 1 -o /etc/stats.txt volume:exch01_db:read_latency volume:exch01_db:write_latency volume:exch01_logs:read_latency volume:exch01_logs:write_latency

filer01> rdfile /etc/stats.txt

Instance read_latency write_latenc

ms ms

exch01_db 0 16.00

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 8.00

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 1.00

exch01_logs 0 0

Unfortunately this doesn’t free up the console, so scripting this from RSH or SSH may be the best bet, but be careful how long you run the iterations for!

Another nice feature is that you can have some presets. So if you have 4 Exchange servers each with 3 databases, then you can load all the volume:<vol_name>:read/write_latency commands into a file and issue this direct from the stats command. The presets files are XML files, so they take a little thought in the writing, but if you have seen XML before, then it’s not that tricky.

My XML file looks like this…

<?xml VERSION = “1.0″ ?>

<preset>

<object name=”volume”>

<instance name=”exch01_db”>

<counter name=”read_latency”>

</counter>

<counter name=”write_latency”>

</counter>

</instance>

<instance name=”exch01_logs”>

<counter name=”read_latency”>

</counter>

<counter name=”write_latency”>

</counter>

</instance>

</object>

</preset>

Once saved within /etc/stats/presets as an “.xml” file, I can call it directly from the stats command.

filer01> stats show -p exchange -i 1 -n 5

Instance read_latency write_latenc

ms ms

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 0

exch01_logs 0 0

exch01_db 0 0.13

exch01_logs 0 0.12

exch01_db 0 0.00

exch01_logs 0 0.00

exch01_db 0 0

exch01_logs 0 0

The possibilities are huge for this, but this opens up something even better. We can now use “stats start” and “stats stop” to trigger this reporting and I get my console back!

filer01> stats start -p exchange

Stats identifier name is ‘Ind0x6920b2f0′

 

filer01> stats show -I Ind0x6920b2f0

StatisticsID: Ind0x6920b2f0

volume:exch01_db:read_latency:0ms

volume:exch01_db:write_latency:5.14ms

volume:exch01_logs:read_latency:0ms

volume:exch01_logs:write_latency:0.00ms

 

filer01> stats stop -I Ind0x6920b2f0

StatisticsID: Ind0x6920b2f0

volume:exch01_db:read_latency:0ms

volume:exch01_db:write_latency:5.36ms

volume:exch01_logs:read_latency:0ms

volume:exch01_logs:write_latency:0.00ms

Hopefully you are starting to realise why I like this command, and why the possibilities for using this are huge, and that it is very powerful indeed!

One final thing to add, there are a lot of counters available by default in normal privileged mode, but try switched to advanced, or even diag, and see how many counters are available then! This is overwhelming, but with a bit of digging, very powerful.

One last thing, you can use wildcards in the “stats show” command, so to pull out all counters for my exchange database…

filer01> stats show volume:exch01_db:*

volume:exch01_db:avg_latency:0.00ms

volume:exch01_db:total_ops:3/s

volume:exch01_db:read_data:0b/s

volume:exch01_db:read_latency:0ms

volume:exch01_db:read_ops:0/s

volume:exch01_db:write_data:12288b/s

volume:exch01_db:write_latency:0.00ms

volume:exch01_db:write_ops:3/s

volume:exch01_db:other_latency:0ms

volume:exch01_db:other_ops:0/s

Or to show all the read_latency for all my volumes…

 

filer01> stats show volume:*:read_latency

volume:vol0:read_latency:0ms

volume:exch01_db:read_latency:0ms

volume:home:read_latency:0ms

volume:backup:read_latency:0ms

volume:share:read_latency:0ms

If you have any specific questions, or you want to query how to get specific counter information from the system, feel free to send me over a question. Hope this is useful for everyone!

VN:F [1.9.22_1171]
Rate this post:
Rating: 10.0/10 (1 vote cast)
Performance "stats" without PerfStat or Ops Mgr, 10.0 out of 10 based on 4 ratings

General

  1. April 1st, 2009 at 14:06 | #1

    stats is a great command to use for collecting data for long-term trending too (if the same data is not exposed in the SNMP MIB) — for example, per volume performance data is not available via the SNMP MIB — I have a blog entry about how I collect that and provide some example graphs at http://aditya.grot.org/2009/02/netapp-ontap-per-volume-statistics.html

  2. July 9th, 2009 at 17:59 | #2

    Some great tools available for translating the output from “stats” available on the NetApp Communities – http://communities.netapp.com/docs/DOC-2092

  3. Chris M
    March 2nd, 2011 at 10:42 | #3

    great post Chris, very informative and certainly not something that’s covered in any of the ‘fundamentals’ docs – or even the technical reports I have read so far.

Comments are closed.


This site is not affiliated or sponsored in anyway by NetApp or any other company mentioned within.

© 2009-2013 Chris Kranz All Rights Reserved
This site is not affiliated or sponsored in anyway by NetApp or any other company mentioned within.