Friday, November 19, 2004

Great Thread on Network Performance Troubleshooting

Now that FreeBSD 5.3 has arrived, users are trying to determine if any performance issues are caused by their hardware, OS, or applications. There's a great freebsd-stable thread discussing a user's attempt to improve NFS performance. One of Robert Watson's comments is especially useful, since he spells out five steps to troubleshoot network performance:

"I think the first thing you want to do is to try and determine whether the problem is a link layer problem, network layer problem, or application (file sharing) layer problem. Here's where I'd start looking:

(1) I'd first off check that there wasn't a serious interrupt problem on the box, which is often triggered by ACPI problems. Get the box to be as idle as possible, and then use vmstat -i or stat -vmstat to see if anything is spewing interrupts.

(2) Confirm that your hardware is capable of the desired rates: typically this involves looking at whether you have a decent card (most if_em cards are decent), whether it's 32-bit or 64-bit PCI, and so on. For unidirectional send on 32-bit PCI, be aware that it is not possible to achieve gigabit performance because the PCI bus isn't fast enough, for example.

(3) Next, I'd use a tool like netperf (see ports collection) to establish three characteristics: round trip latency from user space to user space (UDP_RR), TCP throughput (TCP_STREAM), and large packet throughput (UDP_STREAM). With decent boxes on 5.3, you should have no trouble at all maxing out a single gig-e with if_em, assuming all is working well hardware wise and there's no software problem specific to your configuration.

(4) Note that router latency (and even switch latency) can have a substantial impact on gigabit performance, even with no packet loss, in part due to stuff like ethernet flow control. You may want to put the two boxes back-to-back for testing purposes.

(5) Next, I'd measure CPU consumption on the end box -- in particular, use top -S and systat -vmstat 1 to compare the idle condition of the system and the system under load.

If you determine there is a link layer or IP layer problem, we can start digging into things like the error statistics in the card, negotiation issues, etc. If not, you want to move up the stack to try and characterize where it is you're hitting the performance issue."

Richard Blum's book Network Performance Open Source Toolkit, which I read and reviewed last year, also gives good tips.

No comments: