Re: [SLUG] Weird application "hang" or slowdown - not sure which

From: Ian C. Blenke (ian@blenke.com)
Date: Wed Feb 21 2007 - 16:11:37 EST


Bill Glidden wrote:
>
> I have a POS application which runs on several linux machines all on a
> network. Everything has been good until this past weekend.
>
> This system is in a building that has an internal network on
> 192.168.0. The POS application was using this network for its
> connectivity to the database server. This past weekend, I put these 6
> terminals and the database server on their own subnet which is
> 192.168.10, in anticipation of adding another 12 terminals. I thought
> this might be good for performance reasons. The switch that these
> terminals and server are connected to is connected to a linksys router
> which is in turn connected to the 192.168.0 network. Pretty much,
> everything works as expected.
>
>
>
> The other change I made, at the suggestion of my friend the Unix guru,
> was to add “bg,rw,soft” to the NFS mount entry on each POS terminal
> (they mount a file-system on the database server from which they
> read/write some data to/from regular files).
>

The "bg" means "don't wait for the mount to finish, fork it into the
background"
The "rw" means "mount this as read/write"
The "soft" means "give up on a file operation if it takes too long to
respond"
    The alternative to "soft" is "hard" which means "block forever and
never give up on any file operation, until it completes successfully"

You fill find that "hard" mounts do work around some intermittent
network outages by blocking and resuming when the file operation is
successful after the network returns, but tend to go "stale" now and
again and need some manual flogging to make work again.
 
>
> But, ever since these changes, the POS application will sometimes
> temporarily freeze up. After 1-2 minutes, it continues on as if
> nothing wrong had happened. It looks to me like this occurs whenever
> the application is accessing either the NFS-mounted disk or the
> database (which, of course, uses the network).
>

That 1-2 minutes sounds like a soft mount "giving up" and the file
operation is erroring out for your application.

This is most likely a symptom of your network losing connectivity or
some other transient connectivity problem (your switch/router is
rebooting, etc).

Try going back to a "hard" mount and see if the errors stop, or if the
entire application seems to hang for long stretches of time until the
network returns.

Also, consider using a TCP NFS v3 mount instead of a UDP NFS v2 mount.
Using "tcp" instead of the default "udp" will let layer3 hide the
network outage from your RPC based NFS handshaking. NFS _does_ retry
things with UDP, but if your network is problematic, a TCP transport
_may_ help.

This, and much more, can be found on the "official" (or at least most
authoritative) Linux NFS site:

    http://nfs.sourceforge.net/

> I’m at a bit of a loss to understand why my changes would have this
> effect. I’m going to go to the customer tonight and attach a debugger
> to one of the application processes when it freezes up to see if I can
> narrow down a little better where it hangs up. But even if it’s hung
> up in some call related to the network, I still don’t know why doing
> what I did would have this effect.
>

Check your system logs to see if there are any network interface up/down
events.

Check your network interfaces to see if there are any persistent
interface errors.

Check your linksys device uptime to see if it is rebooting.

If you can configure your linksys network devices to syslog to a central
collector, you might catch some errors that way.

Consider running a network monitoring tool of some kind to verify
connectivity through your routed LAN segment to catch any spurrious
networking issues.

In general, try and rule out a network layer 1/2 problem.

 - Ian C. Blenke <ian@blenke.com> http://ian.blenke.com/

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 15:34:07 EDT