[SLUG] Linux mail server crashing

From: Doug Koobs (dkoobs@dkoobs.com)
Date: Sun Nov 09 2003 - 09:23:21 EST


Hello everyone,

Occasionally, my RH9 mail server crashes, and on the console (and in
/var/log/messages) the following messages are displayed over and over:

Nov 9 00:07:28 mail kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000018
Nov 9 00:07:28 mail kernel: printing eip:
Nov 9 00:07:28 mail kernel: c023f114
Nov 9 00:07:28 mail kernel: *pde = 00000000
Nov 9 00:07:28 mail kernel: Oops: 0000
Nov 9 00:07:28 mail kernel: e100 ipt_REJECT iptable_filter ip_tables st
keybdev mousedev hid input usb-uhci usbcore ext3 jbd
 aic7xxx sd_mod scsi_mod
Nov 9 00:07:28 mail kernel: CPU: 0
Nov 9 00:07:28 mail kernel: EIP: 0060:[<c023f114>] Not tainted
Nov 9 00:07:28 mail kernel: EFLAGS: 00010246
Nov 9 00:07:28 mail kernel:
Nov 9 00:07:28 mail kernel: EIP is at unix_stream_connect [kernel]
0x2b4 (2.4.20-20.9)
Nov 9 00:07:28 mail kernel: eax: 00000000 ebx: 00000000 ecx:
de288598 edx: c25a34c0
Nov 9 00:07:28 mail kernel: esi: d8cc4080 edi: dde2e580 ebp:
00000018 esp: cbf51ea8
Nov 9 00:07:28 mail kernel: ds: 0068 es: 0068 ss: 0068
Nov 9 00:07:28 mail kernel: Process flush (pid: 23167, stackpage=cbf51000)
Nov 9 00:07:28 mail kernel: Stack: cbf51ef4 00000018 00000001 0000006e
cbf51ec4 7fffffff d8cc4580 fffffffe
Nov 9 00:07:28 mail kernel: 0000006e cb338d94 cbf51ef4 0000006e
bfffeaa8 c01f0aed cb338d94 cbf51ef4
Nov 9 00:07:28 mail kernel: 0000006e 00000002 00000000 762f0001
722f7261 2e2f6e75 6463736e 636f735f
Nov 9 00:07:28 mail kernel: Call Trace: [<c01f0aed>] sys_connect
[kernel] 0x7d (0xcbf51edc))
Nov 9 00:07:28 mail kernel: [<c01f06bd>] sys_socket [kernel] 0x3d
(0xcbf51f64))
Nov 9 00:07:28 mail kernel: [<c01f15e5>] sys_socketcall [kernel] 0xb5
(0xcbf51f80))
Nov 9 00:07:28 mail kernel: [<c0131688>] sys_brk [kernel] 0x108
(0xcbf51f98))
Nov 9 00:07:28 mail kernel: [<c01173a0>] do_page_fault [kernel] 0x0
(0xcbf51fb0))
Nov 9 00:07:28 mail kernel: [<c0109630>] error_code [kernel] 0x34
(0xcbf51fb8))
Nov 9 00:07:28 mail kernel: [<c010953f>] system_call [kernel] 0x33
(0xcbf51fc0))
Nov 9 00:07:28 mail kernel:
Nov 9 00:07:28 mail kernel:
Nov 9 00:07:28 mail kernel: Code: c9 3c 24 e8 e4 45 fb ff 8b 44 24 18
85 c0 74 14 c7 44 24 04

I ran memtest, and every test failed with thousands of errors. I have
two 256M modules of 133mhz memory. I removed one stick, and ran a few of
the tests with no errors. I then ran some tests with the only the other
stick, and again no errors. I put them both back in, using the same
slots, and ran the tests... Again, no errors. What gives? Maybe
originally one stick wasn't inserted all the way? Maybe it's an
overheating problem, and everything cooled down with the case open? Do
the erros in /var/log/messages even indicate a memory problem?
Right now, the mail server is up and running. I'm going to let it run
for a few hours, get up to "operational temperature", and then run
memtest again without opening the case. Any pointers are greatly
appreciated!

Doug

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:35:16 EDT