|
peter_pclos
|
 |
« on: December 13, 2011, 08:27:53 PM » |
|
I've begun to encounter erratic behaviour with my fairly old desktop machine running a fully upgraded PCLinuxOS. On searching I find hints that this could possibly be caused by early signs of failure of the battery on the motherboard. Is this likely? The machine apparently boots reliably from a live pclos CD, and on one occasion I got it back to life by booting into the Windows option, then restarting the PC out of Windows, but this doesn't always work.
The eventual error message when the Linux boot stalls is "Kernel panic - not syncing: Fatal exception interrupt".
Any suggestions welcome - I just hope the PC is still alive to receive them!
|
|
|
|
|
Logged
|
|
|
|
|
djohnston
|
 |
« Reply #1 on: December 13, 2011, 10:32:57 PM » |
|
It could be RAM or the boot media. There are other possibilities. What is displayed after "Fatal exception in interrupt"? That might give some clues.
|
|
|
|
|
Logged
|
Bare metal VBox AMD Athlon 7750 Dual-Core Single core 4GiB RAM 1GiB RAM nVidia GeForce FX 5200 64MB video LXDE 32bit KDE 64bit
Registered Linux User #416378
|
|
|
|
peter_pclos
|
 |
« Reply #2 on: December 14, 2011, 06:59:23 AM » |
|
In answer to your question - nothing. That's where it crashes. For what it's worth, the output immediately before the crash line was: EIP: [<c026d323>] ioread16_rep+0x2b/0x3d SS:ESP 0068:c0507dec CR2: 00000000fffb1000 --- [ end trace ff6525aa4bcccf2c ]--- Intermittent faults are always a pain, but this morning I did a cold boot, but left the GRUB list up for about 30 seconds before selecting PCLinuxOS, and it booted perfectly. Could this perhaps indicate that the BIOS was sorting itself out - but slowly?
|
|
|
|
|
Logged
|
|
|
|
|
djohnston
|
 |
« Reply #3 on: December 15, 2011, 12:53:12 PM » |
|
Intermittent faults are always a pain, but this morning I did a cold boot, but left the GRUB list up for about 30 seconds before selecting PCLinuxOS, and it booted perfectly. Could this perhaps indicate that the BIOS was sorting itself out - but slowly?
I doubt it. Usual culprits are faulty boot medium, faulty RAM module or power supply. Do a net search for "Kernel panic - not syncing: Fatal exception" and you'll see the possibilities. If the MB battery is going, you would lose the BIOS settings, such as the system clock's time and date. If you are shutting down the PC at night, I'd start by running overnight memory tests. Instead of shutting down, reboot from a PCLinuxOS live CD, select MemTest86 from the boot menu, and let it run. That would start a process of elimination. And, you're right. Intermittent hardware problems can be a pain to track down.
|
|
|
|
|
Logged
|
Bare metal VBox AMD Athlon 7750 Dual-Core Single core 4GiB RAM 1GiB RAM nVidia GeForce FX 5200 64MB video LXDE 32bit KDE 64bit
Registered Linux User #416378
|
|
|
|
pags
|
 |
« Reply #4 on: December 15, 2011, 03:40:00 PM » |
|
Runs OK from cold...dies randomly after been running for a while? Is this a reasonable summary? Could be cards/chips(RAM) not seated/dirty...heat expansion may mitigate the issue (hence, OK when cold)... ...  Complete dis-assemble, clean all contacts (isopropyl - LET IT DRY!, or "red" pencil eraser - MAKE SURE IT'S CLEAN AFTER!) and re-seat all cards, chips, ribbon/cable connectors, etc... No guarantees, but it's a starting point.
|
|
|
|
|
Logged
|
|
|
|
|
peter_pclos
|
 |
« Reply #5 on: December 15, 2011, 08:01:41 PM » |
|
Thanks for the various recent replies. I've managed to get a bit more diagnostic information which precedes the error messages I reported earlier. It was copied manually from the monitor from a recent boot crash, but I think it's accurate: [<c0104b55>] ? do_IRQ+0x46/0x9a [<c0103bb0>] ? common_interrupt+0x30/0x38 [<c019136b>] ? page_cache_get_speculative+0xc/0x30 [<c0191cad>] ? find_get_page+0x53/0x7a [<c0192475>] ? filemap_fault+0x7b/0x343 [<c01a71c3>] ? __do_fault+0x41/0x2f6 [<c01a7765>] ? handle_mm_fault+0x2ed/0x664 [<c03a036d>] ? do_page_fault+0x2b2/0x2c8 [<c03a00bb>] ? do_page_fault+0x0/0x2c8 [<c039e7e3>] ? errorcode+0x73/0x78 I'm not by any stretch of the imagination a systems guru, so if anyone can see anything here which might indicate the source of the intermittent booting problem, I'd be grateful to hear from them. To clarify what's happening, the symptoms are that it is difficult to achieve the initial boot, but after success (usually involving a delay in selecting the GRUB menu option) the PC is perfectly stable (over several hours) and behaves completely normally. Since starting this thread, I've become aware of discussion of a similar problem under the thread "Boot process still erratic" on the PCLinuxOS forum at http://www.pclinuxos.com/forum/index.php?topic=94269.0 and tomorrow will try texstar's suggestions there to see where I get to, but don't let that stop anybody with good ideas from replying to this post!
|
|
|
|
|
Logged
|
|
|
|
AS
Global Moderator
Hero Member
   
Offline
Posts: 4082
Have a nice ... night!
|
 |
« Reply #6 on: December 15, 2011, 09:06:53 PM » |
|
Thanks for the various recent replies. I've managed to get a bit more diagnostic information which precedes the error messages I reported earlier. It was copied manually from the monitor from a recent boot crash, but I think it's accurate: [<c0104b55>] ? do_IRQ+0x46/0x9a [<c0103bb0>] ? common_interrupt+0x30/0x38 [<c019136b>] ? page_cache_get_speculative+0xc/0x30 [<c0191cad>] ? find_get_page+0x53/0x7a [<c0192475>] ? filemap_fault+0x7b/0x343 [<c01a71c3>] ? __do_fault+0x41/0x2f6 [<c01a7765>] ? handle_mm_fault+0x2ed/0x664 [<c03a036d>] ? do_page_fault+0x2b2/0x2c8 [<c03a00bb>] ? do_page_fault+0x0/0x2c8 [<c039e7e3>] ? errorcode+0x73/0x78 I'm not by any stretch of the imagination a systems guru, so if anyone can see anything here which might indicate the source of the intermittent booting problem, I'd be grateful to hear from them. This is part of a "backtrace", roughly the sequence of calls some process made at the time of the crash, it's meant to help the people who developed the code to recognize the exact process flow in case of crash. Unfortunately it lack of the first part, that usually state the name of the module/process that crashed, but even if present, we are not the developers of the code, most probably some kernel module, so don't worry much about.  The most useful info we can obtain here is the name of the process/module that crashed, just to have an hint about what went wrong. Also could be interesting to know if the crash happen always the same way ... i.e. always at time of start of plymouth, or at the time of the start of the Xserver ... To clarify what's happening, the symptoms are that it is difficult to achieve the initial boot, but after success (usually involving a delay in selecting the GRUB menu option) the PC is perfectly stable (over several hours) and behaves completely normally. Since starting this thread, I've become aware of discussion of a similar problem under the thread "Boot process still erratic" on the PCLinuxOS forum at http://www.pclinuxos.com/forum/index.php?topic=94269.0 and tomorrow will try texstar's suggestions there to see where I get to, but don't let that stop anybody with good ideas from replying to this post! Additionally there has been reports in the past about crashes (even intermittent like your), for some combination of hardware, i.e. older video cards, older CPUs, older kernels. Although you state your system is fully updated, but you don't mention the kernel, which doesn't update automatically. What's your kernel ? ( uname -r from a terminal) AS
|
|
|
|
|
Logged
|
|
|
|
|
peter_pclos
|
 |
« Reply #7 on: December 16, 2011, 01:16:10 PM » |
|
In answer to AS, the kernel is:
2.6.32.24-pclos1.bfs
and I'm running the latest KDE4 from the pclos repository.
Other facts about the hardware:
Motherboard: Abit KT7A (non-RAID) Processor: Athlon Thunderbird 1.4 GHz RAM 1GB Crucial DIMM (2x 0.5 GB units) Graphics card: Not sure of make, but is AGP (4 Meg I think)
All the above have been performing well for some years; do you reckon this spec. is still adequate?
Unfortunately I haven't had time yet to carry out the tips from the other thread.
|
|
|
|
|
Logged
|
|
|
|
|
djohnston
|
 |
« Reply #8 on: December 16, 2011, 01:46:23 PM » |
|
In answer to AS, the kernel is:
2.6.32.24-pclos1.bfs
I would have to say, to begin with, update to a newer kernel. That one's fairly old and will definitely cause problems with package updates in the long run. Open Synaptic and mark the 2.6.38.8 bfs kernel for installation. Apply the changes, then reboot immediately after closing Synaptic. The default GRUB option, (the one highlighted), will boot the new kernel. The first boot will take a little longer, due to loading kernel dkms modules. If you don't boot in verbose mode, just press the Esc key when prompted to on the screen in order to see the boot process. The dkms module builds only have to be done once, but should not be interrupted while in progress. Leave the old kernel installed until you are sure you're satisfied with the new one.
|
|
|
|
|
Logged
|
Bare metal VBox AMD Athlon 7750 Dual-Core Single core 4GiB RAM 1GiB RAM nVidia GeForce FX 5200 64MB video LXDE 32bit KDE 64bit
Registered Linux User #416378
|
|
|
|
DeBaas
|
 |
« Reply #9 on: December 16, 2011, 03:02:31 PM » |
|
Just check your back-up battery. (life span 2-5 years) On a the first cold boot activate your BIOS screen and check time. If this is erratick change the battery, mostly a CR2032 After a first boot, or booting windows, time is corrected and the next boot is OK. Happend to me before 
|
|
|
|
|
Logged
|
|
|
|
|
peter_pclos
|
 |
« Reply #10 on: December 19, 2011, 06:33:51 PM » |
|
Thanks to all who have contributed to this thread. I am cautiously optimistic that Texstar's advice in the "Boot process still erratic" thread has worked completely, as after following his instructions I have had four days of problem-free boots. I suspect, from the circumstances behind that thread, and djohnston's comments here about kernels, that the problem may well have been down to running Synaptic updates on an old kernel.
I'll leave it for a couple of days yet before marking this thread as 'solved', in case my optimism turns out to have been misplaced!
|
|
|
|
|
Logged
|
|
|
|
|
djohnston
|
 |
« Reply #11 on: December 19, 2011, 10:39:36 PM » |
|
I am cautiously optimistic that Texstar's advice in the "Boot process still erratic" thread has worked completely, as after following his instructions I have had four days of problem-free boots. I suspect, from the circumstances behind that thread, and djohnston's comments here about kernels, that the problem may well have been down to running Synaptic updates on an old kernel.
Well, Texstar's advice referred to running a filesystem check and turning off speedboot. I'm guessing that's what you've done. Filesystem errors can certainly cause more errors than an older kernel. In any case, you did some of your own research and problem solving. Nice going!
|
|
|
|
|
Logged
|
Bare metal VBox AMD Athlon 7750 Dual-Core Single core 4GiB RAM 1GiB RAM nVidia GeForce FX 5200 64MB video LXDE 32bit KDE 64bit
Registered Linux User #416378
|
|
|
|
peter_pclos
|
 |
« Reply #12 on: December 20, 2011, 06:47:13 AM » |
|
Thanks, djohnston. But, what caused the filesystem error? My own experience and the existence of the other thread suggest that it's update related.
BTW, is fsck machine-specific? i.e. if I fsck a bootable USB SSD, which I use as an external PCLinuxOS OpenBox platform driving a Linutop2, on my main desktop machine, will this cause compatibility problems when I re-attach the SSD to the Linutop? I ask because my Linutop has also started exhibiting serious boot problems, and I'm not sure I can get PCLinuxOS/OpenBox up to fsck the SSD in situ.
The desktop machine still boots! If it is still behaving properly tomorrow, I'll mark this thread as solved.
|
|
|
|
|
Logged
|
|
|
|
AS
Global Moderator
Hero Member
   
Offline
Posts: 4082
Have a nice ... night!
|
 |
« Reply #13 on: December 20, 2011, 08:11:19 AM » |
|
Thanks, djohnston. But, what caused the filesystem error? My own experience and the existence of the other thread suggest that it's update related.
Filesystem inconsistency very probably was due to the unclean shutdown consequent to the kernel - panic. When a kernel panic occurs, some data (and/or metadata) may still be in cache/buffer (RAM) and are not going to be written on disk(s). BTW, is fsck machine-specific? i.e. if I fsck a bootable USB SSD, which I use as an external PCLinuxOS OpenBox platform driving a Linutop2, on my main desktop machine, will this cause compatibility problems when I re-attach the SSD to the Linutop? I ask because my Linutop has also started exhibiting serious boot problems, and I'm not sure I can get PCLinuxOS/OpenBox up to fsck the SSD in situ.
No, fsck automatically detect the filesystem type and work accordingly, i.e. it's able to clean up and fix ext2/ext3/ext4 and others ... In case a filesystem type is not supported, fsck will tell you about and will not perform any operation. You can safely use the PCLinuxOS fsck to check the Linutop filesystem, you only need to operate fsck on unmounted partitions. The desktop machine still boots! If it is still behaving properly tomorrow, I'll mark this thread as solved.
Good!  AS
|
|
|
|
|
Logged
|
|
|
|
|
djohnston
|
 |
« Reply #14 on: December 20, 2011, 11:44:31 AM » |
|
No, fsck automatically detect the filesystem type and work accordingly, i.e. it's able to clean up and fix ext2/ext3/ext4 and others ... In case a filesystem type is not supported, fsck will tell you about and will not perform any operation.
You can safely use the PCLinuxOS fsck to check the Linutop filesystem, you only need to operate fsck on unmounted partitions.
To add to what as said, the only filesystem I've encountered so far that the fsck -f command doesn't perform any operations on is xfs. Even then, fsck reported which xfs utility to use. And, also heed as's advice and never run a filesystem check on a mounted partition. That's asking for trouble.
|
|
|
|
|
Logged
|
Bare metal VBox AMD Athlon 7750 Dual-Core Single core 4GiB RAM 1GiB RAM nVidia GeForce FX 5200 64MB video LXDE 32bit KDE 64bit
Registered Linux User #416378
|
|
|
|