17
5. Kernel Crash Dump
5.1. Introduction
A Kernel Crash Dump refers to a portion of the contents of volatile memory RAM that is copied to disk whenever the execution of the kernel is disrupted. The following events can cause a kernel
disruption : • Kernel Panic
• Non Maskable Interrupts NMI • Machine Check Exceptions MCE
• Hardware failure • Manual intervention
For some of those events panic, NMI the kernel will react automatically and trigger the crash dump mechanism through kexec. In other situations a manual intervention is required in order to capture the
memory. Whenever one of the above events occurs, it is important to find out the root cause in order to prevent it from happening again. The cause can be determined by inspecting the copied memory
contents.
5.2. Kernel Crash Dump Mechanism
When a kernel panic occurs, the kernel relies on the kexec mechanism to quickly reboot a new instance of the kernel in a pre-reserved section of memory that had been allocated when the system
booted see below. This permits the existing memory area to remain untouched in order to safely copy its contents to storage.
5.3. Installation
The kernel crash dump utility is installed with the following command:
sudo apt-get install linux-crashdump
A reboot is then needed.
5.4. Configuration
No further configuration is required in order to have the kernel dump mechanism enabled.
5.5. Verification
To confirm that the kernel dump mechanism is enabled, there are a few things to verify. First, confirm that the crashkernel boot parameter is present note: The following line has been split into two to fit
the format of this document:
18
cat proccmdline
BOOT_IMAGE=vmlinuz-3.2.0-17-server root=devmapperPreciseS-root ro crashkernel=384M-2G:64M,2G-:128M
The crashkernel parameter has the following syntax:
crashkernel=range1:size1[,range2:size2,...][offset] range=start-[end] start is inclusive and end is exclusive.
So for the crashkernel parameter found in
proccmdline
we would have :
crashkernel=384M-2G:64M,2G-:128M
The above value means: • if the RAM is smaller than 384M, then dont reserve anything this is the rescue case
• if the RAM size is between 386M and 2G exclusive, then reserve 64M • if the RAM size is larger than 2G, then reserve 128M
Second, verify that the kernel has reserved the requested memory area for the kdump kernel by doing:
dmesg | grep -i crash
... [ 0.000000] Reserving 64MB of memory at 800MB for crashkernel System RAM: 1023MB
5.6. Testing the Crash Dump Mechanism
Testing the Crash Dump Mechanism will cause a system reboot. In certain situations, this can cause data loss if the system is under heavy load. If you want to test the mechanism,
make sure that the system is idle or under very light load. Verify that the SysRQ mechanism is enabled by looking at the value of the
procsyskernelsysrq
kernel parameter :
cat procsyskernelsysrq
If a value of 0 is returned the feature is disabled. Enable it with the following command :
sudo sysctl -w kernel.sysrq=1
Once this is done, you must become root, as just using sudo will not be sufficient. As the root user, you will have to issue the command echo c procsysrq-trigger. If you are using a network
19 connection, you will lose contact with the system. This is why it is better to do the test while being
connected to the system console. This has the advantage of making the kernel dump process visible. A typical test output should look like the following :
sudo -s
[sudo] password for ubuntu:
echo c procsysrq-trigger
[ 31.659002] SysRq : Trigger a crash [ 31.659749] BUG: unable to handle kernel NULL pointer dereference at null
[ 31.662668] IP: [ffffffff8139f166] sysrq_handle_crash+0x160x20 [ 31.662668] PGD 3bfb9067 PUD 368a7067 PMD 0
[ 31.662668] Oops: 0002 [1] SMP [ 31.662668] CPU 1
....
The rest of the output is truncated, but you should see the system rebooting and somewhere in the log, you will see the following line :
Begin: Saving vmcore from kernel crash ...
Once completed, the system will reboot to its normal operational mode. You will then find Kernel Crash Dump file in the
varcrash
directory :
ls varcrash
linux-image-3.0.0-12-server.0.crash
5.7. Resources
Kernel Crash Dump is a vast topic that requires good knowledge of the linux kernel. You can find more information on the topic here :
• Kdump kernel documentation
13
. • The crash tool
14
• Analyzing Linux Kernel Crash
15
Based on Fedora, it still gives a good walkthrough of kernel dump analysis
13 http:www.kernel.orgdocDocumentationkdumpkdump.txt 14 http:people.redhat.com~anderson
15 http:www.dedoimedo.comcomputerscrash-analyze.html
20
Chapter 3. Package Management