debugging puppet and facter with strace

I worked on a slightly annoying problem yesterday and thought I’d share the process with you. On one of our linux boxes puppet stopped working with an error that was not very helpful:

[root@portrix]# puppet agent --test
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve local facts: private method `split' called for nil:NilClass
Error: Failed to apply catalog: Could not retrieve local facts: private method `split' called for nil:NilClass

All this told me was that something was wrong with facter during the puppet run. I was able to reproduce the error with just facter but even the debug output was not much more helpful:

[root@portrix]# facter --puppet --debug
Found no suitable resolves of 1 for ec2_metadata
value for ec2_metadata is still nil
private method `split' called for nil:NilClass

In fact, this was even more misleading since I was drawn to the message about ec2_metadata. This box is a physical server and should not have anything to do with ec2. I chased this rabbit for a while but without any success. The best idea then became to run strace on the command and see if this returned anything useful. Strace records and prints all system calls that are made during the execution of a program and among other things it will show you which libraries, files etc are being touched. I produces a ton of output and I will spare you the whole thing. But the general idea is to scan the output from the bottom up. The first things (from the bottom) will be the printing of the error message to your screen and hopefully the stuff that caused the error will show up slightly above that. This is the (not very interesting) part where the error message is printed:

[root@portrix]# strace facter --puppet --debug
execve("/usr/bin/facter", ["facter", "--puppet", "--debug"], [/* 28 vars */]) = 0
brk(0)                                  = 0x18d9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f78f4f7a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
[... many, many lines deleted ...]
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
write(2, "private method `split' called fo"..., 46private method `split' called for nil:NilClass) = 46
write(2, "\n", 1
)                       = 1
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x30162326a0}, {0x3c6a295ef0, [INT], SA_RESTORER|SA_RESTART, 0x30162326a0}, 8) = 0
close(3)                                = 0
munmap(0x7fe785c7e000, 4096)            = 0
exit_group(12)                          = ?

And just a bit above that is the last file it read before failing:

stat("/etc/oratab", {st_mode=S_IFREG|0664, st_size=741, ...}) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/etc/oratab", O_RDONLY)           = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
fstat(4, {st_mode=S_IFREG|0664, st_size=741, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe785c7c000
read(4, "#\n\n\n\n# This file is used by ORAC"..., 4096) = 741

So it read my the /etc/oratab file and if we scan the output just a bit further up we can also find the module that does this:

stat("/var/lib/puppet/lib/facter/oracle_sid.rb", {st_mode=S_IFREG|0644, st_size=255, ...}) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/var/lib/puppet/lib/facter/oracle_sid.rb", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=255, ...}) = 0
close(4)                                = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/var/lib/puppet/lib/facter/oracle_sid.rb", O_RDONLY) = 4
close(4)                                = 0
open("/var/lib/puppet/lib/facter/oracle_sid.rb", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=255, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe785c7c000
read(4, "require 'facter'\n\nisOracle = Fil"..., 4096) = 255
read(4, "", 4096)                       = 0
close(4)                                = 0

This file oracle_sid.rb is a facter plugin that we wrote ourselves and immediately when I saw this I knew where to look. This is the part of our code that caused this error in facter:

result = open('/etc/oratab') { |f| f.grep(/dbhome_1/) }
result = result[0].split(":").first

The code is looking for the first line with the string “/dbhome_1″ in it (which usually works for our databases) and then extracts the oracle SID from that line (seperated by colons). But the oratab on that server was empty, so we were trying to split an empty string, resulting in the error “private method `split’ called for nil:NilClass”.

Once this was known the fix was rather easy and this is not what I wanted to show in the blog post. But I hope you got the idea of using strace to debug a linux program that is not returning the level of logging that you would need to find the error yourself.

quick tip: build debian package from sqldeveloper rpm

Oracle has made the installation of sqldeveloper easy on distributions that use rpm by providing a nice package. But we are using ubuntu for our development workstations at the office and thus need a debian .deb package. I do not want to use the ‘alien’ workaround and we already build or rebuild some of our packages for our own repository anyway. This can easily be built with the help of fpm like this after downloading the Linux rpm from OTN:

fpm --no-auto-depends -d 'oracle-jdk-7' -s rpm -t deb -m "portrix Systems GmbH " /var/tmp/sqldeveloper-4.0.2.15.21-1.noarch.rpm

Upload this to your local repository and you can then install this with apt-get or simply show the details like this.

portrix@vm-04:~$ sudo apt-cache show sqldeveloper
Package: sqldeveloper
Version: 4.0.2.15.21-1
Architecture: all
Maintainer: portrix Systems GmbH 
Installed-Size: 305622
Depends: oracle-jdk-7
Provides: sqldeveloper
Filename: pool/main/sqldeveloper_4.0.2.15.21-1_all.deb
Size: 246647402
MD5sum: 1a7127bde4521e0a5bc7160ee3f891a0
SHA1: 55f12afe5075bcbd9223f61fe8f183fd0570c295
SHA256: aeb07cede84298c7c8a9bf8caf0c42ccf560840a857415be81bb4189c8939ba2
Section: extra
Priority: Priority
Homepage: http://example.com/no-uri-given
Description: Oracle SQL Developer is a new, free graphical tool that enhances productivity
             and simplifies database development tasks. With SQL Developer, you can browse database objects, 
             run SQL statements and SQL scripts, and edit and debug PL/SQL statements. 
             You can also run any number of provided reports, as well as create and save your own.
License: Oracle

resize linux root disk in Oracle VM

I just needed to grow my root fs on an Oracle Linux guest VM running in OVM 3.2 and I thought I’d share the process with you. A lot of the steps are taken from a similar blog on doing a similar thing on vmware.

To start with, I have a linux vm which started as a clone of one of the Linux VM templates that Oracle provides. It comes with a 20G disk which is partitioned into two pieces. One for /boot and the other one is lvm physical volume that holds the rest of the system including the root partition.

The basic steps are:

– backup the existing system just in case
– resize virtual disk in OVM
– grow the partition to the new size
– grow the pv device
– grow the logical volume for my root / partition
– grow the filesystem on /

Instead of growing the original physical volume and partition I could also simply add a second disk to the lvm pool but I like to keep things simple and use as few devices as possible.

First you want to backup the VM in case something goes wrong later. If your VM is on an OCFS repository, you can simply clone the running VM. This VM was on an NFS repo so I had to turn it off and create a clone from there (actually I simply made a copy of the virtual disk .img file once the VM was shut down).
resize_OVM_disk
Edit the VM in OVM manager, navigate to the virtual disk and enter a new (bigger) size for the virtual disk. Click apply and this should be taken care of. Now the Linux guest needs to see the change and this can be done either by rebooting the VM or rescanning the devices. Since this is a paravirtualized VM (PVM) simply running ‘sync’ or rescanning the SCSI bus won’t work because the device (xvda) is not really scsi. I have yet to find a way to do this online (and if you know one, please leave a comment). Until then: reboot the VM so that fdisk recognized the new size of the VM. After that, start fdisk, check the new size and re-create the lvm partition to the new maximum size. I am always a bit scared of this step because you are actually deleting the partition. But this simply means removing the entry from the partition table and will actually leave your data intact as long as you re-create the partition with the same start sector.

[root@ovmm32 ~]# fdisk /dev/xvda

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): p

Disk /dev/xvda: 34.4 GB, 34359738368 bytes
255 heads, 63 sectors/track, 4177 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00082e48

    Device Boot      Start         End      Blocks   Id  System
/dev/xvda1   *           1          64      512000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/xvda2              64        2611    20458496   8e  Linux LVM

Command (m for help): d
Partition number (1-4): 2

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (64-4177, default 64): 
Using default value 64
Last cylinder, +cylinders or +size{K,M,G} (64-4177, default 4177): 
Using default value 4177

Command (m for help): p

Disk /dev/xvda: 34.4 GB, 34359738368 bytes
255 heads, 63 sectors/track, 4177 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00082e48

    Device Boot      Start         End      Blocks   Id  System
/dev/xvda1   *           1          64      512000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/xvda2              64        4177    33038728+  83  Linux

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): L

 0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris        
 1  FAT12           39  Plan 9          82  Linux swap / So c1  DRDOS/sec (FAT-
 2  XENIX root      3c  PartitionMagic  83  Linux           c4  DRDOS/sec (FAT-
 3  XENIX usr       40  Venix 80286     84  OS/2 hidden C:  c6  DRDOS/sec (FAT-
 4  FAT16 <32M      41  PPC PReP Boot   85  Linux extended  c7  Syrinx         
 5  Extended        42  SFS             86  NTFS volume set da  Non-FS data    
 6  FAT16           4d  QNX4.x          87  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS       4e  QNX4.x 2nd part 88  Linux plaintext de  Dell Utility   
 8  AIX             4f  QNX4.x 3rd part 8e  Linux LVM       df  BootIt         
 9  AIX bootable    50  OnTrack DM      93  Amoeba          e1  DOS access     
 a  OS/2 Boot Manag 51  OnTrack DM6 Aux 94  Amoeba BBT      e3  DOS R/O        
 b  W95 FAT32       52  CP/M            9f  BSD/OS          e4  SpeedStor      
 c  W95 FAT32 (LBA) 53  OnTrack DM6 Aux a0  IBM Thinkpad hi eb  BeOS fs        
 e  W95 FAT16 (LBA) 54  OnTrackDM6      a5  FreeBSD         ee  GPT            
 f  W95 Ext'd (LBA) 55  EZ-Drive        a6  OpenBSD         ef  EFI (FAT-12/16/
10  OPUS            56  Golden Bow      a7  NeXTSTEP        f0  Linux/PA-RISC b
11  Hidden FAT12    5c  Priam Edisk     a8  Darwin UFS      f1  SpeedStor      
12  Compaq diagnost 61  SpeedStor       a9  NetBSD          f4  SpeedStor      
14  Hidden FAT16 <3 63  GNU HURD or Sys ab  Darwin boot     f2  DOS secondary  
16  Hidden FAT16    64  Novell Netware  af  HFS / HFS+      fb  VMware VMFS    
17  Hidden HPFS/NTF 65  Novell Netware  b7  BSDI fs         fc  VMware VMKCORE 
18  AST SmartSleep  70  DiskSecure Mult b8  BSDI swap       fd  Linux raid auto
1b  Hidden W95 FAT3 75  PC/IX           bb  Boot Wizard hid fe  LANstep        
1c  Hidden W95 FAT3 80  Old Minix       be  Solaris boot    ff  BBT            
1e  Hidden W95 FAT1
Hex code (type L to list codes): 8e
Changed system type of partition 2 to 8e (Linux LVM)
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

This calls for another reboot which is unfortunate, but one more cycle won’t hurt us now. After that we we can resize the pv:

[root@ovmm32 ~]# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/xvda2
  VG Name               vg_ovmm32
  PV Size               19.51 GiB / not usable 2.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              4994
  Free PE               0
  Allocated PE          4994
  PV UUID               HPweY3-sK8N-qKRR-57qR-HimH-zPKK-WD4p6e
   
[root@ovmm32 ~]# pvresize /dev/xvda2
  Physical volume "/dev/xvda2" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
[root@ovmm32 ~]# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/xvda2
  VG Name               vg_ovmm32
  PV Size               31.51 GiB / not usable 3.38 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              8065
  Free PE               3071
  Allocated PE          4994
  PV UUID               HPweY3-sK8N-qKRR-57qR-HimH-zPKK-WD4p6e
[root@ovmm32 ~]# vgdisplay 
  --- Volume group ---
  VG Name               vg_ovmm32
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               31.50 GiB
  PE Size               4.00 MiB
  Total PE              8065
  Alloc PE / Size       4994 / 19.51 GiB
  Free  PE / Size       3071 / 12.00 GiB
  VG UUID               Yi9o86-E9fU-ZbU0-Fvuj-9QQn-n1AJ-N04Hzb

So let’s look at the logical volumes and then resize the root volume to the new maximum size:

[root@ovmm32 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/vg_ovmm32/lv_root
  LV Name                lv_root
  VG Name                vg_ovmm32
  LV UUID                Z3slmg-uIYv-0cRP-q1ZQ-8kmA-12Ja-WU6FfT
  LV Write Access        read/write
  LV Creation host, time ovmm32.portrix.net, 2013-06-07 14:00:44 +0200
  LV Status              available
  # open                 1
  LV Size                11.70 GiB
  Current LE             2994
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0
   
  --- Logical volume ---
  LV Path                /dev/vg_ovmm32/lv_swap
  LV Name                lv_swap
  VG Name                vg_ovmm32
  LV UUID                6dfA32-ZH6r-Ulbn-ZzUT-uDOh-EwlP-y8E3PC
  LV Write Access        read/write
  LV Creation host, time ovmm32.portrix.net, 2013-06-07 14:01:55 +0200
  LV Status              available
  # open                 2
  LV Size                7.81 GiB
  Current LE             2000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1
   
[root@ovmm32 ~]# lvextend -l +100%FREE /dev/vg_ovmm32/lv_root
  Extending logical volume lv_root to 23.69 GiB
  Logical volume lv_root successfully resized
[root@ovmm32 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/vg_ovmm32/lv_root
  LV Name                lv_root
  VG Name                vg_ovmm32
  LV UUID                Z3slmg-uIYv-0cRP-q1ZQ-8kmA-12Ja-WU6FfT
  LV Write Access        read/write
  LV Creation host, time ovmm32.portrix.net, 2013-06-07 14:00:44 +0200
  LV Status              available
  # open                 1
  LV Size                23.69 GiB
  Current LE             6065
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0
   
  --- Logical volume ---
  LV Path                /dev/vg_ovmm32/lv_swap
  LV Name                lv_swap
  VG Name                vg_ovmm32
  LV UUID                6dfA32-ZH6r-Ulbn-ZzUT-uDOh-EwlP-y8E3PC
  LV Write Access        read/write
  LV Creation host, time ovmm32.portrix.net, 2013-06-07 14:01:55 +0200
  LV Status              available
  # open                 2
  LV Size                7.81 GiB
  Current LE             2000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

Last step: resize the ext4 filesystem

[root@ovmm32 ~]# resize2fs /dev/mapper/vg_ovmm32-lv_root
resize2fs 1.43-WIP (20-Jun-2013)
Filesystem at /dev/mapper/vg_ovmm32-lv_root is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 2
The filesystem on /dev/mapper/vg_ovmm32-lv_root is now 6210560 blocks long.

[root@ovmm32 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_ovmm32-lv_root
                       24G  7.1G   16G  32% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/xvda1            477M  152M  296M  34% /boot

Linux Huge Pages in Oracle VM 3

My last post was about properly determining the number of hugepages needed with the newest version of the unbreakable enterprise kernel UEK R3 in Oracle Linux. Aside from the little issue with the proper kernel version detection in the script, the process is quite simple and well documented on the web:

  • turn off automatic memory management AMM by setting the parameter MEMORY_TARGET to 0
  • determine the number of hugepages needed by running the hugepages_settings.sh script
  • add an entry to /etc/sysctl.conf
  • reload the config with sysctl -p or reboot the server
  • edit /etc/security/limits.conf
  • bounce the instance(s)

Unfortunately, this was not it for me. You can check the effect of the setting like this:

[root@linuxbox ~]# grep Huge /proc/meminfo
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

This should have showed more than 0 hugepages configured. There is however an error logged by dmesg.

[root@linuxbox ~]# dmesg | tail -1
Guest pages are not properly aligned to use hugepages

Alright, the problem is that while Oracle VM may be engineered to run the Oracle database best, someone actually forgot to add support for huge pages in Para-Virtualized (PVM) guests. This should work for HVM guests (I have not tried it) but the general recommendation is to use PVM for Linux guests. The workaround for OVM 3.2.4 and later is described in metalink note 1529373.1. You have to add the parameter ‘allowsuperpage’ to the grub kernel boot options in /etc/grub.conf like this (added parameter is bold):

#boot=/dev/sda
default=0
timeout=5
title Oracle VM Server (2.6.39-300.32.5.el5uek)
        root (hd0,0)
        kernel /xen.gz dom0_mem=1160M allowsuperpage
        module /vmlinuz-2.6.39-300.32.5.el5uek ro root=UUID=d15aad1f-668b-4358-a54a-d7163a15198d
        module /initrd-2.6.39-300.32.5.el5uek.img

You need to do this on all nodes that you may want to run or live-migrate a VM with huge pages to and reboot the servers. I had to perform an update to 3.2.7 anyway so a reboot was neccessary anyway.

Next step, you need to add the parameter “superpages=1″ at the end of the vm.cfg file of this vm and stop/start (restart is not enough) the vm.

But you are not supposed to edit the vm.cfg manually and an edit of the VM in the manager (like changing number of vcpus, memory or boot options) will remove that line again. Which will then lead to this VM not actually allocating any hugepages after the next boot. The note says they are working on it and I really hope they will fix this rather sooner than later.

And in case you are wondering what all this fuss is about and why I do not simply want to live with good old standard shared memory, let Mark Bobak convince you that if you are not using Huge Pages, you are doing it wrong.

setting up Huge Pages with UEK R3 (kernel 3.8)

I came across a little hiccup when configuring huge pages on my oracle linux 6.5 playground machine with the latest unbreakable linux kernel. Part of the documentation and also this blog post by Tim Hall have a nifty script that outputs the setting one should tweak to set the nr_hugepages parameter correctly. Unfortunately, this script fails on the most recent version of the kernel shipped with Oracle Linux:

[root@linuxbox ~]# /usr/local/bin/hugepages_setting.sh 
Unrecognized kernel version 3.8. Exiting.

Fortunately, the fix is really easy. There is a wrapper script that will fake the output of uname back to 2.6 and the method of setting the parameter with vm.nr_hugepages has stayed the same.

[root@linuxbox ~]$ yum install uname26
[root@linuxbox ~]$ uname26 /usr/local/bin/hugepages_setting.sh 
Recommended setting: vm.nr_hugepages = 516

But even better than faking your way around this would be to modify the script and to accept 3.8 as a valid kernel version like this.

#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`
# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}`
# Start from 1 pages to be on the safe side and guarantee 1 free HugePage
NUM_PG=1
# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"`
do
   MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
   if [ $MIN_PG -gt 0 ]; then
      NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
   fi
done
# Finish with results
case $KERN in
   '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
          echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
   '2.6' | '3.8' ) echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
      *) echo "Unrecognized kernel version $KERN. Exiting." ;;
esac

Btw: my test returned only 1 when I first ran it. After some advice by Frits Hoogland on twitter and a little study of this blog post by Tanel Poder it became quite obvious. I forgot to unset the MEMORY_TARGET parameter in my spfile and thus the shared memory was allocated in chunks visible in /dev/shm but not through ‘ipcs -m’. I do not trust AMM anyway, set MEMORY_MANAGEMENT to 0 and went on.

troubleshooting a CPU eating java process

Just yesterday we had a situation where a java process (a JBOSS app server) started to raise the CPU load on a machine significantly. Load average increased and top showed that the process consumed a lot of CPU. The planned “workaround”? Blame something (garbage collection is always an easy victim), restart and hope for the best. After all, there is not much to diagnose here if there is nothing in the logs, right? wrong! There are a few things that we can do with a running java vm to diagnose these issues from a solaris or linux commandline and find the root cause.

This system is Solaris 11 with Java 7 but the troubleshooting steps and tools are the same on older versions of Solaris and Java aswell as on Linux.

turn on thread-view in top

The first thing we can do is to turn on the thread-view in top. Just start top and hit “H” or start with the “-t” flag. Here is an output from my machine yesterday:

load averages:  5.11,  5.23,  5.27;               up 52+01:50:08 15:13:15
266 threads: 260 sleeping, 6 on cpu
CPU states: 36.7% idle, 62.8% user,  0.5% kernel,  0.0% iowait,  0.0% swap
Kernel: 1975 ctxsw, 168 trap, 2246 intr, 1399 syscall, 6 flt
Memory: 128G phys mem, 3915M free mem, 132G total swap, 132G free swap

   PID  LWP USERNAME PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 26313 9999 portrix    0    0   11G   11G cpu/1   22.6H 12.71% java
 26313 9999 portrix    0    0   11G   11G cpu/5   23.1H 12.64% java
 26313 9999 portrix    0    0   11G   11G cpu/0   22.9H 12.62% java
 26313 9999 portrix    0    0   11G   11G cpu/3   22.7H 12.54% java
 26313   65 portrix    0    0   11G   11G cpu/6   25.7H 12.54% java
 26313 9999 portrix   59    0   11G   11G sleep   69:41  0.19% java
 26313 9999 portrix   60    0   11G   11G sleep    0:02  0.05% java
 26313   35 portrix   59    0   11G   11G sleep  297:05  0.03% java
 26313 9999 portrix   59    0   11G   11G sleep   74:35  0.03% java
 26313 9999 portrix   59    0   11G   11G sleep   35:29  0.01% java
 26313   21 portrix   59    0   11G   11G sleep   46:46  0.01% java
  9705    1 portrix   59    0 4400K 2868K cpu/4    0:00  0.00% top
 26313   11 portrix   59    0   11G   11G sleep    5:51  0.00% java
 26313   18 portrix   59    0   11G   11G sleep   22:15  0.00% java
 25257    1 webservd  59    0 8024K 1188K sleep    6:23  0.00% nginx
 25258    1 webservd  59    0 8024K  996K sleep    6:22  0.00% nginx

What can we read from this?

  • The average load has been steady around 5 for a while. Meaning that 5 processes or LWPs have been and are still running on CPU
  • process 26313 has 5 LWPs (threads) that are hogging CPU. This is an 8 core machine and 12.5% CPU utilization in top represents 100% of one of those CPU cores. So these 5 threads are truly stuck spinning around and burning as much CPU as they can.
  • I could not figure out why the LWP number is 9999 for some of these LWPs

Analyze GC behaviour with jstat

JAVA comes with a few commandline utilities that can be used to investigate some of the VM properties. One of those utilitiies is jstat which can be used (among other things) to read statistics about the garbage collector. I printed those stats in intervals of 10 seconds:

portrix@jboss42 ~ $ /usr/java/bin/jstat -gc 26313 10s
S0C      S1C       S0U   S1U     EC        EU        OC         OU         PC       PU        YGC    YGCT     FGC   FGCT    GCT   
106816.0 108224.0  0.0   88819.4 3196032.0 2143395.7 6828032.0  6611615.9  131072.0 119782.1  11187  431.555  27    101.982  533.537
106816.0 108224.0  0.0   88819.4 3196032.0 2504908.0 6828032.0  6611615.9  131072.0 119782.1  11187  431.555  27    101.982  533.537
106816.0 108224.0  0.0   88819.4 3196032.0 2519750.7 6828032.0  6611615.9  131072.0 119782.1  11187  431.555  27    101.982  533.537
106816.0 108224.0  0.0   88819.4 3196032.0 2804560.2 6828032.0  6611615.9  131072.0 119782.1  11187  431.555  27    101.982  533.537
106816.0 108224.0  0.0   88819.4 3196032.0 3098403.7 6828032.0  6611615.9  131072.0 119782.1  11187  431.555  27    101.982  533.537
106816.0 108224.0  0.0   88819.4 3196032.0 3194522.2 6828032.0  6611615.9  131072.0 119782.1  11187  431.555  27    101.982  533.537

I don’t know a whole lot about the different pools and generations in java memory management and the way GC collection works but at this point I am merely looking at the levels of activity. Notice how all these numbers don’t really change by much. The numbers I am mostly interested in are the integer numbers for YGC (young genration garbage collection – this is the “cheap” one) and FGC (Full Garbage Collection – this usually halts the VM for a few seconds). Neither one increases by much – a good indicator that this java vm is not wasting a whole lot of time on garbage collection. When you see a process with heap memory problems and thus GC problems you will notice that these numbers change a whole lot. But in this case, GC had an alibi, we cannot blame it for this issue, so let’s move on to the next little troubleshooting tool.

Review running threads in a stack dump with jstack

We saw earlier that this system has 5 threads that are just spinning on CPU. It would be very useful to learn what these threads are doing and why. The best way (that I know of) to find this information is to take one or a few stack dumps of the running VM and check for threads being executed at that time. The jstack utiility will return the stack dump of each java thread along with its state. Since 5 threads are on CPU almost all the time, I am expecting to find them in the running state in all my stack dumps and hopefully also find them to show a similar stack dump each time. This is how you run jstack. It usually returns quickly and does not seem to impact the performance of the running VM by much. Wait a few seconds between the runs.

portrix@jboss42 ~ $ /usr/java/bin/jstack 26313 > /var/tmp/jstack.out
portrix@jboss42 ~ $ /usr/java/bin/jstack 26313 > /var/tmp/jstack2.out
portrix@jboss42 ~ $ /usr/java/bin/jstack 26313 > /var/tmp/jstack3.out

This saves the stack dump to three files. Have a look at them and don’t be intimidated by the amount of information. We do not need to understand what all this is about. All we are looking for is the 5 threads in running state, so search for “RUNNABLE”. Ignore the first hit, it will just be the jstack util itself:

"Attach Listener" daemon prio=3 tid=0x0000000004e43800 nid=0x505e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

There may also be some other runnable threads that we should ignore. But further down, we can find 4-5 Stacks with this information:

"http-0.0.0.0-8080-Processor108" daemon prio=3 tid=0x000000000300e000 nid=0x398f runnable [0xffff80ff9451d000]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.getEntry(HashMap.java:446)
        at java.util.HashMap.get(HashMap.java:405)
        at net.portrix.sw.backend.user.SomeCache.access(SomeCache.java:38)
...

And I could find these same stacks in the other dump files. So the threads we saw spinning in top earlier are burning their time on HashMap.getEntry. Remember, I know next to nothing about java but I can enter this into google along with something like “100% CPU” or such. This yields plenty of results. This blog post by Pieree-Hugues Charbonneau is very similar to the one I am currently writing and this article on stackoverflow explains things a bit more. Apparently HashMap is not threadsafe and can lead to locking or spinning issues when accessed concurrently and one should either use ConcurrentHashMap or serialize access to the object yourself. This information together with the code in our software that called it should be enough for development to fix the issue so we don’t have to just rely on turning things off and on again but rather fix issues.

Upgrading to Oracle Linux 6.5 and UEK3

OL 6.5 was released a few days ago to ULN and public yum, and of course I was going to upgrade some machines immediately. But first: Have I mentioned before that I really love Oracle Linux for providing ISOs and patches for free? One of our customers insists on using Red Hat instead of OL. No problem so far but a few days ago we needed to clone that VM for some testing and staging. Only problem: to do this we needed another support subscription from RH which the customer is paying for but more importantly it delayed the project since we now had to wait for procurement. This is why I appreciate the license model of OL: you can use it for free and have the option of purchasing support for the machines that are most important to you. Also, licensing is per (hardware) server, not VM. Read more about OL licensing directly from the horses belgian’s mouth.

But enough of that. yum update downloaded and patched more than 400 packages. But to update to the latest UEK kernel release 3, I had to run another yum install kernel-uek after that:

[root@tux ~]# yum install kernel-uek
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package kernel-uek.x86_64 0:3.8.13-16.2.2.el6uek will be installed
--> Processing Dependency: kernel-firmware = 3.8.13-16.2.2.el6uek for package: kernel-uek-3.8.13-16.2.2.el6uek.x86_64
--> Running transaction check
---> Package kernel-uek-firmware.noarch 0:3.8.13-16.2.2.el6uek will be installed
--> Finished Dependency Resolution
--> Running transaction check
---> Package kernel-uek.x86_64 0:2.6.39-400.17.1.el6uek will be erased
---> Package kernel-uek-firmware.noarch 0:2.6.39-400.17.1.el6uek will be erased
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================
 Package                             Arch                   Version                                   Repository                               Size
====================================================================================================================================================
Installing:
 kernel-uek                          x86_64                 3.8.13-16.2.2.el6uek                      public_ol6_UEKR3_latest                  41 M
Removing:
 kernel-uek                          x86_64                 2.6.39-400.17.1.el6uek                    @anaconda-UEK2/6.4                      101 M
 kernel-uek-firmware                 noarch                 2.6.39-400.17.1.el6uek                    @anaconda-UEK2/6.4                      5.0 M
Installing for dependencies:
 kernel-uek-firmware                 noarch                 3.8.13-16.2.2.el6uek                      public_ol6_UEKR3_latest                 1.6 M

Transaction Summary
====================================================================================================================================================
Install       2 Package(s)
Remove        2 Package(s)

Total download size: 43 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): kernel-uek-3.8.13-16.2.2.el6uek.x86_64.rpm                                                                            |  41 MB     00:01     
(2/2): kernel-uek-firmware-3.8.13-16.2.2.el6uek.noarch.rpm                                                                   | 1.6 MB     00:00     
----------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                20 MB/s |  43 MB     00:02     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : kernel-uek-firmware-3.8.13-16.2.2.el6uek.noarch                                                                                  1/4 
  Installing : kernel-uek-3.8.13-16.2.2.el6uek.x86_64                                                                                           2/4 
  Cleanup    : kernel-uek-2.6.39-400.17.1.el6uek.x86_64                                                                                         3/4 
  Cleanup    : kernel-uek-firmware-2.6.39-400.17.1.el6uek.noarch                                                                                4/4 
  Verifying  : kernel-uek-3.8.13-16.2.2.el6uek.x86_64                                                                                           1/4 
  Verifying  : kernel-uek-firmware-3.8.13-16.2.2.el6uek.noarch                                                                                  2/4 
  Verifying  : kernel-uek-2.6.39-400.17.1.el6uek.x86_64                                                                                         3/4 
  Verifying  : kernel-uek-firmware-2.6.39-400.17.1.el6uek.noarch                                                                                4/4 

Removed:
  kernel-uek.x86_64 0:2.6.39-400.17.1.el6uek                           kernel-uek-firmware.noarch 0:2.6.39-400.17.1.el6uek                          

Installed:
  kernel-uek.x86_64 0:3.8.13-16.2.2.el6uek                                                                                                          

Dependency Installed:
  kernel-uek-firmware.noarch 0:3.8.13-16.2.2.el6uek                                                                                                 

Complete!

And after a reboot I was greated by the release banner of UEK R3

[root@tux ~]# uname -a
Linux tux.portrix.net 3.8.13-16.2.2.el6uek.x86_64 #1 SMP Tue Nov 26 08:41:44 PST 2013 x86_64 x86_64 x86_64 GNU/Linux

The biggest new feature (to me) is the built-in support for DTrace. I know DTrace from Solaris and definitely need to play with it on Linux.