Chasing down why installing the kernel segfaulted

I’ve been running a server for continuous integration targeting a specific architecture. If you’ve been running servers recently, you’ll know about the constant treadmill of kernel patches due to widely publicized issues like Copy Fail. Usually, these go pretty smoothly (except the previous kernel, which had a PowerPC specific build regression). Now this time, when running make install for the kernel, I noticed a very curious message:

# make install
  INSTALL /boot
/usr/bin/dracut: line 3125: 3644490 Segmentation fault         hardlink "$initdir" 2>&1
     3644491 Done                       | ddebug
Generating grub configuration file ...
[...]

That’s pretty concerning. Now I’m worried about if this will even work when I reboot into it. Let’s try to get more detail; the kernel makefile uses V=1 to show extra information.

# make install V=1 
make --no-print-directory -C /usr/src/linux-6.18.32-gentoo-r1 \
-f /usr/src/linux-6.18.32-gentoo-r1/Makefile install
# INSTALL /boot
  unset sub_make_done; ./scripts/install.sh
/usr/bin/dracut: line 3125: 3648755 Segmentation fault         hardlink "$initdir" 2>&1
     3648756 Done                       | ddebug
Generating grub configuration file ...

Well, that didn’t tell us much except it runs a script for the actual install part. Let’s take a look at the relevant part.

# User/arch may have a custom install script
for file in "${HOME}/bin/${INSTALLKERNEL}"              \
            "/sbin/${INSTALLKERNEL}"                    \
            "${srctree}/arch/${SRCARCH}/install.sh"     \
            "${srctree}/arch/${SRCARCH}/boot/install.sh"
do
        if [ ! -x "${file}" ]; then
                continue
        fi

        # installkernel(8) says the parameters are like follows:
        #
        #   installkernel version zImage System.map [directory]
        exec "${file}" "${KERNELRELEASE}" "${KBUILD_IMAGE}" System.map "${INSTALL_PATH}"
done

This is effectively a wrapper around installkernel, which is a custom distribution-specific program (the kernel supplies a generic equivalent that’s not as good if you lack it) that handles things like generating an initrd (in this case, delegating that to dracut) and updating the boot loader. In this case, Gentoo’s version takes a -v flag, so let’s add that and see if we can get more interesting information:

exec "${file}" -v "${KERNELRELEASE}" "${KBUILD_IMAGE}" System.map "${INSTALL_PATH}"

OK, let’s run make install V=1 again:

dracut[I]: *** Hardlinking files ***
/usr/bin/dracut: line 3125: 3403396 Segmentation fault         hardlink "$initdir" 2>&1
     3403397 Done                       | ddebug
dracut[I]: *** Hardlinking files done ***

Well, it looks like it’s contained in this specific step. It seems we need to take a look at dracut itself. The relevant chunk (with line 3125 annotated):

# Hardlink is mtime-sensitive; do it after the above clamp.
if [[ $do_hardlink == yes ]] && command -v hardlink > /dev/null; then
    dinfo "*** Hardlinking files ***"
    hardlink "$initdir" 2>&1 | ddebug
    dinfo "*** Hardlinking files done ***"

    # Hardlink itself breaks mtimes on directories as we may have added/removed
    # dir entries. Fix those up.
    if [[ ${SOURCE_DATE_EPOCH-} ]] && [[ $CPIO != 3cpio ]]; then
        clamp_mtimes "$initdir" -type d
    fi
fi # this is line 3125

It’s the fi that ends this block. Anyways, hardlink is clearly implicated here. A nicer person would replace hardlink on the path, but in this case, I’m just going to modify the dracut executable. (Don’t do this at home!) I’ll invoke this with gdb instead; and remove the piping to dracut’s debug logging:

    #hardlink "$initdir" 2>&1 | ddebug
    gdb --args hardlink "$initdir"

Time to run make again. When we get the gdb prompt, let’s run it:

dracut[I]: *** Hardlinking files ***
GNU gdb (Gentoo 17.1 vanilla) 17.1
Copyright (C) 2025 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from hardlink...
Reading symbols from /usr/lib/debug/usr/bin/hardlink.debug...
(gdb) catch signal 
Catchpoint 1 (standard signals)
(gdb) run
Starting program: /usr/bin/hardlink /var/tmp/dracut.dEcmlh1/initramfs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) 

Wait, where’d the program even go? gdb needs the process to exist still to inspect it and- wait, did the kernel kill it perhaps? If we check dmesg…

[1199626.054903] BUG: Unable to handle kernel data access at 0xc0403effffffffc8
[1199626.054921] Faulting instruction address: 0xc000000000396cb4
[1199626.054927] Oops: Kernel access of bad area, sig: 11 [#15]
[1199626.054932] BE PAGE_SIZE=4K MMU=Hash  SMP NR_CPUS=32 NUMA pSeries
[1199626.054939] Modules linked in: vsock_diag vmx_crypto ibmveth pseries_rng rng_core fuse vsock_loopback vmw_vsock_virtio_transport_common vsock sr_mod cdrom nx_crypto
[1199626.054969] CPU: 22 UID: 0 PID: 3545486 Comm: hardlink Tainted: G S    D             6.18.26-gentoo-ppc #1 VOLUNTARY 
[1199626.054981] Tainted: [S]=CPU_OUT_OF_SPEC, [D]=DIE
[1199626.054984] Hardware name: IBM,8286-42A POWER8 (architected) 0x4b0201 0xf000004 of:IBM,FW860.90 (SV860_226) hv:phyp pSeries
[1199626.054992] NIP:  c000000000396cb4 LR: c000000000396e68 CTR: c000000000396e40
[1199626.054998] REGS: c00000018a4a7840 TRAP: 0380   Tainted: G S    D              (6.18.26-gentoo-ppc)
[1199626.055006] MSR:  8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 44002242  XER: 20000000
[1199626.055023] CFAR: c000000000396e64 IRQMASK: 0 
                 GPR00: c0003d00002b0acc c00000018a4a7ae0 c0000000018ad100 fffffffffffffff0 
                 GPR04: fffffffffffffff0 0000000000000001 c000000637bc0c28 0000000000000000 
                 GPR08: 0000001ffd4cd000 c0003f0000000000 0000000000000000 c0003d00002b48a8 
                 GPR12: c000000000396e40 c00000002ec44800 0000000000000000 0000000000000000 
                 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
                 GPR20: 0000000000000000 0000000000000000 0000000000000000 00000001000300c8 
                 GPR24: 000000010002efe0 0000000000000000 0000000000000001 c00000011d91a480 
                 GPR28: 0000000000000000 c00000000271cd00 c0003d00002b80b8 c0403effffffffc0 
[1199626.055106] NIP [c000000000396cb4] __ksize+0x34/0x190
[1199626.055117] LR [c000000000396e68] kfree_sensitive+0x28/0x80
[1199626.055124] Call Trace:
[1199626.055127] [c00000018a4a7ae0] [c00000018a4a7b20] 0xc00000018a4a7b20 (unreliable)
[1199626.055136] [c00000018a4a7b10] [c00000018a4a7b80] 0xc00000018a4a7b80
[1199626.055143] [c00000018a4a7b40] [c0003d00002b0acc] nx_crypto_ctx_shash_exit+0x24/0x60 [nx_crypto]
[1199626.055154] [c00000018a4a7b70] [c00000000097af78] crypto_shash_exit_tfm+0x28/0x40
[1199626.055165] [c00000018a4a7b90] [c00000000096f168] crypto_destroy_tfm+0x98/0x140
[1199626.055176] [c00000018a4a7bd0] [c000000000978d60] crypto_exit_ahash_using_shash+0x20/0x40
[1199626.055186] [c00000018a4a7bf0] [c00000000096f168] crypto_destroy_tfm+0x98/0x140
[1199626.055196] [c00000018a4a7c30] [c000000000998b5c] hash_release+0x1c/0x30
[1199626.055207] [c00000018a4a7c50] [c000000000996f58] alg_sock_destruct+0x38/0x60
[1199626.055216] [c00000018a4a7c80] [c0000000010bed98] __sk_destruct+0x48/0x2b0
[1199626.055227] [c00000018a4a7cc0] [c0000000009970a8] af_alg_release+0x58/0xb0
[1199626.055237] [c00000018a4a7cf0] [c0000000010b3918] __sock_release+0x68/0x150
[1199626.055247] [c00000018a4a7d70] [c0000000010b3a20] sock_close+0x20/0x40
[1199626.055257] [c00000018a4a7d90] [c0000000004549b0] __fput+0x110/0x3a0
[1199626.055265] [c00000018a4a7de0] [c00000000044df48] sys_close+0x48/0xa0
[1199626.055275] [c00000018a4a7e10] [c000000000029d40] system_call_exception+0x140/0x2d0
[1199626.055284] [c00000018a4a7e50] [c00000000000c354] system_call_common+0xf4/0x258
[1199626.055295] ---- interrupt: c00 at 0x3ffff7def394
[1199626.055300] NIP:  00003ffff7def394 LR: 00003ffff7def3f0 CTR: 0000000000000000
[1199626.055305] REGS: c00000018a4a7e80 TRAP: 0c00   Tainted: G S    D              (6.18.26-gentoo-ppc)
[1199626.055312] MSR:  800000000280f032 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI>  CR: 24002242  XER: 00000000
[1199626.055334] IRQMASK: 0 
                 GPR00: 0000000000000006 00003fffffffb820 00003ffff7f87100 0000000000000003 
                 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
                 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
                 GPR12: 0000000000000000 00003ffff7ff37e0 0000000000000000 0000000000000000 
                 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
                 GPR20: 0000000000000000 0000000000000000 0000000000000000 00000001000300c8 
                 GPR24: 000000010002efe0 0000000000000000 0000000000000001 0000000100030070 
                 GPR28: 00003fffffffc078 0000000000000002 00003ffff7f802d8 00000001000300c8 
[1199626.055412] NIP [00003ffff7def394] 0x3ffff7def394
[1199626.055417] LR [00003ffff7def3f0] 0x3ffff7def3f0
[1199626.055421] ---- interrupt: c00
[1199626.055425] Code: 38426480 28230010 418200c4 3d2200df fbe1fff8 f821ffd1 787fa402 7c641b78 3929c820 7bff3664 e9290000 7fe9fa14 <e95f0008> 71480001 408200a4 895f0030 
[1199626.055456] ---[ end trace 0000000000000000 ]---

[1199626.059073] note: hardlink[3545486] exited with irqs disabled

Oh no. Why is hardlink causing a kernel oops, and why is it doing it in the crypto subsystem? The kernel’s new nemesis AF_ALG shows up, best known for… Copy Fail. This almost certainly isn’t Copy Fail, but I wouldn’t be surprised if the fix for Copy Fail may have introduced a regression. Let’s figure out why hardlink is even using this. Instead of gdb, let’s try putting it under strace in our hacked up dracut. In our horribly long syscall trace:

close(5)                                = 0
close(0)                                = 0
close(0)                                = -1 EBADF (Bad file descriptor)
close(4)                                = 0
close(3)                                = ?
+++ killed by SIGSEGV +++
/usr/bin/dracut: line 3128: 3561819 Segmentation fault         (core dumped) strace hardlink "$initdir"

Well, the kernel oopsed in the middle of the close syscall, giving us a very funny SIGSEGV. What’s the last thing that created an fd #3?

socket(AF_ALG, SOCK_SEQPACKET, 0)       = 3
bind(3, {sa_family=AF_ALG, salg_type="hash", salg_feat=0, salg_mask=0, salg_name="sha256"}, 88) = 0
accept(3, NULL, NULL)                   = 4

Oh great, it actually is using AF_ALG. Why would something that makes hardlinks want to use the kernel’s buggy crypto acceleration path? It’s not IPsec, after all. Well, if we look for AF_ALG in util-linux, the package where hardlink comes from, there’s a utility function for file comparisons (in lib/fileeq.c). If we look at the first big comment:

/*
 * compare file contents
 *
 * The goal is to minimize amount of data we need to read from the files and be
 * ready to compare large set of files, it means reuse the previous data if
 * possible. It never reads entire file if not necessary.
 *
 * The other goal is to minimize number of open files (imagine "hardlink /"),
 * the code can open only two files and reopen the file next time if
 * necessary.
 *
 * This code supports multiple comparison methods. The very basic step which is
 * generic for all methods is to read and compare an "intro" (a few bytes from
 * the beginning of the file). This intro buffer is always cached in 'struct
 * ul_fileeq_data', this intro buffer is addressed as block=0. This primitive
 * thing can reduce a lot ...
 *
 * The next steps depend on selected method:
 *
 *  * memcmp method: always read data to userspace, nothing is cached, directly
 *  compare file contents; fast for small sets of small files.
 *
 *  * Linux crypto API: zero-copy method based on sendfile(), data blocks are
 *  sent to the kernel hash functions (sha1, ...), and only hash digest is read
 *  and cached in userspace. Fast for large set of (large) files.
 *
 * [...]
 */

Cool. It’s an optimization path that exposes a fragile kernel subsystem to just do… hashing. The actual bit that sets up the socket in that file is in init_crypto_api, and the logic to use it is gated behind a USE_FILEEQ_CRYPTOAPI define. Since there’s a fallback back, can we disable this easily to use the memcmp behaviour instead, which certainly should be OK? Well, if we check include/fileeq.h, which exposes the API surface for that:

#if defined(__linux__) && defined(HAVE_LINUX_IF_ALG_H)
# define USE_FILEEQ_CRYPTOAPI 1
#endif

Nice, it’s hardcoded to always be available with newer kernels effectively; no build system options (and thus no USE flags either). Well, let’s just turn it off. Since I’m using Gentoo on this CI server, it’s trivial to fix this. Put a patch containing the below into /etc/portage/patches/sys-apps/util-linux/no-af-alg.patch and rebuild the package with emerge -av sys-apps/util-linux:

diff --git a/include/fileeq.h b/include/fileeq.h
index 90b8d5118..e4d2dfae2 100644
--- a/include/fileeq.h
+++ b/include/fileeq.h
@@ -11,7 +11,7 @@
 #include <stdbool.h>

 #if defined(__linux__) && defined(HAVE_LINUX_IF_ALG_H)
-# define USE_FILEEQ_CRYPTOAPI 1
+#// define USE_FILEEQ_CRYPTOAPI 1
 #endif

 /* Number of bytes from the beginning of the file we always

With our strace invocation still in our hacked up dracut, let’s run it again:

close(3)                                = 0                             
close(0)                                = 0                             
close(0)                                = -1 EBADF (Bad file descriptor)
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x2), ...}) = 0                                                                                    
write(1, "Mode:                     real\n", 31Mode:                     real                                                                            
) = 31                                                                      
write(1, "Method:                   memcmp"..., 33Method:                   memcmp                                                                       
) = 33                                                                                                                                                   
write(1, "Files:                    1038\n", 31Files:                    1038                                                                            
) = 31                                                                                                                                                   
write(1, "Linked:                   3 file"..., 34Linked:                   3 files                                                                      
) = 34                                                                      
write(1, "Compared:                 0 xatt"..., 35Compared:                 0 xattrs                                                                     
) = 35                                                                      
write(1, "Compared:                 416 fi"..., 36Compared:                 416 files                                                                    
) = 36                                                                                                                                                   
write(1, "Saved:                    5.74 K"..., 35Saved:                    5.74 KiB                                                                     
) = 35                                                                                                                                                   
write(1, "Duration:                 1.0702"..., 43Duration:                 1.070276 seconds                                                             
) = 43                                                                      
exit_group(0)                           = ?                                                                                                              
+++ exited with 0 +++                                                       
dracut[I]: *** Hardlinking files done *** 

Yay, it works. It looks like it also bombed out near the end doing cleanup and making a final report, so it turns out it probably would have worked all along. Now I need to figure out why the kernel oopsed at all…

(I also suspect that in the efforts to defang AF_ALG, kernel people will make it not be zero copy in the future, rendering util-linux using it moot. Patches to remove AF_ALG from util-linux may be a good idea.)

Leave a Reply

Your email address will not be published. Required fields are marked *