Illegal Instruction after 12.4 upgrade (i386).

Hi there. I have a WRAP board which was running 12.3 (i386) and recently upgraded it to 12.4 but unfortunately sudo doesn't work any more. It throws a SIGILL for an illegal instruction in what looks to be /libexec/ld-elf.so.7.

I know technically sudo is a package/port, but I've re-installed the package a few times and no change and the backtrace looks to be in the installed libs. I've even built it from /usr/ports and the new binary also throws a core dump. The backtrace looks as follows:

Code:
# gdb ./sudo sudo.core
GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "i386-portbld-freebsd12.4".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./sudo...
[New LWP 100295]
Core was generated by `./sudo id'.
Program terminated with signal SIGILL, Illegal instruction.
Privileged opcode.
#0  0x0040f590 in getenv ()
(gdb) where
#0  0x0040f590 in getenv ()
#1  0x2082d838 in ?? () from /lib/libc.so.7
#2  0x2082e45e in ?? () from /lib/libc.so.7
#3  0x2043ceef in ?? () from /libexec/ld-elf.so.1
#4  0x2043bbb5 in ?? () from /libexec/ld-elf.so.1
#5  0x2043961e in ?? () from /libexec/ld-elf.so.1

I've checked the libc and ld-elf shared libraries against the release and they both match the pristine i386 release in terms of size and sum. I suspect the next steps are to produce a libc and ld-elf with symbols but this isn't a development machine so it'll be a bit tricky. build-world would not be reasonable as the board has 512MB. I'm at a loss for why there's an illegal instruction in there and there's nothing untoward in ENV. I'll keep poking around, but I was hoping someone might have some suggestions.
 
Issue happened in frame 0, getenv() function. While one can be never 100% sure just from the backtrace judging from the address (0x0040f590) this is a function within binary. info proc map would give an idea. But we do have a source code, this function is hooked by sudo (env_hooks.c).

Can you show the output of the x/3i $pc after you executed gdb? That will give us an idea what that illegal instruction is. Even then though it's not possible to tell how that error came to be. If you could share the sudo and core dump here I can have a better look.

Did you try to recompile the sudo port after upgrade? Did you try one from packages too ?

Few times when I helped somebody here with the similar problem on i386/ports problem was actually caused by incorrectly generated code by clang (e.g. apache/php @ i386)

I don't know why that is but can you recompile sudo with the gcc and test ? I think it's worth a test.
Code:
pkg install gcc
export CC=/usr/local/bin/gcc
/usr/ports/security/sudo
make install clean
 
Wow. Thanks for the detailed response. I’m not much of a gdb user, hence the misunderstanding of the backtrace. I didn’t realise there were weird edge cases with clang. You’re right - that’s what was used to compile sudo. I’ll install gcc and retry as suggested. Thanks again.

I’ll also upgrade to 13.1 but will continue to investigate.
 
Yesterday I tested this on 12.4 i386 VM, I compiled sudo normally using clang -- all was ok. Maybe I should have kept the gcc test for later not to confuse you too much. I mean I did see those edge cases where it really was a problem and solution was to use gcc but in this case it may have been something else.

That's why I asked if you did a proper upgrade and tried to reinstall the package just to see if that helps. If not sharing the binary/coredump would help a lot in troubleshooting this further.
Can you share the output of that gdb command x/4i $pc so we can see what the illegal instruction was ?
 
Here are the instructions around the PC:

(gdb) x/4i $pc => 0x40f590 <getenv>: endbr32 0x40f594 <getenv+4>: push %ebp 0x40f595 <getenv+5>: mov %esp,%ebp 0x40f597 <getenv+7>: push %ebx

From what I can see, that's a 32-bit instruction so I don't know why it's illegal. I'll try recompiling with GCC.
 
Digging a bit further, the WRAP board uses an AMD Geode chip which doesn't seem to support that ENDBR32 instruction. I'm installing GCC and will try with that rather than clang.
 
I didn't google around info for your CPU but yes, this instruction is supported since certain models of CPUs (fairly old nowadays though).
I actually opened a bug report towards llvm a year or so ago - somebody here on forums had the same problems with Intel's P2 (I think) CPU.

You could play around with the march/mcpu variables and force it to generate the code for even older cpu. Or test with gcc, that would be probably the fastest option there.
 
Back
Top