mysql-server SIGILL on arm64/aarch64 with FreeBSD 14.1 for 8.0, 8.1, and 8.4 - no activity?

I first reported this as bugzilla bug 280165 on July 6th for mysql80-server-8.0.35_1 and mysql81-server-8.1.0 and subsequently added mysql84-server-8.4.0 on August 4th starting from a brand new image of the OS, and it was assigned to someone, but I've heard nothing from them. The daemon gets a SIGILL during startup, so it's pretty severe. I did some research and got stack traces. There's also a file ownership problem in the pkg. Is noone else experiencing this problem? Is there anything I can do to help move this along? What's the next step? I tried to reach out to Oracle, but all they would say is that they don't support FreeBSD.

I really need to resolve this because something very bizarre has happened on my FreeBSD 13.1 system. Even though the database is still up and running, the pkg and all its file (links) have disappeared. I am assuming the system retains a phantom link count to the inode as long as a program is running, like it does for a regular open, because top/htop shows /usr/local/libexec/mysqld even though "ls" can't see it. My only guess is that when I did a pkg autoremove, it had unintended consequences, but for the moment, I am afraid to reboot the system or mess with it in any way since I won't be able to restart the database. There's a fair chance that I could reboot and reinstall the mysql80-server, but it's not a risk I'm ready to take.
 
This sounds like a good time to try and make a backup of your database(s)!

Do you still have the mysqldump utility?
 
Also, as long as you still have the datadir (by default /var/db/mysql/) then it's not such a big deal if you have lost the mysql packages.

You can always install the package(s) again without deleting your datadir. (Copy the datadir somewhere safe first, if you want to be sure. Just make sure mysql-server isn't running when you make the copy.)
 
Yes, I debated whether I should have mentioned that. Thanks for bringing it up, though - I appreciate your interest. I did use mysqldump to dump all the databases, and I do still have the data directory (I moved it to its own filesystem mounted as /mysql/mysql though that doesn't matter) - but not being 100% confident I can restart the server does make me a bit nervous.
 
SIGILL is illegal instruction signal, of course.
So, you need the function name and the instruction.
Maybe it's a miscompilation, maybe your CPU lacks a feature, etc.

BTW, your bugzilla link is not clickable ;)
 
Is this using packages or are you compiling it via ports/poudriere?
If the latter, what's in /etc/make.conf and are you passing something else non default?
 
Ah. The link is not clickable because it didn't occur to me to make it a link. Here's the link: PR 280165
I am using packages(pkg). I do not have a environment set up to compile ports, but will certainly consider doing so. I don't know how to get any more information out of the fault than there is - it may be that the executable is stripped, because there aren't a lot of symbolic references in the stack dump. I am very open to suggestions. If you point me in the right direction, I'll see about building an environment to compile ports, but I assume there's value in getting the package fixed, too.

As an additional note - while my production instance moved the database to /mysql/mysql the test instance I set up still has it at /var/db/mysql - I tried to be as "vanilla" as possible for that.
 
I cannot find any evidence of there being a core dump despite the fact that kern.coredump = 1 and kern.corefile = %N.core and I did a
find / -name '*.core' -print - I suspect that this is due to the fact that the program it catching the signal itself and calling its own stack trace routine. I'm not entirely sure how to run this under gdb since it is being run by an rc.d script but I'll give that a try. Ah, in fact, I found an article -
7.9.1.4 Debugging mysqld under gdb
 
After a stupid typo that caused mysqld to run out of swap space, I was able to reproduce the problem with gdb - I modified /usr/local/etc/rc.d/mysql-server to capture the parameters with which it was being called. I created a .gdb file with the options suggested by the MySQL reference manual entry mentioned above. As I suspected from the lack of identifiers in the stack trace, there're no debugging symbols in the executable. I'll put the gdb output in the bug report, as it seems relevant. I'll edit out all the gdb boilerplate.

What's next?
 
Well, I suppose I can at least post the trapped fault itself from gdb here to save you having to go to the bug unless you need more detail.
Code:
Thread 3 received signal SIGILL, Illegal instruction.
Illegal trap.
[Switching to LWP 100262 of process 8872]
0x0000000003291848 in hardware::crc32_using_pclmul(unsigned char const*, unsigned long) ()
 
Oh. I thought the "crc32_using_pclmul(unsigned char const*, unsigned long) ()" was the info we were looking for.
It disassembles quite a bit. How much do you want? Here's what it's showing me near the fault address - but how can it disassemble someting if it's an illegal instruction?
Code:
(gdb) disas
Dump of assembler code for function _ZN8hardware18crc32_using_pclmulEPKhm:
...
--Type <RET> for more, q to quit, c to continue without paging--
   0x0000000003291820 <+716>:   crc32cx w10, w10, x16
   0x0000000003291824 <+720>:   ldp     x15, x18, [x0, #320]
   0x0000000003291828 <+724>:   crc32cx w13, w13, x1
   0x000000000329182c <+728>:   ldp     x17, x16, [x0, #432]
   0x0000000003291830 <+732>:   fmov    d0, x10
   0x0000000003291834 <+736>:   crc32cx w13, w13, x15
   0x0000000003291838 <+740>:   crc32cx w10, w13, x18
   0x000000000329183c <+744>:   ldp     x13, x15, [x0, #448]
   0x0000000003291840 <+748>:   crc32cx w14, w14, x17
   0x0000000003291844 <+752>:   crc32cx w14, w14, x16
=> 0x0000000003291848 <+756>:   pmull   v0.1q, v1.1d, v0.1d
   0x000000000329184c <+760>:   fmov    d1, x10
   0x0000000003291850 <+764>:   crc32cx w10, w14, x13
   0x0000000003291854 <+768>:   ldp     x13, x14, [x0, #464]
   0x0000000003291858 <+772>:   crc32cx w10, w10, x15
   0x000000000329185c <+776>:   pmull   v1.1q, v2.1d, v1.1d
   0x0000000003291860 <+780>:   crc32cx w10, w10, x13
   0x0000000003291864 <+784>:   ldp     x15, x13, [x0, #480]
   0x0000000003291868 <+788>:   crc32cx w10, w10, x14
   0x000000000329186c <+792>:   eor     v0.16b, v1.16b, v0.16b
   0x0000000003291870 <+796>:   ldr     x14, [x0, #496]
   0x0000000003291874 <+800>:   add     x0, x0, #0x1f8
Should I put this in the bug report, or don't we know enough yet?
 
You can try to recompile mysql-server. Modify
databases/mysql84-server/files/patch-storage_innobase_ut_crc32.cc and replace
Code:
return capabilities & HWCAP_CRC32;
with
Code:
return false;
 
Wow. A lot to process here. I'll try your suggestions, but it will take me a while. I've never tried to compile from ports before.

Should any of this be going into the bug report?

Does this give the information regarding features that you wanted?

May 31 11:25:25 generic kernel: CPU 0: ARM Cortex-A53 r0p4 affinity: 0
May 31 11:25:25 generic kernel: Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG>
May 31 11:25:25 generic kernel: Instruction Set Attributes 0 = <CRC32>
May 31 11:25:25 generic kernel: Instruction Set Attributes 1 = <>
May 31 11:25:25 generic kernel: Instruction Set Attributes 2 = <>
May 31 11:25:25 generic kernel: Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
May 31 11:25:25 generic kernel: Processor Features 1 = <>
May 31 11:25:25 generic kernel: Trying to mount root from ufs:/dev/ufs/rootfs [rw]...
May 31 11:25:25 generic kernel: Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA>
May 31 11:25:25 generic kernel: Memory Model Features 1 = <8bit VMID>
May 31 11:25:25 generic kernel: Memory Model Features 2 = <32bit CCIDX,48bit VA>
May 31 11:25:25 generic kernel: Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
May 31 11:25:25 generic kernel: Debug Features 1 = <>
May 31 11:25:25 generic kernel: Auxiliary Features 0 = <>
May 31 11:25:25 generic kernel: Auxiliary Features 1 = <>
May 31 11:25:25 generic kernel: AArch32 Instruction Set Attributes 5 = <CRC32,SEVL>
May 31 11:25:25 generic kernel: AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
May 31 11:25:25 generic kernel: AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>
May 31 11:25:25 generic kernel: CPU 1: ARM Cortex-A53 r0p4 affinity: 1
May 31 11:25:25 generic kernel: CPU 2: ARM Cortex-A53 r0p4 affinity: 2
May 31 11:25:25 generic kernel: CPU 3: ARM Cortex-A53 r0p4 affinity: 3
 
Wow. A lot to process here. I'll try your suggestions, but it will take me a while. I've never tried to compile from ports before.

Should any of this be going into the bug report?

Does this give the information regarding features that you wanted?

May 31 11:25:25 generic kernel: CPU 0: ARM Cortex-A53 r0p4 affinity: 0
May 31 11:25:25 generic kernel: Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG>
May 31 11:25:25 generic kernel: Instruction Set Attributes 0 = <CRC32>
May 31 11:25:25 generic kernel: Instruction Set Attributes 1 = <>
May 31 11:25:25 generic kernel: Instruction Set Attributes 2 = <>
May 31 11:25:25 generic kernel: Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
May 31 11:25:25 generic kernel: Processor Features 1 = <>
May 31 11:25:25 generic kernel: Trying to mount root from ufs:/dev/ufs/rootfs [rw]...
May 31 11:25:25 generic kernel: Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA>
May 31 11:25:25 generic kernel: Memory Model Features 1 = <8bit VMID>
May 31 11:25:25 generic kernel: Memory Model Features 2 = <32bit CCIDX,48bit VA>
May 31 11:25:25 generic kernel: Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
May 31 11:25:25 generic kernel: Debug Features 1 = <>
May 31 11:25:25 generic kernel: Auxiliary Features 0 = <>
May 31 11:25:25 generic kernel: Auxiliary Features 1 = <>
May 31 11:25:25 generic kernel: AArch32 Instruction Set Attributes 5 = <CRC32,SEVL>
May 31 11:25:25 generic kernel: AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
May 31 11:25:25 generic kernel: AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>
May 31 11:25:25 generic kernel: CPU 1: ARM Cortex-A53 r0p4 affinity: 1
May 31 11:25:25 generic kernel: CPU 2: ARM Cortex-A53 r0p4 affinity: 2
May 31 11:25:25 generic kernel: CPU 3: ARM Cortex-A53 r0p4 affinity: 3
Your CPU doesn't have the AES+PMULL features. So it's normal to get a SIGILL.
You can update the PR if the suggested fix works for you.
 
Right! It's a perfectly good instruction for ARM64 in general, just not my particular silicon. That's why it can disassemble it. That and the code change you suggest seem to make perfect sense. I'll be suprised if this doesn't fix it.

Unfortunately, this being my first attempt to compile anything from ports, I may have done some serious foot-shooting. I am using the instructions I found on Chapter 4 in the handbook after initially finding Installing a Port on FreeBSD which relies on the deprecated "portsnap" utility.

I pulled the git repository and checked out the branch, but I think I got "HEAD" rather than the one that the pkg was based on, and consequently, I am compiling 8.4.2 rather than 8.4.0. Also, I changed some of the configuration options - I elected to have InnoDB be statically linked (since that's all I use) and unselected NLS, all of which will make this SLIGHTLY less than an apples-to-apples comparison assuming I ever get the darn thing compiled and linked. It had an INSANE number of dependencies it's working on - and I swear we've compiled cmake several times - I suspect that is partly because I went with the latest rather than 8.4.0, so the things I had installed from pkg didn't "count". And - speaking of configuration options, is there a way to NOT get the dialog boxes? I probably would have been done with this compile long ago if it didn't stop and pop up a dialog box asking me about configuration options. Of course, with all these dependencies, I had NO idea what the config options should be for most of them, so for all except msqld itself, I just left them alone - but that still means that it crunches away for an unknown amount of time and then stops and waits for input from me, when I just want it to keep compiling.

Assuming this turns out to fix the problem, and I think there's a high probability that will be true, how can I make the update to the PR as helpful as possible? A diff of a diff? That doesn't seem very helpful.

BTW, for anyone who cares - it is doing a parallel make so we're maxing all four CPUs on the RPI, but mounted vertically along the long edge, and with heatsinks also aligned vertically, the max CPU core temp I have seen is 66.7C with totally passive cooling.
 
I'm concerned that something as gone drastically wrong. There are two instances of cc++ that have accumulated over five HOURS of CPU time. Other than that, virtually nothing is going on the system - no disk IO, memory is only about half used, almost no page faults - I don't get it. The process did get interrupted last night (SIGHUP) and I tried to restart it by simply running "make install" agai - but maybe I have to do a "make clean" and start over? I am quite baffled at the moment.
Code:
44989  0  IW+      0:00.00 make install
45059  0  IW+      0:00.00 make CONFIG_DONE_MYSQL84-SERVER=1 /usr/ports/databases/mysql84-server/work/.install_done.mysql._usr_local
48310  0  I+       0:00.00 /bin/sh -e -c (cd /usr/ports/databases/mysql84-server/work/.build; if ! /usr/bin/env -i HOME=/usr/ports/databases/mysql84-server/work  PWD="${PWD}"  __M
48311  0  S+       0:00.65 /usr/bin/make -f Makefile -j4 all
48314  0  IW       0:00.00 sh -ev
48316  0  S        0:01.02 /usr/bin/make -f CMakeFiles/Makefile2 all
50864  0  IW       0:00.00 sh -ev
50868  0  S        0:00.19 /usr/bin/make -f strings/CMakeFiles/strings_objlib.dir/build.make strings/CMakeFiles/strings_objlib.dir/build
50964  0  IW       0:00.00 sh -ev
50966  0  R      349:40.18 /usr/bin/c++ -DDISABLE_MYSQL_THREAD_H -DHAVE_CONFIG_H -DLZ4_DISABLE_DEPRECATE_WARNINGS -D_USE_MATH_DEFINES -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
51170  0  IW       0:00.00 sh -ev
51172  0  R      345:33.34 /usr/bin/c++ -DDISABLE_MYSQL_THREAD_H -DHAVE_CONFIG_H -DLZ4_DISABLE_DEPRECATE_WARNINGS -D_USE_MATH_DEFINES -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
 
Back
Top