Solved Portupgrade Building Rust stopped with signal: 11 (SIGSEGV) (core dumped)

Thanks. It seems the dump doesn't have .text segments in, it's not possible to get the information on what the faulty (priviledged) instruction was.
Out of curiosity I'll try to replicate on my old 13.x VM.
 
It would probably help to call gdb with the executable as the first argument.
Could be. It's not necessary to have the actual binary when analyzing dump (.text can be included in dump) but it wouldn't hurt to test with it too.

Rob215x please could try it to execute gdb with gdb /usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/bin/rustc /path/to/core?
And then do the gdb commands. It could be that text segment is not included in core dump.
 
So, backup machine has 64G RAM and I have physical access to it.
Live machine has 32G RAM and I do not have physical access to it.

Both are set up mainly to serve web pages. Apache, PHP, and mySQL are customized so I build those from ports. And so I still use portupgrade to update everything.

Idk if that background helps any or not. Like I said, I've never used gdb before so I installed it for this thread and I'm just copy/pasting your suggestions.
 
Could be. It's not necessary to have the actual binary when analyzing dump (.text can be included in dump) but it wouldn't hurt to test with it too.

Rob215x please could try it to execute gdb with gdb /usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/bin/rustc /path/to/core?
And then do the gdb commands. It could be that text segment is not included in core dump.

UPDATE: I got something new...

# gdb /usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/bin/rustc rustc.core
GNU gdb (GDB) 15.1 [GDB v15.1 for FreeBSD]
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.5".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/bin/rustc...
[New LWP 158801]
[New LWP 117833]
[New LWP 158414]
[New LWP 158415]
[New LWP 158417]
[New LWP 158521]
[New LWP 158522]
[New LWP 158802]

warning: Could not load shared library symbols for [vdso].
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `/usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/bin/rustc --crate'.
Program terminated with signal SIGILL, Illegal instruction.
Privileged opcode.
#0 0x00003abd41e970b8 in llvm::SelectionDAG::getVTList(llvm::EVT) ()
from /usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/lib/librustc_driver-44d6690783209a73.so
[Current thread is 1 (LWP 158801)]

 
Idk if that background helps any or not. Like I said, I've never used gdb before
You are using it correctly; yes, it's better to use binary (first argument) if available, I didn't pick on it. cracauer@ is helping you make it work, I was just curious to see why it failed.
So far I was not able to reproduce it; got few different errors during buid but not this (I'm not a rust user).

When using gdb generally you want to use two arguments: first the binary (executable that failed) and core dump. You are missing actual core dump in your second argument. I specified it as /path/to/corefile because I didn't know where your rustc.core was.
 
You are using it correctly; yes, it's better to use binary (first argument) if available, I didn't pick on it. cracauer@ is helping you make it work, I was just curious to see why it failed.
So far I was not able to reproduce it; got few different errors during buid but not this (I'm not a rust user).

When using gdb generally you want to use two arguments: first the binary (executable that failed) and core dump. You are missing actual core dump in your second argument. I specified it as /path/to/corefile because I didn't know where your rustc.core was.
see my updated reply. I just cd into the directory and put rustc.core. I think there's some new info I haven't seen?
 
_martin I tried the command you suggested earlier...

Code:
(gdb) disass $pc-0x30, $pc+0x30
Dump of assembler code from 0x3abd41e97088 to 0x3abd41e970e8:
   0x00003abd41e97088 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+248>:    mov    0x49000000(%rax),%eax
   0x00003abd41e9708e <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+254>:    mov    (%rsi),%esi
   0x00003abd41e97090 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+256>:    mov    0x88(%rbx),%rdi
   0x00003abd41e97097 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+263>:    call   0x3abd3fefd250 <_ZNSt3__127__tree_balance_after_insertB8sn190107IPNS_16__tree_node_baseIPvEEEEvT_S5_>
   0x00003abd41e9709c <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+268>:    incq   0x90(%rbx)
   0x00003abd41e970a3 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+275>:    mov    %r15,%rax
   0x00003abd41e970a6 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+278>:    add    $0x20,%rax
   0x00003abd41e970aa <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+282>:    mov    $0x1,%edx
   0x00003abd41e970af <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+287>:    pop    %rbx
   0x00003abd41e970b0 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+288>:    pop    %r12
   0x00003abd41e970b2 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+290>:    pop    %r14
   0x00003abd41e970b4 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+292>:    pop    %r15
   0x00003abd41e970b6 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+294>:    pop    %rbp
   0x00003abd41e970b7 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+295>:    ret
=> 0x00003abd41e970b8 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+296>:    ud2
   0x00003abd41e970ba <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+298>:    lea    0x5317467(%rip),%rdi        # 0x3abd471ae528 <_ZGVZN4llvm6SDNode16getValueTypeListENS_3MVTEE13SimpleVTArray>
   0x00003abd41e970c1 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+305>:    mov    %esi,%ebx
   0x00003abd41e970c3 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+307>:    call   0x3abd46c4a380 <__cxa_guard_acquire@plt>
   0x00003abd41e970c8 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+312>:    mov    %ebx,%esi
   0x00003abd41e970ca <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+314>:    test   %eax,%eax
   0x00003abd41e970cc <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+316>:    je     0x3abd41e96faf <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+31>
   0x00003abd41e970d2 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+322>:    call   0x3abd41ed41c0 <_ZN12_GLOBAL__N_18EVTArrayC2Ev>
   0x00003abd41e970d7 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+327>:    lea    0x3d292(%rip),%rdi        # 0x3abd41ed4370 <_ZN12_GLOBAL__N_18EVTArrayD2Ev>
   0x00003abd41e970de <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+334>:    lea    0x531742b(%rip),%rsi        # 0x3abd471ae510 <_ZZN4llvm6SDNode16getValueTypeListENS_3MVTEE13SimpleVTArray>
   0x00003abd41e970e5 <_ZN4llvm12SelectionDAG9getVTListENS_3EVTE+341>:    lea    0x52aef24(%rip),%rdx        # 0x3abd47146010 <__dso_handle>
End of assembler dump.
 
Yop, this is better. The update you did above showed on which function it failed ( llvm::SelectionDAG::getVTListp ). If this jump was on purpose or not (e.g. did it deliberately jump here to ud2 instruction), that's another story.
That does satisty my curiosity I was after, thanks.
I was trying to replicate this on my VM but I'm not able to hit that (was dealing with other issues with the compilation that I didn't resolve yet).
 
I was not able to compile rust on my old 13.x VM; decided to use fresh 13.5 VM instead.

While rust is being compiled (I threw i7-10700K 12cores/32GB mem at it, it will take some time) I had a look on your Q again.
Here you listed few core files. I'd say it's safe to ignore those in test/API/ directory; those core files are result of a test.

But you do have two core files which are due to the failed compilaton:
Code:
/usr/ports/lang/rust/work/rustc-1.91.1-src/cargo.core
/usr/ports/lang/rust/work/rustc-1.91.1-src/rustc.core

It's hard to determine what caused the crash from your initial post. The cargo build command most likely executed few commands. It could be that one ended in sigsegv and the other in sigill. Can you execute file, e.g.
Code:
file /usr/ports/lang/rust/work/rustc-1.91.1-src/cargo.core
on both of them? It should tell you what command was it generated from.

Looking at the github I saw few posts where it was discussed llvm using ud2 as a way of marking unreachable code for various reasons. One of that reason was to handle OOM (out of memory).

edit: I was able to built it on fresh 13.5 without a problem.
 
Thanks for looking into this!


# file /usr/ports/lang/rust/work/rustc-1.91.1-src/cargo.core
/usr/ports/lang/rust/work/rustc-1.91.1-src/cargo.core: ELF 64-bit LSB core file, x86-64, version 1 (FreeBSD), FreeBSD-style, from '/usr/ports/lang/rust/work/bootstrap/bin/cargo build --target x86_64-unknown-free', pid=4918

# file /usr/ports/lang/rust/work/rustc-1.91.1-src/rustc.core
/usr/ports/lang/rust/work/rustc-1.91.1-src/rustc.core: ELF 64-bit LSB core file, x86-64, version 1 (FreeBSD), FreeBSD-style, from '/usr/ports/lang/rust/work/_build/x86_64-unknown-freebsd/stage1/bin/rustc --crate', pid=75997
 
random inexplicable segfaults building large programs can be caused by bad RAM. have you done a full overnight memtest?
I have not. I've been using this machine since July 2021. Obviously memory can go bad but isn't that kind of rare?

The next question is... wouldn't I have to kill all the applications to run memtest?
 
I have not. I've been using this machine since July 2021. Obviously memory can go bad but isn't that kind of rare?

The next question is... wouldn't I have to kill all the applications to run memtest?

memtest86+ is something you boot into, so yes everything is down.

But your problem is reproducible, so it is unlikely to be a problem with RAM or CPU.

Bad memory isn't that rare, it is just that when you don't have ECC memory you can't tell.
 
In ancient DOS era, memory errors were almost reliably reproducible,
as it was "single task, low amount of memory".
Doing exactly same thing usually caused the same error, as specific memory cell was almost always used for specific data/code to be stored.
So the code reaches to a specific point, code / data was stored into broken memory cell, causing exactly the same error.

But FreeBSD is a multi-user, multi-task "OS". Specific broken memory cell is "basically" not always used for specific code / data, especially when some kind of ASLR is in use. But if you enabled detailed memory tests on UEFI / legacy BIOS boot time, broken memory cells would be (if you're enough lucky or the mis-behaviors of the broken cell is "stable") found there.

But the detailed memory test on UEFI firmware / legacy BIOS are usually "single path", so cannot detect instable errors, thus, something like memtest, which repeatedly test throuout all installed memories is needed.

For pollutions of electronic contacts exposed open-air, it could cause random, hard to reliably reproducible but frequent errors. In these cases,
cleaning (including repeated pull-off and push-in) of contacts (not limited with DIMMS) could help. Some (limited) good user's manuals describe how-to, avoiding breakages.
 
if you don't have ECC RAM we'd definitely run a memtest just to rule it out, but also we had a linux machine in early 200x that would reliably segfault while building gcc3 but not gcc 2.95, and it was due to bad RAM, we're obviously a little biased.
 
I agree with you in general T-Aoki but this:
especially when some kind of ASLR
It doesn't matter if ASLR is in place or not. ASLR is just virtual address randomization. You can map the same pfn into different vaddr.

While HW issue is always a possibility I don't think this is a case.

Rob215x I can't reproduce the issue you're having. Doing remote debugging like this is hard and unfruitful. On fresh VM I had no issues. If you need rust it might be worth a thought if redeployment is an option.
 
It doesn't matter if ASLR is in place or not. ASLR is just virtual address randomization. You can map the same pfn into different vaddr.
Yes, just a possibility, but depending on how much dummy space is added on start address for randomization, extra one (possibly more? depending on implementation) pages would be needed additionally. Increasing memory pages could increase possibility of faulted memory cell is allocated.
 
, extra one (possibly more? depending on implementation) pages would be needed additionally.
I don't agree. vaddr will be different, yes. That means different pagedir entry (index) will be used (page translation will land into different entry). But it will be the same amount of physical memory and it will still be the same pfn, i.e. same physical address.
 
I still don't favor the bad RAM theory. Certainly not enough to take down a production machine for 12 hours.

In my opinion the best next step is testing this /usr/local and /var/db/pkg on the backup machine that currently works and see whether the same tree is broken over there, too.
 
Well bad hardware (including bad RAM) could've caused corruption in a library needed by rustc. The error would be reproducible if that's the case.

I agree with cracauer@ that the likely culprit is a corrupt file. Have you tried cleaning out the work directory between builds?
 
In my opinion the best next step is testing this /usr/local and /var/db/pkg on the backup machine that currently works and see whether the same tree is broken over there, too.
Another place to compare between affected and sane computers would be /usr/local/lib/compat/pkg. If there are more ones in affected computer, moving the additional ones into different place that libraries are NOT looked for and try again.
 
Well bad hardware (including bad RAM) could've caused corruption in a library needed by rustc. The error would be reproducible if that's the case.

I agree with cracauer@ that the likely culprit is a corrupt file. Have you tried cleaning out the work directory between builds?
I have not cleared the work directory between builds. How would I do that?

Also, at this point, I tend to agree with the last reply from cracauer@ I just need to schedule some down time for the server.

I don't host many sites on this server but I do have my own websites, including poestories.com which is probably my best contribution to the world. Its a free site about American author Edgar Allan Poe and it is used by schools, teachers, and students worldwide. It was one of the first sites I built, in 2005. This Wednesday, Nov 19, 2025, the site had 35,000 unique visitors.
 
Back
Top