http://94.180.119.80/new1/bsdtar.coreCan you share the coredump of any failing program? From that we'd see what the actual instruction it hit (and where).
On another machine disassemble gives nothing. I am not too familiar with the tool.For program X failing
gdb `which X` X.core
disassemble
This should give you the instruction which failed. If gdb does work, or lldb. If those fail you might need to take them from the base distribution.
$ gdb -core /mnt/yury/git-remote-https.core
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
[New LWP 100217]
Core was generated by `/usr/local/libexec/git-core/git-remote-https origin https://git.FreeBSD.org/src.'.
Program terminated with signal SIGILL, Illegal instruction.
Privileged opcode.
#0 0x000000082d38b3ec in ?? ()
(gdb) disassemble
No function contains program counter for selected frame.
(gdb) q
O.K. then I'll try it next time I'll get to the machine in question, may be half a day later.You need the binary which threw the core dump. Code space is not part of the core dump, but you get the program counter. The code comes from the binary, it HAS to be the same
x/3i $pc to see what instructions was at $pc (0x000000082d38b3ec) at the time of crash. Edit note: but of course, binary is needed , so gdb /path/to/binary /path/to/core to execute it.Priviledged opcode in git???
Errrr what?This smells surprisingly close to the recent issue with C++ programs which was caused by assertions in LLVM being implemented as inserting these opcodes in the error case.
Previously, even here on forums, I helped people to debug certain ports which were segfaulting due to LLVM bug (that was on 32b systems, ebx was trashed causing issues). Incorrect jump in code could land you in the middle of the instructions resulting in invalid opcode (instruction decoded is something totally different).Priviledged opcode in git???
Errrr what?
Last time I heard that, it was 30 years ago. How did that happen?
CMP Ra,RB
Beq .+2
CMP Ra,#32
Script started on Tue Jul 2 11:50:25 2024
Command: gdb /usr/bin/bsdtar
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.2".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/bsdtar...
(No debugging symbols found in /usr/bin/bsdtar)
(gdb) r --version
Starting program: /usr/bin/bsdtar --version
Program received signal SIGILL, Illegal instruction.
Privileged opcode.
0x0000000801a8c3ec in ?? () from /lib/libcrypto.so.30
(gdb) where
#0 0x0000000801a8c3ec in ?? () from /lib/libcrypto.so.30
#1 0x000000080103e10c in ?? () from /libexec/ld-elf.so.1
#2 0x0000000000000000 in ?? ()
(gdb) disassemble
No function contains program counter for selected frame.
(gdb) x/4i $pc
=> 0x801a8c3ec: (bad)
0x801a8c3f2: insb (%dx),%es:(%rdi)
0x801a8c3f3: insb (%dx),%es:(%rdi)
0x801a8c3f4: jbe 0x801a8c463
(gdb) x/12tb $pc
0x801a8c3ec: 01100101 01100010 01110101 01100111 00000000 00101110 01101100 01101100
0x801a8c3f4: 01110110 01101101 01011111 01100001
(gdb) c
Continuing.
Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.
(gdb) q
Command exit status: 0
Script done on Tue Jul 2 11:51:04 2024
Script started on Tue Jul 2 11:51:52 2024
Command: gdb openssl
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.2".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from openssl...
(No debugging symbols found in openssl)
(gdb) r -version
Starting program: /usr/bin/openssl version
Program received signal SIGILL, Illegal instruction.
Privileged opcode.
0x00000008015f83ec in ?? () from /lib/libcrypto.so.30
(gdb) where
#0 0x00000008015f83ec in ?? () from /lib/libcrypto.so.30
#1 0x000000080111010c in ?? () from /libexec/ld-elf.so.1
#2 0x0000000000000000 in ?? ()
(gdb) disassemble
No function contains program counter for selected frame.
(gdb) x/4i $pc
=> 0x8015f83ec: (bad)
0x8015f83f2: insb (%dx),%es:(%rdi)
0x8015f83f3: insb (%dx),%es:(%rdi)
0x8015f83f4: jbe 0x8015f8463
(gdb) x/12tb $pc
0x8015f83ec: 01100101 01100010 01110101 01100111 00000000 00101110 01101100 01101100
0x8015f83f4: 01110110 01101101 01011111 01100001
(gdb) c
Continuing.
Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.
(gdb) q
Command exit status: 0
Script done on Tue Jul 2 11:52:25 2024
Script started on Tue Jul 2 11:52:54 2024
Command: gdb /usr/bin/c++
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.2".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/c++...
(No debugging symbols found in /usr/bin/c++)
(gdb) r -version
Starting program: /usr/bin/c++ -version
Program received signal SIGSEGV, Segmentation fault.
Address not mapped to object.
0x0000000002251971 in ?? ()
(gdb) where
#0 0x0000000002251971 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) disassemble
No function contains program counter for selected frame.
(gdb) x/4i $pc
=> 0x2251971: testb $0x0,(%rax)
0x2251974: insl (%dx),%es:(%rdi)
0x2251975: ds add %al,(%rax)
0x2251978: add $0x0,%al
(gdb) x/12tb $pc
0x2251971: 11110110 00000000 00000000 01101101 00111110 00000000 00000000 00000100
0x2251979: 00000000 00000000 00000000 00000010
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) q
Command exit status: 0
Script done on Tue Jul 2 11:53:29 2024
Script started on Tue Jul 2 11:55:13 2024
Command: gdb /usr/local/libexec/git-core/git-remote-https
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.2".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/libexec/git-core/git-remote-https...
(No debugging symbols found in /usr/local/libexec/git-core/git-remote-https)
(gdb) r origin https://git.FreeBSDF.orgsrc/src
Starting program: /usr/local/libexec/git-core/git-remote-https origin https://git.FreeBSD.org/src
Program received signal SIGILL, Illegal instruction.
Privileged opcode.
0x00000008013933ec in ?? () from /lib/libcrypto.so.30
(gdb) whre
Undefined command: "whre". Try "help".
(gdb) where
#0 0x00000008013933ec in ?? () from /lib/libcrypto.so.30
#1 0x000000080042110c in ?? () from /libexec/ld-elf.so.1
#2 0x00007fffffffd9c0 in ?? ()
#3 0x000000080042805f in ?? () from /libexec/ld-elf.so.1
#4 0x00000000002017a8 in ?? ()
#5 0x00000008013fe9b8 in ?? () from /usr/lib/libprivateheimipcc.so.11
#6 0x00007fffffffd9d8 in ?? ()
#7 0x00007fffffffda50 in ?? ()
#8 0x00000008013fe7f4 in ?? () from /usr/lib/libprivateheimipcc.so.11
#9 0x00000008013fe7f4 in ?? () from /usr/lib/libprivateheimipcc.so.11
#10 0x0000000800445c08 in ?? ()
#11 0x0000000000000003 in ?? ()
#12 0x00007fffffffda40 in ?? ()
#13 0x00000008004234ab in ?? () from /libexec/ld-elf.so.1
#14 0x0000000000000000 in ?? ()
(gdb) disassemble
No function contains program counter for selected frame.
(gdb) x/4i $pc
=> 0x8013933ec: (bad)
0x8013933f2: insb (%dx),%es:(%rdi)
0x8013933f3: insb (%dx),%es:(%rdi)
0x8013933f4: jbe 0x801393463
(gdb) x/10tx
0x8013933f6: 0x6464615f 0x67697372 0x45504f00 0x4c53534e
0x801393406: 0x5f6b735f 0x756c6176 0x504f0065 0x53534e45
0x801393416: 0x6b735f4c 0x6c65645f
(gdb) c
Continuing.
Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.
(gdb) q
Command exit status: 0
Script done on Tue Jul 2 11:56:13 2024
Well, nativeJust untar the userland tarfile base.tar.gz.
tarbsd is not working, so if transferring from another machine, I could lose permissions and ownership, can I? Via FAT flash or from DVD1 (it is unpacked there, I suppose they are usr, sbin, lib, bin, libexec directories?)Well, nativetarbsdis not working, so if transferring from another machine, I could lose permissions and ownership, can I? Via FAT flash or from DVD1 (it is unpacked there, I suppose they are usr, sbin, lib, bin, libexec directories?)
Well, from live CD it should. Since if booted in the corrupted system, bsdtar from other media still uses unworking system libs.The bsdtar from the CD or USB stick should work, no?
bsdtar, its .core file and system libs it is linked toFrom the output you shared we see it's not the unsupported instruction causing SIGILL, most likely it's incorrect jump. Most of the examples you showed had issue in libcrypt, though c++ one jumped to incorrect code "on its own". I'm assuming straight from ld.
To know why we'd need binary and core dump of it. bsdtar would be a good example. Simpler command the better, especially if you don't have debug symbols in.
When it comes to system itself not sure what your goal is. If you want to rescue the system I'd boot from older media and recover the data.
To test the 14.x FreeBSD on this CPU I'd live boot it from usb to see if it behaves the same.
# readelf -e libcrypto.so.30 |grep -w init
02 .text .init .fini .plt
[15] .init PROGBITS 00000000004043ec 004033ec
# gdb -q libcrypto.so.30 -ex 'x/3i 0x4043ec' -ex q
Reading symbols from libcrypto.so.30...
(No debugging symbols found in libcrypto.so.30)
0x4043ec: (bad)
0x4043f2: ins BYTE PTR es:[rdi],dx
0x4043f3: ins BYTE PTR es:[rdi],dx
#
Well, I've cloned new source tree, have removed object files, have substituted base file by untaring base.txz from 14.1 release.Thanks for sharing that. It does confirm what I mentioned above: this illegal opcode is not caused by compiler using unsupported instruction. In the example you shared libcrypto is the problematic library; i.e. if I use all of your libraries provided but libcrypto I'm able to execute your bsdtar just fine.
The problem occurs early in the init phase of the elf binary (rtld init is screwed up). Why ? I don't know. It's a bit strange why would this happen just to some of the libraries during compilation.
When I check:
Code:# readelf -e libcrypto.so.30 |grep -w init 02 .text .init .fini .plt [15] .init PROGBITS 00000000004043ec 004033ec # gdb -q libcrypto.so.30 -ex 'x/3i 0x4043ec' -ex q Reading symbols from libcrypto.so.30... (No debugging symbols found in libcrypto.so.30) 0x4043ec: (bad) 0x4043f2: ins BYTE PTR es:[rdi],dx 0x4043f3: ins BYTE PTR es:[rdi],dx #
It might be worth purging /usr/src, sourcing a fresh tree and recompiling again. If you hit a problem again it might be worth looking deeper. If not it could have, maybe, happened as the tree was not 100% in consistent state (speculation).
make buildworld it'd ended with this error:--- all_subdir_lib/clang ---
--- llvm/Frontend/OpenMP/OMP.h.inc ---
llvm-tblgen --gen-directive-decl -I /usr/src/contrib/llvm-project/llvm/include -d llvm/Frontend/OpenMP/OMP.h.inc.d -o llvm/Frontend/OpenMP/OMP.h.inc /usr/src/contrib/llvm-project/llvm/include/llvm/Frontend/OpenMP/OMP.td
Illegal instruction (core dumped)
*** [llvm/Frontend/OpenMP/OMP.h.inc] Error code 132
make[6]: stopped in /usr/src/lib/clang/libllvm
1 error
make[6]: stopped in /usr/src/lib/clang/libllvm
make[5]: stopped in /usr/src/lib/clang
make[4]: stopped in /usr/src/lib
--- all_subdir_lib/libbluetooth ---
make[4]: stopped in /usr/src/lib
make[3]: stopped in /usr/src
make[2]: stopped in /usr/src
2091.16 real 3715.31 user 353.27 sys
make[1]: stopped in /usr/src
make: stopped in /usr/src
Command exit status: 2
Script done on Thu Jul 4 20:32:24 2024
gdb says it doesn't understand this llvm-tblgen file. So, I had to use lldb, and now it gave me SIGILL as expected.Script started on Fri Jul 5 13:35:54 2024
Command: lldb /usr/obj/usr/src/amd64.amd64/tmp/legacy/bin/llvm-tblgen --gen-directive-decl -I /usr/src/contrib/llvm-project/llvm/include -d llvm/Frontend/OpenMP/OMP.h.inc.d -o llvm/Frontend/OpenMP/OMP.h.inc /usr/src/contrib/llvm-project/llvm/include/llvm/Frontend/OpenMP/OMP.td
(lldb) target create "/usr/obj/usr/src/amd64.amd64/tmp/legacy/bin/llvm-tblgen"
Current executable set to '/usr/obj/usr/src/amd64.amd64/tmp/legacy/bin/llvm-tblgen' (x86_64).
(lldb) settings set -- target.run-args " --gen-directive-decl -I /usr/src/contrib/llvm-project/llvm/include -d llvm/Frontend/OpenMP/OMP.h.inc.d -o llvm/Frontend/OpenMP/OMP.h.inc /usr/src/contrib/llvm-project/llvm/include/llvm/Frontend/OpenMP/OMP.td"
(lldb) r
Process 24578 launched: '/usr/obj/usr/src/amd64.amd64/tmp/legacy/bin/llvm-tblgen' (x86_64)
Process 24578 stopped
* thread #1, name = 'llvm-tblgen', stop reason = signal SIGILL: privileged opcode
frame #0: 0x000000000065e632 llvm-tblgen`___lldb_unnamed_symbol0 + 3695410
llvm-tblgen`___lldb_unnamed_symbol0:
-> 0x65e632 <+3695410>: vdivsd -0x35400000(%rip), %xmm8, %xmm15
0x65e63a <+3695418>: addb (%rax), %al
0x65e63d <+3695421>: addb %al, (%rax)
0x65e63f <+3695423>: addb %bl, (%rdx)
(lldb) x/10x $pc
0x0065e632: 0x3d5e3fc5 0xcac00000 0x00000248 0x001a0000
0x0065e642: 0x00000000 0xe1d80000 0x3d5e3fc5 0xe1d80000
0x0065e652: 0x3d5e3fc5 0xcac00000
(lldb) c
Process 24578 resuming
Process 24578 exited with status = 4 (0x00000004)
(lldb) q
Command exit status: 0
Script done on Fri Jul 5 13:36:29 2024
vdivsd is an AVX instruction, but my processor is only SSSE3! Nowhere I see ARCHLEVEL set, now I'll try to set it explicitly to baseline.Do you happen to haveAnd really,vdivsdis an AVX instruction, but my processor is only SSSE3! Nowhere I see ARCHLEVEL set, now I'll try to set it explicitly to baseline.
CPUTYPE or something else specifying CPU instruction sets in, for example, /etc/make.conf and/or /etc/src.conf? If something missingly specifies CPU exists, mis-matched ARCHLEVEL could be auto-detected and used, I doubt.vdivsd and indeed gdb shows you the first instuction being that the very little output after that instruction doesn't make much sense. Could it be suffering from the same issue you saw with libcrypt? Possibly.schg set /chflags(1) ?And if /usr/obj/ (default place) is on ZFS, create snapshot of it and then delete its whole contents for clean builds would be helpful, and if needed again, you can roll back to the snapshot. Of course, clearly a mess for disk space, though.I would start without custom make.conf, i.e. without CPUTYPE optimization.
Your post is a bit messy (color escape sequences); pay attention to the disassembly output. It's bogus. While your CPU may not supportvdivsdand indeed gdb shows you the first instuction being that the very little output after that instruction doesn't make much sense. Could it be suffering from the same issue you saw with libcrypt? Possibly.
I suggest you start with the proper fresh install. While unpacking base may have worked, did you check all libs were overwriten by tar including ones withschgset /chflags(1) ?
Don't use any optimization, start with empty (and/or non-existent) make and src.conf.
That last attempt was without any CPUTYPE set. I even returned back debug files (had WITHOUT_DEBUG_FILES in src.conf previously).I would start without custom make.conf, i.e. without CPUTYPE optimization.
Your post is a bit messy (color escape sequences); pay attention to the disassembly output. It's bogus. While your CPU may not supportvdivsdand indeed gdb shows you the first instuction being that the very little output after that instruction doesn't make much sense. Could it be suffering from the same issue you saw with libcrypt? Possibly.
I suggest you start with the proper fresh install. While unpacking base may have worked, did you check all libs were overwriten by tar including ones withschgset /chflags(1) ?
Don't use any optimization, start with empty (and/or non-existent) make and src.conf.
gdb from ports couldn't understand that file, so I had to use lldb. Yes, wrong jump/ entry point is a possibility, not sure how to investigate it.schg flag (when booted from live CD).freebsd-update and try to reproduce these steps again. If it fails you have a good base for opening a PR.