Bash is broken, after upgrade to 5.2_3

FreeBSD 12.3, very up-to-date at RELEASE-p6. Two different machines, one i386, the other amd64. Did a package update today, which upgraded the bash package from 5.1_16 to 5.2_3. Problem is that the newly installed 5.2_3 version of bash doesn't work, immediately exits with "segmentation fault". On both machines, that was relatively easy to diagnose, by logging in on the toor account (which uses /bin/csh as a shell). On one machine, fixing this was relatively easy, since the older version of bash was still in /var/cache/pkg, so I could uninstall the current version and downgrade. On the second machine (where I'm a little disk-space constrained), I had unfortunately done "pkg clean", so downgrade is not possible; on that machine I have temporarily replaced all uses of /usr/local/bin/bash with /bin/sh, which at least allows me to log in and do productive work.

Won't have time to debug the root cause today (super busy), and the rest of the week is not looking good. So if you use bash, just be careful with package upgrades.
 
I've debugged it for somebody here, please try LC_ALL=C bash - if that works for you.

It was for this scenario here (static bash which had a failure in chroot); I suspect though it might be similar issue.
 
Thank you, might try to test that tonight, assuming we get home from orchestra rehearsal before my brain is completely used up for the day.
 
Had a short break, while colleagues took a snack break. Your suggestion works. You don't even need it in the invocation of bash (which after all is automated and comes from login); it's sufficient to add LC_ALL=C as the first line of .bashrc. That means the problem is some line in my .bashrc file triggers (tickles? titilates?) the bug in bash. This evening (again under brain survival assumptions) I will do some careful debugging of what exact line in .bashrc causes the problem, and then open a PR and assign it to the maintainer of the bash port.

Thanks for the hint!
 
Yeah, LC_ALL=C bash syntax was just to do a PoC (proof of concept) that it is what I thought. There might be people though who do use LC* vars and this is not a solution for them.

You don't have a problem in your .bashrc though, you maybe using some sort of locale (even US utf-8 is considered locale). locale command will tell you so. But this is a bug in distributed bash under FreeBSD. You are the first one to run to the actual trouble AFAIK.

Up in the thread I mentioned above I was just trying to demonstrate that you can run bash-static without anything. It turned out bash-static compiled from ports is not the same as bash-static from binary packages (latter actually drags some lib dependency with it which is a no-no).

I consider this a port bug. I did report several bugs this month, few on FreeBSD, and got 0 feedback. I'm starting to lose interest in reporting anything.
 
If you get some free time and/or brain is still capable :) easier test than going through your bash profile would be to create a new user with bash login and see if it crashes or not.

For comparison my tests:
Code:
# LC_ALL=en_US.ISO8859-1 bash
[root@fbsd12 ~]#^D

# LC_ALL=en_AU.UTF-8 bash
[root@fbsd12 ~]#

# LC_ALL=en_AU bash
bash: warning: setlocale: LC_ALL: cannot change locale (en_AU): No such file or directory
Segmentation fault (core dumped)
#

# cksum /usr/local/bin/bash
194901059 1249368 /usr/local/bin/bash
#

# gdb -q `which bash` ./bash.core
Reading symbols from /usr/local/bin/bash...
(No debugging symbols found in /usr/local/bin/bash)
[New LWP 100123]
Core was generated by `bash'.
Program terminated with signal SIGSEGV, Segmentation fault.
Sent by kill() from pid 987 and user 0.
#0  kill () at kill.S:3
3    kill.S: No such file or directory.
(gdb) bt
#0  kill () at kill.S:3
#1  0x00000000002bae0d in termsig_handler ()
#2  0x00000000002ba8d3 in termsig_sighandler ()
#3  <signal handler called>
#4  strlen (str=0x0) at /usr/src/lib/libc/string/strlen.c:101
#5  0x0000000000320b09 in _rl_init_locale ()
#6  0x0000000000320b46 in _rl_init_eightbit ()
#7  0x00000000002fedf0 in rl_initialize ()
#8  0x00000000002c5d77 in initialize_readline ()
#9  0x000000000026a415 in ?? ()
#10 0x0000000000270a76 in ?? ()
#11 0x000000000026e0f4 in ?? ()
#12 0x00000000002687be in yyparse ()
#13 0x00000000002684ac in parse_command ()
#14 0x00000000002681c7 in read_command ()
#15 0x000000000026800e in reader_loop ()
#16 0x00000000002674ae in main ()
(gdb) f 5
#5  0x0000000000320b09 in _rl_init_locale ()
(gdb) disass $pc-0x10, $pc+0x10
Dump of assembler code from 0x320af9 to 0x320b19:
   0x0000000000320af9 <_rl_init_locale+169>:    mov    dh,0xc6
   0x0000000000320afb <_rl_init_locale+171>:    mov    DWORD PTR [rip+0x16853],eax        # 0x337354 <_rl_utf8locale>
   0x0000000000320b01 <_rl_init_locale+177>:    mov    rdi,rbx
   0x0000000000320b04 <_rl_init_locale+180>:    call   0x3289d0 <strlen@plt>
=> 0x0000000000320b09 <_rl_init_locale+185>:    lea    rdi,[rax+0x1]
   0x0000000000320b0d <_rl_init_locale+189>:    call   0x2d4120 <xmalloc>
   0x0000000000320b12 <_rl_init_locale+194>:    mov    rdi,rax
   0x0000000000320b15 <_rl_init_locale+197>:    mov    rsi,rbx
   0x0000000000320b18 <_rl_init_locale+200>:    call   0x3289e0 <strcpy@plt>
End of assembler dump.
(gdb) i r $rdi
rdi            0x0                 0
(gdb) disass _rl_init_locale
..
...
   0x0000000000320ab0 <+96>:    call   0x329180 <setlocale@plt>
   0x0000000000320ab5 <+101>:    mov    rbx,rax
   0x0000000000320ab8 <+104>:    test   rax,rax
   0x0000000000320abb <+107>:    je     0x320adf <_rl_init_locale+143>

   0x0000000000320adf <+143>:    xor    r14d,r14d
   0x0000000000320ae2 <+146>:    jmp    0x320af7 <_rl_init_locale+167>
..
   0x0000000000320af7 <+167>:    movzx  eax,r14b
   0x0000000000320afb <+171>:    mov    DWORD PTR [rip+0x16853],eax        # 0x337354 <_rl_utf8locale>
   0x0000000000320b01 <+177>:    mov    rdi,rbx
   0x0000000000320b04 <+180>:    call   0x3289d0 <strlen@plt>
So crash happens in rl_init_locale, %rdi is 0, hence segfault (strlen(NULL)). setlocale() comes from libc and can return NULL which rl_init_locale() needs to take care of. This function comes from readline lib. Ports mark bash as broken if you use ports_readline which is most likely the reason why.

To see this from truss perspective
Code:
# export LC_ALL=en_AU
# truss -f bash
 1003: open("/usr/share/locale/en_AU/LC_CTYPE",O_RDONLY,013720646057) ERR#2 'No such file or directory'
 1003: SIGNAL 11 (SIGSEGV) code=SEGV_MAPERR trapno=12 addr=0x0

# ls -la /usr/share/locale/en_AU/LC_CTYPE
ls: /usr/share/locale/en_AU/LC_CTYPE: No such file or directory
#
Is very likely a problem (and was not in bash 5.1).

I never used locales. I'm not a good person to talk locales. But many programs fallback to C locale if there's a problem. It would be good start to compare what changed between 5.1 to 5.2 for this to happen. Also in my example above I was able to launch bash with en_AU.UTF-8 as it exists but plain en_AU doesn't.

And on 12.3 with bash 5.1 built from ports:
Code:
$ LC_ALL=en_AU bash
bash: warning: setlocale: LC_ALL: cannot change locale (en_AU): No such file or directory
$
 
Did a little more debugging. The problem happens if variable LC_CTYPE is not set or invalid. Setting LC_ALL also fixes the problem, since it overrides all other LC_... variables. From my experimentation, it seems that none of the other LC_... variables or LANG or LANGUAGE matter, as long as LC_CTYPE or LC_ALL are set.

PR opened at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267528

The workaround is simple: Have all bash users set LC_CTYPE=... (something valid, like C.UTF-8 or just C) as the first line of their .bashrc. Here "something valid" means the corresponding directory in /usr/share/locale exists and contains a LC_CTYPE file (with the special exception of C being valid too). And since the .bashrc files are easy to find (just look in all home directories), that's a relatively easy workaround. Setting it to C.UTF-8 is a safe choice for all languages, since CTYPE for all locales that use the UTF-8 encoding is actually a soft-link to the C.UTF-8 version.

One more little complication: When I say ".bashrc" above, I mean "the first script that bash will execute after logging in". If the system has a modified /etc/profile (with anything other than comments), or the user has a .profile or .bash_profile in their home directory, that is where the LC_CTYPE=... should go instead. I think those cases would be rare, but YMMV.

Again, thanks for the debugging help.
 
Back
Top