Sshd process completely hangs up

I repeatedly experience the following problem - the sshd process completely hangs up:
  • I can't connect to it (not from the same machine, not from remote):
    Code:
    # ssh 127.0.0.1
    <and nothing happends for a long time>
  • I can't kill or restart it:
    Code:
    # service sshd restart
    Stopping sshd.
    Wating for PIDS: 2698
    <and nothing happends for a long time>
About 1~5 hours after reboot SSH works normally, than hangs up again. If I have a working SSH connection during hangups, this connection continues to work for some time, but I can't establish a new connection.

My system: clean FreeBSD 9.1 AMD64 install CD1, no updates, I did not make changes to /etc/ssh/sshd_config or any other configurations, except /etc/rc.conf, where I configure a lagg0 interface (based on ibg0, ibg1) and receive Internet through the vlan3 interface along with other services on vlan4. Also I have a connection to my laptop via igb3 (see attachment).

Code:
sshd -v
OpenSSH_5.8p2_hpn13v11 FreeBSD-20110503, OpenSSL 0.9.8x 10 May 2012

I searched everywhere for a solution or suggestions, but nothing seems to work. Can you please spot a problem, or give any directions or suggestions?

Thank you!
 

Attachments

  • rc.conf.txt
    548 bytes · Views: 212
I fixed a mistype in the rc.conf file while I retyped it from the screen. Here is the updated one. Sorry.
 

Attachments

  • rc.conf.txt
    547 bytes · Views: 246
Thanks for your quick reply. netstat -nr is in the attachment.
 

Attachments

  • netstat-rn.txt
    3.4 KB · Views: 213
Here I upload output just after a reboot, the system is still working with it. I'll check the output of netstat -nr when sshd hangs again, there may be something dynamically added there.
 
I fixed the gateway in the fourth row, that must be 172.2.2.1.
 

Attachments

  • netstat-rn.txt
    3.4 KB · Views: 215
Could you just paste the outputs inside the CODE tags, it's not very convinient to download attachments.
 
No problem! I thought attachments would save some space :)

Here is my rc.conf:
Code:
hostname="TEST"

ifconfig_igb0=”media 100baseTX mediaopt full-duplex”
ifconfig_igb1=”media 100baseTX mediaopt full-duplex”
ifconfig_igb3="inet 10.0.0.1 netmask 255.255.255.0"
cloned_interfaces="lagg0 vlan1 vlan3 vlan4"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1"
ifconfig_vlan1="vlan 1 vlandev lagg0"
ifconfig_vlan3="inet 172.2.2.10 netmask 255.255.255.0 vlan 3 vlandev lagg0"
ifconfig_vlan4="inet 10.0.151.45 netmask 255.255.192.0 vlan 4 vlandev lagg0"
static_routes="default”
defaultrouter="172.2.2.1"

sshd_enable="YES"
And here my netstat -rn:
Code:
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            172.2.2.1          UGS         0       40  vlan3
10.0.0.0/30        link#4             U           0        0   igb3
10.0.0.1           link#4             UHS         0        0    lo0
10.0.192.0/18      link#22            U           0        0  vlan4
10.0.221.45        link#22            UHS         0        0    lo0
127.0.0.1          link#16            UH          0       64    lo0
172.2.2.0/24       link#20            U           0        0  vlan3
172.2.2.10         link#20            UHS         0        0    lo0

Internet6:
Destination                       Gateway                       Flags      Netif Expire
::/96                             ::1                           UGRS        lo0
::1                               link#16                       UH          lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%igb0/64                    link#1                        U          igb0
fe80::92e2:baff:fe47:eff4%igb0    link#1                        UHS         lo0
fe80::%igb1/64                    link#2                        U          igb1
fe80::92e2:baff:fe47:eff5%igb1    link#2                        UHS         lo0
fe80::%igb3/64                    link#4                        U          igb3
fe80::92e2:baff:fe47:eff7%igb3    link#4                        UHS         lo0
fe80::%lo0/64                     link#16                       U           lo0
fe80::1%lo0                       link#16                       UHS         lo0
fe80::%lagg0/64                   link#17                       U         lagg0
fe80::92e2:baff:fe47:eff4%lagg0   link#17                       UHS         lo0
fe80::%vlan3/64                   link#20                       U         vlan3
fe80::92e2:baff:fe47:eff4%vlan3   link#20                       UHS         lo0
fe80::%vlan4/64                   link#21                       U         vlan4
fe80::92e2:baff:fe47:eff4%vlan4   link#21                       UHS         lo0
ff01::%igb0/32                    fe80::92e2:baff:fe47:eff4%igb0 U          igb0
ff01::%igb1/32                    fe80::92e2:baff:fe47:eff5%igb1 U          igb1
ff01::%igb3/32                    fe80::92e2:baff:fe47:eff7%igb3 U          igb3
ff01::%lo0/32                     ::1                           U           lo0
ff01::%lagg0/32                   fe80::92e2:baff:fe47:eff4%lagg0 U         lagg0
ff01::%vlan3/32                   fe80::92e2:baff:fe47:eff4%vlan3 U       vlan3
ff01::%vlan4/32                   fe80::92e2:baff:fe47:eff4%vlan4 U       vlan4
ff02::/16                         ::1                           UGRS        lo0
ff02::%igb0/32                    fe80::92e2:baff:fe47:eff4%igb0 U          igb0
ff02::%igb1/32                    fe80::92e2:baff:fe47:eff5%igb1 U          igb1
ff02::%igb3/32                    fe80::92e2:baff:fe47:eff7%igb3 U          igb3
ff02::%lo0/32                     ::1                           U           lo0
ff02::%lagg0/32                   fe80::92e2:baff:fe47:eff4%lagg0 U         lagg0
ff02::%vlan3/32                   fe80::92e2:baff:fe47:eff4%vlan3 U        vlan3
ff02::%vlan4/32                   fe80::92e2:baff:fe47:eff4%vlan4 U        vlan4
I compared netstat -rn before and after SSH stopped working, they are the same. And I do not use IPv6, only IPv4.
 
Can you do this for us:

# cat /etc/nsswitch.conf

and:

# cat /etc/resolv.conf

and paste the outcome here? (using those same
Code:
 tags, good going!).

I have a hunch this might be DNS related.

Now; I'm going to check out those attachments in a moment; but in the mean time: what version of FreeBSD are you using? Also: are you using the SSH version as supplied by the base system or did you build one yourself using the Ports collection?

I have a hunch we'll get you through this one, intriguing issue.
 
Not edit (
devilgrin.gif
):

Figured I'd better not edit my previous page considering the risk we're all over this one :)

In addition to my previous questions:

# head -15 /etc/hosts.
 
@ShelLuser,

Thanks for the interest in my problem!
Here is my # cat /etc/nsswitch.conf
Code:
#
# nsswitch.conf(5) - name service switch configuration file
# $FreeBSD: release/9.1.0/etc/nsswitch.conf 224765 2011-08-10 20:52:02Z dougb $
#
group: compat
group_compat: nis
hosts: files dns
networks: files
passwd: compat
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
And here is my # cat /etc/resolv.conf
Code:

It's completly empty, but still a file with this name exists .

Maybe something DNS related, but I do not configure any domains for my machine, so I always try to refer to it by IP address. It is located at the datacenter, and listens on the private IP 172.2.2.10 for the internet while I can connect to it with outside IP 85.5.5.10 or from another machine, connected to it directly by 10.0.0.1 IP.

FreeBSD version 9.1, clean install from FreeBSD-9.1-RELEASE-amd64-memstick.img from USB key, prepared by win32diskimager-v0.9-binary. During installation I checked the option to install the sshd package, and I did not rebuild it or reconfigure it. So it is going with a clean install of the base system.

And here is my head -15 /etc/hosts
Code:
# $FreeBSD: release/9.1.0/etc/hosts 109997 2003-01-28 21:29:23Z dbaker $
#
# Host Database
#
# This file should contain the addresses and aliases for local hosts that
# share this file.  Replace 'my.domain' below with the domainname of your
# machine.
#
# In the presence of the domain name service or NIS, this file may
# not be consulted at all; see /etc/nsswitch.conf for the resolution order.
#
#
::1                     localhost localhost.my.domain
127.0.0.1               localhost localhost.my.domain
#
 
Last edited by a moderator:
sshd will by default try to lookup any remote host, see UseDNS in sshd_config(5).

IHMO you should at least add an entry to your /etc/hosts with your main IP address and your hostname, e.g.
Code:
10.0.0.1    TEST
If you want to have DNS resolving work for your machine (most of the time you want this), you have to add at least one DNS server to /etc/resolv.conf, this is probably your default gateway, or an DNS resolver of your ISP. You can also use an open DNS like Google's 8.8.8.8, e.g.
Code:
nameserver    172.2.2.1
 
One minor thing: what's with some of the quotes in your /etc/rc.conf?
  • Normal quotes show as ".
  • Some of yours show as ”.
  • On my machine, when I download your attachment, those latter ones show as â (an a with a caret/circumflex), while most quotes appear to be normal.
 
@glocke,

Thanks for the reply!

On my problematic machine in the Internet the only thing that is allowed by my ISP is incoming SSH traffic. So I can connect to my machine from anywhere in the world by SSH, but from this machine I can't do any DNS lookup (except /etc/hosts of course).

Maybe in this case it would be better to switch 'UseDNS' option in /etc/hosts to 'no'? Can it be the root of the problem?

On the other hand, I have a connection to my problematic machine via cable directly (it has 10.0.0.1 IP, me has I have 10.0.0.2 IP), so following you advice I make an addition into /etc/hosts on both machines as follows:
Code:
10.0.0.1                PROBLEMATIC
10.0.0.2                LAPTOP

BTW, this modification seems to change the behaviour of my problem slightly. When I issue:
Code:
ssh PROBLEMATIC
<connecting to 10.0.0.1, hangs for a while>
Password: <I entered password>
<hangs for a while>
Last login: Thu Sep 19 14:47:01 2013 from <my ip>
<hangs for a while>
Welcome to FreeBSD!
$

And when I issue:
Code:
ssh 85.5.5.10
<it hangs hor several minutes and than>
Password:

What can it be?
 
Last edited by a moderator:
@@fonz:

Thanks for pointing it out!

To be honest, I made some simplifications in rc.conf, and while editing it, posted some portions of text from different sources, that's why there are the encoding mismatches the in " symbol. Actually I have about five VLANs in my rc.conf, but in order to not waste the time of readers I cut it down to two important ones. Sorry about that, I understand that it is not the smartest thing to do in my case.
 
Last edited by a moderator:
strikki said:
Sorry about that, I understand that it is not the smartest thing to do in my case.
It's no big deal, I just pointed it out to make sure it's not part of the problem.
 
strikki said:
So I can connect to my machine from anywhere in the world by SSH, but from this machine I can't do any DNS lookup (except /etc/hosts of course).

Maybe in this case it would be better to switch 'UseDNS' option in /etc/hosts to 'no'? Can it be the root of the problem?
Taking everything into consideration I'm now quite convinced that your issues are DNS related. However, I'm not sure if the problem is strictly related to SSH and not the machine as a whole, but that's mostly because I've never been in a situation where a Unix machine wasn't able to do name lookups. One way to find out though...

@glocke already mentioned this; but disabling DNS lookups within sshd_config would be the best thing to do here.
 
Last edited by a moderator:
strikki said:
Maybe in this case it would be better to switch 'UseDNS' option in /etc/hosts to 'no'? Can it be the root of the problem?

/etc/hosts is not the right place to do this. You should use the power of rc and unleash the magic it provides by adding the UseDNS flag to the sshd_flags in /etc/rc.conf, e.g. (if you do not already have a sshd_flags directive in /etc/rc.conf):
Code:
sshd_flags="-o UseDNS=no"
Then issue a
Code:
service sshd restart
 
Thanks about the DNS tip, I applied it, but unfortunately it didn't help much.

And yes, I mistyped the name of the file, /etc/ssh/sshd_conf, not /etc/hosts.

BTW, what profit do I gain by specifying sshd settings in the /ect/rc.conf file? Is the idea to always use default configuration files for most programs and explicitly specify only custom settings in /ect/rc.conf? I'm a noob for the FreeBSD community, sorry:r

But, I found something interesting! It turns out, I'm very popular and some guy tries his best to guess my logins:
Code:
May 19 04:38:32 PROBLEMATIC sshd[3731]: Invalid user newuser from 222.33.62.178
May 19 04:38:35 PROBLEMATIC sshd[3733]: Invalid user newuser1 from 222.33.62.178
May 19 04:38:37 PROBLEMATIC sshd[3735]: Invalid user nicholson from 222.33.62.178
May 19 04:38:39 PROBLEMATIC sshd[3737]: Invalid user norris from 222.33.62.178
May 19 04:38:42 PROBLEMATIC sshd[3739]: Invalid user payne from 222.33.62.178
May 19 04:38:44 PROBLEMATIC sshd[3741]: Invalid user petitto from 222.33.62.178
May 19 04:38:47 PROBLEMATIC sshd[3743]: Invalid user wendy from 222.33.62.178
May 19 04:38:49 PROBLEMATIC sshd[3745]: Invalid user will from 222.33.62.178
and many many funny characters...

I figured out that after about 20 login attempts (even without specifying a password), just:
Code:
ssh 127.0.0.1
Password:^C
on about 21 turn ssh hangs:
Code:
ssh 127.0.0.1
^C^C^C
I can type here but can not exit ^Z^X^C
after that I can't login nor locally, nor remotly. Even service sshd restart won't help.

So I can guess, that this strange sshd behaviour is due to someone guessing many-many usernames. I can use a firewall like PF to allow login only from known IP's, but can I somehow increase the number of logging attempts? Or reset hanging attempts?

Thank you!
 
To be honest your setup has so many advanced features turned on, lagg(4) mixed with vlan(4)s, that it's very difficult to tell if the problems stem from a misconfiguration in those or something else. Try to revert to a simple setup with just physical interfaces involved and see if it makes a difference.

In general if you're dealing with a system that is still unfamiliar to you try to build things little steps at a time and don't try to use all the advanced features at the same time. It makes troubleshooting much easier if you have a good picture of what you just changed when something goes wrong.
 
2 kpa
@kpa

Great advice, thank you! I'll disable all these features and try to reproduce the issue with sshd. I must report, that I can't reproduce this issue on my VMware Player FreeBSD box of the same version.
 
Last edited by a moderator:
I've provided several experiments in order to reproduce the issue with SSH hangs up:
  • the issue does not being reproduced on the system without lagg0 interface being configured;
  • the issue does not being reproduced on the system with lagg0 interface being configured but sshd does not listen to any vlan of lagg0 interface;
  • the issue being reproduced on the system with lagg0 interface being configured and sshd does listen to vlan4 (internet) interface of lagg0 interface.
There must be something wrong with my lagg(4) interface configuration combined with vlan(4) as @kpa points out.

Can I somehow configure a setup(may be several virtual machines) in order to play with LAGG and VLAN interfaces? I feel that I lack in understanding of this technology.

Thank you.
 
Last edited by a moderator:
After doing some research:
Code:
/etc/rc.d/sshd stop
/usr/sbin/sshd -ddd
The line on witch sshd actually hangs up was the following:
Code:
debug1: SSH2_MSG_KEXINIT sent

According to http://old.nabble.com/sshd-hangs-after-SSH2_MSG_KEXINIT-sent---Fedora-Core-5-update-td8818375.html and http://www.snailbook.com/faq/mtu-mismatch.auto.html may be this problem is because of packets fragmentation. My ISP firewall configured to only allow traffic on 22 port. And sshd actually hangs up when I try to read big files.

I report after I can confirm or dismiss it.
Thank you for attention.
 
Back
Top