No network connection between specific mail servers

Hello!

I have an issue wirth network conection to ovh mail servers from my side.
I've changed ISP and the problem still exists.
I don't have any issues with other services and mail servers as well.

I've got one external interface with several IPs.
I've got several VM on bhyve on this server.
I use NAT on PF to forward all traffic one-to-one IP to specific VM.

I can connect directly via server (hypervisor).
Code:
hpv:~ % telnet -s 1st_ip mx3.mail.ovh.net 25
Trying 91.121.53.175...
Connected to mx3.mail.ovh.net.
Escape character is '^]'.
220-mx3.mail.ovh.net in43
QUIT
220 mx3.mail.ovh.net in43
221 2.0.0 Bye
Connection closed by foreign host.
hpv:~ % telnet -s 2nd_ip mx3.mail.ovh.net 25
Trying 91.121.53.175...
Connected to mx3.mail.ovh.net.
Escape character is '^]'.
220-mx3.mail.ovh.net in34
QUIT
220 mx3.mail.ovh.net in34
221 2.0.0 Bye
Connection closed by foreign host.

I cannot conect from VM to ovh mail servers.
Code:
vm:~ % telnet mx3.mail.ovh.net 25
Trying 91.121.53.175...
telnet: connect to address 91.121.53.175: Operation timed out
telnet: Unable to connect to remote host

ifconfig on hpv
Code:
hpv:~ % ifconfig igb0
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 1st_ip netmask 0xfffffff8 broadcast brc_ip
        inet 2nd_ip netmask 0xfffffff8 broadcast brc_ip
        inet 3rd_ip netmask 0xfffffff8 broadcast brc_ip
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

tcpdump caught on hpv but connection from VM
Code:
2019-04-03 18:54:06, ethertype IPv4 (0x0800), length 74: 2nd_ip.52727 > 91.121.53.175.25: Flags [S], seq 2485337155, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 4245799571 ecr 0], length 0
2019-04-03 18:54:06, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > 2nd_ip.52727: Flags [S.], seq 1380750155, ack 2485337156, win 17520, options [mss 1460,sackOK,wscale 12,TS val 25 ecr 4245799571,eol], length 0
2019-04-03 18:54:09, ethertype IPv4 (0x0800), length 74: 2nd_ip.52727 > 91.121.53.175.25: Flags [S], seq 2485337155, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 4245802650 ecr 0], length 0
2019-04-03 18:54:09, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > 2nd_ip.52727: Flags [S.], seq 1380750155, ack 2485337156, win 17520, options [mss 1460,sackOK,wscale 12,TS val 25 ecr 4245802650,eol], length 0

tcpdump caught on VM
Code:
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: fw_local_ip.57278 > 91.121.53.175.25: Flags [S], seq 3197177173, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 3089671621 ecr 0], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: mail_local_ip.59635 > 91.121.53.175.25: Flags [S], seq 3197177173, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 2243058910 ecr 0], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > fw_local_ip.59635: Flags [S.], seq 3834360175, ack 3197177174, win 17520, options [mss 1460,sackOK,wscale 12,TS val 3340468235 ecr 2243058910,eol], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > mail_local_ip.57278: Flags [S.], seq 3834360175, ack 3197177174, win 17520, options [mss 1460,sackOK,wscale 12,TS val 2720487035 ecr 3089671621,eol], length 0

tcpdump caught on hpv and conection from hpv
Code:
2019-04-03 18:51:37, ethertype IPv4 (0x0800), length 74: 1st_ip.44933 > 91.121.53.175.25: Flags [S], seq 3448230379, win 65535, options [mss 1460,nop,wscale 9,sackOK,TS val 1843780144 ecr 0], length 0
2019-04-03 18:51:37, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > 1st_ip.44933: Flags [S.], seq 3531821347, ack 3448230380, win 17520, options [mss 1460,sackOK,wscale 12,TS val 25 ecr 1843780144,eol], length 0
2019-04-03 18:51:37, ethertype IPv4 (0x0800), length 66: 1st_ip.44933 > 91.121.53.175.25: Flags [.], ack 1, win 2050, options [nop,nop,TS val 1843780184 ecr 25], length 0
2019-04-03 18:51:37, ethertype IPv4 (0x0800), length 93: 91.121.53.175.25 > 1st_ip.44933: Flags [P.], seq 1:28, ack 1, win 512, options [nop,nop,TS val 25 ecr 1843780184], length 27: SMTP: 220-mx3.mail.ovh.net in76
2019-04-03 18:51:37, ethertype IPv4 (0x0800), length 66: 1st_ip.44933 > 91.121.53.175.25: Flags [.], ack 28, win 2050, options [nop,nop,TS val 1843780329 ecr 25], length 0
2019-04-03 18:51:44, ethertype IPv4 (0x0800), length 93: 91.121.53.175.25 > 1st_ip.44933: Flags [P.], seq 28:55, ack 1, win 512, options [nop,nop,TS val 1666 ecr 1843780329], length 27: SMTP: 220 mx3.mail.ovh.net in76
2019-04-03 18:51:44, ethertype IPv4 (0x0800), length 66: 1st_ip.44933 > 91.121.53.175.25: Flags [.], ack 55, win 2050, options [nop,nop,TS val 1843786898 ecr 1666], length 0
2019-04-03 18:51:54, ethertype IPv4 (0x0800), length 77: 1st_ip.44933 > 91.121.53.175.25: Flags [P.], seq 1:12, ack 55, win 2050, options [nop,nop,TS val 1843797418 ecr 1666], length 11: SMTP: EHLO test
2019-04-03 18:51:54, ethertype IPv4 (0x0800), length 162: 91.121.53.175.25 > 1st_ip.44933: Flags [P.], seq 55:151, ack 12, win 512, options [nop,nop,TS val 4334 ecr 1843797418], length 96: SMTP: 250-in76.mail.ovh.net
2019-04-03 18:51:54, ethertype IPv4 (0x0800), length 66: 1st_ip.44933 > 91.121.53.175.25: Flags [.], ack 151, win 2050, options [nop,nop,TS val 1843797574 ecr 4334], length 0

Again, I don't have any issues with other mail servers and services to connect.
Thanks for any suggestions for help.
 
Check your NAT configuration. When you design your network try to avoid using NAT when you can as it has bigger overhead on the CPU/Memory than using bridge/routing.
 
NAT works correctly because other connections work perfect.
CPU and memory are not overloaded.
 
Then why you have two packets with the same seq number?

Code:
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: fw_local_ip.57278 > 91.121.53.175.25: Flags [S], seq 3197177173, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 3089671621 ecr 0], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: mail_local_ip.59635 > 91.121.53.175.25: Flags [S], seq 3197177173, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 2243058910 ecr 0], length 0

Is your tap interface in bridge at the same time?
Or you have some port address translations to that address. Can you show your PF config?
 
HPV (NAT to VM firewall) -> firewall (NAT to VM mailserver) -> VM mailserver
................................................... -> fw_local_ip to mail_local_ip -> ..............................

Both firewall and mailserver are in the same bridge interface.
But, It does not matter regarding PF if other connections in this case work except OVH mailservers.
 
That's explain why you see two packets from both fw_local_ip and mail_local_ip in tcpdump. Your should not bridge VM interface with the WAN interface when you are using NAT.
According to this tcpdump
Code:
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: fw_local_ip.57278 > 91.121.53.175.25: Flags [S], seq 3197177173, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 3089671621 ecr 0], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: mail_local_ip.59635 > 91.121.53.175.25: Flags [S], seq 3197177173, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 2243058910 ecr 0], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > fw_local_ip.59635: Flags [S.], seq 3834360175, ack 3197177174, win 17520, options [mss 1460,sackOK,wscale 12,TS val 3340468235 ecr 2243058910,eol], length 0
2019-04-03 19:14:25, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > mail_local_ip.57278: Flags [S.], seq 3834360175, ack 3197177174, win 17520, options [mss 1460,sackOK,wscale 12,TS val 2720487035 ecr 3089671621,eol], length 0
The syn packet that is send from fw_local_ip port 57278 to 91.121.53.175 port 25 is returning back with ack from 91.121.53.175 port 25 to mail_local_ip.57278 instead of fw_local_ip port 57278 and the connection never occurs. That's why you shoud check your NAT settings and also revisit your network topology. You can refer to this blog that have a good example of bridging vs NAT
 
Even if you have right, are you able to explain why other such connections work instead of the connection to OVH server?

Please look at the following tcpdump from mail server. The previous one was from firewall and there are two entries you mentioned.
Code:
2019-04-03 22:38:22, ethertype IPv4 (0x0800), length 74: mail_local_ip.51484 > 91.121.53.175.25: Flags [S], seq 1100054662, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 2227451120 ecr 0], length 0
2019-04-03 22:38:22, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > mail_local_ip.51484: Flags [S.], seq 1606200551, ack 1100054663, win 17520, options [mss 1460,sackOK,wscale 12,TS val 4026222972 ecr 2227451120,eol], length 0
2019-04-03 22:38:25, ethertype IPv4 (0x0800), length 74: mail_local_ip.51484 > 91.121.53.175.25: Flags [S], seq 1100054662, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 2227454158 ecr 0], length 0
2019-04-03 22:38:25, ethertype IPv4 (0x0800), length 74: 91.121.53.175.25 > mail_local_ip.51484: Flags [S.], seq 1606200551, ack 1100054663, win 17520, options [mss 1460,sackOK,wscale 12,TS val 4026222972 ecr 2227454158,eol], length 0

Generally, due to some reason my mail server does not ACK the last packet from OVH.
It should be like this. This is connection to another mail server.
Code:
2019-04-03 22:42:37., ethertype IPv4 (0x0800), length 74: mail_local_ip.47177 > 212.77.101.4.25: Flags [S], seq 323460540, win 65535, options [mss 4030,nop,wscale 9,sackOK,TS val 790949328 ecr 0], length 0
2019-04-03 22:42:37., ethertype IPv4 (0x0800), length 74: 212.77.101.4.25 > mail_local_ip.47177: Flags [S.], seq 1513722540, ack 323460541, win 28960, options [mss 1460,sackOK,TS val 2869942082 ecr 790949328,nop,wscale 7], length 0
2019-04-03 22:42:37., ethertype IPv4 (0x0800), length 66: mail_local_ip.47177 > 212.77.101.4.25: Flags [.], ack 1, win 2050, options [nop,nop,TS val 790949344 ecr 2869942082], length 0
2019-04-03 22:42:37., ethertype IPv4 (0x0800), length 86: 212.77.101.4.25 > mail_local_ip.47177: Flags [P.], seq 1:21, ack 1, win 227, options [nop,nop,TS val 2869942118 ecr 790949344], length 20: SMTP: 220 mx.wp.pl ESMTP
2019-04-03 22:42:37., ethertype IPv4 (0x0800), length 66: mail_local_ip.47177 > 212.77.101.4.25: Flags [.], ack 21, win 2050, options [nop,nop,TS val 790949482 ecr 2869942118], length 0
2019-04-03 22:42:41., ethertype IPv4 (0x0800), length 77: mail_local_ip.47177 > 212.77.101.4.25: Flags [P.], seq 1:12, ack 21, win 2050, options [nop,nop,TS val 790953559 ecr 2869942118], length 11: SMTP: EHLO test
2019-04-03 22:42:41., ethertype IPv4 (0x0800), length 66: 212.77.101.4.25 > mail_local_ip.47177: Flags [.], ack 12, win 227, options [nop,nop,TS val 2869946318 ecr 790953559], length 0
2019-04-03 22:42:41., ethertype IPv4 (0x0800), length 155: 212.77.101.4.25 > mail_local_ip.47177: Flags [P.], seq 21:110, ack 12, win 227, options [nop,nop,TS val 2869946318 ecr 790953559], length 89: SMTP: 250-mx.wp.pl
2019-04-03 22:42:41., ethertype IPv4 (0x0800), length 66: mail_local_ip.47177 > 212.77.101.4.25: Flags [.], ack 110, win 2050, options [nop,nop,TS val 790953680 ecr 2869946318], length 0
2019-04-03 22:42:43., ethertype IPv4 (0x0800), length 72: mail_local_ip.47177 > 212.77.101.4.25: Flags [P.], seq 12:18, ack 110, win 2050, options [nop,nop,TS val 790955343 ecr 2869946318], length 6: SMTP: QUIT
2019-04-03 22:42:43., ethertype IPv4 (0x0800), length 80: 212.77.101.4.25 > mail_local_ip.47177: Flags [P.], seq 110:124, ack 18, win 227, options [nop,nop,TS val 2869948097 ecr 790955343], length 14: SMTP: 221 mx.wp.pl
2019-04-03 22:42:43., ethertype IPv4 (0x0800), length 66: 212.77.101.4.25 > mail_local_ip.47177: Flags [F.], seq 124, ack 18, win 227, options [nop,nop,TS val 2869948098 ecr 790955343], length 0
2019-04-03 22:42:43., ethertype IPv4 (0x0800), length 66: mail_local_ip.47177 > 212.77.101.4.25: Flags [.], ack 125, win 2050, options [nop,nop,TS val 790955358 ecr 2869948097], length 0
2019-04-03 22:42:43., ethertype IPv4 (0x0800), length 66: mail_local_ip.47177 > 212.77.101.4.25: Flags [F.], seq 18, ack 125, win 2050, options [nop,nop,TS val 790955358 ecr 2869948097], length 0
2019-04-03 22:42:43., ethertype IPv4 (0x0800), length 66: 212.77.101.4.25 > mail_local_ip.47177: Flags [.], ack 19, win 227, options [nop,nop,TS val 2869948119 ecr 790955358], length 0
 
Until i see your state table and NAT rules i can't tell you exactly why the packets are send at the same time. You may have a static nat or redirect to that ip address or some wrong state in your nat table. You can check them using
pfctl -s nat
pfctl -s state
and see how the NAT is perform during the connection.
 
HPV (physical server):
binat on igb0 inet from fw_local_ip to any -> 1st_ip

FW (virtual machine):
nat on vtnet0 inet from mail_local_ip to ! <abc> -> fw_local_ip
rdr on vtnet0 inet proto tcp from any to fw_local_ip port = smtp -> mail_local_ip port 25
 
:~ % openssl s_client -crlf -connect ssl0.ovh.net:465
34371039232:error:0200203C:system library:connect:Operation timed out:/usr/src/crypto/openssl/crypto/bio/b_sock2.c:110:
34371039232:error:2008A067:BIO routines:BIO_connect:connect error:/usr/src/crypto/openssl/crypto/bio/b_sock2.c:111:
connect:errno=60


That's a network problem for sure.
I'm thinking wheather OVH does some weird thing with TCP segment, example adding or changing field or option and server does not ACK the segment.
 
But the problem occures only for OVH servers.
Other services like web, dns, vpn work correctly and mails as well instead of OVH connections.

I can ommit binat+nat and try to connect via telnet only using binat on firewall server.
The same issue.
 
Please look at the TCP segement response from OVH. I see difference in Timestamp here.
I'm thinking if PF binat changes something in this field.

Working connection from HPV (OVH):
Code:
TCP Option - Timestamps: TSval 25, TSecr 1137318648
...
[SEQ/ACK analysis]
    [This is an ACK to the segment in frame: 1]
    [The RTT to ACK the segment was: 0.041559000 seconds]
    [iRTT: 0.041598000 seconds]

Not working connection from VM (OVH):
Code:
TCP Option - Timestamps: TSval 3097997591, TSecr 1912445411
...
[SEQ/ACK analysis]
    [This is an ACK to the segment in frame: 1]
    [The RTT to ACK the segment was: 0.040179000 seconds]

Working connection from HPV (different public mail server):
Code:
TCP Option - Timestamps: TSval 1361857303, TSecr 3348961859
...
[SEQ/ACK analysis]
    [This is an ACK to the segment in frame: 1]
    [The RTT to ACK the segment was: 0.020897000 seconds]
    [iRTT: 0.020950000 seconds]

Working connection from VM (different public mail server):
Code:
TCP Option - Timestamps: TSval 2402880460, TSecr 1142755717
...
[SEQ/ACK analysis]
    [This is an ACK to the segment in frame: 1]
    [The RTT to ACK the segment was: 0.014864000 seconds]
    [iRTT: 0.014928000 seconds]

Based on above I don't see iRTT time in the TCP segment for failure connection.

I was looking to obtain iRTT and found this.

Code:
 TimeStamp pseudocode () {

  #1 Client A  --> Server A - Initial SYN should contain TSVAL = Time Cx1 and TSECR = 0
  #2 Server A --> Client A - SYN + ACK should contain TSVAL = Time Cy1 and TSECR = Cx1
  #3 Client A  --> Server A - ACK should contain TSVAL = Cx2 and TSECR = Cy1
  #4 Server A --> Client A - Packet should contain TSVAL = Cy2 and TSEC = Cx2

  Cx2 - Cx1 = TCP RTT in ms

}

Connection is failure because tha last ACK in 3-way hanshake is not begun from my side on VM.
I suppose VM do not receive iRTT in the TCP segment and that's why VM retransmit the packet again with the same failure result.
 
I think I found the problem and solution.

Code:
vm:~ % sysctl net.inet.tcp.rfc1323
net.inet.tcp.rfc1323: 1

Change solved the issue:
Code:
vm:~ % sudo sysctl net.inet.tcp.rfc1323=0
net.inet.tcp.rfc1323: 1 -> 0

Ok, so let's start deeper investigation.
  • net.inet.tcp.rfc1323 turns TCP window scaling on or off.

It looks like if TCP window scaling is on I cannot connect to OVH servers via VM.
The question is if OVH inproperly handling RFC1323 or FreeBSD/PF TCP stack does something weird during NAT to/from VM ?!

Next, regarding the https://calomel.org/freebsd_network_tuning.html
TCP Buffers: Larger buffers and TCP Large Window Extensions (RFC1323) can
help alleviate the long fat network (LFN) problem caused by insufficient
window size; limited to 65535 bytes without RFC 1323 scaling. Verify the
window scaling extension is enabled with net.inet.tcp.rfc1323=1, which is
default. Both the client and server must support RFC 1323 to take advantage
of scalable buffers.


Last but not least, FreeBSD has set net.inet.tcp.rfc1323=1 as default option.
Again, I do not have any issues with other servers.
 
By the way, can someone telnet from bhyve VM to OVH server to check if the problem exists only in my enviroment or is a global?
Thanks.

telnet mx3.mail.ovh.net 25
 
By the way, can someone telnet from bhyve VM to OVH server to check if the problem exists only in my enviroment or is a global?
Thanks.

telnet mx3.mail.ovh.net 25
Code:
abishai@poudriere:~ %  telnet mx3.mail.ovh.net 25
Trying 91.121.53.175...
Connected to mx3.mail.ovh.net.
Escape character is '^]'.
220-mx3.mail.ovh.net in52
 
I'm trying to ask again. Can someone telnet from bhyve VM to OVH server to check if the problem exists only in my enviroment or is a global?

telnet mx3.mail.ovh.net 25

And then show me the output of the following command from the system.

sysctl net.inet.tcp.rfc1323
 
Last edited:
I'm trying again.
Can someone test the connection from bhyve VM with scaling wIndow enabled?
 
Back
Top