Net traffic collision: jail's aliased prevents TFTP

To get multiple Jails to run on a single NIC I was faced with two choices for alias IP creation:
  1. Alias directly on to the NIC
  2. Clone lo1 and create alias IPs onto lo1.
I chose the simpler method of aliasing the NIC (#1):
Code:
ifconfig alc0 inet 192.168.1.100/24 alias
I then ran into a problem: my diskless clients which boot from alc0 proper, (IP 192.168.2.1) started timing out on TFTP request at boot. Clients get their IP from DHCP just fine, but I have to reset the switch several times to get TFTP working and other complications.

Then I decided to shutdown all the Jails, and diskless booting was back to running smoothly. Conclusion: IP addresses from Jail aliases are corrupting the topology. I have a GBit switch, it's a cheap one and I don't expect miracles from it, but it may be getting the addresses confused because all of them are serviced through the same single port on the box. I can think of two causes for this problem:
  • My PF/NAT/rdr settings might be wrong, and maybe proper use of packet TAG will correct this. Currently I only tag as INTNET and have no rdr set up for dhcp/tftp/nfs (these services should not need it I think?) My current pf.conf here:
  • I should in fact create the aliases on lo1 and let the FreeBSD network stack do the sorting instead of leaving the poor switch in a disoriented state. That way, the switch only communicates with alc0.
Is my analysis correct? Which is the better way?

Regards.
 
If you are running jails you need to make sure all network services are bound to a specific IP address. By default they will listen on all addresses, including the ones from your jails.
 
Isn't the jailed service (http, mysql) by the very definition, bound only to the jail's IP?

Are you saying that un-bound services will saturate the switch?
 
Beeblebrox said:
Isn't the jailed service (http, mysql) by the very definition, bound only to the jail's IP?
No, it isn't.

Are you saying that un-bound services will saturate the switch?
No, I'm saying the services running on the host or jail might get confused and start responding to other IP addresses also defined on the host.
 
OK, for mysql I had apparently already placed in jail-sql/etc/my.conf below code, which should take care of sql.
Code:
bind-address = 192.168.2.101
For http I now placed inside its jail etc/rc.conf "-h <ip>" flag. I then ran my script which cretes the 2 aliases on alc0 and starts the jails. Then I started a diskless client and boot hangs waiting for TFTP.
Should I bind dhcp/inetd as well? dhcp.conf passes "next-server 192.168.2.1", I assumed that would suffice. NFS is already bound to 192.168.2.1.
 
Beeblebrox said:
  1. Alias directly on to the NIC
  2. Clone lo1 and create alias IPs onto lo1.
I chose the simpler method of aliasing the NIC (#1):
Code:
ifconfig alc0 inet 192.168.1.100/24 alias
I then ran into a problem: my diskless clients which boot from alc0 proper, (IP 192.168.2.1) started timing out on TFTP request at boot.
This may be caused by improper address resolution at link level. If the address of lo1 and alc0 share the same subnet, you need to configure alc0 to respond to ARP requests for the address which now is bould to lo1, otherwise other computers won't be able to send packets to it.

I should in fact create the aliases on lo1 and let the FreeBSD network stack do the sorting instead of leaving the poor switch in a disoriented state. That way, the switch only communicates with alc0.
Is my analysis correct? Which is the better way?

Regards.

If you use three addresses bound to the internal interface, loopback or to the external interface has almost no impact on performance. The only difference is the use of ARP protocol.
 
Thanks for the input, ecazamir.
If the address of lo1 and alc0 share the same subnet
I was trying to say OR not AND - so I would use alc0 clones OR lo1 clones, not both. NIC fxp0 and NIC alc0 are separate networks, clones go on alc0, and traffic between the two NICs should be bridged.

That said, I ran tcpdump after your post and found that pflog0 dumps nothing at all (internal nor external) but fxp0, alc0, lo0 do dump data. I do suspect my pf to be inaccurately configured, specially rule #2. In this setup, clones (.2.100 & .2.101) are placed on alc0. dhcp/inetd/nfsd run on alc0 proper (not clone) and diskless clients get address range 192.168.2.2 > 2.99
Code:
TRANSLATION RULES:
1. nat on fxp0 from ! (fxp0) to any -> (fxp0:0)
2. nat on alc0 from (alc0:network) to any -> (alc0) round-robin
3. nat on fxp0 inet from 192.168.2.0/24 to any -> 192.168.1.10
4. rdr on alc0 inet proto tcp from any to 192.168.2.1 port = http -> 192.168.2.100 port 80
5. rdr on alc0 inet proto tcp from any to 192.168.2.1 port = 3306 -> 192.168.2.101 port 3306
6. rdr on fxp0 inet proto tcp from any to (alc0) port = http -> 192.168.2.100 port 80
7. no rdr all
you need to configure alc0 to respond to arp requests for the address which now is bould to lo1
Since I cloned on alc0, there is no lo1 and clones show up on lo0. I thought that with such setup, NAT would take care of it all? If I still need ARP, can I place static entries in /etc/hosts?

EDIT: I just noticed that the jails do not necessarily need to be running for this error to happen; just the fact that the aliases are present/created is sufficient. So error occurs even when jails are stopped, but aliases have not been removed. In order to get it working again, I have to delete both aliases, bring alc0 down and bring alc0 back up.
 
Beeblebrox said:
clones go on alc0, and traffic between the two NICs should be bridged.
Bridging the interfaces it's like having a single NIC with all the addresses of both NICs, with some distribution regarding incoming traffic from your clients.

Beeblebrox said:
2. nat on alc0 from (alc0:network) to any -> (alc0) round-robin
This rule is performing NAT for the traffic having the source address in the subnet asigned to alc0 if the outbound interface is alc0. This kind of traffic should not happen, unless you want to perform NAT for LAN clients trying to access a DMZ server located on a different subnet, but physically in the same network with the LAN clients AND you don't use named with split view configuration. This way, the DMZ server sees the conection as if coming from your gateway, instead of the real LAN client.

Example: LAN_Client: in subnet 10.0.0.0/24, DMZ_Server in subnet 10.0.1.0/24, the server is performing NAT in behalf of the client.
From my experience, use of NAT is needed only for external interfaces, or if your servers are accessed with port forwarding method.


Beeblebrox said:
If I still need ARP, can I place static entries in /etc/hosts?
/etc/hosts and optionally /etc/ethers are used by the local machine to determine how to reach remote devices/servers. In addition, DNS is used. /etc/hosts (static) and DNS (dynamic) is used to find the IP address of the remote machines, ARP is used to find the ethernet address (MAC). ARP uses dynamic resolution by default, but some entries can be assigned manually in /etc/ethers.

Other machines usually use the ARP protocol to find how to communicate with the local machine, the default FreeBSD installation is set to respond to ARP requests for all the local configured addresses on each network segment.
 
Back
Top