DNS External Resolve Problem

I have recently set up a new server with FreeBSD 9.0-RELEASE to replace our slightly outdated 7.1-RELEASE, but am having a problem when switching them out. Both servers are set up with jails, one of which acts as the DNS server. I believe the problem has to do with BIND, but am getting no errors in my var/log/messages log.

I am able to view the websites in the server fine from the local network, even when using their domain names as I normally would while surfing the web. These same websites, though, are not available to the outside world. In fact, the server itself can't resolve any external domains and can only ping as far as the router. I have called the router support desk and supplied them with all of its settings, but they only responded that there should be no connectivity problems as long as the IP addresses are the same (which they are).

The old server is running BIND 9.4 while the new one is on 9.8. When setting up the server, I simply used rsync to copy over all of the current config files and naively (read stupidly) thought that would be fine.

I currently have the new server online through a different IP address and its /etc/resolv.conf pointing towards our current server as the DNS server. This kind of defeats the purpose, though, as I need it to be able to stand alone while we lay the old one to rest.

I don't have much experience with BIND or DNS, but it seems to me like it isn't able to make contact with the DNS root servers. This would explain why it resolves without error locally, but is invisible to the outside world, would it not?

Any help is greatly appreciated.
This has had me on the brink of tears for the past few days.:\

(Did my best to follow the formatting rules, could have missed something though.)
 
Does the jail that hosts the DNS server have full internet access? How is the network setup of that jail, is it set up with a private or public address and which interface the address is bound to? NAT settings if it's using private address?
 
kpa said:
Does the jail that hosts the DNS server have full internet access?

Yes, it has full internet access.

kpa said:
How is the network setup of that jail, is it set up with a private or public address and which interface the address is bound to? NAT settings if it's using private address?

The local IP for the DNS jail is 192.168.10.211, with 192.168.10.210 as the physical server address. It's attached to the fxp0 interface. Our router converts our publicly visible addresses to the local addresses listed below. Below that is the DNS jails resolv.conf.

Code:
[cmd=""]ifconfig fxp0[/cmd]
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        ether 00:04:23:88:6a:7c
        inet 192.168.10.210 netmask 0xffffff00 broadcast 192.168.10.255
        [color="red"]inet 192.168.10.211 netmask 0xffffffff broadcast 192.168.10.211[/color]
        inet 192.168.10.212 netmask 0xffffffff broadcast 192.168.10.212
        inet 192.168.10.213 netmask 0xffffffff broadcast 192.168.10.213
        inet 192.168.10.214 netmask 0xffffffff broadcast 192.168.10.214
        inet 192.168.10.215 netmask 0xffffffff broadcast 192.168.10.215
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

Code:
[cmd=""]less /usr/jails/jail_dns/etc/resolv.conf[/cmd]
domain  changed.for.privacy
nameserver      192.168.10.211

Note that the above output is actually the output from our currently operating server (can't access the troubled one as I shut it down before coming home from work). It really shouldn't matter too much though since both machines are exactly the same hardware-wise. The only difference is that there is an fe80:: inet6 listing in the above ifconfig fxp0 output. The options are also slightly different. Would that make any difference?

Also, I should explain that if I modify all of the IP addresses on the new server to 192.168.10.230~192.168.10.234 for example, I have no connectivity problems. I assume that this is because the original DNS server at 192.168.10.211 is still running. When the new server is set to these IP addresses, all of the resolv.conf files are also set to resolve using 192.168.10.231. If I set them all to not use the old servers DNS, 192.168.10.211, does that mean that it has no effect on the new server at all? Is there any reason as to why it would work with these different IP addresses? I always shut down and disconnect the old server before turning the new one on so there are no IP conflicts.
 
First of all make sure you don't have any routing issues from your new servers to the Internet, e.g. ping to a server on the Internet using its IP address
Then run
# dig @192.168.10.231 some.host.name +trace
to figure out how far your DNS requests go.
 
lbol said:
First of all make sure you don't have any routing issues from your new servers to the Internet, e.g. ping to a server on the Internet using its IP address
Then run
# dig @192.168.10.231 some.host.name +trace
to figure out how far your DNS requests go.

In this example, does "some.host.name" refer to the host name of the same IP address that I ping beforehand?

Also, I noticed that the named.root file in BIND 9.8 is slightly different than its counterpart in 9.4. Is there any chance that such a slight change is stopping my server from contacting the outside world?

I will be doing as much testing as possible when I go back to work, so any other hints/commands would be greatly appreciated.
 
some.host.name refers to an arbitrary host on the Internet, e.g. google.com.

Differences in named.root doesn't make any difference. The file contains the IP addresses of the root name servers. Even if you don't have the latest version of the file it will not stop DNS from working.

You should also check the named log files
 
I have checked all of the named log files that I know of. By default, I think it outputs to /var/log/messages. I have since made a logging clause inside of named.conf to output a more detailed log, but haven't had a chance to really test it out to see what happens.

Are there any other possible explanations other than DNS that would explain such a strange problem? Like I said, I changed the IP addresses to be the same, confirmed that the router settings are not a problem, and am seeing no other errors in any of my logs. It's driving me crazy. :\
 
To make sure you have Internet connectivity to root name servers try to ping one of them
# ping 198.41.0.4
If that works try to resolve a host name from the root name server
# dig @198.41.0.4 google.com +trace
If you don't get a positive response your request is blocked somewhere or the response is not getting back to you.
If you get a positive response named on your system is also able to contact the root name servers.
 
I ran the commands and this is what I got.

Code:
[cmd=""]ping 198.41.0.4[/cmd]
PING 198.41.0.4 (198.41.0.4): 56 data bytes
64 bytes from 198.41.0.4: icmp_seq=0 ttl=52 time=75.926 ms
64 bytes from 198.41.0.4: icmp_seq=1 ttl=52 time=102.105 ms
64 bytes from 198.41.0.4: icmp_seq=2 ttl=52 time=86.483 ms
^C
--- 198.41.0.4 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 75.926/88.171/102.105/10.754 ms

Code:
[cmd=""]dig @198.41.0.4 google.com +trace[/cmd]
; <<>> DiG 9.8.1-P1 <<>> @198.41.0.4 google.com +trace
; (1 server found)
;; global options: +cmd
.                       518400  IN      NS      c.root-servers.net.
.                       518400  IN      NS      m.root-servers.net.
.                       518400  IN      NS      j.root-servers.net.
.                       518400  IN      NS      k.root-servers.net.
.                       518400  IN      NS      e.root-servers.net.
.                       518400  IN      NS      h.root-servers.net.
.                       518400  IN      NS      a.root-servers.net.
.                       518400  IN      NS      g.root-servers.net.
.                       518400  IN      NS      f.root-servers.net.
.                       518400  IN      NS      l.root-servers.net.
.                       518400  IN      NS      i.root-servers.net.
.                       518400  IN      NS      b.root-servers.net.
.                       518400  IN      NS      d.root-servers.net.
dig: couldn't get address for 'c.root-servers.net': not found

I also checked my detailed named log and found this at the beginning:

Code:
26-Mar-2012 22:34:53.185 general: info: zone 0.0.127.in-addr.arpa/IN/inside: loaded serial 20060131
26-Mar-2012 22:34:53.218 general: info: zone 10.168.192.in-addr.arpa/IN/inside: loaded serial 2009021801

And the following entries for each domain we have set up in our named.conf. The "inside" and "outside" you see are actually declared DNS views.

Code:
26-Mar-2012 22:34:53.244 general: info: zone changed.for.privacy/IN/inside: loaded serial 2010090201
26-Mar-2012 22:34:53.544 general: info: zone changed.for.privacy/IN/outside: loaded serial 2010090201

I also tested connecting directly to one of the jails running Apache through it's external IP from a computer outside of the LAN and couldn't connect. Once I reconnected the old server, I connected without a hitch. I have the ability to test and try to get the server to connect every night, so any hints as to what my next step should be?

It really looks like DNS to me since the server isn't able to resolve any names. But if that's the case, then why can I not access the servers directly through their external IP addresses? We do have another cache DNS server running on the network, could that explain some of this? Also, I recently learned that there are two types of DNS servers: cache and authoritative. Is there something special I need to do to BIND to make it be authoritative? I can post my named.conf if that would help.
 
This looks more like a firewall problem to me. I would recommend to check your firewall log files for anything being rejected to/from your new host
 
Well, I had the guy who originally set up the network and server to see what he thought and he couldn't figure it out either. So, we have decided that the most likely culprit is some random setting that's different in FreeBSD 9.

The original reason we needed to change servers is that the original is running out of space fast. I used RAID to copy over the original contents to a larger hard drive and tried # growfs, but that didn't work. I hate to distract the thread to a different problem, but what is the best way to add a new partition? I guess I'll try gparted and see what happens.

Thanks for the help. ;)
 
Back
Top