Help with zabbix

Hi,

I installed Zabbix to monitor our production server and my mailbox is flodded with email from zabbix saying :
Code:
OK: Zabbix agent on production.mydomain.co.uk is unreachable for 5 minutes

Trigger: Zabbix agent on production.mydomain.co.uk is unreachable for 5 minutes
Trigger status: OK
Trigger severity: Average
Trigger URL:

Item values:

1. Agent ping (production.mydomain.co.uk:agent.ping): Up (1)
2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
3. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*

Original event ID: 5917

We have several thrid party mnitoring tool that check if the website is running every minute and none of them are triggered. This make be believe that
1. the Zabbix server has networking issue
2. I missconfigured something

The zabbix server run in a vps in the USA and the production is in the UK.. Could the geographical location cause this problem?
 
Just a tough, I also monitor another (backup) server from the same zabbix server also in the UK and I do not get this message
 
You need to open port 10050 to allow the server to contact the agent.

You can test things on the 'agent' machine: zabbix_agentd -t agent.ping
This should work and respond. That means the agent is configured correctly. Make sure the Server and ServerActive is correctly configured in zabbix_agentd.conf.

Then on the Zabbix server: zabbix_get -I <IP of zabbix host> -s <host to check> -k agent.ping
Set the -I parameter if the Zabbix host is multihomed.

It is also vitally important that the hostname of the checked host is correctly entered in Zabbix.
 
In /etc/pf.conf I have the following
Code:
pass in log on $ExtIf inet proto tcp from 206.95.xx.xx to any port 10050
I do receive data from the production server also. So the production and zabbix server are talking to each other
Code:
    production.mydomain.co.uk    CPU (10 Items)
            Context switches per second    2018-05-14 13:47:07    6.22 Ksps    +1.48 Ksps    Graph
            CPU idle time    2018-05-14 13:47:09    73 %    -4.55 %    Graph
            CPU interrupt time    2018-05-14 13:47:12    0.1 %    +0.01 %    Graph
            CPU nice time    2018-05-14 13:53:10    0.11 %    -0.01 %    Graph
            CPU system time    2018-05-14 13:53:11    5.71 %    -0.84 %    Graph
            CPU user time    2018-05-14 13:53:12    17.99 %    -3.69 %    Graph
            Interrupts per second    2018-05-14 13:52:03    208 ips    -129 ips    Graph
            Processor load (1 min average per core)    2018-05-14 13:52:23    0.25    -0.05    Graph
            Processor load (5 min average per core)    2018-05-14 13:52:06    0.25        Graph
            Processor load (15 min average per core)    2018-05-14 13:52:04    0.24        Graph
 
Check if the server has enough Pollers and such running. I've seen this happen if there's not enough of them. Also, try shutting down the Zabbix server process; service zabbix_server stop and service zabbix_server start again. Depending on the amount of historical data this can take a really long time. Long enough for the time-outs to happen. Leave the server running for a while and those errors should disappear once the checks are running again.
 
zabbix34/zabbix_server.conf
Code:
StartPollers=10
I curently monitor 2 FreeBSD server and 1 pfSense box.
FreeBSD 1 has 5 jails
FreeBSD 2 has 20 jails

What would be the correct settings for pollers?
 
Check your queues. You'll want them all at 0 delay, ideally.

But I'd say, 10 is a bit overkill. I think I configured 7 pollers to monitor 25 physical hosts.
 
The queues are all green (see attacted)...
Any other idea why I the email notification constantly?
Also what does this error mean?
Code:
2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
# zabbix_agentd -t agent.ping
Code:
agent.ping                                    [u|1]
The Server and ServerActive is correctly configured in /usr/local/etc/zabbix32/zabbix_agentd.conf
But that just make me realised that the agent is zabbix32 not zabbix34... is that an issue?
zabbix_get -I <IP of zabbix host> -s <host to check> -k agent.ping
Code:
zabbix_get [31243]: Get value error: cannot resolve [production.mydomain.co.uk]
for <IP of zabbix host> I type the IP of the zabbix server..is that correct?
Do I need to edit /etc/hosts to resolve the name?
 

Attachments

  • zabbix-queue.PNG
    zabbix-queue.PNG
    25.8 KB · Views: 460
But that just make me realised that the agent is zabbix32 not zabbix34... is that an issue?
Ideally you want to have the same versions but it's fine if the agent is older. During the transition I had a 3.4 server running but most of the agents were still at 2.2.
for <IP of zabbix host> I type the IP of the zabbix server..is that correct?
Yes, that's fine.
Do I need to edit /etc/hosts to resolve the name?
Well, I recommend fixing the resolving, preferably with DNS. The hostname of the agent machine is used in Zabbix to link the data to the correct host. So it's imperative these are correct.
 
I recommend fixing the resolving, preferably with DNS. The hostname of the agent machine is used in Zabbix to link the data to the correct host. So it's imperative these are correct.
How would I do that? Are we talking about local DNS (ubound)? Or the DNS I use to redirect traffic (dnsmadeeasy.com)
 
Are we talking about local DNS (ubound)?
Assuming the Zabbix server and the hosts are all local, yes, I'm talking about local DNS. A local DNS (doesn't matter which DNS service) is preferred because it's the least amount of work, adding hosts to a local DNS will make sure everything on the local network will be able to resolve the names. You could do it with hosts files of course, but that means you have to make sure all hosts have a complete hosts file (more work to keep everything in sync).
 
Zabbix server is in USA and Zabbix agent in the uk.. Would you suggest I add production.mydomain.co.uk to point to my public ip?
 
As you only have two servers putting it in /etc/hosts will be fine. Adding to hosts is a good solution but it scales poorly when the number of servers increases.
 
Back
Top