Summary:
I recently installed Network UPS Tools (sysutils/nut) and have been experiencing issues with the USB driver losing communication to the UPS, resulting in stale data and connection failure errors. On system boot, NUT will start and operate correctly, but loses communication to the UPS after a few hours. Once communication is dead, attempts to restart the NUT service fail. The only means of recovery is to reboot or unplug and reconnect the UPS USB cable.
System/Software:
FreeBSD 13.2-p4 and NUT version 2.8.0_24 installed from the package repository.
UPS:
APC Smart-UPS 750
Setup:
FreeBSD runs in a virtual machine on VMware ESXi.
UPS connects to the ESXi host by USB cable and to the FreeBSD guest by USB pass-through.
ESXi host has NUT client software installed and communicates to the FreeBSD NUT server to shutdown guests and the host on power failure.
Details:
I had this setup (same server and UPS) running NUT on a CentOS 6 virtual machine and worked perfectly for years without UPS communication issues. When I moved the NUT server to FreeBSD I copied the working configuration (settings changed from defaults) from CentOS.
Settings applied in ups.conf:
Settings applied in upsd.conf:
Initial attempts to start the NUT service after configuration failed:
Apparently, this is caused by the kernel USB HID driver not detaching from the device and preventing NUT from binding to it. To overcome this I added the following to /boot/loader.conf:
The vendor and product ID come from usbconfig. After a reboot, NUT has correct permissions on the device and upsd starts successfully.
Output of usbconfig (when everything is working):
When the USB driver loses communication after a few hours, upsd marks the data as stale, and the following appears in the system log:
On loss of communication, upsmon is configured to raise a notify event (nocomm or commbad) to the scheduler, and a 30-second timer will restart the NUT service via the upssched-cmd script. This is how it was on CeotOS. But the service restart fails here because the driver refuses the connection, as shown in the system log:
This goes on and on until I unplug and plug in the UPS again.
I updated the UPS firmware, but this did not help the problem.
Searching for solutions on the Internet, I initially tried increasing the timeout threshold by adding to upsd.conf:
and to upsmon.conf:
This made no improvement. I then additionally extended the poll interval to 10 seconds in ups.conf:
This seems to have stopped the failures and NUT has now run continuously for over a week without a communication error. But that's only half the problem; if communication does fail for some reason, there is no automated recovery.
During my testing I discovered it was possible to reset a device with usbconfig. So, when NUT failed I could:
and this worked. So I changed part of the upssched-cmd script to:
It's a bit ugly, and not yet tested for real by somehow creating a failure, but it's the best workaround I have.
But I still see this as an issue with the usbhid-ups driver or libusb, or both. Perhaps the NUT maintainer can look into it.
Alternatives:
I also considered switching to a serial interface port/cable and using the apcsmart driver, but the driver list suggests this has a lower support level. Without going through the hassle of changing the setup I don't know if I would see the same or less functionality and data. If anyone has knowledge of this it would be useful. I note the USB driver does not report the load or in/out voltage, which is disappointing. I don't know if that's a limitation of the UPS or the USB driver.
I recently installed Network UPS Tools (sysutils/nut) and have been experiencing issues with the USB driver losing communication to the UPS, resulting in stale data and connection failure errors. On system boot, NUT will start and operate correctly, but loses communication to the UPS after a few hours. Once communication is dead, attempts to restart the NUT service fail. The only means of recovery is to reboot or unplug and reconnect the UPS USB cable.
System/Software:
FreeBSD 13.2-p4 and NUT version 2.8.0_24 installed from the package repository.
UPS:
APC Smart-UPS 750
Setup:
FreeBSD runs in a virtual machine on VMware ESXi.
UPS connects to the ESXi host by USB cable and to the FreeBSD guest by USB pass-through.
ESXi host has NUT client software installed and communicates to the FreeBSD NUT server to shutdown guests and the host on power failure.
Details:
I had this setup (same server and UPS) running NUT on a CentOS 6 virtual machine and worked perfectly for years without UPS communication issues. When I moved the NUT server to FreeBSD I copied the working configuration (settings changed from defaults) from CentOS.
Settings applied in ups.conf:
Code:
[smt750]
driver = usbhid-ups
port = auto
desc = "APC Smart-UPS SMT750I"
offdelay = 180
ondelay = 240
Settings applied in upsd.conf:
Code:
LISTEN 127.0.0.1 3493
LISTEN <the.vm.ip.address> 3493
Initial attempts to start the NUT service after configuration failed:
Code:
# service nut start
Network UPS Tools - UPS driver controller 2.8.0
Network UPS Tools - Generic HID driver 0.47 (2.8.0)
USB communication driver (libusb 1.0) 0.43
interrupt pipe disabled (add 'pollonly' flag to 'ups.conf' to get rid of this message)
Can't claim USB device [051d:0003]@0/0: Other error
Driver failed to start (exit status=1)
Code:
usb_quirk_load="YES"
hw.usb.quirk.0="0x051d 0x0003 0 0xffff UQ_HID_IGNORE"
Code:
# ll /dev/usb
crw-rw---- 1 root nut 0x84 Nov 9 17:51 0.4.0
Output of usbconfig (when everything is working):
Code:
# usbconfig
ugen0.4: <American Power Conversion Smart-UPS 750 FW:UPS 15.0 / ID18> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (2mA)
# usbconfig -d ugen0.4 dump_device_desc
bLength = 0x0012
bDescriptorType = 0x0001
bcdUSB = 0x0200
bDeviceClass = 0x0000 <Probed by interface class>
bDeviceSubClass = 0x0000
bDeviceProtocol = 0x0000
bMaxPacketSize0 = 0x0040
idVendor = 0x051d
idProduct = 0x0003
bcdDevice = 0x0106
iManufacturer = 0x0001 <American Power Conversion >
iProduct = 0x0002 <Smart-UPS 750 FW:UPS 15.0 / ID=18>
iSerialNumber = 0x0003 <xxxxxxxxxxxx >
bNumConfigurations = 0x0001
When the USB driver loses communication after a few hours, upsd marks the data as stale, and the following appears in the system log:
Code:
upsd[3132]: Data for UPS [smt750] is stale - check driver
upsmon[5991]: Poll UPS [smt750@localhost] failed - Data stale
upsmon[5991]: Communications with UPS smt750@localhost lost
On loss of communication, upsmon is configured to raise a notify event (nocomm or commbad) to the scheduler, and a 30-second timer will restart the NUT service via the upssched-cmd script. This is how it was on CeotOS. But the service restart fails here because the driver refuses the connection, as shown in the system log:
Code:
upssched-cmd[34596]: Timer event comm-bad triggered: restarting NUT service
upsd[3132]: mainloop: Interrupted system call
upsmon[5991]: Poll UPS [smt750@localhost] failed - Write error: Broken pipe
upsmon[5991]: UPS [smt750@localhost]: connect failed: Connection failure: Connection refused
usbhid-ups[2912]: libusb1: Could not open any HID devices: insufficient permissions on everything
upsmon[5991]: UPS [smt750@localhost]: connect failed: Connection failure: Connection refused
root[37497]: /usr/local/etc/rc.d/nut: WARNING: failed precmd routine for nut
upsmon[5991]: UPS [smt750@localhost]: connect failed: Connection failure: Connection refused
upsmon[5991]: UPS smt750@localhost is unavailable
I updated the UPS firmware, but this did not help the problem.
Searching for solutions on the Internet, I initially tried increasing the timeout threshold by adding to upsd.conf:
Code:
MAXAGE 30
Code:
DEADTIME 30
Code:
pollinterval = 10
During my testing I discovered it was possible to reset a device with usbconfig. So, when NUT failed I could:
Code:
# service nut stop
# usbconfig -d ugen0.4 reset
# service nut start
Code:
case $1 in
...
no-comm|comm-bad)
if [ ! -z "`pgrep upsd`" ]; then
$LOGGER "Timer event $1 triggered: restarting NUT upsd service"
service nut stop
DEVICE=`usbconfig | grep $ID | sed 's/:.*//'`
if [ ! -z $DEVICE ]; then
usbconfig -d $DEVICE reset
service nut start
printf "%s\n" "Host $HOSTNAME UPS communication failure on $NOW" \
"The service has been restarted." | \
$MAIL -s "UPSMON: Communication Failure" $SYSADMIN
fi
fi
;;
...
esac
But I still see this as an issue with the usbhid-ups driver or libusb, or both. Perhaps the NUT maintainer can look into it.
Alternatives:
I also considered switching to a serial interface port/cable and using the apcsmart driver, but the driver list suggests this has a lower support level. Without going through the hassle of changing the setup I don't know if I would see the same or less functionality and data. If anyone has knowledge of this it would be useful. I note the USB driver does not report the load or in/out voltage, which is disappointing. I don't know if that's a limitation of the UPS or the USB driver.