HP Proliant ML150 G6 server with freeBSD: Internet disconnecting, memory available seems too low and corrupted zfs raid 1

Hi there,

I'm new to FreeBSD and I have these 3 problems on my HP Proliant ML150 G6 which is an old machine. I've installed the latest version of FreeBSD amd64 (FreeBSD 13.1) and I've done pkg upgrade (didn't upgrade before making the zfs raid1, just mentioning in case).

Internet disconnecting

For my internet connection problem, after a certain time, I can no longer use internet, even tough DHCP is still active and bge0 status is active. So far, I've only found out that rebooting allows me to get internet back. I've seen another old post of someone who seem to have had the same issue, but there is no answer at the end:


I don't know if the answer is somewhere else. I've added these lines to /etc/rc.conf without any success:

Code:
gateway_enable="YES"
ifconfig_bge0="DHCP"
local_unbound_enable="YES"

I've tried these command as suggested here, except the end solution, because I don't understand it. I'm not sure this is related to my problem:


I didn't try to install the outbound package because I don't know what it is.
While writing this post, I've found out dmesg says (while the connection is working fine):
Code:
lo0: link state changed to UP                                                                                                                                              
bge0: link state changed to DOWN
(...)
bge0: link state changed to UP
Then after using the dmesg command after the connection went out, it repeatedly says:
Code:
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
[...]

Ram available

When I look at the available ram, I get ~ 10000 mb real memory but ~1900mb available. I have 3x 2go ram + 1x 4go ram. The 4 go ram wasn't working on a previous windows server so I suppose this one is not compatible for some reason (it is a newer one), but all the other 2go ram were working normally. Since I don't really have anything running, I assume it is not normal to only have ~1900mb available, I should have ~5000mb.

Raid 1

I'm trying to make 2 zfs drive on raid1 I've used this command: zpool create storage raidz1 ada0 ada3 When I check out zpool list, it says I have ada0 + ada3 space available instead of half (like a regular raid1 mirroring). Are my 2 drives really raid1? how can I verify?

With dmesg command, I see:
Code:
GEOM: ada0: the primary GPT table is corrupt or invalid.                                                                                                                   
GEOM: ada0: using the secondary instead -- recovery strongly advised.                                                                                                      
GEOM: ada3: the primary GPT table is corrupt or invalid.                                                                                                                   
GEOM: ada3: using the secondary instead -- recovery strongly advised.
I suppose this is not good, even though I can cd into the storage and add files without problem. Note that I'm booting from a smaller disk that isn't in raid. The ML150 sheet specify about a storage limit of ~2To, but from my understanding, since I boot from a 250Go drive, this shouldn't be an issue.

Edit: It actually seems like I've never really got connected to internet (except locally) although I tought I had been able to run pkg upgrade successfully.
 
Last edited by a moderator:
Raid 1

I'm trying to make 2 zfs drive on raid1 I've used this command: zpool create storage raidz1 ada0 ada3 When I check out zpool list, it says I have ada0 + ada3 space available instead of half (like a regular raid1 mirroring). Are my 2 drives really raid1? how can I verify?
raidz1 is not Raid1. You have to use the keyword mirror for create a Raid1.
See: zpool-create(8) and zpoolconcepts(7)

If you use this machine as a workstation, you only need ifconfig_bge0="DHCP" in /etc/rc.conf.

The disconnection of your ethernet card makes me think a bug in the driver (or a material problem). Maybe someone knows a workaround for that. However, if I was you, I'll take another card (especially Intel) if possible.

You should try to remove your 4 GB RAM to see if things improve.
 
Thanks a lot!
So I've removed the possibly incompatible 4GB ram and now I have this:
real memory = 6442450944 (6144 MB)
avail memory = 4088684544 (3899 MB)

I don't know if these number make sens, but it definitely improved to remove the 4GB ram.

As for internet, I've actually been able to download some apps and now I can't anymore. I'm only able to ssh. Ping and pkg doesn't work. As for the freeze, I suppose it is still there. I've got a couple of ethernet port, maybe one of them works, will give it a try.

For the drives,I've removed and built it again with mirror, It seems to work fine although I still got this message:
Code:
GEOM: ada0: the primary GPT table is corrupt or invalid.
GEOM: ada0: using the secondary instead -- recovery strongly advised.
GEOM: ada3: the primary GPT table is corrupt or invalid.
GEOM: ada3: using the secondary instead -- recovery strongly advised.
Should I do something about it?
 
Last edited by a moderator:
Having a spare Ethernet card is a very sensible option.
I have several spare genuine Intel Ethernet cards that I got, used, off eBay, for not a lot of money.
I also suspect that Intel chip cards from quality vendors like Startech would work.
Just watch the PCIe bus lane footprint. A single 10/100/1000 card should not need more than an X1 slot.
 
Ok, so I've ended up buying an intel PCI-E x1 card with 2 rj-45. Hope I get it fast.
In the mean time, I will try to use usb-thetering with phone to get internet. Thanks a lot for these tips, it is very appreviated!
 
For the drives,I've removed and built it again with mirror, It seems to work fine altough i still got this message:
GEOM: ada0: the primary GPT table is corrupt or invalid.
GEOM: ada0: using the secondary instead -- recovery strongly advised.
GEOM: ada3: the primary GPT table is corrupt or invalid.
GEOM: ada3: using the secondary instead -- recovery strongly advised.

Should I do something about it?

Yes.
gpart recover ada0
gpart recover ada3

See here: gpart(8)
 
Don't use gpart recover!

ZFS was configured to use the whole drive, that overwrote the partition table that existed on the disk, corrupting it. Restoring the partition table would destroy the metadata from ZFS (they're in the same place on disk) and destroy the pool.

I would destroy this pool and start over as it's not a mirror (RAID-Z is similar to RAID5). Then make sure there's no existing partition table ( gpart destroy) if you want to use the whole drives OR create a freebsd-zfs partition and add the partitions to the pool, not the whole drives.
 
Thanks, does exactly what I wanted!

I'm almost good to get going. I still can't connect to internet. I'm trying with USB tethering, but I don't really know where to head to set this properly.

I have removed the ethernet cable to make this easier, removed dhcp from rc.conf to launch it by myself. I can't ssh with the IP I get from ifconfig. I've tried to change the default route with the new IP adress without any success. I don't know what I am suppose to do from here.
 
Don't use gpart recover!

ZFS was configured to use the whole drive, that overwrote the partition table that existed on the disk, corrupting it. Restoring the partition table would destroy the metadata from ZFS (they're in the same place on disk) and destroy the pool.

I would destroy this pool and start over as it's not a mirror (RAID-Z is similar to RAID5). Then make sure there's no existing partition table ( gpart destroy) if you want to use the whole drives OR create a freebsd-zfs partition and add the partitions to the pool, not the whole drives.
Good catch. Didn't see that. However, note that you cannot destroy the pool. gpart answers: "Invalid argument".

Yes, never use a whole disk to make a pool. Create a freebsd-zfs partition and use it for the pool, even if this partition takes all the disk room. You will get much less problems.
 
Good catch. Didn't see that. However, note that you cannot destroy the pool. gpart answers: "Invalid argument".

Yes, never use a whole disk to make a pool. Create a freebsd-zfs partition and use it for the pool, even if this partition takes all the disk room. You will get much less problems.

Really? I've removed the zpool and partition from both drives then I've added them with zpool with mirror and I no longer have error message and everything seems fine. What do you mean when you say I will get much less problem if I started by adding a partition before adding it to zpool?
 
Yes, never use a whole disk to make a pool.
Nothing wrong with using the whole disk. You just have to make sure there's no partition table left on the disk before you add it to the pool. Or else you're going to keep getting nagged about a "corrupt" partition table on disk adXX. The partition table and the metadata from ZFS simply occupy the same space on the disk, not a major issue, just something to be aware of.

For identification purposes, if you move one of those disks to a different system for example, it's useful to be able to 'see' there's a freebsd-zfs partition on it. You can also put a label on the partition, identifying it even easier.
 
Many softwares don't understand zfs and may advise or do weird things including, but not limited to, gpart. Better is to encapsulate zfs in a partition scheme.
 
Back
Top