Hard drive work freezing all system

Noner · Apr 28, 2022

Have:
FreeBSD 12.3 amd64 intel xeon e5-2690 x2, chip X79 HDD 1Tb x2 mirror raid (gmirror label -vb round-robin gm0)
On the raid GPT with boot, ufs, swap partitions. When copying large files, at the time of copying, all the work of reading and writing from and to disk for the entire system freezes almost completely. I've never seen this before on my servers, but so far I've only had 8.4. I've updated them. It became almost impossible to work. Why is this happening and is there any way to fix it?

PS: the kernel loaded a standard set of virtualbox modules: vboxdrv.ko vboxnetadp.ko vboxnetflt.ko

cracauer@ · Apr 28, 2022

So does it stay frozen or does it recover after a while?

Anything in `dmesg`?

Noner · Apr 28, 2022

cracauer@ said:
dmesg

Nothing. Silence. It slows down only during disk operation and only with the disk itself. As a single-task system. If it copies, then only one process. Everyone else is working with the disk at the speed of a drunken turtle. Although sometimes the software slows down, not only when working with the disc during copying process. But as soon as the file is copied everything works fast as before.

cracauer@ · Apr 29, 2022

Maybe there simply not enough RAM left after your virtual machines?

You can check swapouts in `vmstat 3` during copy.

Noner · Apr 29, 2022

cracauer@ said:
swapouts

No. This server is new and the load is gradually transferred to it. At the moment, there is still a lot of free memory.

Phishfry · Apr 29, 2022

Noner said:
round-robin

This could be your problem. You don't want to use round robin for gmirror.

Are your drives the same model?

Phishfry · Apr 29, 2022

Noner said:
gmirror label -vb round-robin gm0

This syntax is incorrect. You need to label drives, not gm0
gmirror label -v gm0 /dev/ada0 /dev/ada1

Show us gmirror status or gmirror list

Phishfry · Apr 29, 2022

I did miss that this was happening under virtualbox.

My experience with round robin gmirror was not positive.
Give the default 'load' setting a try. See if it helps.

Noner · Apr 29, 2022

Phishfry said:
round robin

It was once considered the fastest algorithm. Has something changed now? Drives is models from one part, full identical.

Phishfry said:
/dev/ada0

The syntax is fine, I just didn't finish writing the command here.
# gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin

Phishfry said:
this was happening under virtualbox

No. This is a server with a virtualbox, not a virtual server. I wrote that virtualbox modules are loaded into the server kernel. There are 3 virtual workstations running on it (vboxheadless).

Phishfry said:
Give the default 'load' setting

Ou... Can this be done on existing raid? I can't change everything from beginning right now, there won't be enough disk space to transfer. I've been using round-robin since 2012 (I think)... it has worked fine so far. True, the newest version before this upgrade is FreeBSD 8.3))

Phishfry · Apr 30, 2022

change gmirror balance algorithm

Hello all Is it possible to change the gmirror balance algorithm without destroying the mirror and do a rebuild. I use roundrobin, but want to see if load gives me some better performance. regards, Johan

forums.freebsd.org

Noner · May 2, 2022

Switching balancing to load is useful, but not a panacea. This can increase the speed of work, but it cannot completely solve the problem with interrupts and load balancing on the side of the operating system and hardware. It became faster, but not much. It was also found that without a software raid, the speed of overwriting to disk is up to 100 MB / sec, and with a raid it drops to 30. With 2 processors and a total of 32 virtual, 16 physical and 8 threads, this is not very serious. I would like to make an experiment with ubuntu server, my colleague has already reported that on the server of the same configuration as mine, he has Windows running without problems and freezing. It's sad. Of course, experiments should be carried out on the same hardware, but this will only be available when I get another kit for the next server. Perhaps my hardware has some problems with the chips. I really don't want to believe that Windows works better than FreeBSD.

Phishfry · May 8, 2022

Noner said:
It was also found that without a software raid, the speed of overwriting to disk is up to 100 MB / sec, and with a raid it drops to 30.

I did some testing upon setting up a fanless fileserver on E3826 Atom
gmirror of two 2TB Samsung mSATA drives running in SATA2 mode
Single mSATA drive was ~270MB/sec
gmirror pair of the same. A paltry loss of 2MB/sec. to 268MB/sec in RAID1

Code:

/dev/mirror/gm0
    512             # sectorsize
    2000398933504    # mediasize in bytes (1.8T)
    3907029167      # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    Yes             # TRIM/UNMAP support
    Unknown         # Rotation rate in RPM



Transfer rates:
    outside:       102400 kbytes in   0.383345 sec =   267122 kbytes/sec
    middle:        102400 kbytes in   0.382090 sec =   268000 kbytes/sec
    inside:        102400 kbytes in   0.381931 sec =   268111 kbytes/sec

So you are not doing something right or your hardware choice is not ideal.
100MB/sec is less than SATA1 speeds.

Noner · May 8, 2022

Phishfry said:
you are not doing something right or your hardware choice is not ideal

I don't use SSD drives on servers.

Reaperzx · May 13, 2022

Phishfry said:
This could be your problem. You don't want to use round robin for gmirror.

Are your drives the same model?

Unfortunately all howtos in the internet tell you to use round robin

Reaperzx · May 13, 2022

Noner said:
I don't use SSD drives on servers.

1TB SSD-s are quite cheap nowadays. I use 3 SSD of different model in gmirror as system drives.

Phishfry · May 13, 2022

Reaperzx said:
Unfortunately all howtos in the internet tell you to use round robin

Yes even Lucas's books. All you can do is test them out. No two systems are alike.

I have found that round robin on LAGG was faster on my network.
It defies logic because the Cisco switch recognizes LACP but it is not as fast. Enough difference to change it.
Maybe that is a weak spot on my network I dunno.

Basic testing is all that is needed. Trust through verification.

Noner · May 20, 2022

Reaperzx said:
1TB SSD-s are quite cheap nowadays. I use 3 SSD of different model in gmirror as system drives.

Yes. But short-lived. I create servers for years of autonomy. Uptime in 3-5 years is the norm for my work.

Noner · May 20, 2022

Phishfry said:
Yes even Lucas's books. All you can do is test them out. No two systems are alike.

I have found that round robin on LAGG was faster on my network.
It defies logic because the Cisco switch recognizes LACP but it is not as fast. Enough difference to change it.
Maybe that is a weak spot on my network I dunno.

Basic testing is all that is needed. Trust through verification.

The ways of choosing a disk to read cannot seriously change the speed. Especially if both drives are from the same batch, as happens most often.

mer · May 20, 2022

You say there's nothing in dmesg, but have you installed smartmontools and checked the individual devices with it? Typically mirrors "work to the slowest" when writing and "work to the fastest" on reading, so maybe one of the devices has a marginal cable that is causing issues under load.
Once you install smartmontools you can add some flags to /etc/periodic.conf to probe the devices and get that included in the daily.log

chessguy64 · May 30, 2022

Why would you want to use UFS for servers? ZFS would be better for RAID no? Also why are you passing it through to virtualbox? You can try bhyve instead, but I think the best thing to do is run it bare metal. And a lot of package versions / changes have been made from 8.4 to 12.3, so it's probably freezing because of an incompatibility somewhere.

Noner · Jun 10, 2022

chessguy64 said:
ZFS would be better for RAID

Everything has its reasons. ZFS is too heavy and doesn't make sense for simple tasks.