Solved ZFS faulted

Vincent See · Mar 1, 2018

Hi guys! I scrub my zpool two days ago and in a few hours it will be finished but I have a feeling I should stop the scrub, my whole pool is failing, any advice what should I do next?

Sincerely,
- Vincent

Vincent See · Mar 1, 2018

Eventually, after a few minutes I can't logon via SSH and I immediately hard shutdown the server as it was not responding even to the keyboard connected to it.

I inspected it and there was a lot of dust in the filter. and I power on the server again, and there was a long boot up and a lot of errors too. Then I checked the status and It was resilvering, could the heat buildup in the case cause the hard drive to get faulted? Should I replace the hard drive? Its my first time dealing with a failing zpool.

Thanks!

- Vincent

gkontos · Mar 1, 2018

What FreeBSD version are you running ?

Please post the relevant errors from /var/log/messages

Vincent See · Mar 1, 2018

Hi gkontos, thank you for the quick reply!

11.1-RELEASE

I checked /var/log/messages and the output was:

Code:

Mar  1 21:00:00 FreeBSD01 newsyslog[6642]: logfile turned over due to size>100K
Mar  1 21:00:07 FreeBSD01 kernel: (da6:mps0:0:10:0): READ(10). CDB: 28 00 80 8f 92 e0 00 01 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da6:mps0:0:10:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da6:mps0:0:10:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da6:mps0:0:10:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da6:mps0:0:10:0): Retrying command (per sense data)
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 49 cf f4 a0 00 00 00 c0 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): Retrying command (per sense data)
Mar  1 21:00:07 FreeBSD01 kernel: (da4:mps0:0:8:0): READ(10). CDB: 28 00 80 90 42 70 00 01 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da4:mps0:0:8:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da4:mps0:0:8:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da4:mps0:0:8:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da4:mps0:0:8:0): Retrying command (per sense data)
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 49 d6 33 38 00 00 00 c0 00 00 length 98304 SMID 981 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 49 d6 32 f0 00 00 01 00 00 00 length 131072 SMID 436 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
Mar  1 21:00:07 FreeBSD01 kernel: (da5:mps0:0:9:0): READ(10). CDB: 28 00 80 90 60 40 00 01 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da5:mps0:0:9:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da5:mps0:0:9:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da5:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da5:mps0:0:9:0): Retrying command (per sense data)
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 49 d6 33 38 00 00 00 c0 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): CAM status: CCB request completed with an error
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): Retrying command
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 49 d6 32 38 00 00 01 00 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da1:mps0:0:1:0): Retrying command (per sense data)
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 49 d6 32 f0 00 00 01 00 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): CAM status: CCB request completed with an error
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): Retrying command
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 49 d6 31 f0 00 00 01 00 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da3:mps0:0:3:0): Retrying command (per sense data)
Mar  1 21:00:07 FreeBSD01 kernel: (da7:mps0:0:11:0): READ(10). CDB: 28 00 80 90 60 40 00 01 00 00
Mar  1 21:00:07 FreeBSD01 kernel: (da7:mps0:0:11:0): CAM status: SCSI Status Error
Mar  1 21:00:07 FreeBSD01 kernel: (da7:mps0:0:11:0): SCSI status: Check Condition
Mar  1 21:00:07 FreeBSD01 kernel: (da7:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:07 FreeBSD01 kernel: (da7:mps0:0:11:0): Retrying command (per sense data)
Mar  1 21:00:08 FreeBSD01 kernel: (da3:mps0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 49 ce cc 30 00 00 00 40 00 00
Mar  1 21:00:08 FreeBSD01 kernel: (da4:mps0:0:8:0): READ(10). CDB: 28 00 80 91 97 70 00 00 c0 00
Mar  1 21:00:08 FreeBSD01 kernel: (da3:mps0:0:3:0): CAM status: SCSI Status Error
Mar  1 21:00:08 FreeBSD01 kernel: (da4:mps0:0:8:0): CAM status: SCSI Status Error
Mar  1 21:00:08 FreeBSD01 kernel: (da3:mps0:0:3:0): SCSI status: Check Condition
Mar  1 21:00:08 FreeBSD01 kernel: (da3:mps0:0:3:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:08 FreeBSD01 kernel: (da3:mps0:0:3:0): Retrying command (per sense data)
Mar  1 21:00:08 FreeBSD01 kernel: (da4:mps0:0:8:0): SCSI status: Check Condition
Mar  1 21:00:08 FreeBSD01 kernel: (da4:mps0:0:8:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:08 FreeBSD01 kernel: (da5:mps0:0:9:0): READ(10). CDB: 28 00 80 91 97 70 00 00 c0 00
Mar  1 21:00:08 FreeBSD01 kernel: (da4:(da5:mps0:0:9:0): CAM status: SCSI Status Error
Mar  1 21:00:08 FreeBSD01 kernel: (da5:mps0:0:9:0): SCSI status: Check Condition
Mar  1 21:00:08 FreeBSD01 kernel: (da5:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 21:00:08 FreeBSD01 kernel: (da5:mps0:0:9:0): Retrying command (per sense data)

Sorry it took so long, the error was so long, this is like 1/10 of it. I couldn't post it all.

Vincent See · Mar 1, 2018

I tried checking the log file again /var/log/messages but in the file it says

Code:

Mar  1 23:00:00 FreeBSD01 newsyslog[2154]: logfile turned over due to size>100K

only. Should I poweroff my FreeBSD Server and wait for hard drive replacement for devices da0p4 and da2p4?

Thank you!

Vincent See · Mar 1, 2018

This is half of the first messages in /var/log/messages Pastebin of /var/log/messages
There are now 5 messages,
messages.0.bz2
messages.1.bz2
messages.2.bz2
messages.3.bz2
messages.4.bz2

The status of the zpool scrub zroot is

Code:

  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Mar  1 22:06:14 2018
        4.91T scanned out of 25.3T at 1.03G/s, 5h35m to go
        55.2M resilvered, 19.43% done
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      raidz3-0  ONLINE       0     0     0
        ada2p4  ONLINE       0     0     0
        ada3p4  ONLINE       0     0     0
        ada0p4  ONLINE       0     0     0
        da0p4   ONLINE       0     0     1  (resilvering)
        da1p4   ONLINE       0     0     0
        da2p4   ONLINE       0     0     0  (resilvering)
        da3p4   ONLINE       0     0     0
      raidz3-1  ONLINE       0     0     0
        ada1    ONLINE       0     0     0
        ada4    ONLINE       0     0     0
        ada5    ONLINE       0     0     0
        da4     ONLINE       0     0     0
        da5     ONLINE       0     0     0
        da6     ONLINE       0     0     0
        da7     ONLINE       0     0     0

errors: No known data errors

Vincent See · Mar 1, 2018

The resilver is finished, faster than expected,

Code:

pool: zroot
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 57.2M in 1h35m with 0 errors on Thu Mar  1 23:41:15 2018
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      raidz3-0  ONLINE       0     0     0
        ada2p4  ONLINE       0     0     0
        ada3p4  ONLINE       0     0     0
        ada0p4  ONLINE       0     0     0
        da0p4   ONLINE       0     0     1
        da1p4   ONLINE       0     0     0
        da2p4   ONLINE       0     0     0
        da3p4   ONLINE       0     0     0
      raidz3-1  ONLINE       0     0     0
        ada1    ONLINE       0     0     0
        ada4    ONLINE       0     0     0
        ada5    ONLINE       0     0     0
        da4     ONLINE       0     0     0
        da5     ONLINE       0     0     0
        da6     ONLINE       0     0     0
        da7     ONLINE       0     0     0

errors: No known data errors

I don't know whether I should replace the devices or not, if anyone has any advice I am very greatful, which one I should replace? Thank you.

SirDice · Mar 1, 2018

Use sysutils/smartmontools to check each disk. If the SMART data looks ok the issues were likely caused by the heat buildup. If you have uncorrectable errors or other dodgy values replace them.

Vincent See · Mar 1, 2018

SirDice said:
Use sysutils/smartmontools to check each disk. If the SMART data looks ok the issues were likely caused by the heat buildup. If you have uncorrectable errors or other dodgy values replace them.

Thank you SirDice! I used sysutils/smartmontools and the da0 and da3 is fine, the one that seems suspicious is da5 and da6, I will look onto that, thank you for helping me, I learned something new thanks. :beer:

cheers!

ralphbsz · Mar 1, 2018

The errors you are getting are error code 47/03 (SCSI calls that the ASC/ASCQ of the error). That error translates to "INFORMATION UNIT iuCRC ERROR DETECTED" (cut and pasted from the SCSI standard), which means an error in data transmission between the disk and the controller, not an error on the disk.

If these are the only types of errors you were getting (please check), and if the smartmon result for the "suspicious" disks da5 and da6 comes back good, then the data on disk is likely fine, and you had communication problems. Those can be caused by overheating, undervoltage, power supply stressed out, and bad wiring. The picture above looks like you have SAS wiring, but I see loose cables floating around, which in a busy case might cause trouble. Might want to organize your cables so they are not bent or kinked, and have enough slack at the end to plug into the disks cleanly.

You seem to have found the older (compressed) /var/log/messages file. It might be a good idea to temporarily manually save older copies of the messages file, so you have a historic record of disk errors (or their absence). With this many disks on a busy server, it's OK to get occasional disk error (from the disk itself), but a large number of communication errors as you had above are both bad, and easily fixable without risk to the data.

With RAID-Z3 you can tolerate three errors on the same data block or stripe (which may mean three dead disks). This is ample good enough to guard against the known error rate of modern disk drives. But communication errors such as you saw can overwhelm any RAID scheme, even a 3-fault tolerant one, so you should check your logs for them, and fix them when they occur.

Snurg · Mar 1, 2018

Filters' resistances increase with getting more clogged. Replacing might be a good idea.
And the design of the case to me does not make the impression that it is designed for good airflow.
So the errors might perfectly well have been caused by overheating. Just watch the disk temperatures while operation.

SirDice · Mar 1, 2018

Case could probably use a couple of extra fans too. This many disks can produce quite a bit of heat. Running a scrub is very I/O intense so I could imagine the disks heating up a bit more than usual. The rest of the system can suffer from that heat buildup too. Think about the heatsinks on top of controller chips, the CPU, etc. All that heat has to go somewhere.

The servers in my living room produce so much heat my central heating system rarely turns on, even with the icy cold weather we currently have down here

ralphbsz · Mar 1, 2018

And while we are giving general good advice, which may or may not be related to the problem at hand ...

SirDice is, as usual, right. But also consider this: fans are by their nature (moving parts) unreliable. Having multiple fans not only gives you better cooling, it also gives you some cooling when a fan fails. And having a little cooling can make the difference between a system that has time to shut down cleanly, and one that self-destructs (anecdote below). Fans use very little energy. Even better: If you connect the fans to the motherboard fan connectors, you can usually monitor their fan speed, and with a little work make an alarm system that tells you when they run slow or stop. To be honest, I haven't done that myself yet (because the common FreeBSD motherboard-monitoring things don't seem to work for my unusual motherboard), but at least I monitor the disk drive temperature with smartd and get e-mails when something funny happens.

Anecdote: I used to work on a system that uses the highest-end LSI/Broadcom/Avago SAS controller cards. Those cards consume a lot of power, I think about 20W each. Due to a firmware problem in the motherboard, one day on a test system all the fans turned off, but the power supplies stayed on. Oops. We were monitoring the temperature of the controller chip, and the last reading we got from it was 109 degrees C (really C, not F). After that, the SAS controller remained dead, even rebooting the server and turning the fans on full speed remotely did not help. A technician who was send to investigate/fix reported that the system smelled bad, and that the PC board around the controller chip had turned brown. So the effect of fan failure can be quite expensive.

Vincent See · Mar 2, 2018

ralphbsz said:
The errors you are getting are error code 47/03 (SCSI calls that the ASC/ASCQ of the error). That error translates to "INFORMATION UNIT iuCRC ERROR DETECTED" (cut and pasted from the SCSI standard), which means an error in data transmission between the disk and the controller, not an error on the disk.

If these are the only types of errors you were getting (please check), and if the smartmon result for the "suspicious" disks da5 and da6 comes back good, then the data on disk is likely fine, and you had communication problems. Those can be caused by overheating, undervoltage, power supply stressed out, and bad wiring. The picture above looks like you have SAS wiring, but I see loose cables floating around, which in a busy case might cause trouble. Might want to organize your cables so they are not bent or kinked, and have enough slack at the end to plug into the disks cleanly.

You seem to have found the older (compressed) /var/log/messages file. It might be a good idea to temporarily manually save older copies of the messages file, so you have a historic record of disk errors (or their absence). With this many disks on a busy server, it's OK to get occasional disk error (from the disk itself), but a large number of communication errors as you had above are both bad, and easily fixable without risk to the data.

With RAID-Z3 you can tolerate three errors on the same data block or stripe (which may mean three dead disks). This is ample good enough to guard against the known error rate of modern disk drives. But communication errors such as you saw can overwhelm any RAID scheme, even a 3-fault tolerant one, so you should check your logs for them, and fix them when they occur.

Thank you ralph, should I buy my sas cables in amazon instead of ebay? Or are they generally more or less the same?
- Vincent

Vincent See · Mar 2, 2018

ralphbsz said:
And while we are giving general good advice, which may or may not be related to the problem at hand ...

SirDice is, as usual, right. But also consider this: fans are by their nature (moving parts) unreliable. Having multiple fans not only gives you better cooling, it also gives you some cooling when a fan fails. And having a little cooling can make the difference between a system that has time to shut down cleanly, and one that self-destructs (anecdote below). Fans use very little energy. Even better: If you connect the fans to the motherboard fan connectors, you can usually monitor their fan speed, and with a little work make an alarm system that tells you when they run slow or stop. To be honest, I haven't done that myself yet (because the common FreeBSD motherboard-monitoring things don't seem to work for my unusual motherboard), but at least I monitor the disk drive temperature with smartd and get e-mails when something funny happens.

Anecdote: I used to work on a system that uses the highest-end LSI/Broadcom/Avago SAS controller cards. Those cards consume a lot of power, I think about 20W each. Due to a firmware problem in the motherboard, one day on a test system all the fans turned off, but the power supplies stayed on. Oops. We were monitoring the temperature of the controller chip, and the last reading we got from it was 109 degrees C (really C, not F). After that, the SAS controller remained dead, even rebooting the server and turning the fans on full speed remotely did not help. A technician who was send to investigate/fix reported that the system smelled bad, and that the PC board around the controller chip had turned brown. So the effect of fan failure can be quite expensive.

Is there any way to monitor the LSI 9211-8i temperature from FreeBSD? 109C is smoking hot, I don't want that to happen in my case. Would a PCI Fan mount bracket help? I plan on migrating my system into a Rosewill Rackmount 4u for bigger cooling

Thanks!

- Vincent

ralphbsz · Mar 2, 2018

I have no idea about SAS cables, and which brands are good and bad. I know bad SAS cables exist (we had some very amusing examples), but I was shielded from purchasing decisions.

Vincent See said:
Is there any way to monitor the LSI 9211-8i temperature from FreeBSD?

I think this information can be obtained from the cards with the normal "megacli" or "megaraid" commands. Find the correct card- and RAID-management software for your model card, install it, and look around in it.

Would a PCI Fan mount bracket help?

Probably would help. But personally, I would much rather get some external fans added which get cold air from outside the case and pump it in, or that take hot air from inside the case and push it out. Just stirring the hot soup inside the case won't that much.

I plan on migrating my system into a Rosewill Rackmount 4u for bigger cooling

In general, rackmount cases tend to be better engineered than deskside cases. Whether Rosewill is good or bad, I don't know. This is a very complex area, with enormous amounts of engineering work being done (there are conferences, trade shows, magazines, all about computer cooling). I know that the rackmount servers from the big companies (Oracle, HP, IBM, Lenovo, Dell, Supermicro...) are very well engineered, as is the stuff from the OpenCompute group. Fortunately, my own home use only dissipates 35W and runs very cool; unfortunately, I have not had to learn how to do this on the open market.

Vincent See · Mar 2, 2018

ralphbsz said:
I have no idea about SAS cables, and which brands are good and bad. I know bad SAS cables exist (we had some very amusing examples), but I was shielded from purchasing decisions.

I think this information can be obtained from the cards with the normal "megacli" or "megaraid" commands. Find the correct card- and RAID-management software for your model card, install it, and look around in it.

Probably would help. But personally, I would much rather get some external fans added which get cold air from outside the case and pump it in, or that take hot air from inside the case and push it out. Just stirring the hot soup inside the case won't that much.

In general, rackmount cases tend to be better engineered than deskside cases. Whether Rosewill is good or bad, I don't know. This is a very complex area, with enormous amounts of engineering work being done (there are conferences, trade shows, magazines, all about computer cooling). I know that the rackmount servers from the big companies (Oracle, HP, IBM, Lenovo, Dell, Supermicro...) are very well engineered, as is the stuff from the OpenCompute group. Fortunately, my own home use only dissipates 35W and runs very cool; unfortunately, I have not had to learn how to do this on the open market.

Thank you! I will look onto what Rackmount case I can buy that pushes alot of air, I don't want delta fans, they are crazy loud.

ralphbsz · Mar 2, 2018

So you want your computer to be quiet? Ha ha!

Old engineering saying: "Good, fast, cheap, pick any two". With fans, you can have quiet, you can have high airflow, and you can have cheap, but you can't have all three. One way to get quieter with high airflow is to use very long fans (they are about 2" long) with more screw-shaped blades. Most cases don't have room for those fans though. If you open up a high-quality 1U or 2U server from a big-name brand, you might find a whole bank of these fans (sometimes as many as a dozen) in the middle of the case; that gives a lot of airflow in a very distributed fashion, some redundancy, and the placement of the fans in the middle dampens some of the noise.
Cases that have lots of fans often have the ability to replace the fans while the computer is on; the manufacturer mounts the fan on little carriers with connectors and snap releases, and trained service personnel can grab broken fans with their fingertips, pull them out, and replace with a spare part within a few seconds, without the rest of the case heating up. Untrained service personnel typically lose their fingertips in the process

Another fan technology that helps is to use a few big centrifugal blowers instead of small axial fans; but this can only be done in if the whole case is designed around the fan.
Yet another technology that helps is to control multiple fans electronically. A friend of mine has a patent on this technology: measure the position of the fan blades, and synchronize fans so they turn exactly synchronously. This makes the system much quieter and more energy efficient, with minimal loss of airflow. The problem with this is: It requires a single company to design the sheet metal, the fan, and the motherboard that controls the fan, and today there are few companies that still have an integrated product like that.

My suggestion would be: get a decent-quality case with lots of fans (not too expensive), and ignore the noise. Or try to put the case into some part of the house where the noise doesn't matter. Or shield the noise with doors, blankets, foam (we had a discussion here in the forum recently). Or ask your housemate to start playing the trombone, then the computer noise won't matter any longer. OK, actually seriously: modern computer cases are designed to be installed in data centers, where cooling and energy efficiency are very important concerns. Data centers are nearly completely unattended, and personnel who works in them has to wear hearing protection (the noise level is amazingly bad), and sometimes also cold-weather gear (depending on how the cooling is done, it may be very cold, very hot, or very windy). Modern rackmount cases are not designed to be installed in homes, so you'll have to deal with unpleasant compromises.

Snurg · Mar 3, 2018

ralphbsz is right...
That nonsense with temperature-controlled ventilators is that nobody notices when they become defective.
If you deactivate all that nonsense and have the fans operate at full speed, ventilation is always good and you notice when fans become bad, before electronics fried.

Vincent See · Mar 4, 2018

ralphbsz said:
So you want your computer to be quiet? Ha ha!

Old engineering saying: "Good, fast, cheap, pick any two". With fans, you can have quiet, you can have high airflow, and you can have cheap, but you can't have all three. One way to get quieter with high airflow is to use very long fans (they are about 2" long) with more screw-shaped blades. Most cases don't have room for those fans though. If you open up a high-quality 1U or 2U server from a big-name brand, you might find a whole bank of these fans (sometimes as many as a dozen) in the middle of the case; that gives a lot of airflow in a very distributed fashion, some redundancy, and the placement of the fans in the middle dampens some of the noise.
Cases that have lots of fans often have the ability to replace the fans while the computer is on; the manufacturer mounts the fan on little carriers with connectors and snap releases, and trained service personnel can grab broken fans with their fingertips, pull them out, and replace with a spare part within a few seconds, without the rest of the case heating up. Untrained service personnel typically lose their fingertips in the process
Another fan technology that helps is to use a few big centrifugal blowers instead of small axial fans; but this can only be done in if the whole case is designed around the fan.
Yet another technology that helps is to control multiple fans electronically. A friend of mine has a patent on this technology: measure the position of the fan blades, and synchronize fans so they turn exactly synchronously. This makes the system much quieter and more energy efficient, with minimal loss of airflow. The problem with this is: It requires a single company to design the sheet metal, the fan, and the motherboard that controls the fan, and today there are few companies that still have an integrated product like that.

My suggestion would be: get a decent-quality case with lots of fans (not too expensive), and ignore the noise. Or try to put the case into some part of the house where the noise doesn't matter. Or shield the noise with doors, blankets, foam (we had a discussion here in the forum recently). Or ask your housemate to start playing the trombone, then the computer noise won't matter any longer. OK, actually seriously: modern computer cases are designed to be installed in data centers, where cooling and energy efficiency are very important concerns. Data centers are nearly completely unattended, and personnel who works in them has to wear hearing protection (the noise level is amazingly bad), and sometimes also cold-weather gear (depending on how the cooling is done, it may be very cold, very hot, or very windy). Modern rackmount cases are not designed to be installed in homes, so you'll have to deal with unpleasant compromises.

Snurg said:
ralphbsz is right...
That nonsense with temperature-controlled ventilators is that nobody notices when they become defective.
If you deactivate all that nonsense and have the fans operate at full speed, ventilation is always good and you notice when fans become bad, before electronics fried.

I would love to research something new about 2" thick fans, thank you ralphbsz and Snurg, I will somehow put thicker fans in my case to move more air. That helped me alot, I thought computer fans are only 25mm thick, I'll look for 120mm fans that are thicker than the normal ones

Solved ZFS faulted

Vincent See

Vincent See

gkontos

Vincent See

Vincent See

Vincent See

Vincent See

SirDice

Administrator

Vincent See

ralphbsz

Snurg

SirDice

Administrator

ralphbsz

Vincent See

Vincent See

ralphbsz

Vincent See

ralphbsz

Snurg

Vincent See