Solved Suddenly system start freeze

Hi all. The last days I started to have problems with my FreeBSD. Suddenly system freezes and reboots. On start I believed that had something to do with my cpu because was freezing during compile. I had open top in a terminal and I saw that the moment where system froze during compile, cpu was only on 25%.

Also no often but sometimes freezes even just using firefox or on idle mode! (I am sleeping and computer is open doing nothing). I removed bios battery, reset bios, checked bios settings again. I checked cables, power supply and all my hardware. Everything seems perfect but even if I try to fetch my ports tree system hangs! Others times, compiles some ports fine and in a point when continues compile hangs again!

My conclusion is system will freeze in any case in some point even if I am not doing anything. Just on compile processes freezes in a matter of minutes and the problem is not high cpu or temperature! It is so strange! Any idea?

My system is

Code:
FreeBSD FreeBSD 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug  9 11:55:48 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

My Temperature even during compile is not over 50 Celsius
 
My hard disk is quite new. Is an intenso ssd 128GB. I have 16GB Ram (4x4GB Corsair vengeance). No need for swap because of much ram + that is not an ideal option for an ssd. On huge compiles my system uses not more than 4GB ram (12GB free on top). So even if some point of my memory had some problem, my system has so much empty space to try to write somewhere else the data (I don't if an OS can do that, I just guess). Also if some cell of my memory is bad, always the system tries to write the data on that point? And if can't just freeze all the kernel? Just some thoughts that makes me think that ram is not the suspect. I know that I can check memory for problems but takes much time!
 
It is very easy to insult other people behind the keyboard. Especially when you don't know them. And yes. I told some thoughts. Maybe they are wrong. So you can help me. It's much better instead of this crappy answer.

This is my disk.

root@FreeBSD:/home/ember # smartctl -a /dev/ada0

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     INTENSO SSD 128GB
Serial Number:    F36201R01216
LU WWN Device Id: 0 000000 000000000
Firmware Version: O0213E
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Sep 30 18:27:58 2017 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
                   was never started.
                   Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
                   without error or no self-test has ever
                   been run.
Total time to complete Offline
data collection:        (    0) seconds.
Offline data collection
capabilities:             (0x71) SMART execute Offline immediate.
                   No Auto Offline data collection support.
                   Suspend Offline collection upon new
                   command.
                   No Offline surface scan supported.
                   Self-test supported.
                   Conveyance Self-test supported.
                   Selective Self-test supported.
SMART capabilities:            (0x0002)   Does not save SMART data before
                   entering power-saving mode.
                   Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
                   General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     (  10) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x0039)   SCT Status supported.
                   SCT Error Recovery Control supported.
                   SCT Feature Control supported.
                   SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   050    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0002   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0000   100   100   050    Old_age   Offline      -       625
 12 Power_Cycle_Count       0x0000   100   100   050    Old_age   Offline      -       837
160 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       0
161 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       258
162 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       1
163 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       71
164 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       357057
165 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       111
166 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       58
167 Unknown_Attribute       0x0000   100   100   050    Old_age   Offline      -       88
192 Power-Off_Retract_Count 0x0000   100   100   050    Old_age   Offline      -       77
194 Temperature_Celsius     0x0000   100   100   050    Old_age   Offline      -       28
195 Hardware_ECC_Recovered  0x0000   100   100   050    Old_age   Offline      -       2
196 Reallocated_Event_Count 0x0000   100   100   050    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0000   100   100   050    Old_age   Offline      -       0
241 Total_LBAs_Written      0x0000   100   100   050    Old_age   Offline      -       192364
242 Total_LBAs_Read         0x0000   100   100   050    Old_age   Offline      -       51040

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Ram not checked. Some passes with sysutils/memtest and seems ok but I have not the time to examine 16GB of ram (My grand father is in hospital). Can you tell me a proper and "fast" way to check that everything is ok? (If exist)
 
George, there is no fast way to check if memory is good. Memtest needs to run for at least 4 hours (I recommend letting it run overnight) to be useful. Also, memtest is only an indicator.
You symptoms points to memory or a bad power supply as the most likely problems, but it might be something else. Do you have the option to replace the power supply (as a test to see if that is the problem)?

(I have had one occurrence on a server where the SSD after more than a year developed a problem which resulted in random / all files on it being corrupted. smartctl reported no problems on that SSD the whole time. I (thought) I got it fixed and reinstalled / restored everything, but a week or two later the problems showed up again, so I replaced that SSD. The SSD was Intel-branded, and I have other like it running without problems.)
 
I have the possibility to change the power supply but maybe is better to test my ram first. Tomorrow I will post the results :)
 
Hi all. The last days I started to have problems with my FreeBSD. Suddenly system freezes and reboots.
Hardware problems. I had a similar problem with one of Supermicro Atom servers (less than 2 years old) last week which I use to host bunch of Jails. After trying every bloody gimmick and spending hours with my vendor technical support they just decided to send me a new server. We figured out that $800 is not worth neither their nor my time.
 
After a lot of tests I started to get errors

root@FreeBSD:/home/ember # memtester 2GB 5
Code:
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 2048MB (2147483648 bytes)
got  2048MB (2147483648 bytes), trying mlock ...locked.
Loop 1/5:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x67369fa8.
Skipping to next test...
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
FAILURE: 0xc6a779291571e37c != 0xc6a779291171e37c at offset 0x27369fa8.
  Compare MUL         :   Compare DIV         : ok
FAILURE: 0xf150b08b7d9f3dd5 != 0xf150b08b799f3dd5 at offset 0x27369fa8.
  Compare OR          : FAILURE: 0x9150000b5d1b1805 != 0x9150000b591b1805 at offset 0x27369fa8.
  Compare AND         :   Sequential Increment: ok
  Solid Bits          : testing   3FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x27369fa8.
  Block Sequential    : testing   4FAILURE: 0x404040404040404 != 0x404040400040404 at offset 0x27369fa8.
  Checkerboard        : testing   1FAILURE: 0x5555555555555555 != 0x5555555551555555 at offset 0x27369fa8.
  Bit Spread          : testing   0FAILURE: 0xfffffffffffffffa != 0xfffffffffbfffffa at offset 0x27369fa8.
  Bit Flip            : testing   1FAILURE: 0xfffffffffffffffe != 0xfffffffffbfffffe at offset 0x27369fa8.
  Walking Ones        : testing  25FAILURE: 0xfffffffffdffffff != 0xfffffffff9ffffff at offset 0x27369fa8.
  Walking Zeroes      : testing  26FAILURE: 0x04000000 != 0x00000000 at offset 0x27369fa8.
  8-bit Writes        : -FAILURE: 0xffbb681dd7df334f != 0xffbb681dd3df334f at offset 0x27369fa8.
  16-bit Writes       : ok

Loop 2/5:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x67369fa8.
Skipping to next test...
  Random Value        : ok
  Compare XOR         : ok
FAILURE: 0x1946d5eb9fb9b294 != 0x1946d5eb9bb9b294 at offset 0x27369fa8.
  Compare SUB         : FAILURE: 0x1c2273ee360dbb5c != 0xc6a3054b0a0dbb5c at offset 0x27369fa8.
  Compare MUL         : FAILURE: 0x00000000 != 0x00000001 at offset 0x27369fa8.
  Compare DIV         :   Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : testing   1FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x27369fa8.
  Block Sequential    : testing   4FAILURE: 0x404040404040404 != 0x404040400040404 at offset 0x27369fa8.
  Checkerboard        : testing   1FAILURE: 0x5555555555555555 != 0x5555555551555555 at offset 0x27369fa8.
  Bit Spread          : testing   0FAILURE: 0xfffffffffffffffa != 0xfffffffffbfffffa at offset 0x27369fa8.
  Bit Flip            : testing   1FAILURE: 0xfffffffffffffffe != 0xfffffffffbfffffe at offset 0x27369fa8.
  Walking Ones        : testing  25FAILURE: 0xfffffffffdffffff != 0xfffffffff9ffffff at offset 0x27369fa8.
  Walking Zeroes      : testing  26FAILURE: 0x04000000 != 0x00000000 at offset 0x27369fa8.
  8-bit Writes        : -FAILURE: 0xbaab34c1ceea722e != 0xbaab34c1caea722e at offset 0x27369fa8.
  16-bit Writes       : -FAILURE: 0xdfcf45d3edf5711d != 0xdfcf45d3e9f5711d at offset 0x27369fa8.

Loop 3/5:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x67369fa8.
Skipping to next test...
  Random Value        : FAILURE: 0x7db51f267df7d5c2 != 0x7db51f2679f7d5c2 at offset 0x27369fa8.
  Compare XOR         : ok
FAILURE: 0x844e3d1a5f236232 != 0x844e3d1a5b236232 at offset 0x27369fa8.
  Compare SUB         : FAILURE: 0x6a9264c0ec6c59a2 != 0x6f1273df086c59a2 at offset 0x27369fa8.
  Compare MUL         :   Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
FAILURE: 0xf72e348784a14453 != 0xf72e348780a14453 at offset 0x27369fa8.
  Sequential Increment:   Solid Bits          : testing   1FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x27369fa8.
  Block Sequential    : testing   4FAILURE: 0x404040404040404 != 0x404040400040404 at offset 0x27369fa8.
  Checkerboard        : testing   1FAILURE: 0x5555555555555555 != 0x5555555551555555 at offset 0x27369fa8.
  Bit Spread          : testing   0FAILURE: 0xfffffffffffffffa != 0xfffffffffbfffffa at offset 0x27369fa8.
  Bit Flip            : testing   1FAILURE: 0xfffffffffffffffe != 0xfffffffffbfffffe at offset 0x27369fa8.
  Walking Ones        : testing  25FAILURE: 0xfffffffffdffffff != 0xfffffffff9ffffff at offset 0x27369fa8.
  Walking Zeroes      : testing  26FAILURE: 0x04000000 != 0x00000000 at offset 0x27369fa8.
  8-bit Writes        : ok
  16-bit Writes       : -FAILURE: 0x7f7ddebb97df5f5b != 0x7f7ddebb93df5f5b at offset 0x27369fa8.

Loop 4/5:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x67369fa8.
Skipping to next test...
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
FAILURE: 0xee87f09d6c07a204 != 0xee87f09d6807a204 at offset 0x27369fa8.
  Compare MUL         :   Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : testing   1FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x27369fa8.
  Block Sequential    : testing   4FAILURE: 0x404040404040404 != 0x404040400040404 at offset 0x27369fa8.
  Checkerboard        : testing   1FAILURE: 0x5555555555555555 != 0x5555555551555555 at offset 0x27369fa8.
  Bit Spread          : testing   0FAILURE: 0xfffffffffffffffa != 0xfffffffffbfffffa at offset 0x27369fa8.
  Bit Flip            : testing   1FAILURE: 0xfffffffffffffffe != 0xfffffffffbfffffe at offset 0x27369fa8.
  Walking Ones        : testing   0FAILURE: 0xfffffffffffffffe != 0xfffffffffbfffffe at offset 0x27369fa8.
  Walking Zeroes      : testing  26FAILURE: 0x04000000 != 0x00000000 at offset 0x27369fa8.
  8-bit Writes        : -FAILURE: 0x7fb66a1af6fb5f5b != 0x7fb66a1af2fb5f5b at offset 0x27369fa8.
  16-bit Writes       : -FAILURE: 0xd7fd992cfffe1e60 != 0xd7fd992cfbfe1e60 at offset 0x27369fa8.

Loop 5/5:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x67369fa8.
Skipping to next test...
  Random Value        : ok
FAILURE: 0xd61496ccef20dff6 != 0xd61496cceb20dff6 at offset 0x27369fa8.
  Compare XOR         : FAILURE: 0x17333976f3a5d664 != 0x17333976eba5d664 at offset 0x27369fa8.
  Compare SUB         : FAILURE: 0x4a46659beef54830 != 0x6eaa870c8af54830 at offset 0x27369fa8.
  Compare MUL         :   Compare DIV         : ok
  Compare OR          : ok
FAILURE: 0xbf2f031c065b94c4 != 0xbf2f031c025b94c4 at offset 0x27369fa8.
  Compare AND         : FAILURE: 0xffbbe2003c5520e4 != 0xffbbe200385520e4 at offset 0x27369fa8.
  Sequential Increment:   Solid Bits          : testing   1FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x27369fa8.
  Block Sequential    : testing   4FAILURE: 0x404040404040404 != 0x404040400040404 at offset 0x27369fa8.
  Checkerboard        : testing   1FAILURE: 0x5555555555555555 != 0x5555555551555555 at offset 0x27369fa8.
  Bit Spread          : testing   0FAILURE: 0xfffffffffffffffa != 0xfffffffffbfffffa at offset 0x27369fa8.
  Bit Flip            : testing   1FAILURE: 0xfffffffffffffffe != 0xfffffffffbfffffe at offset 0x27369fa8.
  Walking Ones        : testing  25FAILURE: 0xfffffffffdffffff != 0xfffffffff9ffffff at offset 0x27369fa8.
  Walking Zeroes      : testing  26FAILURE: 0x04000000 != 0x00000000 at offset 0x27369fa8.
  8-bit Writes        : -FAILURE: 0x97cd4377eff38ee1 != 0x97cd4377ebf38ee1 at offset 0x27369fa8.
  16-bit Writes       : ok

Should I remove all memories and run memtester on each one separately to find the faulty memory? I have 4 slots with 4BG each
 
Yes, that's the basic idea. Test each one separately. Note that this can also be caused by a faulty *slot*. Hopefully not.

But before that make sure all the slots are clean. Actually make sure the entire motherboard and PSU are dust free. Use a blow dryer if you don't have compressed air canisters. Hold fan blades (using an insulated material) to keep them from spinning. Hold the modules by the edges to avoid damaging their components with static discharges.
 
I am back after a lot time to close the topic. I tried another power supply and graphics card and I had the same problems. Tried another disk with Linux and even on 100% cpu everything was running perfect. Check the FreeBSD ssd disk and was ok! So I finally reinstalled FreeBSD and problem solved! At least now I know that one of my four rams have some issues.
 
Back
Top