empty passwd after upgrade to 12-STABLE

Have been running 12-RELEASE for a week or so, compiled about 1000 packages using Poudriere, setup a few jails, etc... Decided to upgrade to 12-STABLE to test a few upcoming features, which was not intimidating after a few years of building kernel/world manually using DragonFly.

After completing the mergemaster process, I did a "shutdown -p now" and went to sleep. The next day when powering up, none of the user accounts would login. Booting into single-user mode I found that the /etc/passwd and /etc/master.passwd files existed, but were blank, containing no data. As far as I can tell those are the only two files missing, but there could be more.

Question is, could I have done something wrong during the mergemaster process that would cause the master.passwd and passwd files to not be written? Or. should I be looking at hardware?

It's a Dell R210 with ECC memory and a Samsung Enterprise SM863 SSD drive. A "zpool scrub" of that zroot pool checked out fine, and the standard Dell system and memory checks showed no errors, so it's more likely to be user error, but would like to make sure before going much further...
 
Created a snapshot and am upgrading to STABLE again. Will see if there is a way to create a blank passwd file running mergemaster.
 
I attempted to screw up mergemaster in every way possible, but when completed and issuing a "shutdown -p now", then rebooting, it came back up fine. Files that were not upgraded still show releng, but everything works as expected.

Guess I need to look more closely at the SM863. Just don't understand how a failing SSD won't cause ZFS to log checksum errors...
 
Checked SSD with smartctl, it appears ok.

Code:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       14376
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       211
177 Wear_Leveling_Count     0x0013   099   099   005    Pre-fail  Always       -       20
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   100   100   010    Pre-fail  Always       -       961
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   067   040   000    Old_age   Always       -       33
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
202 Exception_Mode_Status   0x0033   100   100   010    Pre-fail  Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       174
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       3887775449
242 Total_LBAs_Read         0x0032   099   099   000    Old_age   Always       -       1336775783
243 SATA_Downshift_Ct       0x0032   100   100   000    Old_age   Always       -       0
244 Thermal_Throttle_St     0x0032   100   100   000    Old_age   Always       -       0
245 Timed_Workld_Media_Wear 0x0032   100   100   000    Old_age   Always       -       65535
246 Timed_Workld_RdWr_Ratio 0x0032   100   100   000    Old_age   Always       -       65535
247 Timed_Workld_Timer      0x0032   100   100   000    Old_age   Always       -       65535
251 NAND_Writes             0x0032   100   100   000    Old_age   Always       -       4762970920

SMART Error Log Version: 1
No Errors Logged

Unless anyone has any input, this appears to be a mystery. I do not understand how the passwd files could have been written as blank files, leaving the server inaccessible via ssh. It does not appear to be the SSD, the server hardware passes all tests, and ZFS shows the pool is fine.

Could it be the STABLE revision I downloaded?
 
Could it be the STABLE revision I downloaded?
Of course. You are aware that STABLE is basically a developer snapshot and doesn't imply a stable environment? While it works for a lot of people it's still a snapshot in the end which is the defacto place for bugs and other hiccups to surface.

If you want stability & reliability you're better of with the official release.
 
Could it be the STABLE revision I downloaded?

More likely there were changes to master.passwd and you pressed install instead of merge doing mergemaster.

EDIT: OK, I'm reading not closely enough and there don't seem to be any changes after the 12.0-RELEASE, sorry.
 
Of course. You are aware that STABLE is basically a developer snapshot and doesn't imply a stable environment? While it works for a lot of people it's still a snapshot in the end which is the defacto place for bugs and other hiccups to surface.

If you want stability & reliability you're better of with the official release.

I upgraded to STABLE, not CURRENT, specifically to test features under development. However, the issue is that passwd and master.passwd were replaced with empty files during the upgrade process, which doesn't seem to be possible using mergemaster.

While instablity with features was expected, instability with mergemaster during the upgrade process was not, which is why I am trying to figure out if there is something else going on.

EDIT: It seems safe to say the issue likely isn't FreeBSD, going to assume the hardware may be intermittently failing in some way. It seems there would be some checksum errors if the drive or hardware were malfunctioning, but it is an old machine so who knows...
 
EDIT: It seems safe to say the issue likely isn't FreeBSD, going to assume the hardware may be intermittently failing in some way. It seems there would be some checksum errors if the drive or hardware were malfunctioning, but it is an old machine so who knows...
So try with another machine, if it's still the same it means FreeBSD's update procedure sucks.
 
So try with another machine, if it's still the same it means FreeBSD's update procedure sucks.
I installed a new hard drive and tried it from the same machine a second time, which worked. The update procedure is not the issue.
 
Back
Top