ZFS takes forever to mount after unclean shutdown

Hi!

A little over a week ago I noticed that my Freebsd box had crashed, after a restart it´s now still stuck when trying to mount one of the pools. There is constant disk activity an the machine responds to the keyboard, so I guess it is still trying to fix the filesystem.

I believe the reason for the crash is that I put too much deduped data into the pool with too little memory (4 GB + some SSD ARC), and it now tries to reconstruct the dedup tables.
The pool consists of 2 RAIDZ vdevs, 10 drives, around 12TB in total. Of this I estimate around 500-700GB is on a deduped filsystem, and about 10 TB non deduped.


So a few questions for the ZFS experts:

* Is there any way of knowing approximately how long it will take if I just leave it ?, will it even finish ?
* What happens if I abort it, will it restart from the beginning ? If not I could shut it down and add more memory and then go again. Will more memory even help with the recovery ?
* The deduped filesystem (which I use for VM image backups) have 90 snapshots (3 months), will this affect the time it will take to recover ?

Any other ideas ?
 
gkontos said:
Can you provide some more details regarding your ZFS version, FreeBSD version and capacity of the pool?

FreeBSD 9.1 RC3, the system has 2 pools, the small root pool seems to be ok, the problematic one is described in my post: "The pool consists of 2 RAIDZ vdevs, 10 drives, around 12TB in total. Of this I estimate around 500-700GB is on a deduped filsystem, and about 10 TB non deduped." Are there any other info about the pool that is useful to know ?
The ZFS version is the one shipped with 9.0 (28 ?), which was later upgraded to 9.1RC3, no ZFS pool updates were done.
 
LasseKongo said:
FreeBSD 9.1 RC3, the system has 2 pools, the small root pool seems to be ok, the problematic one is described in my post: "The pool consists of 2 RAIDZ vdevs, 10 drives, around 12TB in total. Of this I estimate around 500-700GB is on a deduped filsystem, and about 10 TB non deduped." Are there any other info about the pool that is useful to know ?
The ZFS version is the one shipped with 9.0 (28 ?), which was later upgraded to 9.1RC3, no ZFS pool updates were done.


Q: Are there any other info about the pool that is useful to know ?

A: Yes! What is the status of the pools/drives?

Code:
zpool status -v

and

Code:
zpool iostat
 
User23 said:
Q: Are there any other info about the pool that is useful to know ?

A: Yes! What is the status of the pools/drives?

Code:
zpool status -v

and

Code:
zpool iostat

That is not possible to know since the machine is stuck when booting. It guess it will not give me access until the pool is repaired, that´s why I am asking questions about estimated recovery time etc.
 
The recovery has now been running for 8 days, I have seen other posts on the net which confirms this is not uncommon in my situation.
If nothing else turns up I plan to let it run for a few more days, if it´s not ready by then, shut it down and replace the motherboard with one with 16GB of memory and see if that helps.

My guess is that since I probably used too much memory when the system was up, it will still be too much now, when it´s trying to recreate the DDT tables, so maybe more memory would avoid spooling the table up and down from the spinning disks.
 
It really depends on how and when it crashed. Dedupication itself requires a lot of memory. If you have a pool with 12TB capacity then 4GB of RAM is really low, even without dedup. Of course this also depends on the I/O operations. When I asked you before about the capacity, I also wanted to know how much percentage is being used.

How long should you wait? That is a good question but I seriously think that 8 days is a lot.
 
kpa said:
I'm pretty sure you will need lots more memory to use dedup on your pool. See this thread for example:

http://forums.freebsd.org/showthread.php?t=31999

I know I´m on the low side, dedup consumes 320 bytes of memory / deduped block. I only use dedup on a single dataset which had around 200 GB of deduped data from the beginning. My mistake was that I added a few larger VM images to my VM server which in turn got backed up to this dataset onmy now problematic backup server. I should of course added more memory before I did that since it is pretty easy to calculate how much dedup will consume.
 
gkontos said:
It really depends on how and when it crashed. Dedupication itself requires a lot of memory. If you have a pool with 12TB capacity then 4GB of RAM is really low, even without dedup. Of course this also depends on the I/O operations. When I asked you before about the capacity, I also wanted to know how much percentage is being used.

How long should you wait? That is a good question but I seriously think that 8 days is a lot.

I wish there were some way of monitoring the progress of the recovery operation.

I have around 10.5 TB of data of 12 TB usable, that is not really optimal either, I have noticed that ZFS get substantially slower when the filsystem fills up.

I also found this when googling:
"DDT is considered metadata. Up to 25% of memory (zfs_arc_meta_limit) can be used to store metadata."

I wonder if this parameter is tunable ? If it is a kernel parameter I could temporarily increase it to give the DDT tables more space i RAM, I don´t really need it for any applications at the moment :)
 
LasseKongo said:
I wish there were some way of monitoring the progress of the recovery operation.

I have around 10.5 TB of data of 12 TB usable, that is not really optimal either, I have noticed that ZFS get substantially slower when the filsystem fills up.

I also found this when googling:
"DDT is considered metadata. Up to 25% of memory (zfs_arc_meta_limit) can be used to store metadata."

I wonder if this parameter is tunable ? If it is a kernel parameter I could temporarily increase it to give the DDT tables more space i RAM, I don´t really need it for any applications at the moment :)

Sorry but you have really overdone it here! You are using 4GB of RAM on a 12TB pool that has much exceeded the 80% utilization and you are also deduping.

I really can't suggest much here. What I would do is to cut off the operation. Upgrade to at least 16GB of RAM, boot from a Installation media and try importing the pool.
 
gkontos said:
Sorry but you have really overdone it here! You are using 4GB of RAM on a 12TB pool that has much exceeded the 80% utilization and you are also deduping.

I really can't suggest much here. What I would do is to cut off the operation. Upgrade to at least 16GB of RAM, boot from a Installation media and try importing the pool.

Yes, I think that is the way forward, I actually have a board with 16GB that I planned to use for this server, but never got around installing. Looks pretty stupid now...

What nags me is that the current recovery process could be 10 minutes from finishing when I turn it off, but I will never know. Will probably shut it down tomorrow a swich the board.

Now, off for Christmas celebrations!
 
LasseKongo said:
I believe the reason for the crash is that I put too much deduped data into the pool with too little memory (4 GB + some SSD ARC), and it now tries to reconstruct the dedup tables.
The pool consists of 2 RAIDZ vdevs, 10 drives, around 12TB in total. Of this I estimate around 500-700GB is on a deduped filsystem, and about 10 TB non deduped.

Correct. It's building the DDT, and trying to complete the last transaction. Usually, it's a dataset destroy that was interrupted, so it's finishing that at pool import.


So a few questions for the ZFS experts:

* Is there any way of knowing approximately how long it will take if I just leave it ?, will it even finish ?

So long as you have enough RAM, it will finish eventually. Add as much RAM as possible, to make it run faster. You can also upgrade to 9.1 and update the pool to support feature flags. One of the features is background destroy. That helped us in a similar situation.

* What happens if I abort it, will it restart from the beginning ? If not I could shut it down and add more memory and then go again. Will more memory even help with the recovery ?
It will continue from the previous location, not restart from the beginning. See above about RAM.
 
phoenix said:
Correct. It's building the DDT, and trying to complete the last transaction. Usually, it's a dataset destroy that was interrupted, so it's finishing that at pool import.

As I mentioned earlier I have 90 snapshots in the deduped dataset, one new is created every night, and the oldest one is destroyed, this is probably where it went wrong.


phoenix said:
So long as you have enough RAM, it will finish eventually. Add as much RAM as possible, to make it run faster. You can also upgrade to 9.1 and update the pool to support feature flags. One of the features is background destroy. That helped us in a similar situation.

It is pretty obvious that I am low on memory so I hope my motherboard replacement will do the trick. Since I´m running 9.1RC3, would it be possible to boot single user and upgrade the feature flags ? Since the pool is not clean my guess it that it will not let me make changes to it.


phoenix said:
It will continue from the previous location, not restart from the beginning. See above about RAM.

That was a relief to hear !

Thanks for this useful information.
 
Did the motherboard swap today to one with 16GB of memory, and voilá, it took less then 30 min. to mount all filsystems and get a login prompt.

When I ran zpool status -D on the affected pool it has a little over 6 million DDT entries, and if I multiply that with 320 bytes it gives me around 1850 MB needed. Maybe a little bit too much if you have 4GB of memory. ;)

The system behaves a little sluggish, some commands like for example top takes almost a minute to execute, while others are normal. Hope this will fix itself.
 
Back
Top