Solved When resilvering is in progress, ZIL disk removal is restarting the resilvering process .

This is happening because as part of clean up the vdev namespace (spa_vdev_remove_from_namespace) after removing ZIL disk, we reopen (vdev_reopen) all the vdev, which checks if resilvering is needed for a leaf vdev, if it is needed, it restarts the resilvering. It doesn't check if resilvering is already going on or not. So if resilvering is already in-progress, it will start it from the beginning.

My question is, why we are not checking for dsl_scan_resilvering (we do this in spa_load_impl) in vdev_open? What will be the side effects if we put this check while opening the vdev?


Let me know if I am missing anything.

Thanks,
Pawan.

Code:
vdev_open() :-

  /*
  * If a leaf vdev has a DTL, and seems healthy, then kick off a
  * resilver.  But don't do this if we are doing a reopen for a scrub,
  * since this would just restart the scrub we are already doing.
  */
  if (vd->vdev_ops->vdev_op_leaf && !spa->spa_scrub_reopen &&
  vdev_resilver_needed(vd, NULL, NULL))
  spa_async_request(spa, SPA_ASYNC_RESILVER);
 
I suggest posting this question to the freebsd-fs mailinglist. There aren't a lot of developers on this board so I doubt you'll get meaningful answers any time soon.
 
Response on the list was that removal of the log device checks all vdevs to see if a resilver is needed, which detects the resilvering situation (replaced disk, onlined disk, etc) and starts a resilver ... without checking to see if one is already underway. There is discussion underway to see about changing that behaviour.
 
thanks phoneix for the reply. By the way I posted this question to the freebsd-fs mailinglist but didn't get any reply. Could you please refer me the list where this discussion happened.
 
Oh, I see that your post was the one that included the patch. :) Was going from memory and thought there was more to the thread than that.
 
Back
Top