ZFS Introduce new ZFS exploration and data recovery tool: ZFS Spy

Ordoban · Sep 9, 2018

Last month I had a bad experience with ZFS. While browsing through directories, the system went away with a kernel panic. Somehow a directory has broken. As soon as I change a certain directory the system gets a panic. zpool scrub did not solve the problem, but made it worse. Now I get the panic already when importing the pool. Fortunately, it is still possible to import the pool with -o readonly=on, so that I could save all important data, except the one directory. The data in the destroyed directory I still had in the backup. After sweaty hours, I had my data back.

After all, I asked myself: is there no fsck for ZFS? I've found old developer discussions, whether they should write a fsck for zfs or not. They have decided against it, because this kind of error should not actually occur, and an automatic fsck can not deliver good results anyway.

My curiosity is aroused. I want to know what's wrong with the pool. I have seen that it is possible to investigate with the zdb pools, but terribly awkward. Besides that you have to have imported the pool to access it with zdb on it.

I decided to write my own tool to study zfs pools. I'm not really a good programmer, I use the language I know best: java. (No, please do not beat, I'm still so small ...) Since there are no good GUI frameworks for java, I decided on something unconventional: html. You can use the tool in the browser. Nice side effect is that I can present you already the first impressions.
Feel free to take a look here: https://imoriath.com/zfsspy/

Finally my questions:
- What do you think? Could this be a useful tool?
- Is there a more recent documentation for the internal data structures than ZFS On-Disk Specification – Draft?
- Who can I ask if I am stuck at one point?

ShelLuser · Sep 9, 2018

Being a fan of the Java language myself I definitely applaud the effort you put into this and generally speaking this looks like an interesting project to me. I'm not too sure I agree about the GUI frameworks though, there's always the console which is where most of the administrators do their work anyway. To be honest the idea that you'd need to depend on a browser in order to perform low level maintenance on a file system doesn't sound appealing to me, also because it can become quite a hassle if you're working within a rescue environment.

Even so, it could be interesting. And I definitely agree that the requirement to import a pool before you can perform any kind of maintenance on it can be a very massive drawback, especially in comparison to traditional filesystems which you can even diagnose read only (of course the results would be somewhat limited, but you could still perform some testing).

That doesn't compare to ZFS where the only option you have to test the fileystem is scrubbing, and that can only be performed if the pool is imported in a read/write fashion. Anything else won't do. I have experienced plenty of scenarios where it was the writing which started to cause problems, not reading. So in those situations you're pretty much screwed because although you can still rescue all the active data, you can't perform any kind of filesystem maintenance (and thus diagnosis!) on it. Which means that sometimes you'll end up having to guess about the possible cause of the problems.

So from that perspective this definitely seems like an interesting idea. I just wonder how much different this would be from zdb(8).

Ordoban · Sep 11, 2018

ShelLuser said:
I'm not too sure I agree about the GUI frameworks though, there's always the console which is where most of the administrators do their work anyway.

An midnight commander-like GUI could be an alternative, but I fear this would take much more time just for the GUI to work. HTML is much more easy to code, just throw some tables together and let the browser do all the tabbing, resizing, coloring, scrolling stuff.

To be honest the idea that you'd need to depend on a browser in order to perform low level maintenance on a file system doesn't sound appealing to me, also because it can become quite a hassle if you're working within a rescue environment.

No, you not need a browser in the rescue environment, you need a working network interface and a browser somewhere. It is more the java runtime who will be missing. Is there a thing like a portable java environment? Or does GCJ still work?

I just wonder how much different this would be from zdb(8).

Very different. On zdb you have to type blocknumbers and command options on every step of file recovery. You have to know exactly what to type. On my idea you have just to click on links. Even monkeys can do this. (Thats why so much people using windows...)

By the way, today I have seen the first file content from the disk through my tool.

Ordoban · Oct 7, 2018

Its now on a state where it can be used, so here is the Github link

ZFS Introduce new ZFS exploration and data recovery tool: ZFS Spy

Ordoban

ShelLuser

Ordoban

Ordoban