Yeah I mean, are you trying to just be obnoxious and repeat yourself?
Yes, I'm being obnoxious and repeating myself. Because in this thread, I haven't yet seen information that would be required to diagnose the problem surgically and cleanly. There are vague descriptions of something going wrong (what exactly goes wrong?) when doing a complex series of operations (namely a find). I would like to see a single operation that goes wrong, and then see what exactly that "wrongness" is (user space hang, looping, kernel hang, return code, ...). And when I say "operation" in this context, I mean syscall. That's why I asked about relatively low-level programs (such as ls with various options), because they run a small and understandable set of sys calls.
I actually used folder and directory interchangeably because in laymen terms they pretty much mean the same thing, is laymen not simple enough for you?
Actually, the thing called "folder" (typically a folder on a GUI) can also be a softlink, while the term "directory" is unambiguous.
When I mentioned seek, I prefaced it with the idea of any form if seek, whether it be cd'ing, ls'ing, or running something like find.
There is a world of difference between ls, ls, ls, cd, and find. And I mentioned "ls" several times because depending on options, ls runs different operations. It nearly always does opendir followed by readdir, but whether it runs the opendir on "." or on a named entity depends on the arguments. And whether it then runs stat depends on the options. I would like to see exactly which syscall fails, and in which fashion.
The term "seek" is used heavily in file system interfaces and implementation. It does not mean at all what you are using it for.
"Unable to" really means exactly what it might seem, "to not be able to"..
What exactly happens when you try (other than: the find hangs, and it isn't even clear yet where it hangs)? Can you reduce the complex find to a simpler operation? Can you give us more details about exactly in what fashion it fails?
Look, in the ideal world, if I were being paid to debug this, I would ask you to execute exactly the following series of syscalls or C library calls, with exactly the following arguments, and report exactly what happens on each step. And I'd e-mail you a small program (in a language du jour) that does exactly this. I don't get paid to help debug your problems, so I'm trying to get some clear and crisp information, using language that makes the information actionable, with the minimum hassle for everyone.
What exactly are you wanting by all of your starters... to get the debug information from truss? If so, why not just say that?
You can use truss to run any of the small examples, that wouldn't hurt, and it would probably even help. But it isn't even necessary. It would already be great if you could report something like "Step A worked with no problem, step B caused the following error message to be printed, and step C hung, didn't react to Control-C, and ps showed the hung process to be in D state".
Wouldn't they be all using a similar system call than find did?
No, find uses a lot of different system calls, and then a long sequence of them.
Here's a real question though, what is zdb and "wandering around in there" mean?
Sorry about not explaining that. Every file system has metadata, which is everything that is not "data", which is defined as the content of the files. So metadata includes things like
- directories (which are lists of names, and then pointers to what these names are),
- whether the object pointed to by a name is a file, directory, link, or something else,
- attributes of the object, such as mtime and atime, permissions, size (important for directories in this problem I suspect), and link counts,
- a few more uncommon things including ACLs (a more complex way to express permissions), EAs (extended attributes), and flags (is this object changeable or has it been archived),
- and file system internal things that make everything work, like inode numbers and allocation bitmaps.
Zdb is a program that allows a user to read that metadata in quite a raw format, and then use it to follow links, most important directory entries. That following structures is what I meant by "wandering around". What I didn't mention is "take a look while you wander". For example, if this were a file system I was familiar with, I would start by looking at the /home/user/tmp directory: Does it have a sensible number of entries? How many of the entries are directories? Is the link count of the directory 2 + number of subdirectories? Is one of the entries something called "Cache_Data", and is that entry an object of type file? Does the stat of that entry look like it would be readable, and does it have any suspicious looking flags, EAs, or ACLs? Is its size somewhat reasonable (0 or a huge number are implausible)? Where on disk is Cache_Data stored? Is that place on disk plausible, and is not shared with any other object? If I look at these blocks on disk, does their content look like directory entries should look? Is the number of directory entries found on disk for Cache_Data match its size reasonable or perfectly? If I try to read Cache_Data as a directory, do I get names and objects, and the correct number? Does it have . and .. entries? Is the link count on those entries good? How many subdirectories, and does that match the link count itself? And so on and so on. With just a few minutes, a ZFS internals expert (which I am not!) would be able to validate that the directory itself is in great health, or find a problem in the metadata structures. If they find a problem, how did that happen, does the syndrome match a known cause? If there is no problem with the metadata, then why reading it "not work" (whatever that might mean?