Help with Inode

GroupInode · Oct 26, 2011

Hi guys,

I am stuck to a very specific problem. I want to add a field to the existing inode data structure, but I am not sure where to make the changes. What I want to know is where and how to make changes to the inode data structure. Also please explain the problems that may arise in making such changes.

trasz@ · Nov 1, 2011

Assuming you really mean inode (i.e. structure describing a file in the filesystem, stored on the disk) and not vnode (structure in memory, representing inode being used) - for UFS, it's defined in sys/ufs/ufs/inode.h. An obvious problem with modifying it would be losing compatibility with unmodified version - your modified code won't be able to mount normal UFS filesystems.

GroupInode · Nov 2, 2011

Thanks for the info. Can you also tell me: if I add the field to the inode structure which changes have to be made so that
1. All the metadata is consistent for all the files on the system
2. The functions that access the inodes of all the files and make modifications to it work the same way as if there was no change?
Thanks in advance.

GroupInode · Dec 29, 2011

As of now we have understood that any program uses struct stat structure to store temporary inode info and display it. The struct stat structure gets info from struct inode (defined in inode.h) which in turn gets info from struct dinode. The struct dinode is (or may be, I am not sure) is the actual physical representation of the inode structure.

I also tried studying ls.c source code. I found that it uses another important structures struct FTS and struct FTSENT for traversing the file system hierarchy using functions defined in fts.c. Now the problem is that fts.c uses systemcalls like fstat() and open() for accessing inode information and I can't find where the source of those system calls are or in what language it is written? I think they are written in perl language but I have also found some .c files in /lib/libstand folder, but the source code is merely 30 lines.

I am stuck over here any help would do and also if my information is wrong please correct me. Thank you in anticipation.

monkeyboy · Jan 2, 2012

It might help if you gave more details of what you are trying to actually do. My knowledge is old and rusty, but, dinode is the main piece, if you are trying to mod the disk inode. Be careful not to mess up assumptions as to inodes/sector. Hopefully what you are trying to add is small and can be put in the padding of the dinode struct.

To be more explicit, in 8.2 there are 16 bytes of padding (di_spare) left unused. I don't know about FreeBSD 9 (check!), but presuming they are still free, you can put some stuff in there. I would strongly hesitate adding anything to the dinode beyond using up these 16bytes because basically you'd be creating a new (and incompatible) filesystem format.

fstat and open are syscalls in the kernel, of course, written in C. Have you looked at vfs_syscalls.c? There are of course C library wrappers in the library source which generate the traps into the kernel.

GroupInode · Jan 4, 2012

I'm trying to find how the information is retrieved from dinode and printed on the console or to the file. Once I know how it is retrieved, I'll try to add a field to dinode structure last_modified_by i.e it would have the username who lastly modified that file. I'll probably try to modify ls.c also to display information on console.

monkeyboy · Jan 4, 2012

GroupInode said:
I'm trying to find how the information is retrieved from dinode and printed on the console or to the file. Once I know how it is retrieved, I'll try to add a field to dinode structure last_modified_by i.e it would have the username who lastly modified that file. I'll probably try to modify ls.c also to display information on console.

OK, interesting project and very helpful to know what your motivation is...
First the easy stuff... If what you mean by "retrieve from dinode" and "printed", then yes, the most common method is by the ls command. BUT it is far, far from the only mechanism. Many, many apps directly access file and inode information, e.g. tar, dump, chmod, even sh, just to name a few, nevermind large apps like DBs, webservers, etc, etc. So, for example, if you want to back up your new info by tar or dump, you will have to modify these utilities. Consider that, if you are successful, you will certainly want to modify find(1) so that you can do searches for all files that have been modified by specific users.

The simpliest description of ls(1) is that it reads directories, opendir(), etc, then does stat() calls to grab the inode info for each file.

As far as "username", I presume and hope that you actually mean uid/gid, since it makes no sense to have usernames in the inodes. Luckily a uid/gid pair will fit in the spare entries of a dinode so you are set there. I would strongly recommend not changing the size of a dinode.

Ok, now for the harder stuff. Again, my knowledge is old and rusty -- its been years/decades since I've poured over the kernel. But I'll take a stab... My gut reaction is that this is an interest idea (record uid/gid of last inode modifier), but it is not going to be easy to do 100% airtight, and it is likely for these reasons why it hasn't been done yet.

To be more explicit, I think what you want is the record the uid/gid each time an inode's mtime (and maybe ctime) is updated/changed. Sounds straight forward (but...). The first thing I would do is to grep all occurences of mtime and ctime in the kernel and study them.

Here's where it gets more subtle: 1) user programs change inodes in several different ways, many involving creating file descriptors (e.g. open()). By the time a filedes is created, info about the uid/gid is lost. There is no uid/gid associated with a file table entry. 2) user programs aren't the only way in which inodes are modified.

For 1) consider the case of when you use the su(1) command. It is an SUID program, so merely running it means that 1) a new process is created with a NEW set of uid/gid, yet 2) it inherits filedes from the parent shell, which has the original uid/gid. Both sets of filedes point to the same file table entries, which in turn point to the same inodes. So you cannot depend on the file table to tell you the proper uid/gid to update the inode with. The only real source of appropriate uid/gid info is going to be the user struct, the record of the active process that the kernel is acting on behalf of. I haven't even begun to consider what happens with threads since I don't really know about the current implementations are of threads. This means you will have to be very careful in studying the kernel to determine at any given point when mtime is changed, whether the info in struct user is actually the right info to use.

On the other hand, you will probably want to avoid, if possible, hacking every FS type (think UFS vs ZFS) to get the job done, unless you really don't care about anything other than UFS. So working at the vnode/vattr level (va_mtime) would be a strong candidate I would think... You would have to add muid/mgid fields to struct vattr. If you are lucky, maybe something simple at the vnode layer would work, like checking for changes to va_mtime in vput() or similar (just guessing here). Otherwise you may still have to hack every filesystem you care about.

You might say, why bother? why not just tackle the problem at the syscall level? I will guess/posit that doing that will miss many paths and situations when inodes get changed. Now you may not care about those situations and a 90% solution/accuracy may be good enough. This second idea is simply in each syscall routine that might change an inode, to include code to update the muid/mgid (let's call it the modifying uid/gid). I would consider this a much messier, less clean idea and here is one possible major hole: Consider the workings of a filesystem server like NFS or Samba. (I read that NFS is now stateful, which may make things even worse) I believe such server daemons do the appropriate authentication themselves and then operate on local filesystems as root. Therefore it is likely that the proper muid/mgid info is never in the kernel at all. The syscalls issued by the user may be from another machine! It will be messy to carry it around and inject it back in. You probably will have to hack every inode-modifying server daemon to get this all 100% correct -- unless you don't care about this level of accuracy.

Again my knowledge is very old and rusty, so sorry for any inaccuracies. But I hope this helps. Overall I'll guess that what you are trying to do will be medium difficult to get 80-90% right and very difficult to get 100% right.

GroupInode · Jan 5, 2012

Thank you for the reply.
Well I am a beginner at UNIX and I am trying to make this project as my undergraduate project. I appreciate all the queries you listed above although I am half way through with studying the system calls.

I think the path for the system calls are two fold:
1. Filesystem Independent system calls present in /usr/src/lib/libstand
2. Filesystem Dependent system calls present in /usr/src/lib/libstand/<file_system.c>

I have read /usr/src/lib/libstand/stand.h which I think is the periphery of Platform Independence. I t has a structure called struct fs_ops which has function pointers for system calls like *open(), *stat(), *fstat() etc.
These function pointers are made to point to actual system calls from different Platform Dependent file systems like ext2fs , ufs etc.
This is done by the open system call ( by reading from the struct open_file files[] and struct fs_ops *file_system[]).
I think if we want to add thi field for a particular file system, then I should try to make changes in file system C files like ufs.c and vfs.c ( and so on ...)

Now I was thinking that modifying already implemented file systems is a messy thing as you pointed out. So, I was planning to make my own file system. This file system code will only include the necessary fs_ops (File system Operations i.e. system calls) and putting the appropriate pointers in stand.h. Am I thinking right?

I read an IBM Developer Forum for FUSE which explained how to create a File System in User Space which explained something very similar. The link is as follows:
http://www.ibm.com/developerworks/linux/library/l-fuse/

Can you please suggest if I am on the right track or not. And also can you suggest some links I can read, that might help me for the same( ie For creating or modifying the file system)

monkeyboy · Jan 5, 2012

GroupInode said:
Well I am a beginner at UNIX and I am trying to make this project as my undergraduate project.

Now I was thinking that modifying already implemented file systems is a messy thing as you pointed out. So, I was planning to make my own file system. This file system code will only include the necessary fs_ops (File system Operations i.e. system calls) and putting the appropriate pointers in stand.h. Am I thinking right?

I read an IBM Developer Forum for FUSE which explained how to create a File System in User Space which explained something very similar.

Ok a bit more clarity into where you're coming from. I had already guess the "beginner at UNIX" piece, but not the undergraduate project. So I would say that it all depends on what your goals are.

When I talked about "messy", I meant that a solution that would work for all (or at least several major) filesystems would likely involving hack those FS's, which might get a little unwieldy. But if this is a school project, well, as you can probably tell, I'm a bit "old school" and I would personally just start with an implementation for UFS mostly and work with the UFS code and the vnode layer. I think an initial version could be done fairly quickly with perhaps 100-200 lines of added code -- pretty minimal impact, and most existing utilities would work fine. Another 100-200 lines added to programs like ls, find, etc, which would have to be done anyways for just about all approaches.

The fuse idea, well, I haven't had very good luck with fuse, but that is mainly dealing with the NTFS implementation and hitting it pretty hard. Nevertheless this approach would probably mean more coding, perhaps A LOT more coding, and less actual understanding of Unix internals itself. Much more learning about the fuse API of course, not so much about Unix. And I think you would end up with a "toy" rather than something that could actually be used. But if you want to do a fuse version, I would try to find an existing fuse model and modify it instead, although most of the fuse's that I have seen that would be good starting pointing seem geared toward Linux, not FreeBSD. Even with fuse, you would still need to alter stuff at the vattr layer to bring out the new muid/mgid info back through stat().

I don't think that the syscall level is the right place to put this, for some of the reasons I mentioned. It belongs in between vnodes and specific FS's code, pretty much where mtime is also updated.

Given what I said above, since all solution I believe will need it, I suggest you start by adding the fields and code support changes to vattr and bring it out through stat(). You can just assign dummy uids or copy the inode owner into these fields. Then modify ls.c to print it out. That would be a good start and you need to do it regardless of which internal solution you choose. That is, build the internal support for a modified stat() and ls(1) and leave the filesystem mods for later.

GroupInode · Jan 13, 2012

Thank you for your reply monkeyboy.
I have decided to go with your suggestion. I am now stuck as to how is VFS actually triggered in the file system. I read vfs_syscalls.c , vfs_nops.c as well as vfs_ops.c etc.

Now the real puzzle for me is that, for the actual inode and the dinode structures the system calls like stat() and open() have only two arguments. Whereas, the similar system calls specified by VFS have 5 arguments ( most probably taking the form of struct fileops).

I tried to search a lot but couldn't get the actual place where the distinction is made OR where the 2 arguments become 5 arguments and back again. Really need some help with this.

Also, I have been trying to debug the kernel hoping to search for answers. I have compiled the kernel for debugging on two machines and connected the two through Ethernet and will also be connecting them using a Null Modem cable (DB9 to DB9 serial cable). Although I tried to debug it but have been having great issues like a:

Code:

MALFORMED PACKET ERROR : TIMEOUT OCCURED

Please tell me how do I debug so as to know what actually happens during execution of a command like ls or stat. I need to trace

1. How the system calls are invoked
2. Referring which source code files
3. Also need to check the arguments and their return values
4. FINALLY, the transition from the 2 argument format to 5 argument format

Any ideas?

monkeyboy · Jan 14, 2012

GroupInode said:
Thank you for your reply monkeyboy.
I have decided to go with your suggestion. I am now stuck as to how is VFS actually triggered in the file system. I read vfs_syscalls.c , vfs_nops.c as well as vfs_ops.c etc.

Now the real puzzle for me is that, for the actual inode and the dinode structures the system calls like stat() and open() have only two arguments. Whereas, the similar system calls specified by VFS have 5 arguments ( most probably taking the form of struct fileops).

I tried to search a lot but couldn't get the actual place where the distinction is made OR where the 2 arguments become 5 arguments and back again. Really need some help with this.

I need to trace

1. How the system calls are invoked
2. Referring which source code files
3. Also need to check the arguments and their return values
4. FINALLY, the transition from the 2 argument format to 5 argument format

Any ideas?

I don't have ready access to full source right now. I can't answer all your questions. However I will try to take a few stabs.

First, I don't think the answers to your questions are really needed to get the work done that you want, assuming that I understand what you are trying to do. The details of how a syscall ends up within vfs_syscalls.c shouldn't matter since, AFAIK, you won't be needing to change any syscalls, the number of arguments or adding any syscalls. I believe all you really need to do is to add a few fields to the struct returned by stat(2). Thus the struct stat returned will have to be larger, but I think that is the only change that you need to make, from a syscall standpoint. Then of course, all utilities that call stat(2) will have to be recompiled, and you will want to add new code in ls(1), perhaps other user commands as you have the energy, like find(1). The tricky ones will be things like dump/restore -- I haven't looked at that issue at all.

I am going mostly from memory here and would need full source to be certain and more detailed, but... the path from syscall to vfs_syscalls.c would be something like this...

A program like ls(1) makes a stat(2) call. It is a C library routine that has its arguments pushed on the stack and then issues a trap instruction (machine dependent). The kernel's trap handler gets control and now you are in kernel space instead of user space. The trap handler determines which syscall was responsible for the trap, presuming it is legal. It is machine dependent: some CPU's encode the syscall number right in the trap instruction, others may have other mechanisms to carry that number into the kernel, like sticking the syscall number in a register.

A list of syscall numbers is in /usr/include/sys/syscall.h . These numbers are actually an index into the table sysent[] which has a structure defined in /usr/include/sys/sysent.h.

This sysent[] table also contains the function pointer to the kernel routine that is to handle the syscall. Its all in the file init_sysent.c.

Remember that the real arguments to the syscall were pushed on the stack in user space. The kernel has to go and fetch them out of user space. IIRC, some implementations have the C library support code fashion an argument block before issuing the trap (I think as pointers immediately following the trap instruction), but in either case, the arguments still live in user space, though short arguments can be passed via the user registers.

As far as return values from the syscall, I believe the kernel sets the value of one of the registers which will be restored during the process of returning from the trap. Other values returned are implemented by having the kernel directly writing user memory, using the pointers presented as arguments to the syscall. This is what happens to return the struct stat from the stat(2) call.

Again, although educational, I don't see why you need to know the details of any of this since, to my understanding, your mods shouldn't have to change any of this. Instead look into the kernel side of stat(2) and the definition of struct stat and start by adding extra fields (muid, mgid) to it, set by vn_stat() -- It looks to me that the business end of stat(2) is indeed in vn_stat(). So that part looks pretty easy...

More later... Let's see if you can make the simple mods to struct stat and vn_stat and have ls(1) get the new info...

(to be really explicit, add the fields st_muid and st_mgid to struct stat, then in vn_stat(), set them to something like sb->st_muid = 42; sb->st_mgid = 43; then recompile the kernel and at least ls.c if not the world (probably safer since all programs that call stat(2) will now be broken. If you don't want to change the sizeof struct stat at this early stage (wise), it looks like there is some extra padding currently there that could be used to start with).

GroupInode · Jan 20, 2012

Thank you once again MonkeyBoy

Sir, I changed the struct stat in sys/stat.h by adding a field st_muid in place of the st_lspare and tried to refer this field from ls.c. But while compiling ls it fires a error

Code:

'struct stat' has no member st_muid

I am really not able to understand the reason for this error because there is only one file named stat.h in the source code of FreeBSD which has the struct stat and also this file is included in ls.c. This has really confused me. Please help me.

Thanks

GroupInode · Jan 21, 2012

Monkey Boy I found out why the error was coming up. Thanks

GroupInode · Aug 13, 2012

Performance test

I did the changes in the inode structure and the project is working fine thanks to monkeyboy. Now I wanted to know whether the changes affects the performance of memory read and write. I am using a application called iozone ( http://www.iozone.org ). Iozone gives numerous tests as following.

0=write/rewrite
1=read/re-read
2=random-read/write
3=Read-backwards
4=Re-write-record
5=stride-read
6=fwrite/re-fwrite
7=fread/Re-fread,
8=random mix
9=pwrite/Re-pwrite
10=pread/Re-pread
11=pwritev/Re-pwritev
12=preadv/Re-preadv

I am not sure which test I should perform.