Solved tar archive creation sort order

I have two identical directories, one mirroring the other through rsync -a. I'm trying to create a checksum of all of the files with tar. The file ordering varies between them and breaks this.

Code:
bash# pwd; tar cf - . | tar tf - | head -n 5
/mnt/files/terancorp
./
./backup/
./code/
./ebooks/
./freebsd/

vs

Code:
bash# pwd; tar cf - . | tar tf - | head -n 5
/srv/files/terancorp
./
./ansible/
./websites/
./.tarsha256sum
./misc_source/

As you can see, the ordering is completely different. gtar has a sort option, but I would prefer to make this FreeBSD native as this script needs to run before I have any ports installed. The whole purpose is to verify integrity of the directory. Attached is the script to give an idea of what's I'm trying to do.

It seems to work in place, but if I cp -a or rsync -a the directory elsewhere, it breaks due to ordering. If tar could sort the order in which files are archived, I think it would work portably.

Do you have any recommendations on what I could try? Is there an existing utility (in base or a script that can run from base) that does this in a better way?
 

Attachments

I got this working in a hackish way.

find -s . ! -name .tarsha256sum -type f -exec cat {} \;

Updated script attached.

Other ideas/suggestions appreciated.
 

Attachments

Fantastic, thank you!

Only gotcha is that you have to have a file which lists your exclusions.

Code:
$ echo .mtree > .mtreeignore
$ mtree -c -X .mtreeignore -K sha256 > .mtree
$ cat .mtree | mtree -X .mtreeignore

Amazing tool. Much better than my silly script.
 
Ok, having some issues doing mtree going from ZFS to UFS.

I get
Code:
flags ("none" is not "uarch")
on all (or many?) of the files.

Not sure how to avoid that.
 
Ok, looks like I can work around that.

Code:
$ mtree -c -X .mtreeignore -K sha256 -R flags > .mtree

I also don't need to use cat to do the comparison.

Code:
mtree -X .mtreeignore -f .mtree

Thank you for your help! Marking as solved.
 
Instead of excluding the master mtree file, why not just save that file outside of the folder you're duplicating?
To use that file to validate a copy of the duplicated structure, cd to the duplicate folder and mtree -f </path/to/master mtree file>

And if all you're interested in doing is verifying the checksum (don't care about file timestamps or ownership or permissions or other 'flags'), you probably should use the -k rather than the -K option.
 
Instead of excluding the master mtree file, why not just save that file outside of the folder you're duplicating?
To use that file to validate a copy of the duplicated structure, cd to the duplicate folder and mtree -f </path/to/master mtree file>

And if all you're interested in doing is verifying the checksum (don't care about file timestamps or ownership or permissions or other 'flags'), you probably should use the -k rather than the -K option.

I like the idea of it being standalone, but that's certainly possible as well. In this case the -K method doesn't really hurt, so I'm good with it. I want to know if the data gets corrupted in any way. And while the file contents will be most of it, there'd be a tiny, tiny chance the corruption might be permissions related. Since it's just as fast (relatively speaking) I think it's good like that.
 
Back
Top