dupd(1) and a look at man pages

I noticed some duplicate files under /usr/share/man so I installed and ran dupd(1).

man(1) pages can be plain text or gzip format so these files are already quite small.

For any man page that describes two or more commands or functions, the file appears under each name so that it can be found by the man(1) utility. This results in a lot of duplicates. Just how much data was being duplicated? According to dupd it's about 31 megabytes. Half of the duplication is under section 3.

Play with the commands 'dupd scan', 'dupd report' and 'dupd ls' to see.

The duplicate files could be replaced with symlinks but the effort probably doesn't justify the savings.
 
For any man page that describes two or more commands or functions, the file appears under each name so that it can be found by the man(1) utility. This results in a lot of duplicates. Just how much data was being duplicated?
Actually, these 'duplicates' are not duplicates from the filesystem perspective. They are hard links to the same file (see ln(1)), which means that an internal filesystem object for data in the file is stored in one single instance and has multiple 'references' (namely links, or hard links) to it and that does not lead to 'copying' files and occupying disk space more than once.

You can tell hard links by their inode (ls(1) can print it with -i option): if two (or more) filesystem entries have the same inode number, it means that they are hard links to the same object.

As an example:
Code:
$ ls -li /usr/share/man/man3/strcmp.3.gz /usr/share/man/man3/strncmp.3.gz
13863528 -r--r--r--  2 root wheel 1467 Dec 24 12:14 /usr/share/man/man3/strcmp.3.gz
13863528 -r--r--r--  2 root wheel 1467 Dec 24 12:14 /usr/share/man/man3/strncmp.3.gz

strcmp(3) and strncmp(3) share the same man page and their files have the same inode number (in the first column).
 
I installed and ran dupd(1).
A relevant part of dupd(1):
Code:
OPTIONS
   [...]
       -I, --hardlink-is-unique
	      Consider hard links to the same file content as unique.  By  de-
	      fault  hard links	are listed as duplicates.  See HARD LINKS sec-
	      tion below.  Note	that if	this option is given during  scan,  it
	      cannot be	given during interactive operations.
Have you run it with this option?
 
Back
Top