A
ASX
Guest
hello,
I've modified the 'mkuzip' utility to make use of multithreading where possible, basically parallelizing the compression tasks.
The problem I met while doing this was that the compressed uzip file is a sequence of compressed blocks, and that sequence must remain unchanged, thus limiting the level of parallelization that can be achieved.
I used a relatively simple approach: read and compress N blocks in parallel, (where N is the number of simultaneous threads), wait until all N thread completed, then write the N blocks preserving the sequence, and loop around.
That approach prevent the full use of the cpu cores, still is a good performance enhancement over the single thread version.
On a core i3 (dual core + ht) the speed gain is around 2.5x / 3x, but on cpus with more cores the speed gain is more noticeable.
Otherwise, on single core cpus, the effects of the threaded implementation is practically none.
The compressed files will be (and from my tests are) bit by bit identical to the original version.
Of course, other strategies could have been implemented, but that would have been significatively more complex with the risk to introduce bugs, i preferred a safer route.
Primarily for test purposes, I've added a "-t N" option, to force the use of N number of threads.
src: https://github.com/zBSD/mkuzip
below, some tests using 4k, 16k and 64k block size, on a testfile made up from 10 kernels concatenated together.
I've modified the 'mkuzip' utility to make use of multithreading where possible, basically parallelizing the compression tasks.
The problem I met while doing this was that the compressed uzip file is a sequence of compressed blocks, and that sequence must remain unchanged, thus limiting the level of parallelization that can be achieved.
I used a relatively simple approach: read and compress N blocks in parallel, (where N is the number of simultaneous threads), wait until all N thread completed, then write the N blocks preserving the sequence, and loop around.
That approach prevent the full use of the cpu cores, still is a good performance enhancement over the single thread version.
On a core i3 (dual core + ht) the speed gain is around 2.5x / 3x, but on cpus with more cores the speed gain is more noticeable.
Otherwise, on single core cpus, the effects of the threaded implementation is practically none.
The compressed files will be (and from my tests are) bit by bit identical to the original version.
Of course, other strategies could have been implemented, but that would have been significatively more complex with the risk to introduce bugs, i preferred a safer route.
Primarily for test purposes, I've added a "-t N" option, to force the use of N number of threads.
src: https://github.com/zBSD/mkuzip
below, some tests using 4k, 16k and 64k block size, on a testfile made up from 10 kernels concatenated together.
Code:
CPU : Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
K Arch: amd64 i386
Cores : 4
Copying some files from various location ...
Filesystem Type Size Used Avail Capacity Mounted on
tank/ROOT/initial zfs 478G 44G 434G 9% /
/usr/bin/time ./mkuzip -t N -s X test/kernel-x10
01 thr -s 65536 28.17 real 27.93 user 0.10 sys
02 thr -s 65536 15.63 real 28.11 user 0.16 sys
04 thr -s 65536 10.68 real 35.23 user 0.18 sys
06 thr -s 65536 10.96 real 34.41 user 0.15 sys
08 thr -s 65536 10.06 real 35.26 user 0.25 sys
12 thr -s 65536 9.88 real 35.55 user 0.29 sys
16 thr -s 65536 9.85 real 35.83 user 0.30 sys
24 thr -s 65536 9.62 real 35.87 user 0.27 sys
32 thr -s 65536 9.57 real 35.97 user 0.25 sys
01 thr -s 16384 15.38 real 14.94 user 0.44 sys
02 thr -s 16384 8.65 real 15.09 user 0.48 sys
04 thr -s 16384 6.25 real 20.05 user 0.50 sys
06 thr -s 16384 6.59 real 19.43 user 0.53 sys
08 thr -s 16384 6.01 real 20.29 user 0.69 sys
12 thr -s 16384 5.81 real 20.44 user 0.63 sys
16 thr -s 16384 5.76 real 20.29 user 0.56 sys
24 thr -s 16384 5.63 real 20.42 user 0.56 sys
32 thr -s 16384 5.73 real 20.42 user 0.76 sys
01 thr -s 4096 12.28 real 10.96 user 1.38 sys
02 thr -s 4096 7.26 real 11.37 user 1.80 sys
04 thr -s 4096 5.26 real 15.28 user 2.17 sys
06 thr -s 4096 5.67 real 14.57 user 2.23 sys
08 thr -s 4096 5.03 real 15.28 user 2.28 sys
12 thr -s 4096 5.05 real 15.34 user 2.25 sys
16 thr -s 4096 4.90 real 15.41 user 2.19 sys
24 thr -s 4096 5.02 real 15.32 user 2.32 sys
32 thr -s 4096 4.89 real 15.48 user 2.11 sys