Hi,
I am using FreeBSD 9.0, I am having trouble understanding ZFS performance in dedup mode. The setup is as follows:
dedup and compression are both ON. 128k blocks. ada5p2 and ada1p2 are 45G SSD device. ada0p2 is 45G SSD device. This is the secondary cache which is set to metadata. ada3p3, ada4p1, ada2p1 are regular harddrives. The computer has 16GB of RAM. The primary cache is set to metadata. The main pool has 1.3TB of data in it. There is nothing running on this computer apart from what I run.
I reboot the computer to clear all caches. I use an arc_summary.pl script to get the kernel memory used:
I write 5000M from /dev/random to the mainpool, then I check the memory size - I do this repeatedly:
The kernel memory becomes full. Performance is poor, but I'm assuming that is because the dedup table (DDT) is not in cache. I carry on doing this for an hour..
It levels off at 13G, I guess the OS is holding onto a bit. So pretty much all the kernel memory is used and performance has not gotten any better - hovering around 7MB/s. I want to check that the problem is actually deduplication, so I turn off dedup on mainpool and do the same write:
That's 84MB/s, which is fast, this shows the problem is the deduplication. I have the following questions:
Many thanks.
I am using FreeBSD 9.0, I am having trouble understanding ZFS performance in dedup mode. The setup is as follows:
Code:
test# zpool status
pool: mainpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mainpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0
ada4p1 ONLINE 0 0 0
ada2p1 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
ada5p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
cache
ada0p2 ONLINE 0 0 0
errors: No known data errors
dedup and compression are both ON. 128k blocks. ada5p2 and ada1p2 are 45G SSD device. ada0p2 is 45G SSD device. This is the secondary cache which is set to metadata. ada3p3, ada4p1, ada2p1 are regular harddrives. The computer has 16GB of RAM. The primary cache is set to metadata. The main pool has 1.3TB of data in it. There is nothing running on this computer apart from what I run.
I reboot the computer to clear all caches. I use an arc_summary.pl script to get the kernel memory used:
Code:
Kernel Memory: 728.27M
I write 5000M from /dev/random to the mainpool, then I check the memory size - I do this repeatedly:
Code:
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 775.14M
dd if=/dev/random of=/mainpool/test.100.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 567.303935 secs (9241748 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3333 1043 2289 31% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 1438.59M
dd if=/dev/random of=/mainpool/test.101.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 604.442479 secs (8673911 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3333 1048 2285 31% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 2093.17M
dd if=/dev/random of=/mainpool/test.102.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 336.765594 secs (15568336 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3333 1052 2280 32% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 2491.40M
dd if=/dev/random of=/mainpool/test.103.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 671.916393 secs (7802876 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3333 1057 2275 32% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 3185.06M
dd if=/dev/random of=/mainpool/test.104.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 1151.609028 secs (4552656 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3333 1062 2270 32% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 4244.58M
The kernel memory becomes full. Performance is poor, but I'm assuming that is because the dedup table (DDT) is not in cache. I carry on doing this for an hour..
Code:
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 13001.54M
dd if=/dev/random of=/mainpool/test.141.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 905.066026 secs (5792815 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3332 1243 2089 37% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 13258.20M
dd if=/dev/random of=/mainpool/test.142.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 726.570730 secs (7215925 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3332 1248 2084 37% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 13054.64M
dd if=/dev/random of=/mainpool/test.143.out bs=1M count=50005000+0
records in
5000+0 records out
5242880000 bytes transferred in 735.318146 secs (7130084 bytes/sec)
df -g /mainpoolFilesystem 1G-blocks Used Avail Capacity Mounted on
mainpool 3332 1252 2079 38% /mainpool
( /root/arc_summary.pl | grep 'Kernel Memory' ) 2>/dev/nullKernel
Memory: 13006.72M
It levels off at 13G, I guess the OS is holding onto a bit. So pretty much all the kernel memory is used and performance has not gotten any better - hovering around 7MB/s. I want to check that the problem is actually deduplication, so I turn off dedup on mainpool and do the same write:
Code:
test# zfs set dedup=off mainpool
test# dd if=/dev/random of=/mainpool/test.145.out bs=1M count=5000
5000+0 records in
5000+0 records out
5242880000 bytes transferred in 61.958975 secs (84618572 bytes/sec)
That's 84MB/s, which is fast, this shows the problem is the deduplication. I have the following questions:
- The performance of the writes fluctuate, sometimes 5MB/s sometimes 15MB/s, given that I am writting random data I would have expected things to be highly consistent. Each 128k block will be checksummed, ZFS will read the DDT, which will hit or miss, then write the data. Point is, each 5000M of random data I write should be approximately the same work for the computer.
- I don't understand why ZFS gobbled all the RAM. I'm assuming the RAM usage went up to 13G because the primary cache was filling up. How is this possible? I don't understand how the DDT table can be that big for 1.3TB of data.
- Is there a way of seeing the size of the DDT table?
- What exactly does it mean primarycache=metadata, what else is stored apart from the DDT?
- When the mainpool was empty the random data writes were fast, at about 64MB/s, as more random data was added to mainpool it got slower (all data is /dev/random data). I can understand that to some extent because the DDT is getting bigger.. The only additional overhead I can think is the DDT read.. I was hoping that by having it all in RAM the
process would be very fast - Is it normal to get this level of performace degredation? - Are there any tools I can use to get to the bottom of what is going on? Are there ZFS logs? Can the code be compiled and instrumented?
Many thanks.