Hello,
I am currently transferring data over the network to a new system/PostgreSQL(in jail) database I've setup. Something that has been happening which really has me scratching my head.
It seems that while the system is under heavy load for this PostgreSQL transfer, only one of the two drives in the mirrored pool will be writing data whereas the other drive remains idle (for up to 5 minutes) and will alternate whereas other times both drives are writing data as normally would be expected.
This never occurs when data is randomly written to the zpool, even 500GB randomly generated files do not create this situation. Which leads me to believe it's related to some sort of configuration I need to tweak in ZFS and/or PostgreSQL.
The system will also sometimes feel unresponsive. For example, I signed out of root and it will just hang there for 10-15 minutes before I can go back to a lower level user.
Sometimes the situation will resolve itself whereas others, it will result in a kernel panic:
The machine has 16GB of ram. In an an attempt to resolve this, I doubled the two variables below:
As well as increased the postgresql variables shared_buffers to 1638MB as well as effective_cache_size to 3GB which caused the issue to be delayed. The table that I'm transferring has around 100 million rows. Before these changes, it failed at around 30 million. Afterward, 90 million.
I've ruled out the hard drives by running smartmontools as well as unplugging one drive at a time to confirm that the kernel panic still occurred. I know I could split the table and move smaller chunks, but I'd like to try to address the root cause so my server won't kernel panic in the future.
Any suggestions are welcome.
/boot/loader.conf
I am currently transferring data over the network to a new system/PostgreSQL(in jail) database I've setup. Something that has been happening which really has me scratching my head.
It seems that while the system is under heavy load for this PostgreSQL transfer, only one of the two drives in the mirrored pool will be writing data whereas the other drive remains idle (for up to 5 minutes) and will alternate whereas other times both drives are writing data as normally would be expected.
Bash:
dT: 1.060s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
20 2 0 0 0.0 1 117 15842 152.5| ada0
0 1 0 0 0.0 0 0 0.0 0.0| ada1
This never occurs when data is randomly written to the zpool, even 500GB randomly generated files do not create this situation. Which leads me to believe it's related to some sort of configuration I need to tweak in ZFS and/or PostgreSQL.
The system will also sometimes feel unresponsive. For example, I signed out of root and it will just hang there for 10-15 minutes before I can go back to a lower level user.
Sometimes the situation will resolve itself whereas others, it will result in a kernel panic:
Bash:
panic: I/O to poll 'zroot' appears to be hung on vdev guid 3397587246704100575 at '/dev/label/encroot0.eli'
The machine has 16GB of ram. In an an attempt to resolve this, I doubled the two variables below:
kern.ipc.shmall=262144 #System default was: 131072
kern.ipc.shmmax=1073741824 #System default was: 536870912
As well as increased the postgresql variables shared_buffers to 1638MB as well as effective_cache_size to 3GB which caused the issue to be delayed. The table that I'm transferring has around 100 million rows. Before these changes, it failed at around 30 million. Afterward, 90 million.
I've ruled out the hard drives by running smartmontools as well as unplugging one drive at a time to confirm that the kernel panic still occurred. I know I could split the table and move smaller chunks, but I'd like to try to address the root cause so my server won't kernel panic in the future.
Any suggestions are welcome.
pool: zroot
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/encroot0.eli ONLINE 0 0 0
label/encroot1.eli ONLINE 0 0 0
Code:
Filesystem 1K-blocks Used Avail Capacity Mounted on
zroot 7411268064 74435124 7336832940 1% /
/boot/loader.conf
Code:
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256