alsuki said:
Hi, everyone.
Just now I saw this thread. I had the same problem about a year back, at that time I didn't know the cause, so I reinstall Linux. At that time I had a AMD64 athlon x2 with 8GB of memory and 5 TB of disk, and every 7 days the machine would hang during the night.
My point is downgrading will not solve the problem. But I would be very glad to see this problem solved, because I'm creating a script for a full install of FreeBSD 9.0 on my machine.
Sorry I can't be of any help, but keep up the good work.
Best regards,
Alsuki.
What version did you use? Lots of mysterious problems happen with ZFS in 8.2-RELEASE and earlier, and 8-STABLE until around October 2011. And I wouldn't try 9 in production until it is proven stable.
And some problems are confused for other problems. For example, if you have many snapshots and run a command that goes through all the
.zfs/snapshot/* directories, for example:
# find /tank/.zfs/snapshot -type d
, it is well known that your system will fail due to running out of memory. And yet you can run a perfectly stable system if you just simply avoid doing that.
And to the OP, I forget if the above type of problem was ever mentioned, but you could try to figure out if something like that is happening. (not sure how... maybe just make a cronjob that logs "
ps axl" and "
ps aux" output, assuming that the problem process will run long enough to find).
Other random ideas that came to mind:
Maybe exporting and importing the pool or some other similar hack would clean up the memory.
Deleting all but a few snapshots would prevent the above type of problem (I would not be happy with this solution)
You should post a detailed question on the freebsd-current or freebsd-fs mailing lists to get attention of developers, because they would know how to track down the root cause of the problem. You could also try updating to the latest 9 code, in case that is what the freebsd-current people expect you to be running. (but expect some people to be annoyed if you update between questions, because then their diagnostic commands they want you to run won't apply to your new version anymore and give inconsistent information).