Read your description. It is lacking lots of detail.
Are you claiming that ZFS accepted a write() system call, returned normally, did not return an error like ENOSPC, but did not write the data (in effect silently truncating the file)? Or are you claiming that Postgres ignored an error from the file system? In the first case, ZFS has a serious bug (sounds unlikely); in the second case, Postgres has a serious bug (also sounds unlikely). You ask "how can this be fixed"; the answer to that depends completely on whether the problem is in ZFS or in Postgres (or somewhere else).
Exactly that is the question. I am not claiming anything (because claiming is a business for lawyers and I am bored about the increasing lawyers talk in IT), I do only see that something has happened which quite certainly should not happen. Besides that, I agree to Your description: either ZFS did not execute the write neither return the syscalls as erroreous, or postgres did not honor the errors and rollback the transaction. Each of these would be a wrong behaviour.
The problem is, forensically I know of no way to figure this out. Therefore I just posted to both parties.
Concerning details, I could post bulks of system configurations, but currently I do not see which of these might actually be useful.
Did you set a quota, or a refquota?
If quota, do you have snapshots?
If quota+snapshots, assumptions like "I can rewrite a file on a full disk, so long as it is not growing" may not hold true. Compression complicates the matter further.
Set quota, no refquota, no snapshots, no compression.
And, the issue is *not* to successfully rewrite a file on an almost full disk. The issue is, the data *did* grow probably over quota, and *should* have run into an error. In such a case, the error must be caught on the application level, which means in this case, the database is expected to rollback the current transaction and then (more or less) stall operation. This did not happen, instead, the database closed the transaction without error, and tried to continue work on a broken file, thereby loosing payload data.
So, just like ralphbsz described, either the ZFS did not properly report the error condition up to the application, or the application database did not properly recognize and react on the error condition.
Allow for some 'extra' space. Remember that ZFS is a
Copy-on-Write filesystem.
Yes, I know this. But sometimes data grows, and some config is not adjusted beforehand, and then things run against limits. This is why we are using databases with transactional security, so that in such a case things get stopped in a defined fashion without data loss. If we were always perfect with all our configuration, there would probably not even be a need for transactional security.