Other How to create a non-sparse file of 10G.

I want to create a file which is non-sparse, i.e. real allocation on disk but this does not work
Code:
touch /zfile
dd if=/dev/null of=/zfile bs=1G count=10
Why and what is the solution ?
 
In your dd command, you are trying to read from /dev/null. That immediately returns EOF, so your dd command will exit after zero bytes. You need to read from something that can be actually read from, like /dev/random (as VladiBG suggests), /dev/zero (which is readable and always returns binary zero), or an existing file.

Having said that: ZFS is actually an interesting beast, in that it automatically compresses whole blocks that contain only zero, and doesn't store them on disk. So I think reading 10G worth of zeros from /dev/null and writing them to a ZFS file will create a sparse file. As far as I know, there is no way to even turn that "zero compression" off. So the easiest way to create a file that actually uses 10G of space is to use random data.

The underlying question is this: What are you actually trying to do? If you look at the POSIX-style file system API, then you can't actually tell that some files are sparse. The original use of the sparse file mechanism was to not bother long runs of zero on disk, in the case that the writing of the file leaves gaps. The classic example of a sparse file is an application that creates a file, seeks far ahead, and writes a small amount. The "size" of that file (which is defined as the location of the last readable byte) is now at the end of that write, and all the bytes that were skipped over read as zero, although they were never written. You can achieve the same effect using certain truncate calls. But with normal file system calls (open, read, stat ...) you can never find out whether the blocks that read as zero were actually written as zero, or are sparse areas that are logically filled in. You can get a summary glimpse at that, by comparing the total space allocation of the file (in st_blocks) to its size, but you can't find out which areas of the file have never been written.

Fundamentally, this way of not storing data that has never been written is a very simple form of compression, which was super easy to implement in early Unix file systems, and reasonably effective. Now ZFS (and some other file systems) have added better forms of compression. One of them is suppressing long runs of zero, which is sort of a superset of the traditional sparse file mechanism. There are many other forms of compression today. Depending on why exactly you want a non-sparse file, you might want to do different things to defeat compression.

One example: Databases are often implemented by organizing a big disk file into small blocks (commonly 4K pages), and writing them in a random access pattern, but always a 4K block at a time. If they start with a zero-size file, they will have highly variable latencies when writing: overwriting an existing block will have only the latency of the disk access, but having to extend the file (increase its size) or writing a sparse block that has never been written will take longer, as the underlying file system will also have to allocate new space and update (and write to disk) some allocation metadata. So what some file systems do: when creating a database, they actually write all "empty" blocks to disk, but each block contains a little bit of information that is non-zero (for example the log serial number under which this block was last updated, to maintain consistency). This would be a pretty good technique to create a non-sparse file.
 
I used /dev/random. I needed a continous file to create a zpool on.
Freebsd need a device "/dev/almost_random", that would be good enough for this purpose.
PS: For mongodb,redis & postgresql i use a recordsize of 64K. For mariadatabase datafiles a zfs recordsize of 16K.
 
Yes, the problem with /dev/random is that it can be a serious speed limiting factor. On some implementations (not FreeBSD) it can block for a long time, or return a short read.

Faced with your need to avoid compression, I would create a smaller file of random data (large enough to defeat compression algorithms) and re-use it.

This is on my VM, resident on SSD.

Firstly, make sure that du(1) is being honest:
Code:
[ritz.299] $ time dd if=/dev/zero of=zeros bs=1m count=100  
100+0 records in
100+0 records out
104857600 bytes transferred in 0.057730 secs (1816347697 bytes/sec)
    0.06s real     0.00s user     0.06s system
[ritz.300] $ du -h -s zeros
512B    zeros
So 512 bytes consumed for 100 MB of zeros. The file system is compressing, and du(1) knows it.

Next compose the file of identical random chunks:
Code:
[ritz.318] $  time dd if=/dev/random of=rand bs=1m count=1
1+0 records in
1+0 records out
1048576 bytes transferred in 0.014546 secs (72084617 bytes/sec)
    0.01s real     0.00s user     0.00s system
[ritz.319] $ n=10240
[ritz.320] $ time while [ $n -gt 0 ]
> do
> cat rand >>rand10G
> n=$(($n-1))
> done
   53.48s real     0.44s user     1.91s system
[ritz.321] $ du -h -s rand10G
 10G    rand10G
We have used random data to fill the 10 GB file at 192 MB/sec.

I'm not sure what the compression buffer size parameters are for ZFS, but, as du(1) indicates, the 1MB "chunk" used above is enough to defeat compression.
 
If you just want a 10 GB file without allocating blocks for 10 GB, use this:

C:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define SIZE 1024*1024*1024*10-1        /* 10GB */
#define FILE "FILENAME"                 /* filename */
#define EMPT ""                         /* empty string */

int
main(void)
{
        int fd;

        if ((fd = creat(FILE, O_CREAT)) < 0) {
                perror(EMPT);
                exit(1);
        }

        if (lseek(fd, SIZE, SEEK_SET) == -1) {
                perror(EMPT);
                exit(1);
        }

        if (write(fd, EMPT, 1) != 1) {
                perror(EMPT);
                exit(1);
        }
      
        exit (0);
}
 
Last edited:
Back
Top