C how db package handles large data

balaji18 · Jan 14, 2021

Hi,

I would like to understand how the db package with btree algorithm ( dbopen ) handles data that is greater than one block size. I assume the block size to be the pagesize, which is 4096 bytes. From the man pages (man dbopen)

"Key and data byte strings may reference strings of essentially unlimited length although any two of them must fit into available memory at the same time. "

For example, how about storing 4103 bytes of data using the btree algorithm, assuming both key and data are of the same size, which is 4103.

--Thanks.

Emrion · Jan 14, 2021

I don't think you will find here a specialist of the FreeBSD implementation of the btree algorithm.
You can try to get an answer in the sources:
/usr/src/include/db.h /usr/src/lib/libc/db/db/db.c /usr/src/lib/libc/db/btree/bt_open.c
(...)

balaji18 · Jan 15, 2021

Thanks Emrion

Going thru the sources, the page size is a one time item, which is kind of config item used at start, passed by the client or the calling code. The base condition is minimum page size of 128 and if less, the code sets to 128. So if the data is 4103 bytes, the remaining 7 bytes(128*32=4096) is stored in a new 128 block with 121 free spaces. The larger the page size, more is the free space. So it is advisable to use smaller page size which leads to the need of design decision taken upfront regarding the page size. What is unavoidable is the free space.

I am open to any idea on reducing the free space further, other than making the page size smaller. Assume the incoming data is already compressed.

Emrion · Jan 16, 2021

Glad that my post has been useful for you.

If I understand correctly what you wrote, the maximum lost free space is 127 bytes by operation. It seems to me very acceptable. Why do you want to optimize such a thing?

I would rather optimize the calls to dbopen() depending of what you want to achieve and what are the constraints.

balaji18 · Jan 18, 2021

With reducing cost of storage, 127 bytes isn't a big deal. The size is set at first. The default size taken is 4096, which is the page size(returned from pagesize command). So overflow or chain blocks are also of 4096 bytes, hence more free space. The db package is modular. The application design and code needs to be changed so that dynamic or variable length data that is more than few hundred bytes, needs to be put in smaller sized blocks to minimize the free space.

Thanks to Emrion. I think this thread can be marked as resolved/closed.

C how db package handles large data

balaji18

Emrion

balaji18

Emrion

balaji18