Unbound: Memory consumption very high, gets killed

Snurg · Jan 24, 2021

When I put my adblocker list (1.6M entries) into unbound in this format

Code:

local-data: "somedomain. IN A someip"

unbound consumes more than 600MB with a blocklist file size of 88MB.

I would like to try this format, which also blocks subdomains:

Code:

local-zone: "somedomain" redirect
local-data: "somedomain A someip"

But when I try to start unbound using such a blocklist (155MB), after some while I just get a message "Killed.".
Free memory according to top is ~1.7GB.

How much memory would unbound need for the second variant?
Could possibly BIND do that with less memory usage?
Do I have a chance at all to run that list on my small 4GB Atom router?

obsigna · Jan 25, 2021

For DNS based blocking, you want to use in unbound.conf the directive:
local-zone: "void.example.com" static

The explanation on why is given on the GitHub repository of my dns/void-zones-tools.

GitHub - cyclaero/void-zones-tools: Prepare a list of void zones that can be readily feed into Unbound on FreeBSD

Prepare a list of void zones that can be readily feed into Unbound on FreeBSD - cyclaero/void-zones-tools

github.com

Even if unbound would use as much memory for each entry like for your directives, it would total to much less, because with the void-zones approach, you don't need to list subdomains, and the hosts2zones utility of said tools strips unnecessary subdomains from the input files.

Snurg · Jan 25, 2021

Wow, I didn't find about that "void" directive, even though I read the looong unbound man pages several times.
Probably I was trying to digest too much at once.

So, I'll try the method you described and compare the results.

I am particularly curious whether this results in overblocking, or (hopefully) no major negative consequences, just kill the last few ads creeping through.

However, my intention is actually to redirect the requests to a specifically configured webserver jail, which serves empty documents and logs the ad requests, maybe even archives the requested javascripts for possible later analysis. Some stuff like miners and session loggers might be particularly interesting to look at.
Background is, I always was curious which companies sell me how many ads, and what kind of stuff gets executed by my browser without my knowledge.
Also, in my subjective experience, serving an empty document is way less intrusive than "failing the connection", as the latter results in frequent annoying error messages littering the browser screen, while with the former usually ads are just quietly gone.

I was using the solution of @Beeblebrox using the yoyo.org blocklist for a while.
But the many ads seeping through led me to making my own blocklist script a year ago, which allows also whitelisting.
If I had known about your solution, I probably would have tried it.

My script is configured via two files, a blacklist and one whitelist file.
So I instantly looked curiously into my blacklist to look whether I can find some new good blocklists

And I finally found the url of the unzipped version of the MVPS list there, which I was looking for unsuccessfully before

Currently there are ~55 active blacklists in my blacklist and one whitelist in my whitelist.
When reviewing the list yesterday I removed 6 blacklists that got inactive.
Now I noticed that I haven't yet added the PiHole whitelist... one more todo

While I am looking through your repo files, I learned some useful things, too.
So it seems 1.1.1.1 in blocklists is to be interpreted as whitelisted... thanks, I didn't know that. Will update my code

As I also need to improve my scripts' user documentation, I expect to update its repo these days.
Thank you obsigna , the information from you is extremely helpful

obsigna · Jan 25, 2021

Snurg said:
However, my intention is actually to redirect the requests to a specifically configured webserver jail, which serves empty documents and logs the ad requests, maybe even archives the requested javascripts for possible later analysis. Some stuff like miners and session loggers might be particularly interesting to look at.

I open the website inspector of my browser and because of the NXDOMAINS errors, I see immediately which stuff had been blocked. For example 50 of in total 129 files on todays front page of CNN:

Bildschirmfoto 2021-01-25 um 08.42.06.png

However, you're right, for offline statistics, this approach is a little bit cumbersome.

Already in my first answer, I was in doubt, that a switch to void-zones alone would solve the memory consumption problem, and I once again would like to point you to the host2zones utility which strips off duplicates and unnecessary subdomains. I am almost sure, that your 1.6 M hosts would end up into less than a million of zones — which would with respect to blocking give the same results, but consumes less resources.

Snurg said:
So it seems 1.1.1.1 in blocklists is to be interpreted as whitelisted... thanks, I didn't know that.

This is my invention and it is used by the host2zones utility for whitelisting, which actually means suppressing any zones form the output which are whitelisted.

Snurg · Jan 25, 2021

That looks very nice!
NXDOMAIN is definitely a better solution than "connection refused".
I will add another option in my script to use the NXDOMAIN method instead of redirect.
Very good tip, thank you very much!

From looking at the sources, my feeling is that the merge subroutine in my script works very similar to your host2zones module.
Both build a tree/trie, yours using C pointers, mine using Perl hashes of hashes.

Your feedback also made me aware that I need to add another pass, if the option to block subdomains is given.
Just chop off all nodes above the first one and take what remains.

I am already curious whether the result will be small enough to fit

Big thanks, obsigna !

Unbound: Memory consumption very high, gets killed

Snurg

obsigna

Profile disabled

GitHub - cyclaero/void-zones-tools: Prepare a list of void zones that can be readily feed into Unbound on FreeBSD

Snurg

obsigna

Profile disabled

Snurg