Many large websites ... only allow certain specific crawlers like Google and Bing to include their webpages ...
Why do webmasters deny privacy-oriented search engines access to their sites?
And I've been doing the same thing, for the last ~15 years, in effect. But as I'll explain below, you have the reasons for the denial all wrong. I look at my web server logs, and I find that a large fraction of traffic comes from crawlers. Then I look at each crawler, and make decisions. If it is a reputable search engine, I allow it access (in my cases, not to everything, I don't allow them to load pictures). If it is clearly a hack attack (looking for scripts etc.), I completely deny them. I also deny all crawling from Russia and China, because my web page is intended for family and friends, and I have no family and friends in those countries, and (for lack of language skills) I can't validate whether crawlers from those countries are reputable. If a crawler ignores robots.txt, it gets immediately blocked (by IP address, and I don't bother being surgically accurate, I typically deny the whole IP range used by the crawler's company or hosting provider).
When I say "reputable", I mean: the crawler honors robots.txt, it doesn't crawl excessively, it doesn't probe for vulnerabilities, their web site has clear instructions for webmasters about how to control crawling. In particular, I try to not block crawling by academic researchers, unless they get out of hand.
The real problem is that the crawler space is dominated by attackers, and completely incompetent want-to-be search engines. Real search engines (Google, Microsoft, Yahoo when that was still a thing) are really good at crawling, very efficient, and have minimal impact. They honor robots.txt, and typically give you feedback on what they see. The hacking attackers are obviously no good, and I block them quickly. I sometimes wonder whether some of the crawl entries where attempts at low-level DoS attacks, they were so impossible to explain otherwise. I finally gave up, and configured my robots.txt to only allow Google and Microsoft crawlers, since trying to stomp out all the evil/stupid ones was too much work. But here are two anecdotes:
Friends of mine are the founders of the (failed) search engine CUIL, which was founded by a former search engineer/manager from Google, and her husband, who was a computer scientist from IBM who specialized in semantic web. And I had to block CUIL, because their crawler was a god-awful mess: it ignored robots.txt and walked into directories it shouldn't have, it crawled a few files every minute or two even though they hadn't changed, and so on. While they were not evil, they were incompetent. Second anecdote: I used to work at IBM, and our research lab had a very high speed connection to the world-wide internet. One day I saw an enormous number of crawls of my personal web site coming from the public IP address of the lab where I worked. And clearly it wasn't a web browser, it was a crawler gone insane. First step was to block it at the IP level. A little investigation showed that it was a young researcher, trying to crawl the web looking for some form of content, except that they forgot to rate-limit the spider, and forgot to not crawl the same resource multiple times (which is admittedly hard if you are using a large distributed system), and were just generally a big mess. Fortunately, colleagues got them to stop before they got in big trouble.
So the answer to your question is: You are jumping to conclusions. I don't know webmasters who deny
privacy-oriented search engines because of being anti privacy. I know webmasters (including myself) who deny crawling that is damaging and pointless.