New hardware database for BSD-systems

A new database of supported hardware for BSD systems from the creators of the Linux-Hardware.org has been opened. Among the most popular features of the database are the search for drivers for devices, operability tests, anonymization of collected system logs, and statistical reports. The options for using the database are diverse - you can simply list all the devices on board, you can send logs to the developers to help fix a bug, you can save a “snapshot” of the current state of the computer for the future, so that you can compare with it in case of system failure, etc.

Search for drivers is performed with the help of the lists of supported devices extracted from all FreeBSD, OpenBSD and NetBSD kernel versions.

As for Linux systems, the database is replenished using the hw-probe program (version 1.6-BETA was released specifically for BSD). This program allows you to abstract from the differences between BSD-systems and display a list of devices in a single format. Recall that, unlike Linux, in BSD systems there is no single way to list PCI / USB and other devices. In FreBSD, pciconf / usbconfig is used for this, in OpenBSD, pcidump / usbdevs, and in NetBSD, pcictl / usbctl.

Among the tested supported systems: FreeBSD, OpenBSD, NetBSD, MidnightBSD, DragonFly, GhostBSD, NomadBSD, FuryBSD, TrueOS, PC-BSD, FreeNAS, pfSense, HardenedBSD, FuguIta, OS108 (if your system is not listed, then please let us know).

Please participate in BETA testing and replenishment of the database. This will greatly help the project at an early stage. Instructions for installation of the database client program and creating your hardware probe: https://github.com/linuxhw/hw-probe/blob/master/INSTALL.BSD.md

UPDATE 1: the telemetry package can now be installed by pkg install hw-probe and executed by hw-probe -all -upload.
UPDATE 2: the dump of the database is now available in the GitHub repository.
 
Last edited:
The github page advises this:

curl -s https://raw.githubusercontent.com/linuxhw/hw-probe/master/hw-probe.pl | perl

This is very bad form.
 
There is a giant fundamental problem: you ask people to take a large piece of code (~18K lines of pretty dense perl), and run it as root, knowing full well that the results will be uploaded. That requires a level of trust that is unrealistic. Matter-of-fact, if I did this on any computer owned by any of my employers (current or past), I would get fired - on the spot. Matter-of-fact, I strongly suspect that running this script at work would cause my phone to ring within a few minutes, because my employers typically have reasonable privacy mechanisms. So your results will be biased towards (a) people who don't care about security, and (b) computers that are either privately owned, or installed in places where they don't care.

On the other hand, I understand why you are doing this, and it makes perfect sense. What you need to do is to get trust, and demonstrate to the user a chain of trust. Let's begin with: Associate yourself with an organization that has a sterling reputation ... which github isn't (anything can be stored on github). On the web site and in the script, identify very clearly who you are (names, locations), and make an argument that you are trustworthy. As far as I can see, the web site has no identifying information whatsoever. By using whois, all I can see that it is registered to an anonymous organization in Russa, not a country that has a sterling reputation in computer security circles. Looking at the author of the script, I see that they are employed by an organization that's incorporated in Switzerland, headquartered in Singapore, and staffed with Russians, again something that does not create a warm and fuzzy feeling. Now I happen to know what your employer A... is (and I don't have any reason to believe that they are evil), and I think you are trustworthy, and there is in reality nothing wrong with Switzerland, Singapore, or Russia, but: Guilt by association. You need to do a much better job demonstrating that you are of good will to people, since otherwise, all the alarm bells will go off.

Having said that ... let's test the script (I know how to sandbox a machine):
Code:
> mkdir foo
> ./hw-probe.pl -save foo
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LC_ALL = (unset),
    LC_COLLATE = "C",
    LANG = "en_US.utf-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Wrong, my locale is perfectly supported and installed. Yes, I know, that's not your code, that's perl doing it. But you are responsible for the cosmetics of your product.

So try again as root:
Code:
# ./hw-probe.pl -save foo
ERROR: can't access '/root/HW_PROBE/LATEST/hw.info', please make probe first
I was expecting that the program would just run when run with minimal arguments ... instead of asking me to make a probe first, it could just do it. Also, the error message could be clearer, perhaps: "Before saving the results, you have to make a probe. The probe is usually stored in /root/..., and I can't find it. Please make a probe first, with the command ...".

So try it with -probe:
Code:
./hw-probe.pl -probe
ERROR: 'dmidecode' package is not installed
ERROR: 'hwstat' package is not installed
ERROR: 'lscpu' package is not installed
TIP: install missed packages by command (execute it by adding `-install-deps` option):

     pkg install dmidecode hwstat lscpu
No, those packages were never installed before. I have never needed them. Asking me to install packages is a pretty heavyweight step ... what if they break something (yes, I've had smartctl crash my computer before!). And offering to install packages for me, while a good idea from one viewpoint, is also quite troubling.

And this is where I stopped. Honestly, I don't think this will be going anywhere. Which is sad, it would be nice to have a good database of hardware configurations that people have had good experiences with.
 
I will not run this on any of my systems, for the reasons Ralphbsz explained so well.

Break it up into two steps: One runs the probe and generates a plain text report. Another uploads the report to your servers. Better still, allow me to upload the report using a standard tool like curl(1).

This would give me the chance to review and edit the report to make sure no details I don't want leaked are uploaded. Right now I'm expected to trust that you will do this correctly.
 
You can create the probe in three steps:

1. Collect data:

hw-probe -all

2. Verify & edit:

ls /root/HW_PROBE/LATEST/

3. Add to database:

hw-probe -upload

Break it up into two steps: One runs the probe and generates a plain text report. Another uploads the report to your servers. Better still, allow me to upload the report using a standard tool like curl(1).

This would give me the chance to review and edit the report to make sure no details I don't want leaked are uploaded. Right now I'm expected to trust that you will do this correctly.
 
While your suggestions are going in the right direction, they still leave many concerns unresolved.
One runs the probe and generates a plain text report.
That's still thousands of lines of code that need to be run as root. I will simply not download code to be downloaded as root from a source that's not 110% trustworthy. Sure, I need to download the kernel ... but I trust Kirk and his friends. Why? Because I've known Kirk and his friends for decades, and I think I know what makes them tick, what rewards they work for. In this case, I get code from an opaque source, without clear chain of trust.

And: How are you going to make sure that this first program really only makes a list of installed hardware? It could clandestinely transmit information. It could install trojan horses. It could damage data. It could ... do so many things. Really large organizations might be able to validate that this software is "clean" by going a line-by-line code inspection. For 10K lines, that's going to take hundreds of many hours.

Another uploads the report to your servers. Better still, allow me to upload the report using a standard tool like curl(1).

This would give me the chance to review and edit the report to make sure no details I don't want leaked are uploaded. Right now I'm expected to trust that you will do this correctly.
The OP had said that the report will be anonymized, by having the top few words of SHA-encoded data. You will not be able to understand those in the hex dump. It won't say "Intel FooLake CPU, 512gig RAM, and LSI disk controller model 12345", it will say "0xABCD, 0x4321, and 0x3579". And if we report the hardware in cleartext (like I did above), it will give away many secrets. Imagine you find that company X is buying all its disk drives from Seagate, in spite of the fact that it claims in press releases to have entered into an exclusive agreement with WD. That would be fascinating, or perhaps lawsuit worthy. There are so many more examples about information leakage from this preventing it, not even worth discussing.
 
First of all, thanks for such a great review. I agree with everything you said.

There is a giant fundamental problem: you ask people to take a large piece of code (~18K lines of pretty dense perl), and run it as root, knowing full well that the results will be uploaded. That requires a level of trust that is unrealistic. Matter-of-fact, if I did this on any computer owned by any of my employers (current or past), I would get fired - on the spot. Matter-of-fact, I strongly suspect that running this script at work would cause my phone to ring within a few minutes, because my employers typically have reasonable privacy mechanisms. So your results will be biased towards (a) people who don't care about security, and (b) computers that are either privately owned, or installed in places where they don't care.

This is why we remove ALL private info from collected logs, that can identify you or your hardware devices. The BETA version is released with the purpose to fix all possible leaks of sensitive information.

On the other hand, I understand why you are doing this, and it makes perfect sense. What you need to do is to get trust, and demonstrate to the user a chain of trust. Let's begin with: Associate yourself with an organization that has a sterling reputation ... which github isn't (anything can be stored on github). On the web site and in the script, identify very clearly who you are (names, locations), and make an argument that you are trustworthy. As far as I can see, the web site has no identifying information whatsoever. By using whois, all I can see that it is registered to an anonymous organization in Russa, not a country that has a sterling reputation in computer security circles. Looking at the author of the script, I see that they are employed by an organization that's incorporated in Switzerland, headquartered in Singapore, and staffed with Russians, again something that does not create a warm and fuzzy feeling. Now I happen to know what your employer A... is (and I don't have any reason to believe that they are evil), and I think you are trustworthy, and there is in reality nothing wrong with Switzerland, Singapore, or Russia, but: Guilt by association. You need to do a much better job demonstrating that you are of good will to people, since otherwise, all the alarm bells will go off.

This project is not related to any organization. It's just my personal contribution to the growth of Linux and BSD operating systems.

I can't note my current employer to avoid erroneous associations of the employer with the project. All I can do is list my well-known open-source projects created in the past. Done this on the contacts page on the site and in the script.

Having said that ... let's test the script (I know how to sandbox a machine):
Code:
> mkdir foo
> ./hw-probe.pl -save foo
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LC_ALL = (unset),
    LC_COLLATE = "C",
    LANG = "en_US.utf-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Wrong, my locale is perfectly supported and installed. Yes, I know, that's not your code, that's perl doing it. But you are responsible for the cosmetics of your product.

No ideas yet. Looks lite this can't be fixed in the script itself.

So try again as root:
Code:
# ./hw-probe.pl -save foo
ERROR: can't access '/root/HW_PROBE/LATEST/hw.info', please make probe first
I was expecting that the program would just run when run with minimal arguments ... instead of asking me to make a probe first, it could just do it. Also, the error message could be clearer, perhaps: "Before saving the results, you have to make a probe. The probe is usually stored in /root/..., and I can't find it. Please make a probe first, with the command ...".

Fixed in master.

So try it with -probe:
Code:
./hw-probe.pl -probe
ERROR: 'dmidecode' package is not installed
ERROR: 'hwstat' package is not installed
ERROR: 'lscpu' package is not installed
TIP: install missed packages by command (execute it by adding `-install-deps` option):

     pkg install dmidecode hwstat lscpu
No, those packages were never installed before. I have never needed them. Asking me to install packages is a pretty heavyweight step ... what if they break something (yes, I've had smartctl crash my computer before!). And offering to install packages for me, while a good idea from one viewpoint, is also quite troubling.

And this is where I stopped. Honestly, I don't think this will be going anywhere. Which is sad, it would be nice to have a good database of hardware configurations that people have had good experiences with.

This is what differs hw-probe from bsdstats: collecting maximum details about the computer hardware configuration. Base system doesn't include enough utilities to do this.

I've added experimental -nodeps option to skip missed dependencies. However curl is still required to upload data securely.
 
That's still thousands of lines of code that need to be run as root. I will simply not download code to be downloaded as root from a source that's not 110% trustworthy. Sure, I need to download the kernel ... but I trust Kirk and his friends. Why? Because I've known Kirk and his friends for decades, and I think I know what makes them tick, what rewards they work for. In this case, I get code from an opaque source, without clear chain of trust.
I don't know them and don't care. What I know is tens (hundreds?) of thousands of people have used their software, and many thousands have examined and modified the source code. It would be very difficult (but not impossible) that any malicious code in it has gone unnoticed. This project lacks that pedigree, and I'm not going to run it on any machine I consider production in any role.

And: How are you going to make sure that this first program really only makes a list of installed hardware? It could clandestinely transmit information. It could install trojan horses. It could damage data. It could ... do so many things. Really large organizations might be able to validate that this software is "clean" by going a line-by-line code inspection. For 10K lines, that's going to take hundreds of many hours.
There are ways to detect this short of code inspection, but yeah, it would be very laborious. I'd be willing to help, but am not going to try to do it all myself.
 
I do appreciate the idea behind this tool. I think we should be making better notes about supported hardware (especially since my beloved Thinkpads are getting worse quality with each revision; I will likely need to jump ship one day and will be pretty lost).

However, why does this need to be run as root? Perhaps it was just quick 'n easy "web dev" instructions that overlooked correct security?

Strip the code out that requires root privileges and more people would be willing to risk it.
 
After all, this is a script. I can read it.

Fact is: Every computer (also: Smartphone) is just as secure as the least trustworthy software on it. And today there is a lot of software on such a computer… And absolutely every single software that I start as a user can read my mails and send itself home, garnished with my SSH keys (!) etc.; Whether root is at work or not, the exciting information in my practice cannot be found in the system configuration, but can be reached as user; And so I see less of a problem with root - running this as root is not more than a stupid feeling. And even on my servers: root is only for administration, everything else is unprivileged (and that's exactly how it should be).

I install a lot of packages in binary form - no idea what I'm getting…: Was this mass of code really checked by others, or simply compiled with a "cool, works!" and offered to me as a package? But still: This "hw-probe" is open source. And that is what I finally decided to trust. If it weren't, yes: I would be skeptical, too.
 
Good question. In a moment, I will ask a deep question, namely "what is really sensible to figure out". I wonder how far one could get from just userspace. Let's see: "pciconf -l" works great. That gives us a list of the graphics devices (which are the #1 through #10 problems in compatibility!), and the disk interfaces. Just doing this would give us 80% of the gain for 1% of the effort, and it works from user space. The next one: "usbconfig list" does not work, needs to be root. But how does it help to know that something is attached in hardware, since we don't know whether it actually works? Next one: dmesg works great from user space. That tells us which disks are attached, a lot of detail of the CPU and memory, network drivers. More importantly, it tells us whether the devices that were found have drivers attached.

The tool the OP posted goes much much further though: it tries to identify the CPU chip exactly (which really doesn't matter, 99.99% of all CPUs are either Intel, AMD or Arm, and the support situation for those is crystal clear), measure the amount of memory (same argument, memory works, we don't need memtest to know whether the memory is fully functional), we don't need smartctl to get details of disks given that we already have the manufacturer, model and revision of the disks in dmesg, and so on. So here is one criticism of the OP's tool: It tries to get to 99% completeness, which requires things that are complex and dangerous and socially not acceptable, when it could have gotten 90% completeness with much less problems.

But the real fundamental problem is: What does it mean for something to be supported? For example, my machine at home contains 4gig of physical RAM (it is a 32-bit CPU). Is that RAM supported? Well, kinda sorta. It functions. But only 3gig of it is actually used on a standard 32-bit machine, which sort of bugs me (not enough to do something about, I could play around with some physical address extension, but I don't have time for that). So I would claim my RAM is not fully supported out of the box. Similar question: I used to have a PCIe attached WiFi card. It was "supported" in the sense that I could make my machine a WiFi client. But I really wanted my machine to be a WiFi AP ... and while the FreeBSD software does actually allow that, it was so buggy at the time that I gave up and switched to an external commercial AP (15 minute trip to the Apple store, done). So is the WiFi card "supported"? Hell no, and I'm actually reasonably upset at the low quality standards of FreeBSD WiFi which allowed half-broken software to get out into the field (it's OK Sam ... I don't mean you, I mean the absence of testers and bug fixers, you did what you could). But the OP's tool would have found it and claimed that it was supported.

So, if the OP's tool wants to know whether something is "supported", it really needs to ask the human user or administator: Are you satisfied with this device, does it meet your expectations? That's something that is not scriptable. Just because a device is physically attached to the computer doesn't mean it is actually working. The definition of "supported" has to be a negotiation process, and is intersubjective and transactional. There is no single correct answer ... the same hardware that I find to be unsupported (the WiFi card, see example above) others will be happy with.

And: the tool goes far beyond listing hardware attributes. It looks at flatpacks, docker, apps, CPU frequencies. This is all very dangerously close to industrial espionage. I'm not claiming that the OP is working for a clandestine source, but I think in his zeal for completeness and correctness (which is, see above, unachievable) he has gone too far in ignoring privacy and security.
 
After all, this is a script. I can read it.
Try it. It's 17,895 lines long. And it relies on extensive Perl libraries (another hundreds of thousands of lines of code), and installation of about a dozen or two dozen packages (another few million lines of code). With unknown side effects. For example, it tries to run smartctl. While on the surface that might seem sensible (you can learn more about disk drives), it is also dangerous: A few FreeBSD releases back, running smartctl on my boot SSD would cause the system to immediately crash. Known bug in the SATA layer.

So no, you can't read it. A team of several dozen experienced engineers would be able to read it, and its dependencies, if they had a month. If a tool such as this were to be used in a commercial setting (large computer company), that's exactly what would happen: before we are allowed to use it, a team of security people would audit it, line by line.

Fact is: Every computer (also: Smartphone) is just as secure as the least trustworthy software on it. And today there is a lot of software on such a computer… And absolutely every single software that I start as a user can read my mails and send itself home, garnished with my SSH keys (!) etc.
Absolutely true. And that's why trust is necessary when installing software. Trust is mostly not a technical attribute, but a social one. For example, I trust the small group of people who write FreeBSD, because I've known them for a long time (I know a certain fraction of them personally, and many of them from experience with their work product). As a corporate software customer, I can trust certain vendors (such as RedHat), because I can reason about what their goals and methods are, and see that they have no incentive to do certain nasty things. And even if I don't have trust in them, my lawyers can create contracts that make it very unlikely that such vendors will do bad things ... because if they do, they will go bankrupt, to jail, or both. This doesn't work for software written by an unknown individual, in a jurisdiction where the long arm of the law doesn't reach. Which is why I won't install their software.

(Actually, I did: I did run the OP's hw-probe.pl script several times yesterday. But that's because I know what I'm doing, I know reasonably well how to sandbox things, and I did inspect the OPs background first to get a good level of confidence that he's not a bad person.)
 
So, if the OP's tool wants to know whether something is "supported", it really needs to ask the human user or administator: Are you satisfied with this device, does it meet your expectations? That's something that is not scriptable. Just because a device is physically attached to the computer doesn't mean it is actually working. The definition of "supported" has to be a negotiation process, and is intersubjective and transactional. There is no single correct answer ... the same hardware that I find to be unsupported (the WiFi card, see example above) others will be happy with.
But then you'll have to trust the user to not deliberately include devices that are not well supported, maybe just for kicks. Maybe you work for the device manufacturer.

And: the tool goes far beyond listing hardware attributes. It looks at flatpacks, docker, apps, CPU frequencies. This is all very dangerously close to industrial espionage. I'm not claiming that the OP is working for a clandestine source, but I think in his zeal for completeness and correctness (which is, see above, unachievable) he has gone too far in ignoring privacy and security.
There are legitimate reasons to want to know CPU frequencies and model, and things like amount of RAM available. The Steam hardware survey comes to mind. Game developers like to have an idea of how large of an potential audience they're likely to get for a given set of hardware requirements. I can think of use cases beyond games too. Maybe you're working on a video codec or a novel compression algorithm.
 
Yes, I agree with your reasoning: When serving individuals, who are using their computers for human interface tasks, things like CPU speed and RAM probably make sense.

Now take this tool to a commercial setting. You know how much computer company A (say IBM, Apple, Microsoft) would want to know how many CPUs and at what speed computer company B (say HP, Amazon, Google) has? They would really want to know! Imagine how interesting it would be to the CIO of Lehman Brothers to get a complete list of the CPU speeds and disk capacities that Smith-Barney is using? (I deliberately used two tanks that no longer exist, so no feelings get hurt). And now do the same thing with the NSA and the FSB, and it stops being funny.

This tool is useful, and also extremely dangerous. The industrial and political espionage aspects are just scary. To get around a lot of those worries, dialing it down would really help: Just doing statistics on desktop users of FreeBSD, and only model numbers of video cards and motherboards.

There are also lots of techniques for correctly anonymizing information that the medical data science community has developed. I think the author might want to look at those technologies, before making this generally available.
 
Next one: dmesg works great from user space.
In general, yes, but it's advisable to have security.bsd.unprivileged_read_msgbuf=1 if you're concerned about security.
So I think it's reasonable for such a tool to be run with root privileges. Nevertheless, all the points you mentioned are very reasonable as well.
 
Agreed. The tool is just amazing...... I cannot believe how much information it takes from your computer.
Maybe its too much information to share in a database, for security reasons
Questions: Where is located the database? Who have access? Permissions?
 
No sensitive information is collected. If you take two different probes of the same computer model made by different people, you will not be able to distinguish or identify them.
 
I haven't had a look at the script yet, but I'd too prefer if it wasn't run as root - it's not just your code, it's the dependencies, even perl itself may have weird bugs (I had one program that would send perl into an endless loop if it didn't like the locale.. If it was running as root, it may have had access to more resources)

Anyway, as mentioned, dmesg. And also mentioned, dmesg could be disabled.

At default, /var/run/dmesg.boot is readable by all, and would be more useful for device probing that running dmesg, as the boottime messages will expire as the buffer gets full.

Even if this file is locked down, it could be made accessable via ACLs or group membership.

usbcontrol - I have a homegrown script that needs raw usbaccess. Even then, I don't run it as root, I have it run under a user in group "usbaccess", and then:

Code:
jamie@thompson% cat /etc/devfs.rules
# To give usbconfig(8) and libusb(3) enabled applications permission to all usb
# devices for their owner and the “usb” group, the following rule may be used.
# The first line declares and starts a new ruleset, with the name
# "thompson_local" and the number 999:
[thompson_local=999]
add path 'usb/*' mode 0660 group usbaccess
Even then it could be tightened down further.

Ok, I realise these things would require more installation configuration, but how about separating all the privilege-required commands from the script?

e.g., you could have

25 03 * * * root /sbin/usbconfig list > /var/db/hd-probe/usbconfig.list

where /var/db/hd-probe would just have read access, so something like:

drwxr-x--- 2 root hd-probe 512 13 Aug 15:13 /var/db/hd-probe/

Rather than individual cron lines, you could have a minimal script to do this, that really contains nothing but a list of fully-pathed commands that produce the output you require.

Of course, for one-off runs, the documentation could mention to call the minimal script as root, and the non-minimal script as its own user (not any other user!)

Just a few thoughts,

cheers, Jamie
 
made ssd and hd to plug in to send system hardware info
not taking chance makes us all less secure
binary system design BSD parts plus parts=SYSTEM
 
made ssd and hd to plug in to send system hardware info
not taking chance makes us all less secure
binary system design BSD parts plus parts=SYSTEM
Yep. Sending report from a LiveCD/LiveUSB (e.g. NomadBSD distribution) is secure too.

BTW Currently, the anonymization script in 1.6.b2 version of the tool is strong enough to remove all private info (if any) from collected logs.
 
BTW running the command at the OP linked site froze the mouse here
til Xorg was restarted... so if more than one tab is open, or it is not easy
to restart Xorg, be prepared to shutdown [ sometimes w/o visible
input, typing into a blank screen... ] and restart...

Not to dissuade from contributing.
 
Back
Top