FreeBSD on large scale systems?

I'd like to tap your experiences with running FreeBSD on large scale systems. What problems have you ran into, and what solutions have been deployed?

We are trying to use (multiple) FreeBSD file servers (Samba & NFS) for a site with around 100k users in the AD database, with around 16-20k user ZFS filesystems (using ZFS snapshots & rsync for server-to-server backups) per server and hundreds of concurrent users active per server - and we are running into various interesting "hurdles".

We can't be the first ones to try this? Or am I wrong? :)


Some issues we've seen so far:

1. AD is not really usable as a direct data source (winbind in nsswitch.conf) due to the huge amount of recursive group memberships that exists - looking up users group memberships takes forever causing winbind requests to queue up and requests starts to be denied.

(Looking at the Samba winbindd code one see that they call listen() with a queue limit of just 5, so even if we run multiple winbindds sometimes things just take too long).

We solved that issue by creating a local DB cache copy of the users and groups information in AD that are then stored on each server (like "db" in nsswitch but which handles things in another database). Also considered using a local NIS server (but we had problems with NIS handling some of the huge groups that exists).

2. Mountd has (had) an issue where when we create 16k new snapshots every hour it would go into a loop consuming 100% CPU and slowly grow an internal data structure one filesystem at a time and never really catching up. Fixed in a future release now.

Still have to solve the mountd issue with "suspend all NFS, remove all exports in kernel, reload all exports into kernel, unsuspend all NFS" when one adds/removes something from /etc/exports & /etc/zfs/exports - when we have 16-20k filesystems it takes a little while - and the user experience isn't directly nice when that happens...

3. read()/pread()'s suddenly taking around 2-3ms instead of 0.01ms when reading everything (including data on tmpfs-filesystems) for an hour or so. When this happens some things go *really* slow (like doing an "ls -l" on directory with the home directory due to the NSSwitch code reading /etc/nsswitch.conf, /etc/passwd and then 2 reads from the new DB - per filesystem. Every read taking 2-3ms... -> 2-3ms * 16-20k * 4 -> 3-4 minutes...
 
At this years sambaXP there was a Talk about "Samba at scale":
https://sambaxp.org/
(Day 2, Track 2 nearly at the bottom of the Site)
Direct link to the slides:
https://sambaxp.org/archive_data/Sa...100,000 user AD Domains - Andrew Bartlett.pdf

User/group parsing as per your first point is one of the topics, with some explanations what already has been done to improve performance there.

The overall recap however: samba is not yet ready to perform well at that scale, but they are working on it.
They (the samba team) are seeking help from users that are running / trying to run samba at such scale, so maybe you could contact the author (see slide #38) for some mutual assistance on that topic.
 
Back
Top