use GPT labels

oliver@

Developer
Hi,

how do I replace my "raw-labels" with GPT labels?

Code:
# zpool status
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0h9m with 0 errors on Thu Jan 31 08:52:09 2013
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0

errors: No known data errors
#

for ada0p3 I have the label "disk0":

Code:
3. Name: ada0p3
   Mediasize: 995909818880 (927G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 82944
   Mode: r1w1e1
   rawuuid: 52ab6e74-d988-11e0-b15c-90e6baccba76
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: disk0
   length: 995909818880
   offset: 4295050240
   type: freebsd-zfs
   index: 3
   end: 1953525134
   start: 8388770

but below /dev I have no "gpt" directory. So when I use this with zfs replace:

Code:
# zpool replace zroot ada0p3 gpt/disk0
cannot open 'gpt/disk0': no such GEOM provider
must be a full path or shorthand device name
#
 
problem found... gpt labels disappear when the "raw labels" are used. so - remove the disk from the zfs pool and the label should be there.
 
Hey,

did you specify to have the label created back when you partitioned it, or did you modify in the label afterwards? Because I´ve noticed that if it´s the latter, you have to yank the drive out and plug it back in for the label to actually appear. Also I´ve noticed that the label gets "hidden" if the real device is in use to protect from accessing the device from more than one place at a time. If you export the pool, try looking for the labels in:
# ls -lah /dev/gpt

If they´re there, you can import the pool back in with:
# zpool import -d /dev/gpt zpool
And it´s supposed to use the labels instead.

/Sebulon
 
@Sebulon:
yeah, the label was hidden:

Code:
# zpool detach zroot ada1p3
# zpool attach zroot ada0p3 gpt/disk1
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'zroot', you may need to update
boot code on newly attached disk 'gpt/disk1'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

        gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

# zpool status
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 31 09:18:22 2013
        172M scanned out of 17.0G at 7.16M/s, 0h40m to go
        172M resilvered, 0.99% done
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            ada0p3     ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0  (resilvering)

errors: No known data errors

No longer ZFS but any idea why gmirror does not show the gpt labels but the devices instead in "gmirror status"?

Code:
# gmirror insert swap gpt/swap1
# gmirror status
       Name    Status  Components
mirror/swap  DEGRADED  ada0p2 (ACTIVE)
                       ada1p2 (SYNCHRONIZING, 9%)
 
oliver@ said:
No longer ZFS but any idea why gmirror does not show the gpt labels but the devices instead in "gmirror status"?

I´ve also noticed that but I don´t remember if I ever found any answers to it. I think it´s bad though, it could lead to you replacing the wrong drive. What you thought was ada0 actually was ada1 at the time and you trash the entire system because of it.

Why no ZFS any more?

/Sebulon
 
zfs mirrored swap? Thought this is not possible this is why i have beside ZFS for my data, gmirror for swap (zfs no more = zfs topic is clarified but now i have a gmirror question ;))
 
Of course you can! But you handle it a bit differently than you think. Just create one big partition over the disk, create your pool and then use a zvol as swap.

# zfs create -s -V 8G -o org.freebsd:swap=on -o primarycache=none -o secondarycache=none pool/swap

# swapon /dev/zvol/pool/swap
# echo 'zfs_enable="YES"' >> /etc/rc.conf

This way is a lot more dynamic:)

/Sebulon
 
Be aware that in low memory situations a ZFS swap volume may result in a deadlock where a separate swap partition would not.
 
@kpa

Could you please back that up with an example of where that actually has happened? Personally I´ve heard a lot of rumors but never seen any account of it. As for myself, I´ve an old raggedy P4(i386 obviously) with 2GB RAM, serving SAMBA(domain member with winbind), AFP(time machine) and NFS at the same time:
Code:
[CMD="#"]uptime[/CMD]
3:30PM  up 56 days,  6:47, 2 users, load averages: 0.85, 0.80, 0.65
[CMD="#"]swapinfo[/CMD]
Device          1K-blocks     Used    Avail Capacity
/dev/zvol/pool1/swap   4194304   136676  4057628     3%

Solid as a rock, and would have had an even longer uptime if not for that pesky power-out:)

/Sebulon
 
The trouble is that the ZVOL is cached in ARC cache and allocation of more swap allocates more ARC cache leading to allocation of more swap, merry go round. That's how I remember it being explained. Can't remember right now where I read that, could have been on freebsd-fs/freebsd-stable mailing lists.

Edit: I believe manually restricting the size of the ARC cache prevents the above scenario. The ARC cache is in "wired" memory and never swapped out as far as I know so setting an upper limit to its size should prevent it from growing so large that nothing else no longer fits in physical memory.

Edit2: NM, in your example you turn off caching alltogether so what I'm saying does not apply... It's probably quite safe to use swap on a ZVOL with those settings if you also do the manual ARC cache size tuning :)
 
You don't have to perfectly anticipate swap partition size when your partitioning your drives. You can delete and resize the zvol at any time. And on modern systems you really don't want to use swap really at all. In all practicality it's more of a safety net in case of memory leak.

And with a zvol just because it's allocated doesn't mean it's actually taking up any space. So you could have a zvol of 1TB if you wanted and it wouldn't take up any drive space until it was needed.
 
Other cool features as well, like ZFS´s checksumming and automatic compression.

Just like you should have ECC RAM for error correction, when the system really needs to swap something out, it´s super to have checksumming and error correction there as well. Plus with compression, the data the needs to be swapped gets smaller and takes a shorter time to complete, thus speeding up the system a little.

/Sebulon
 
Ok, so all things you probably do not need for SWAP except maybe checksumming.

Compression? For Swap? Really? I doubt that this gives you anything regarding to speed. If otherwise please prove that.
Dynamic Sizing? What for? You normaly should only have swap to be able to dump your core if you run into a panic. For "production use" - buy more memory. And in general... Disk space is cheap so who cares about some GB more or less?

Checksumming... ok, but best is to avoid swapping situations in generall and this is more for high-end-professional systems - or do you have ECC memory in your home-servers? ;)

Sounds all a bit constructed benefit-wise ;)
 
And, as was recently pointed out to me by Jeremy Chadwick, in a panic it's really not a good idea to be writing to a filesystem at all. If you plan to get core dumps, put swap on a bare hard drive partition.
 
Swap in and of itself is contrived with memory as cheap as it is and 64bit memory space.

There is a post on this forum where someone ran performance tests and sure enough so long as your using lzjb compression your pretty much getting it for free; it takes longer to write the extra sectors to disk than to compress them. Only way to know is to test it for yourself.

Again you really don't need swap unless you have a memory leak. Just add more ram. But memory leaks do happen, particularly with databases and web servers so a very large swap can be beneficial to stability until bugs are fixed. It can give you more time in-between daemon restarts and reboots, particularly if it's a slow leak. The leaked memory just gets swapped out to disk since it's not being actively used.

You can't use a zvol for a core dump. Not sure how many users really need it though. But if you really need it for core dumps there is no need to make the swap partition larger than memory. Just add a zvol for auxiliary swap.

Many systems are installed on SSD's with limited disk space. And ZFS is used on these disks because of it's features such as compression and cloning. By using a zvol you essentially have swap if you ever need it (for swap, not a core dump) for free as it won't take up any actual space on disk until it's actually used.

ZVol's make sense where you are using ZFS and debating with yourself if you even need a swap partition.
 
Agree with BlueCoder.

With memory prices today, swap should theoretically be unnecessary, if (a) applications were bug free, no memory leaks, and (b) we knew how to size our hardware to the task. In practice, a few dozen GB of swap will keep a system running (slowly) even with leaky programs, and even if there is a temporary overload condition.

On the other hand: if your servers (database, backup, web, ...) are leaking memory, you'll need to restart them regularly anyhow. If you can afford to restart them every N hours (say every 168 hours) by using swap and running the system a little more slowly, why can't you just afford to restart them every n hours (say 24) instead? And if your server can be restarted regularly (meaning your system is not doing its jobs for a small while), then why not reboot the whole system regularly, which clears out all manners of other lint that accumulates in an OS?

Another counter-argument: If the memory footprint of your workload is occasionally larger than the hardware RAM, swap allows you to crawl through that. But it also means that if someone presents a workload that is larger than RAM+swap, it will take much longer before the offending application (or the whole system) crashes. Example: You have 16GB of RAM, and 100GB of swap. If someone wants to run a 120 GB memory usage program, it will run, but take forever, and other users of the system may be inconvenienced. If someone wants to run a 140 GB memory usage program, it will crash, but it will first take forever to get to the point where it uses 132GB, and other users legitimate programs might get crashed as a side-effect. If you knew ahead of time that nobody should be using more than 10GB, and both the 120 and the 140 GB programs were a mistake (pilot error, unreasonable expectations, or buggy software) why did you even enable swap?

I also like the argument that if swap is used mostly in emergency low-memory situations, and I don't want anything complicated in my code path at that time. For that reason, I would prefer to swap through the simplest mechanism, for example directly to a partition. While it is certainly possible to swap through ZFS to a mirrored and compressed zvol, it seems to be taking unnecessary risks. In particular since the deadlock situation (when the very last page of memory needs to be swapped, but ZFS's ARC algorithm first needs one spare page of memory to allow IO to happen) might actually exist (I don't know the ZFS source code, so I can't prove or disprove that this deadlock exists, but it is common in file systems).

On the other hand, if your application's working set is large and variable enough that swap is a way of life, then by all means, use the best and most effective swap mechanism, even if that includes mirroring and compression.
 
BlueCoder said:
There is a post on this forum where someone ran performance tests and sure enough so long as your using lzjb compression your pretty much getting it for free; it takes longer to write the extra sectors to disk than to compress them. Only way to know is to test it for yourself.

That has been my experience as well; you basically get it for free, so you might as well use it:)

oliver@

Yes, I have ECC RAM in my home servers. It´s not paranoia, it´s just reality:
https://forums.freebsd.org/showpost.php?p=206888&postcount=7

And while they aren´t able to be used for dumps, any system occasionally needs to swap a little something out, and that´s when a zvol fits in nicely.

/Sebulon
 
I generally agree that Swap is "optional" today. But I just keep to have it - and this on gmirror as I still do not see the benefit from using ZFS for it - I probably do not use my SWAP but it does not hurt me to reserve 4GB disk space permanently for SWAP. For me being not able to write a crashdump to it weights more than 4GB of permanently reserved SWAP while my zroot uses 18GB out of 1TB

Regarding SSD - yes I run a SSD system as well (but with UFS). SWAP is turned off completly on it as I fear the downside of permanent write cycles pushed to my SSD in the event of swapping.

ralphbsz said:
Agree with BlueCoder.
Another counter-argument: If the memory footprint of your workload is occasionally larger than the hardware RAM, swap allows you to crawl through that. But it also means that if someone presents a workload that is larger than RAM+swap, it will take much longer before the offending application (or the whole system) crashes. Example: You have 16GB of RAM, and 100GB of swap. If someone wants to run a 120 GB memory usage program, it will run, but take forever, and other users of the system may be inconvenienced. If someone wants to run a 140 GB memory usage program, it will crash, but it will first take forever to get to the point where it uses 132GB, and other users legitimate programs might get crashed as a side-effect. If you knew ahead of time that nobody should be using more than 10GB, and both the 120 and the 140 GB programs were a mistake (pilot error, unreasonable expectations, or buggy software) why did you even enable swap?

Because of ulimit, this is a bad argument, find a better one ;) SCNR
 
Back
Top