ZFS quirks=0x1<4K> for logical 512, physical 512, why?

Hi,

I don't understand why SSD's that have 512 sector size (logical and physical) have quirks=0x1<4K> mode? For example Samsung SSD 840-850 family.

Gpart thinking that SSD is 4k, then zfs thinking that SSD is 4k (ashift: 12). TRIM working really slow with deleting 20GB+ files (L(q) in gstat around million). ZFS cache going crazy if SSD used as cache dev when filled all space (showing 16.0E size and fill more and more).

I found destruction commit https://lists.freebsd.org/pipermail/svn-src-head/2014-October/063844.html that made by some "sbruno". After erasing this code and kernel recompile all started to working fine as must be with 512 byte sectors.

So where logic to do 4k quirks for 512 sector size SSD's?

Stock kernel output:
Code:
root@:~ # uname -a
FreeBSD  10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015  root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
root@:~ # dmesg | grep ada0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <Samsung SSD 850 PRO 1TB EXM02B6Q> ACS-2 ATA SATA 3.x device
ada0: Serial Number S252NXAGA08719F
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 976762MB (2000409264 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>
ada0: Previously was known as ad4
root@:~ # camcontrol identify ada0
pass0: <Samsung SSD 850 PRO 1TB EXM02B6Q> ACS-2 ATA SATA 3.x device
pass0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

protocol  ATA/ATAPI-9 SATA 3.x
device model  Samsung SSD 850 PRO 1TB
firmware revision  EXM02B6Q
serial number  S252NXAGA08719F
WWN  50025388400d00ff
cylinders  16383
heads  16
sectors/track  63
sector size  logical 512, physical 512, offset 0
LBA supported  268435455 sectors
LBA48 supported  2000409264 sectors
PIO supported  PIO4
DMA supported  WDMA2 UDMA6
media RPM  non-rotating

Feature  Support  Enabled  Value  Vendor
read ahead  yes   yes
write cache  yes   yes
flush cache  yes   yes
overlap  no
Tagged Command Queuing (TCQ)  no   no
Native Command Queuing (NCQ)  yes     32 tags
NCQ Queue Management  no
NCQ Streaming  no
Receive & Send FPDMA Queued  yes
SMART  yes   yes
microcode download  yes   yes
security  yes   no
power management  yes   yes
advanced power management  no   no
automatic acoustic management  no   no
media status notification  no   no
power-up in Standby  no   no
write-read-verify  yes   no   0/0x0
unload  no   no
general purpose logging  yes   yes
free-fall  no   no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks  yes  8
DSM - deterministic read  no
Host Protected Area (HPA)  yes  no  2000409264/2000409264
HPA - Security  no
 
The SSD probably contains at least 4KiB internal blocks but emulates a 512B sector size to stay compatible with the legacy storage stack. The firmware bug is that it lies about its optimal stripe size. This quirk detects the SSD and adds the quirk to improve performance by treating it like it reported a 4KiB sector size.

It sounds like you have a larger problem than the optimal (automatic) ashift selection. The commit mentions that TRIM only works on 4KiB sectors. If TRIM always fails quickly on a misaligned partition this could hide your problem.
 
No. This is exactly 512bytes sector size SSD's. All 830-840-850 SSD's from Samsung have 512 logical/physical size. CHEAP SSD's from other brands have 4k. Even I found this by filling space by huge amount of small files. Samsungs with 512bytes partitions can handle without any problems this files and don't lose write/read speed. All "detecting" of quirks is just manual hand patches with name of SSD as "Samsung SSD 850*". As I can see in FreeBSD sources - there no any detection of 512b/4k at all for drives/SSDs, just manual specified for each model that used by FreeBSD developers.

And about TRIM only works on 4k sector - NO again. I use sysctl kern.cam.ada.0.quirks="0" for now.

Code:
root@qu3:~ # sysctl kstat.zfs.misc.zio_trim
kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 0
kstat.zfs.misc.zio_trim.success: 165367
kstat.zfs.misc.zio_trim.bytes: 17964968448

root@qu3:~ # zdb
zroot:
  version: 5000
  name: 'zroot'
  state: 0
  txg: 116
  pool_guid: 8611031546651012242
  hostid: 808448454
  hostname: 'qu3'
  vdev_children: 1
  vdev_tree:
  type: 'root'
  id: 0
  guid: 8611031546651012242
  children[0]:
  type: 'disk'
  id: 0
  guid: 7964430097477469044
  path: '/dev/gpt/disk0'
  phys_path: '/dev/gpt/disk0'
  whole_disk: 1
  metaslab_array: 34
  metaslab_shift: 33
  ashift: 9
  asize: 1015614537728
  is_log: 0
  create_txg: 4
  features_for_read:
  com.delphix:hole_birth
  com.delphix:embedded_data

root@qu3:~ # gpart show
=>  34  2000409197  ada0  GPT  (954G)
  34  256  1  freebsd-boot  (128K)
  290  16777216  2  freebsd-swap  (8.0G)
  16777506  1983631725  3  freebsd-zfs  (946G)

Deleting huge size files works fine now. Problem was exactly by enabling quirks for SSD's what have 512byte sectors.

So... I think what need to remove all trash with manually specifying of 4k from sys/cam/ata/ata_da.c sys/cam/scsi/scsi_da.c and made some autodetection for enable quirks by mechanism what used in camcontrol(8) identify adaX. If logical 512, physical 512 - no any quirks. If there 4096 - then enable quirks.
 
Last edited by a moderator:
There is a code to detect physical sector size in FreeBSD kernel, and it works when disk reports true data. Unfortunately many really 4K disks lie there. I can not say about specific Samsung SSDs, those quirks were added not by me, but I have no serious reasons to doubt them.
 
I'm not big specialist in code, but can clean see that there in kernel code by default all HDDs/SSDs have 512bytes sector size and just some with manual names of HDDs/SSDs in sys/cam/ata/ata_da.c sys/cam/scsi/scsi_da.c have "quirks" that means 4k sector size.

Code:
  {
  /* Default */
  {
  T_ANY, SIP_MEDIA_REMOVABLE|SIP_MEDIA_FIXED,
  /*vendor*/"*", /*product*/"*", /*revision*/"*"
  },
  /*quirks*/0
  },

I never saw sector size lie by camcontrol identify adaX at several dozen different HDDs/SSDs. And as you know this "detect" code from sys/cam/ata/ata_da.c sys/cam/scsi/scsi_da.c does not affect to camcontrol identify adaX results at all.

For example I have at some server new ST6000NM0024 (6TB) HDD's. "Detect code" don't detect that is 4k drives, but camcontrol(8) does. And ST6000NM0024 don't "lie". I'm not surprised because "detect code" don't have this model in list.

Code:
root@as0:~ # camcontrol identify ada1
pass1: <ST6000NM0024-1HT17Z SN02> ACS-3 ATA SATA 3.x device
pass1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)

protocol  ATA/ATAPI-10 SATA 3.x
device model  ST6000NM0024-1HT17Z
firmware revision  SN02
serial number  Z4D0CXEP
WWN  5000c500793218de
cylinders  16383
heads  16
sectors/track  63
sector size  logical 512, physical 4096, offset 0
LBA supported  268435455 sectors
LBA48 supported  11721045168 sectors
PIO supported  PIO4
DMA supported  WDMA2 UDMA6
media RPM  7200

Feature  Support  Enabled  Value  Vendor
read ahead  yes   yes
write cache  yes   yes
flush cache  yes   yes
overlap  no
Tagged Command Queuing (TCQ)  no   no
Native Command Queuing (NCQ)  yes     32 tags
NCQ Queue Management  no
NCQ Streaming  no
Receive & Send FPDMA Queued  yes
SMART  yes   yes
microcode download  yes   yes
security  yes   no
power management  yes   yes
advanced power management  no   no
automatic acoustic management  no   no
media status notification  no   no
power-up in Standby  yes   no
write-read-verify  yes   no   0/0x0
unload  yes   yes
general purpose logging  yes   yes
free-fall  no   no
Data Set Management (DSM/TRIM) no
Host Protected Area (HPA)  no
root@as0:~ # dmesg | grep ada1
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST6000NM0024-1HT17Z SN02> ACS-3 ATA SATA 3.x device
ada1: Serial Number Z4D0CXEP
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad1

I'm telling reasons to not approve such quirks tune (on zfs+ashift 12) for Samsung SSDs (and probably some other SSD's which have physical 512bytes sector size) again:
1. Sectors size is 512 bytes.
2. TRIM slow and freeze system with deleting huge files
3. Wasted free space with huge amount of small files (because 4k sector size use more space on small files than 512 bytes)
4. After filling ZFS dev cache you got 16.0E size which begins to continuously fill and GROW bigger than SSD's overall size.

Anyway if this issue will be not fixed I can recommended to general FreeBSD users all times to do camcontrol identify adaX before create filesystem. If
"sector size logical 512, physical 512" then add kern.cam.ada.X.quirks="0" to /boot/loader.conf and if "sector size logical 512, physical 4096" then add kern.cam.ada.X.quirks="1" for each SSD or HDD where X is number of drive in the system.
 
Last edited by a moderator:
For example I have at some server new ST6000NM0024 (6TB) HDD's. "Detect code" don't detect that is 4k drives, but camcontrol do.

And how do you see that in provided output? Have you tried to run for example diskinfo -v /dev/ada1 ? Here is what I see for my 4K drive without quirks (look at stripesize):
Code:
# diskinfo -v /dev/da0
/dev/da0
        512             # sectorsize
        6001175126016   # mediasize in bytes (5.5T)
        11721045168     # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        729601          # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
             WD-WX11DA40HDPN    # Disk ident.
 
Last edited by a moderator:
All your arguments against increasing ZFS ashift can potentially be valid (except the last, which was just a code bug, that AFAIK was fixed). What's about Samsung SSD's specific case, I am not going to argue here, but recommend you to talk to original author of that commit.
 
Code:
root@as0:~ # diskinfo -v /dev/ada1
/dev/ada1
   512     # sectorsize
   6001175126016   # mediasize in bytes (5.5T)
   11721045168    # mediasize in sectors
   4096     # stripesize
   0     # stripeoffset
   11628021     # Cylinders according to firmware.
   16     # Heads according to firmware.
   63     # Sectors according to firmware.
   Z4D0CXEP     # Disk ident.
 
So, as I have told, kernel code detects physical sector size properly, when driver does report it.
 
Oh. I think you don't understand what I mean. If enabled quirks - then ZFS thinking what drive is 4k what can be not true! And also if not enabled quirks and drive is 4k - then can be problems too. Need at least to do gnop while creating zpool and will be good to enable quirks by loader.conf too because gpart and zpool will screaming about sector sizes.

I just want to push somebody to rewrite code for enable quirks. If drive (HDD/SSD) have 4096 stripesize (or logical 512, physical 4096 from camcontrol as I think the same) - then enable quirks. If 512 - then don't.

About ZFS dev cache code - bug exits on 10.2. I didn't test STABLE or CURRENT at production servers.
 
I've written that code, so believe me, I know what it does. If drive properly report 4K physical sector, kernel will detect it without any quirks and pass that knowledge via stripesize property shown above to ZFS to calculate ashift. Quirks are needed/used only for drives not reporting physical sector size "properly" (such as old WD Green's and other from the same old era) to forcefully set stripesize property.

Whether application of this quirk to Samsung SSDs correct is out of question to me -- contact the author.

What's about the bug -- 10.3 should be released within a month, and there it should be fixed, so this is just not an argument for ashift setting.
 
I just want to push somebody to rewrite code for enable quirks. If drive (HDD/SSD) have 4096 stripesize (or logical 512, physical 4096 from camcontrol as I think the same) - then enable quirks. If 512 - then don't.

If the drive is reporting 4096 physical size then there is no need for the quirk in the first place, FreeBSD will use 4k sectors anyway. The quirks are there for disks that are 4k, but report 512b for everything. These do exist; The devs aren't stupid and they wouldn't of needed to build the 4K quirk in the first place if every single 4k disk reported 4096 physical.

The issue you seem to have is that the Samsung disk really is 512b, but a dev has forced it to 4k with the quirks system. If that's the case, and it can be 100% confirmed the disk is 512b (Samsung helpfully don't specify it in their "datasheet"), then the quirk needs removing for that device. It's possible some devs have just got into the habit of adding the 4K quirk for all SSDs, on the assumption that they are all 4k optimised.

There's nothing wrong with the quirks system, it's a general purpose framework which allows various disk features to be overridden if it's known they are reported wrong, buggy, or cause other problems. We just need to make sure that the 4K quirk is only assigned to the disks out there that report 512/512, but are really 4096.
 
Ok. Thank you for answers. Now situation looks more clean.

Yeah, not much official information about Samsung SSD's sector sizes, but I googled something:
http://www.samsung.com/us/pdf/memory-storage/840PRO_25_SATA_III_Spec.pdf
http://www.samsung.com/global/busin.../Samsung_SSD_845DC_PRO_Data_Sheet_ver_1_0.pdf
there "Bytes per Sector" 512 Bytes.

If Samsung SSD's somehow 4k optimized, but have 512bytes sector size, this is possible? Maybe there some misunderstood in 4k requests/optimised and 4k alignment?
 
Ok. Thank you for answers. Now situation looks more clean.

Yeah, not much official information about Samsung SSD's sector sizes, but I googled something:
http://www.samsung.com/us/pdf/memory-storage/840PRO_25_SATA_III_Spec.pdf
http://www.samsung.com/global/busin.../Samsung_SSD_845DC_PRO_Data_Sheet_ver_1_0.pdf
there "Bytes per Sector" 512 Bytes.
They don't say whether that is native or emulated. A lot of that is likely boilerplate text to just imply "doesn't support odd sector sizes like 520 bytes (used on some expensive RAID boxes)".
If Samsung SSD's somehow 4k optimized, but have 512bytes sector size, this is possible? Maybe there some misunderstood in 4k requests/optimised and 4k alignment?
I don't see why there would be a huge performance penalty for assuming a larger stripe size than the hardware's actual stripe size, other than a little CPU to coalesce writes. But I haven't looked at the driver / FS code so I can't say for sure.

The underling flash definitely has a larger block (stripe) size (perhaps as large as 512KB, though 64KB would be more normal). That is hidden behind the flash controller chip in the drive and whatever buffer DRAM is included in the drive - as long as the controller has RAM to buffer writes, you don't care what the block size is, since it is hidden behind the flash controller chip that says "OK, got it" when FreeBSD issues the write request(s). If you send more data than the controller can buffer, it will have to write some data to flash before it reports completion to FreeBSD. It will probably do a write, a verify, and update its block pointer table in that case, which takes a bit longer. If there are no already-erased blocks available in the free pool, it will have to erase one which takes (relatively) a very long time. Normally the controller will do "garbage collection" (erasing blocks in the free pool) when it isn't processing requests from the host system. What will really slow things down is if the controller doesn't think there are any free blocks. For example, writing 0's to every block on the drive will make the drive think they're all in use. When that happens, it can't pre-erase blocks - it has to wait until the operating system says "overwrite this" at which point it can infer that the operating system doesn't want [some or all of] that block's contents any more and can erase it. The TRIM command is (optionally) sent by the operating system to tell the drive that [some or all of] a block or blocks is no longer needed. That puts the block back on the drive's free list and queues it up for garbage collection at some time in the future.
 
Back
Top