Intel X25-V SSD Sector Alignment Questions

I'm feeling thick. :\

I want to align my partitions and file system to the sectors in physical FLASH inside the SSD. Although the buffer and controller will change the way the data is written, I think its prudent and shouldn't be difficult. I'm not doing this for anything other than prolonging the life of the SSD.

So,

Hardware

Intel X25-V 40GB SSD Setup.
I dont know the FLASH devices in the SSD but I would presume that they're 4K sectors. Some Intel FLASH devices Ive dealt with have a number of smaller sectors at the front of the device and some at the end but I cant find any info on this.

According to the BIOS/FreeBSD:
  • 4865 (tracks on all discs) * 255 (sides of the discs) * 63 (cake slices of the discs) = 78,156,225.
  • 78,156,225 sectors where 1 sector is 512 Bytes = 40,015,987,200B.

Sector Alignment:

Q What I dont understand is where this aligns to the internal map. For example, 40015987200/4096 = 9769528.125. Hmm? Is this because the memory map isnt all 4K sectors or that the BIOS/FreeBSD is reporting a geometry that is compatible with legacy hardware/software?

What am I aligning:

Revising the inner workings of a harddrive...
  • Platters are the physical spinning discs.
  • Heads are the physical access to read/write the sides of the discs.
  • Tracks are a circular strip on the disc to read/write data from.
  • A cylinder is a collection of all the tracks across the platters/discs.
  • Sectors are portions of the discs (cake slices) that span the tracks.
  • Block is the term used to describe a sector or many sectors.

Q All the other guides seem to suggest aligning to 63 sector. If 512B per sector, why would aligning to 32256B (1 sector short of 32K) help?

The only way I can see it helping is keeping the partition boundary's to 4K and then making the file system blocks 4K. As the smallest unit and aligned to 4K, no spanning of blocks simply because of alignment issues would occur.

Implementing..

fdisk -- PC slice table maintenance utility.
"Sector 0 of the disk must contain boot code, a slice table and magic number."
I presume this should be track 0 as FreeBSD sets aside 63 sectors (0-62) at the start?

So to align I would either have to push out the slice to start at the 64th sector, Or just align the paritions.

bsdlabel -- read and write BSD label.
"The first partition should start at offset 16, because the first 16 sectors are reserved for metadata."
Which is fine, as 16 is 2 * 4K blocks (8*512=4096).

Q. So, taking into account the offset of the slice, If I started the root partition on the 17th sector (1 sector to offset +1 from the 63rd sector slice + 16 sectors for metadata)?

And to finish, newfs with 4K sectors and 32K blocks (maintaining the 8:1 ratio).

Obviously Im misunderstanding much of this....?
:\

Is it worth having a look at gpart?

Links:
http://forums.freebsd.org/showthread.php?t=7011
http://www.ocztechnologyforum.com/f...h-stuttering-and-increases-drive-working-life.
http://www.freebsd.org/cgi/man.cgi?query=newfs&sektion=8
http://www.freebsd.org/cgi/man.cgi?query=bsdlabel&sektion=8
http://www.freebsd.org/cgi/man.cgi?query=fdisk&sektion=8
 
Have a read through the archives for the -hackers mailing list. There's a thread on just this subject. The general consensus seems to be (for UFS anyway) to future-proof things and start the first slice at 1 MB. That covers 512B sectors of yester-year, 4K sectors of today/tomorrow, and up to 1 MB sectors in the future. :)

Sure, you "waste" 1 MB of disk space, but on a 40 GB drive, that's nothing.
 
Tso seems to think it's even more important to align your SSD to the erase block size: here, though later he has some doubts.

Though, I suppose starting your first slice at 1024K would also align it to a 128K erase block size. This stuff does make my brain hurt, though.
 
Phenoix, do you mean this thread?
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-March/031154.html

Problem with fdisk or gpart is that if you follow the default MBR paritioning scheme, they adhere to any sector falling on a multiple of 63 boundry (or they are in the way Im using them (?)). So if I try and start the slice at 1MB (Sector 2048) then it defaults 2079 (33 * 63). Neither can I extend the unused system boot info slice at the start (not that it matters much).

So ,the only way I can this working is using the crappy aligned slices and aligning thr paritions.

fronclynne,
http://www.linuxfoundation.org/news...02/aligning-filesystems-ssd’s-erase-block-size
"However, with SSD’s (remember SSD’s? This is a blog post about SSD’s…) you need to align partitions on at least 128k boundaries for maximum efficiency." - Not sure where he got that from? Why 128K? Surely it depends on the FLASH ICs inside the SSD?
 
Well, the X25-M and X25-E appear to have 128K erase block sizes, but thanks to Intel's fabulous documentation (vorsicht! *.pdf!), I now know just how many international treaties their SSDs conform to and just how jolly the ridiculous things will make me (including, but not limited to, maximising my frobdignag in face-to-face real time quadrophenia, yay), but nary a peep about erase block sizes. The -V, being a "value" product may use 32K. It does appear, though, that pretty much everything uses an erase block size much larger than 4K.

Addendum: aligning to a larger (multiple of the real) block size should never cause problems beyond wasted space.
 
Ah I see. Well ive aligned to MB boundarys after the root parition. Messing about with slices just doesnt seem to work out.

Code:
# gpart show ad4
63 78165297 ad4 MBR (37G)
63 78165297 1 freebsd [active] (37G)

Parition Sizing:

Code:
	            Label   Start(ST) End(ST)   Size(ST)    (MB)
	/     Root  ___s1a  16*        1161215   1161200    566MB 
	swap  Swap  ___s1b  1161216   18966527  17805312   8694MB
	/var  Var   ___s1d  18966528  21030911   2064384   1008MB
	/tmp  Temp  ___s1e  21030912  23095295   2064384   1008MB
	/usr  User  ___s1f  23095296  54577151  31481856  15372MB
	/???                54577152  78156225  23579073  11513MB

	* +63 Sectors is reserved for MBR slice.

Create the BSD parition scheme....
Code:
# gpart create -s bsd ad4s1 
/
# gpart add -i 1 -b 16 -s 1161200 -t freebsd-ufs ad4s1
swap
# gpart add -i 2 -b 1161216 -s 17805312 -t freebsd-swap ad4s1
/var
# gpart add -i 4 -b 18966528 -s 2064384 -t freebsd-ufs ad4s1
/tmp
# gpart add -i 5 -b 21030912 -s 2064384 -t freebsd-ufs ad4s1
/usr
# gpart add -i 6 -b 23095296 -s 31481856 -t freebsd-ufs ad4s1
 
fronclynne said:
but thanks to Intel's fabulous documentation (vorsicht! *.pdf!), I now know just how many international treaties their SSDs conform to and just how jolly the ridiculous things will make me (including, but not limited to, maximising my frobdignag in face-to-face real time quadrophenia, yay), but nary a peep about erase block sizes.

:e

I found the ICs they use, but It doesn’t look to be commercially available so no datasheet...
 
embeddedbob said:
Phenoix, do you mean this thread?
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-March/031154.html

Problem with fdisk or gpart is that if you follow the default MBR paritioning scheme, they adhere to any sector falling on a multiple of 63 boundry (or they are in the way Im using them (?)). So if I try and start the slice at 1MB (Sector 2048) then it defaults 2079 (33 * 63). Neither can I extend the unused system boot info slice at the start (not that it matters much).

Yeah, that's the thread.

Why are you using the defaults? gpart, at least, allows you to specify exactly which sector to start at. I believe fdisk does as well. Or, are you saying that when you specify the exact starting sector that gpart|fdisk rounds up behind the scenes?
 
Yeah, rounds up to the nearest track i.e. 63(sectors per track) * 512B (sector size). Well, in BIOS land.

This is what I ended up with:

Code:
            Label   Start(ST) End(ST)   Size(ST)    (MB)
/     Root  ___s1a  16         2193407   2193392    1070MB 	
swap  Swap  ___s1b  2193408   21030911  18837504    9198MB
/var  Var   ___s1d  21030912  25288703   4257792    2079MB
/tmp  Temp  ___s1e  25288704  29417471   4128768    2016MB
/usr  User  ___s1f  29417472  78165297  48747825   23751MB
 
So, the goal is to write the 4K filesystem sectors aligned with the 4k pages of the SSD flash. Note that most recent controllers have enough intelligence and 'decoupling' between LBA adresses and internal flash locations to almost completely resolve the performance and aging hit that comes with this misalignment.

If you want to make things easy for yourself, start using the GPT scheme and align at 1MB. The whole C/H/S story has been around for way too long, you should forget it all together.

Remember to size every partition itself to multiples of 1MB also. Also note that using a GPT partition layout will prevent you from booting some other OSes, for example current Windows versions do not support booting from GPT on non-EFI systems. If this does not bother you, i recommend GPT. You can use ZFS too if you like.

Partition your disk as follows:
Code:
Fixit# gpart create -s gpt ad0
Fixit# gpart add -s 64K -t freebsd-boot ad0
Fixit# gpart add [b]-b 2048[/b] -s 2G -t freebsd-swap -l swap0 ad0
Fixit# gpart add -t freebsd-zfs -l zfsdisk0 ad0

My disk as an example:
Code:
# gpart show
=>       34  156301421  ada0  GPT  (75G)
         34        128     1  freebsd-boot  (64K)
        162       1886        - free -  (943K)
       2048    4194304     2  freebsd-swap  (2.0G)
    4196352  100661215     3  freebsd-zfs  (48G)
  104857567   51443888        - free -  (25G)
(This is a normal HDD, but i'm planning to migrate to SSD soon)

Now all you need to do is add bootcode and an OS.
The only problem you have now is that sysinstall will not like your newly partitioned disk, so follow the installation directions for ZFS on GPT here. This can also be done without ZFS offcourse.

Seems like a bit of a hassle, but you're completely legacy-free, so to speak.

If you ever need to grow your ZFS/UFS partition, simply delete the GPT partition and add an extended one at the same location. GPT will not touch the data and ZFS will use the extra space instantly. UFS will need a little growfs.
 
If you want to stick to BIOS partitioning and align to erase blocks without offending any 63 sectors/track sensitive applications, the trick is to find the track boundaries that also lie on the erase boundaries you're after. This script should help you calculate them.

Code:
#!/bin/sh

# erase boundary in 512 byte blocks
# eg. for a 128KiB erase boundary:
# 131072 / 512 = 256
ERASEB=256
TRACKB=63     # 63 sectors/track
#TRACKB=16065  # 255 tracks/cylinder, 63 sectors/track (linux wants cylinder boundaries)
PARTB=$(( ${ERASEB} * ${TRACKB} ))

echo "enter byte offset of partition (append m/g for MiB/GiB)"
read boffset

case "" in
${boffset##*g})
	boffset=$(( ${boffset%*g} * 1073741824 ))
	;;
${boffset##*m})
	boffset=$(( ${boffset%*m} * 1048576 ))
	;;
esac

if [ -z "${boffset}" ]; then exit 1; fi

sectors=$(( ( ${boffset} - ( ${boffset} % 512 ) ) / 512 ))
npb=$(( ${sectors} / ${PARTB} ))
ssdlbahigh=$(( (${npb} + 1) * ${PARTB} ))
ssdlbalow=$(( ${npb} * ${PARTB} ))

echo
echo "Desired offset: ${boffset} bytes"
echo "Corrected offsets:"
echo "High: ${ssdlbahigh} blocks, $(( ${ssdlbahigh} / ${TRACKB} )) tracks @ ${TRACKB} s/t ($(( ${ssdlbahigh} * 512 )) bytes)"
echo "Low: ${ssdlbalow} blocks, $(( ${ssdlbalow} / ${TRACKB} )) tracks @ ${TRACKB} s/t ($(( ${ssdlbalow} * 512 )) bytes)"
 
I have two of the X25-V and have installed a mirrored root ZFS system onto them, as per instructions here. I read a lot of threads before I partitioned them. See here, here, here, and here.

From what I understand, for an SSD to perform well and have a long life the partitions should start on a multiple of 1MB to coincide with the erase blocks, e.g. 1024*1024 bytes, or 2048 sectors (each sector being 512 bytes). In doing so, the start of the partition will also naturally be a multiple of 4096 for the write blocks (e.g. 1024*1024/4096=256).

It is also easier for the OS to cope with the partition starting on a multiple of 63. So practically, this means that sector should start on a multiple of 63*1024*1024 bytes, as 1024*1024 is not divisible by any factors of 63. 63*1024*1024 is the same as 129024 sectors (512* 129024 = 63*1024*1024).

I found out the hard way that if you try and give the boot partition 63MB, it says that it's too big. So we just give it the standard one, and start the swap and the root partition on multiples of 63MB (129024 sectors). This wastes 63MB, but 63MB out of a total of 37GB is insignificant. My partition creation commands look like so:


# # gpart add -b 34 -s 128 -t freebsd-boot da0
# gpart add -b 129024 -s 16773120 -t freebsd-swap -l swap0 da0
# gpart add -b 16902144 -t freebsd-zfs -l disk0 da0
# gpart add -b 34 -s 128 -t freebsd-boot ad8
# gpart add -b 129024 -s 16773120 -t freebsd-swap -l swap1 ad8
# gpart add -b 16902144 -t freebsd-zfs -l disk1 ad8


If I am wrong, please let me know.
 
carlton_draught said:
From what I understand, for an SSD to perform well and have a long life the partitions should start on a multiple of 1MB to coincide with the erase blocks, e.g. 1024*1024 bytes, or 2048 sectors (each sector being 512 bytes). In doing so, the start of the partition will also naturally be a multiple of 4096 for the write blocks (e.g. 1024*1024/4096=256).
The only thing that really matters is that the filesystem sectors are aligned with the flash pages in your SSD. These are usually both 4KB in size. If these don't match up, every write to your SSD will result in 2 4KB writes instead of 1, since the flash memory used can only write in blocks of 4KB.
The real problem hits when the SSD has to erase parts of both misaligned sector 1 and 2 in above scenario. Writing can be done per 4KB page, erasing usually occurs in 128KB blocks. Now, every time you do a misaligned write to you precious SSD, the SSD will have to: Read to cache, Erase, modify data in cache, and Rewrite to the flash block. Not good for flash cell life.

The same scenario is also very bad for RAID5 and 6 based storage by the way, aligning there also benefits performance.

It is also easier for the OS to cope with the partition starting on a multiple of 63. So practically, this means that sector should start on a multiple of 63*1024*1024 bytes, as 1024*1024 is not divisible by any factors of 63. 63*1024*1024 is the same as 129024 sectors (512* 129024 = 63*1024*1024).
Normally the first partition starts at sector 63, for legacy reasons. The part about starting at a multiple of 63, this has to do with aligning to a cilinder boundary. You can't satisfy both this alignment and the SSD alignment if i'm not mistaken.
 
aragon said:
What makes you say that after I've shown how to do it?
Whoops, read over your post. Thanks :)

Has anyone here actually encountered problems with ignoring the legacy layout? I've used 1MB offsets in both FreeBSD and Ubuntu without any problems. To me it's not worth the hassle.
 
p5ycho said:
Has anyone here actually encountered problems with ignoring the legacy layout? I've used 1MB offsets in both FreeBSD and Ubuntu without any problems. To me it's not worth the hassle.
You are probably right. Nowadays there should be almost no BIOSes that still rely on these track alignments any more. As pointed out above though, the FreeBSD tools seem to auto correct attempts at ignoring track misalignments...
 
p5ycho said:
Has anyone here actually encountered problems with ignoring the legacy layout? I've used 1MB offsets in both FreeBSD and Ubuntu without any problems.
By "ignoring the legacy layout", do you mean for example with the way I've set things up? As far as I am aware, it's working fine.

I realize that the Intel probably has 128k erase blocks, not 512k or 1024k, but since MS is standardizing on 1024k as it encompasses the others, I decided to do the same thing. And I realize that next generation SSDs may have fixed this. From what I can tell, the justification for the erase block alignment comes from Ted Tso, and that MS have standardized on a 1024k alignment for Vista and Seven (presumably for the erase blocks). If you multiply that by 63, you satisfy both the cylinder boundary thing as well. So if future SSD manufacturers start standardizing to Microsoft's standard, it will still be ok if I do it the way I'm doing it in future, probably. As far as I know, not all manufacturers of SSDs make public their erase block sizes.

At most I am wasting 63MB, it's .2% of the SSD size. I'd rather do that mistakenly than risk the SSD behaving less than optimally.

Also, credit to Aragon, I first used his script to test a few different values and realize that all the script was really doing (I think) was finding the sectors that satisfied 63* whatever block size you wanted to use, and that these were naturally multiples of that first number. Rather than modify his script when I wasn't sure how it worked, I made a spreadsheet to do the calculations with a larger block size (e.g. the 1024k). I can post if anyone wants it.

That so far Aragon hasn't corrected me indicates that I'm probably not in error.
 
carlton_draught said:
Also, credit to Aragon, I first used his script to test a few different values and realize that all the script was really doing (I think) was finding the sectors that satisfied 63* whatever block size you wanted to use, and that these were naturally multiples of that first number. Rather than modify his script when I wasn't sure how it worked, I made a spreadsheet to do the calculations with a larger block size (e.g. the 1024k). I can post if anyone wants it.

That so far Aragon hasn't corrected me indicates that I'm probably not in error.
Yea, the script takes a given byte offset and adjusts it to the closest byte offsets higher and lower that fall on the 63 sectors/track boundary and erase boundary, ie. you tell it you want 150 meg, it tells you you can have 149.6 meg or 157.5 meg (128k erase boundary).

Just wish SSD manufacturers could stick to a single erase boundary now.
 
carlton_draught said:
By "ignoring the legacy layout", do you mean for example with the way I've set things up? As far as I am aware, it's working fine.
In this case i meant conforming to the C/H/S boundaries. So you are still compatible with the 70-80's layout.
I realize that the Intel probably has 128k erase blocks, not 512k or 1024k, but since MS is standardizing on 1024k as it encompasses the others, I decided to do the same thing.
That doesn't really matter, except for the fact that the beginning and ending of a partition will cross an erase block. For theoretical flash lifetime optimisation only the sector/page alignment is really significant, if one assumes the SSD has a 'dumb' controller without write amplification measures builtin.
 
Back
Top