Other Comparing old CD copies

I copied, I think about 15 years ago, five data CDs.
Years later I made copies of the copies.

Set 1 is Verbatim Data Life plus Super Azo CD-R with printable surface. 52x
Set 2 is Verbatim Super AZO Crystal. 48x

I did now for each disk:

dd if=/dev/cd0 of=fileN.iso bs=2048

The files of Set 2 seem more difficult to read. Perhaps they are the older.
I was able to mount all ten CDs.

And then cmp of the corresponding pairs.

In three pairs cmp ends in the EOF of the disk in set 1.

Code:
% ll pl1.iso PL1.iso
-rw-r--r--  1 root  wheel  628283392 Jan 13 17:20 pl1.iso
-rw-r--r--  1 root  wheel  628281344 Jan 13 16:31 PL1.iso
% cmp pl1.iso PL1.iso
pl1.iso PL1.iso differ: char 619767844, line 2869571

And

Code:
% ll pl2.iso PL2.iso
-rw-r--r--  1 root  wheel  611594240 Jan 13 17:53 pl2.iso
-rw-r--r--  1 root  wheel  611592192 Jan 13 16:40 PL2.iso
% cmp pl2.iso PL2.iso
pl2.iso PL2.iso differ: char 606191654, line 2681770

Here PLi are filesof set 1, pli of set 2. In this case and only in it the files of set 1 are bigger than the ones of set 2.

What can be said? :)
 
CDs which are "burned" rather than "pressed" usually have a very limited lifespan. The CDs that already contain data when you purchase them (eg. music CDs, game CDs, ...) are pressed much like vinyl records. i.e. the data is "engraved" into a metal mold (the tool) and a plastic disc is pressed against that. This leaves the impressions on the disc (the physical 1s and 0s).
The CDs you burn at home work differently: There is still a pre-stamped/pre-pressed structure (a spiral) on the disc which is used for the tracking of the laser optics infrastructure. However, it contains no data. Instead, the disc contains a dye. Your CD burner's laser then physically burns the necessary spots to make it a 0 or a 1. The dye causes it to turn a different color which causes a difference in reflectivity which is ultimately interpreted as a 1 or a 0.

With that, being said, those dyes have a limited lifespan. If I recall correctly "back in the day" the officially advertised lifespan of self burned CDs of reputable manufactures was about 6 months for the cheap/standard discs. Emperic data would suggest that standard quality CDs would last a couple of years before degrading to a point beyond repair.

So I'd argue that what you're seeing here is typical data degradation of self burned CDs.
 
So I'd argue that what you're seeing here is typical data degradation of self burned CDs.

But the differences appear at the end. Can we say that CDs degrade at the end?

Here the sizes:

Code:
% ll
total 4072570
-rw-r--r--  1 root  wheel  628283392 Jan 13 17:20 pl1.iso
-rw-r--r--  1 root  wheel  628281344 Jan 13 16:31 PL1.iso
-rw-r--r--  1 root  wheel  611594240 Jan 13 17:53 pl2.iso
-rw-r--r--  1 root  wheel  611592192 Jan 13 16:40 PL2.iso
-rw-r--r--  1 root  wheel  617816064 Jan 13 18:04 pl3.iso
-rw-r--r--  1 root  wheel  617814016 Jan 13 16:52 PL3.iso
-rw-r--r--  1 root  wheel  577814528 Jan 13 18:16 pl4.iso
-rw-r--r--  1 root  wheel  577812480 Jan 13 17:01 PL4.iso
-rw-r--r--  1 root  wheel  532441088 Jan 13 18:27 pl5.iso
-rw-r--r--  1 root  wheel  532439040 Jan 13 17:08 PL5.iso
-rw-r--r--  1 root  wheel        273 Jan 13 18:50 README

The first two differ as given in my previous post. The last three differ at EOF of PLi.
 
There is no fixed location where CDs will start degrading first. If they degrade, they do it at a random location.
I don't mean to be a bitch or general pain in the ass but there is the notion of cheap CD burners having no temperature compensation in the laser's power path. Depending on the driver circuitry the laser will either output slightly more or slightly less power as it heats up which will almost certainly have an effect on degradation (the most typical driver circuits will output slightly more power if they get warm). Therefore, it's not unlikely that statistically speaking you get faster degradation towards the end of the CDs data.

Just one of the reasons why professional CD burning equipment cost notably more than your typical cost-sensitive end-user product :)
 
CDs are read and written in units of 2048-byte sectors. So let's look at your data:

pl1: 628283392 bytes = 306779 sectors and no partial sectors.
PL1: 628281344 bytes = 306778 sectors and no partial sectors.
That immediately tells us that reading pl1 found one extra sector. Too bad you don't have access to log files of when these copies were made, whether they ended by a clean EOF on input, or by a read error.
cmp: finds a difference at byte 619767844, which is in sector 302621 (over 100 sectors before the end!), at offset 36 in the sector. This confirms that at least one of the two reads is different from the original (what I would call a read error), by more than just the difference in length.

pl2: 611594240 bytes = 298630 sectors, no partial.
PL2: 611592192 bytes = 298629 sectors, no partial.
cmp: byte 606191654 = sector 295992 offset 38.
Exact same diagnosis.

And as others said above, this again illustrates why people don't consider CDs to be terribly reliable.

If you had some information about the content of the data (like if you had stored checksums), you might be able to identify which of the two copies is error-free, if any.
 
And as others said above, this again illustrates why people don't consider CDs to be terribly reliable.

If you had some information about the content of the data (like if you had stored checksums), you might be able to identify which of the two copies is error-free, if any.
Typical end-user comfort problem: The technology seems cheap & easy to use. So we use it without giving it too much thought. I am sure the marketeers back then also contributed whatever they could to foster the use of CDs as data storage as much as possible.
I doubt that anybody who ever burned CDs for "permanent data storage" went through the trouble of keeping checksums separately :p

If anyone is interested in a (vastly) more enduring optical format which can still be burned at home: Look at M-DISC - we're talking 1000 years data retention.
 
Years ago I repaired a few Philips CDM-drives. Typical issue was the distance between laser lens and CD surface. This should remain constant for correct focus while the laser arm moves from inside to outside. If that distance got outside the specifications it would lead to errors either at the beginning or at the end of a CD. Now these where drives from Revox audio players, hence read-only. But this could be an issue in this case too.
 
And as others said above, this again illustrates why people don't consider CDs to be terribly reliable.
Well, I also, but I am impressed that this very old CDs are still more or less readable.
I did not expect it.

And yes, I could have done checksums, but this is an accidental experiment. I did not
plan to write a CD and wait 15 years to see if they are still readable.

Is someone there planning a similar experiment with USB sticks?
 
Typical end-user comfort problem: The technology seems cheap & easy to use. So we use it without giving it too much thought. I am sure the marketeers back then also contributed whatever they could to foster the use of CDs as data storage as much as possible.
I doubt that anybody who ever burned CDs for "permanent data storage" went through the trouble of keeping checksums separately :p

If anyone is interested in a (vastly) more enduring optical format which can still be burned at home: Look at M-DISC - we're talking 1000 years data retention.
No offense taken, I'm always happy to hear the opinion of hardware professionals and learn something new. ;)

Aside that I do remember that a rule of thumb in the glory days of CD writers was that the faster you were burning discs, the faster they will also degrade because it means less time for the laser to work on the dye per dot.

Here's some research by the Library of Congress on the longevity of such media: https://www.loc.gov/preservation/resources/rt/NIST_LC_OpticalDiscLongevity.pdf

Their results: CDRs are typically around 30-45 years, DVD-R and the whole menagerie around DVD in the range from 15 up to 45 years depending on product.
 
Exactly what kinda of data are on these CDs? I wouldn't be surprised if you have metadata changes between disks...
 
Exactly what kinda of data are on these CDs?
These disks were exact copies of the iso images. Hence no metadata change.

I have other old CDs, among them memorex and Sony, that are still readable, but I do not have copies of them to compare images.

Their results: CDRs are typically around 30-45 years, DVD-R and the whole menagerie around DVD in the range from 15 up to 45 years depending on product.
I find the results very optimistic.

I will try now with other CD device, an external (ATA) device. I have a lot. One SATA, many ATA, many SCSI, and one firewire. The problem is to connect them. Also the quality of the devices differ a lot, and they are very sensible to dust, but the SATA I used seems to be one of the most reliable.
 
In that case, compare the actual files instead
I will need something like mtree or find. I do not remember how exactly.

With the external CD drive it was impossible to read the CD of set 2. From one of set one got an identical image as before. Perhaps it is more probable that set 2 is corrupted.
 
Another comparison.

CD1: Platinum CD-R 52x
CD2: Verbatim CR-R 52x - copy of CD1 made on 12.01.2012

The cmp ends at EOF of the image of the older CD1.

I have still a very old image of CD1 on hard disk, perhaps the one for making CD2.
It is identical to the new. Here seems the older disk the one not corrupted.

Interesting is that in the probably corrupted I am reading the same, and 4096 bytes more.
In three of the other 5 was similar, only 2048 bytes more and also similar:

Code:
% tail -c 4099 w.iso | hd
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Any commentaries? :)
 
I will need something like mtree or find. I do not remember how exactly.

With the external CD drive it was impossible to read the CD of set 2. From one of set one got an identical image as before. Perhaps it is more probable that set 2 is corrupted.
rhash --sha256 -r ./* -o moo.txt given that you have rhash installed :)
 
rhash --sha256 -r ./* -o moo.txt given that you have rhash installed

To avoid to write in the directory I am reading, I did the following:

Code:
cd dir
rhash --sha256 -r . -o ../dir.sha256
cd ..

I compared for each pair the signature files with diff. In each pair there were one file that differed (or more precise whose checksums differed). Only one. The files that differ are of the same size, perhaps only one character changed.

They are binary files of a piece of software that someone gave me. I never run it,
I do not know how to see if it is functional.

I did not check if metadata changed.
 
… The technology seems cheap & easy to use. So we use it without giving it too much thought. …

I have a large number of low cost CDs, with some of my favourite music, that lose their outside edges too easily. Literally flaky. I'm not complaining, I probably have the music elsewhere. On, erm, probably an Apple FileVault 2-encrypted volume that's stored on a mobile hard disk drive that uses ZEVO ZFS, and I can't remember the password. Oops.
 
Good to know that cmp has the options -l and -x. This is how the files differ:

Code:
# cmp -x pl1-file   PL1-file
0c40d023 34 35
# cmp -x pl2-file   PL2-file
00195025 61 74
00195026 72 ba
0019502d 64 60
0019502e 65 85
00195030 69 c1
001952a5 65 cd
001952a6 72 52
001952ae 6e 6b
001953ad 0a 3c
001953b5 6d 75
001953be 72 6a

Does this give an idea of how bits mutated?
 
Code:
00195025 61 74 ^=15 bits=3
00195026 72 ba ^=c8 bits=3
0019502d 64 60 ^=04 bits=1
0019502e 65 85 ^=e0 bits=3
00195030 69 c1 ^=a8 bits=3
001952a5 65 cd ^=a8 bits=3
001952a6 72 52 ^=20 bits=1
001952ae 6e 6b ^=05 bits=2
001953ad 0a 3c ^=36 bits=4
001953b5 6d 75 ^=18 bits=2
001953be 72 6a ^=18 bits=2
C:
int cb(int t)
{
int s = 0;
while(t) {
 s += (t &1);
 t>>=1;
 }
return s;
}
int main()
{
unsigned int a,b,c;
while(scanf("%x%x%x",&a,&b,&c)==3) printf("%08x %02x %02x ^=%02x bits=%d\n",a,b,c,b^c,cb(b^c));
return 0;

}
 
Something interesting:

Code:
# cmp -x pl1.iso PL1.iso
24f0e823 34 35
cmp: EOF on PL1.iso
# cmp -x pl2.iso PL2.iso
2421c025 61 74
2421c026 72 ba
2421c02d 64 60
2421c02e 65 85
2421c030 69 c1
2421c2a5 65 cd
2421c2a6 72 52
2421c2ae 6e 6b
2421c3ad 0a 3c
2421c3b5 6d 75
2421c3be 72 6a
cmp: EOF on PL2.iso
# ll pl1.iso PL1.iso pl2.iso PL2.iso
-rw-r--r--  1 user  user  628283392 Jan 13 17:20 pl1.iso
-rw-r--r--  1 user  user  628281344 Jan 13 16:31 PL1.iso
-rw-r--r--  1 user  user  611594240 Jan 13 17:53 pl2.iso
-rw-r--r--  1 user  user  611592192 Jan 13 16:40 PL2.iso
# hd -s 628281344 pl1.iso
2572d000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
2572d800
# hd -s 611592192 pl2.iso
24742800  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
24743000

The differences of the images are either the same of the files or \0 at the end of pl1, pl2.

In the case of message #15, the newer CD had the \0 at the end. Perhaps they were added by cdrecord and the CDs are not spoiled.

In this example, the first one, I am not sure what set is older. Perhaps set 1 is older and better preserved, set 2, with the \0 at the end added by cdrecord, newer and spoiled with the few errors.
 
I think you can use dvd+rw-mediainfo in

CD: sysutils/cdrtools/ --> cdrecord -atip
DVD: sysutils/dvd+rw-tools/ --> dvd+rw-mediainfo
 
Back
Top