In data centers, ideally the disks get replaced before they break.
Yes and no. Big storage systems do a lot of effort in implementing fault prediction. Using SMART data is one component of that, in particular for SCSI disks, where a disk can actually use SMART to explicitly trigger a PFA fault. There are other fault predictors that are in use, for example the rate of transient errors, or performance aberrations. But: in spite of the best effort of predicting that a disk will fail (or listening to the disk stating that itself), and using that to proactively drain disks and put in replacements, disks do fail in production, and have to be replaced.
There are two really big factors which determine the disk replacement strategy. One is the availability of manpower. Imagine a typical large data center (often bigger than a gym), with hundreds of thousands of disks, of which several fail per day. Just immediately removing and replacing failed disks will keep a staff person busy. Some storage systems find it more efficient to leave failed disks in the system, re-replicate the data elsewhere, and only replace disks once it is efficient to do so. For example, if a 100-disk enclosure has 3 or more failed disks, then send a human, but if it is 1 or 2 disks, then just leave it. This strategy is called FIP or "Fail In Place", and has helped to significantly reduce the operating cost of storage systems. On the other hand, if performance is critical, some users prefer to replace disks as soon as possible. The transition depends a lot on the replacement workflow; for example large data center operators today use robotics to pull enclosures out of racks or disks out of enclosures, with no humans involved until the disk reaches a workbench for analysis; they can replace disks more frequently than data center operators that still rely on humans with lab carts and ladders going down the aisles.
Second, it depends on the contract between the customer and the disk vendor whether it is economically advantageous to replace disks. Large disk customers typically do not send removed disks back to the disk manufacturer for post-mortem and replacement, for a variety of reasons: the handling expenses would be very high (lots of FedEx packages, special software to track where each disk is physically in the logistics chain); used disks probably have sensitive data on them; and the disk manufacturers don't want to stockpile older models for warranty replacement. Instead, warranty replacements are typically handled by discounts and refunding. A typical contract between a large customer and Seagate or WD might look like the following:
Yoyodyne is buying one million disks model 12345ABC at $100 each. These disks have a 5-year warranty, and we expect up to 1% of the disks to fail every year during those 5 years, so Yoyodyne gets a 5% discount at initial purchase and only has to pay $95M up front. If more than 10,000 disks fail in any year of the first 5 years, Yoyodyne can request an additional credit of $100 per disk above the 10K that have failed, but then Yoyodyne has to either submit a dated copy of the SMART output showing that the disk has self-identified as failed, or it has to physically submit the failed disks to the manufacturer for analysis if the disk has failed so hard that SMART output can't be read any longer. For efficiency, warranty evaluation will only be processed in January, for all the disks that failed in the previous year.
And yes, I've been in situations where so many disks failed that lawyers got involved (the WEEKLY failure rate reached 1% for a particular batch of disks), and after some tense negotiations, a lot of money changed hands. And if I remember right, a pallet load of disk drives is between 500 and 2000 disks (depending on packaging), so returning 10K disk drives for warranty evaluation is not something you do casually with a FedEx envelope.