ZFS Don't ever buy from ACME hardware

Terri_Kennedy · Mar 18, 2015

gkontos said:
The system arrived and:

View attachment 2468

The send a new controller however the situation is the same.

There was a regression (at least in 8-STABLE) a while ago that caused the "still waiting" message, but it was fixed within a few weeks at most. That seems unrelated to the linked zfsguru topic, as I ran into it on an Adaptec ahd(4) controller.

Since it only (from your screen capture) appears to be happening on unit 10 on the mps1 controller, I'd suggest removing that drive and seeing if the boot completes. I think you may have a bad drive.

gkontos · Mar 18, 2015

Terry_Kennedy said:
Since it only (from your screen capture) appears to be happening on unit 10 on the mps1 controller, I'd suggest removing that drive and seeing if the boot completes. I think you may have a bad drive.

Unfortunately it happens with all units. Eventually after a few hours the system will boot....

gkontos · Mar 18, 2015

The system has been marked as DOA. It is being shipped back.

I have to say the following:

IX Systems has provided so far immediate support. This could be due to a bad mobo or whatever. However, after so many months with that thing, any sort of "accident" can be catastrophic.

Terri_Kennedy · Mar 18, 2015

gkontos said:
The system has been marked as DOA. It is being shipped back.

I have to say the following:

IX Systems has provided so far immediate support. This could be due to a bad mobo or whatever. However, after so many months with that thing, any sort of "accident" can be catastrophic.

I'm sorry to hear that. While I have never purchased anything from iXsystems, a number of the folks there are people I knew back in the BSDI era, and I recommended them to you.

I am sure that they would be able to sort this out for you, but I can understand your frustration and desire to simply get out of a bad situation. You've been dealing with this issue since Christmas, and your customer probably expected the system to be in production by the New Year and that didn't happen.

If you change your mind and decide to go ahead with this system, I repeat my offer of either going to your colo site to work with you to resolve the issue, or for you to have the colo ship the system to me. Just let me know if you'd like to take me up on either offer.

gkontos · Mar 18, 2015

Thanks Terry. At this point I think it is best if the system goes back like they requested. They have already sent the FedEX return labels and I am sure they will fix the problem.

Like you said, the system was planned to be in production last year. Those situations and the money lost in the process is really making my position look very bad.

gkontos · Mar 24, 2015

UPDATE!!!

The system was delivered to IX SYSTEMS and I got the following response:

Unfortunately, it looks like the damage occurred during transit, essentially because of poor packaging. It appears remote hands just threw the server into an over-sized Dell box, on top of the server rails, and then just threw the spare parts in there as well. Nothing was secured inside of the box.
So far, the drive carriers are damaged beyond repair and we are still assessing, to see if other components were affected as well.

IX SYSTEMS did replace the broken parts without any extra cost. Apparently the problem was due to SAS cable that needed to be replaced.

The DC supposedly replaced that cable when they received the new controller!!!

The system will be racked tomorrow and I will update you.

User23 · Mar 25, 2015

"It appears remote hands just threw the server into an over-sized Dell box, on top of the server rails, and then just threw the spare parts in there as well."

Some people just need a high-five, with a chair, in their face.

gkontos · Mar 25, 2015

The system is in the rack. We are dealing with some issues with the disks (apparently one is causing some issues) and some issues with multipathing. Other than that the system is very fast.

gkontos · Mar 25, 2015

Final thoughts. I really want to thank IX Systems for their support in solving the problems. They took the extra step and provided support even when the DC remote hands screwed up our server. They also took the cost of shipping the server overnight with expedite shipment and as it turned out all they had to do is to replace a SAS cable that the remote hands in the DC had to but didn't.

Crivens · Mar 27, 2015

Any chance of handing these remote hands the bill for all this? And I hope this cleans your marks with your client.

gkontos · Mar 27, 2015

Crivens said:
Any chance of handing these remote hands the bill for all this? And I hope this cleans your marks with your client.

That's the idea. Fortunately, IX Systems took the bill. I was about to cover everything. Yes, my client is happy now.

User23 · Mar 27, 2015

gkontos said:
Yes, my client is happy now.

If he is, you can be happy too I guess

gkontos · Mar 27, 2015

User23 said:
If he is, you can be happy too I guess

Yes, I am happy after so many months. To be honest, since the problems started to appear I could not sleep well at night. This is a big account and I felt responsible, so besides losing a big client I was determined to take all costs too.

Now, the only thing that remains to be resolved is the LACP fixed from the "remote hands" of the DC. I know it is difficult to understand this but now THEY CAN'T find the switch!!!!!!!

For those who wonder which DC is.... incompetence

Code:

                        capacity     operations    bandwidth

pool                  alloc   free   read  write   read  write
--------------------  -----  -----  -----  -----  -----  -----
storage               14.7T   115T      0  1.17K      0   127M
  raidz2              3.68T  28.8T      0    263      0  32.0M
    multipath/disk1       -      -      0     72      0  8.04M
    multipath/disk2       -      -      0     72      0  8.04M
    multipath/disk25      -      -      0     72      0  8.04M
    multipath/disk4       -      -      0     71      0  8.03M
    multipath/disk5       -      -      0     72      0  8.03M
    multipath/disk6       -      -      0     72      0  8.04M
  raidz2              3.68T  28.8T      0    320      0  31.9M
    multipath/disk7       -      -      0     85      0  8.05M
    multipath/disk8       -      -      0     86      0  8.05M
    multipath/disk9       -      -      0     86      0  8.06M
    multipath/disk26      -      -      0     84      0  8.06M
    multipath/disk11      -      -      0     85      0  8.06M
    multipath/disk12      -      -      0     86      0  8.05M
  raidz2              3.68T  28.8T      0    315      0  31.4M
    multipath/disk13      -      -      0     83      0  7.93M
    multipath/disk14      -      -      0     82      0  7.92M
    multipath/disk15      -      -      0     86      0  7.92M
    multipath/disk16      -      -      0     87      0  7.93M
    multipath/disk17      -      -      0     84      0  7.92M
    multipath/disk18      -      -      0     84      0  7.94M
  raidz2              3.68T  28.8T      0    302      0  31.9M
    multipath/disk19      -      -      0     76      0  8.01M
    multipath/disk20      -      -      0     76      0  8.01M
    multipath/disk21      -      -      0     84      0  8.04M
    multipath/disk22      -      -      0     83      0  8.04M
    multipath/disk23      -      -      0     85      0  8.04M
    multipath/disk24      -      -      0     81      0  8.05M
logs                      -      -      -      -      -      -
  mirror               128K  31.7G      0      0      0      0
    gpt/zil0              -      -      0      0      0      0
    gpt/zil1              -      -      0      0      0      0
cache                     -      -      -      -      -      -
  gpt/cache0           218G   282G      0    126      0  15.8M
  gpt/cache1           218G   282G      0    189      0  23.7M

User23 · Mar 27, 2015

gkontos said:
To be honest, since the problems started to appear I could not sleep well at night. This is a big account and I felt responsible, so besides losing a big client I was determined to take all costs too.

Now, the only thing that remains to be resolved is the LACP fixed from the "remote hands" of the DC. I know it is difficult to understand this but now THEY CAN'T find the switch!!!!!!!

I know such situations. At the point the IX System arrived (after throwing money), not working, I would have
gone crazy, literally. I am glad to see you managed that.

Tell the DC guys to look in all oversized Dell boxes their could find, maybe the remote hand is in it too.

gkontos · Mar 27, 2015

BTW, my friend that I asked uses 6Gbit SAS controllers also from LSI.

diizzy · Mar 28, 2015

Interesting enough the system were tested before shipping so I guess iX did feel a bit responsible regarding the issues you had from the start.
Anyhow, glad to hear that it go sorted in the end.
//Danne

gkontos · Mar 28, 2015

diizzy said:
Interesting enough the system were tested before shipping so I guess iX did feel a bit responsible regarding the issues you had from the start.
Anyhow, glad to hear that it go sorted in the end.
//Danne

They did a 48H disk burn in. They also installed FreeBSD 10.1-RELEASE and did some checks.

diizzy · Mar 28, 2015

Which I think it's strange if you had a bad SAS cable... (that it passed)
//Danne

gkontos · Mar 28, 2015

diizzy said:
Which I think it's strange if you had a bad SAS cable... (that it passed)
//Danne

Your point being?

diizzy · Mar 28, 2015

That it should've been covered not as an act of friendly gesture. The shipping on the other hand should've been covered by the responsible party (the DC) but this issue shouldn't have been there in the first place if it were tested before shipping.
//Danne

gkontos · Mar 28, 2015

diizzy said:
That it should've been covered not as an act of friendly gesture.

It was covered as part of the warranty.

diizzy said:
The shipping on the other hand should've been covered by the responsible party (the DC)

The DC is in no way responsible for taking the cost of shipping back the server. Their responsibility is to provide professional "remote hands"

diizzy said:
but this issue shouldn't have been there in the first place if it were tested before shipping.
//Danne

I am not sure if this is correct. During shipping a lot of things can go wrong.

ZFS Don't ever buy from ACME hardware

Administrator