HAST Unable to write synchronization data

thinkofit · Jun 20, 2011

Hello,

I had a working HAST configuration on 2 servers for two months: storclust01 and storclust02 with 3 resources (SATA1, SATA2, SATA3). Everything was synched.

Now, after a reboot, storclust02 has the primary role but cannot sync on storclust01 (secondary role).

This is what I read in the log of the primary server:

Code:

Jun 20 15:27:32 storclust02 hastd[2180]: [SATA2] (primary) Unable to receive reply header: Resource temporarily unavailable.
Jun 20 15:27:32 storclust02 hastd[2180]: [SATA2] (primary) Disconnected from tcp4://10.0.0.1.
Jun 20 15:27:32 storclust02 hastd[2180]: [SATA2] (primary) Unable to write synchronization data: Invalid argument.
Jun 20 15:27:33 storclust02 hastd[2179]: [SATA1] (primary) Unable to receive reply header: Resource temporarily unavailable.
Jun 20 15:27:33 storclust02 hastd[2179]: [SATA1] (primary) Disconnected from tcp4://10.0.0.1.
Jun 20 15:27:33 storclust02 hastd[2179]: [SATA1] (primary) Unable to write synchronization data: Invalid argument.
Jun 20 15:27:33 storclust02 hastd[3727]: [SATA3] (primary) Unable to receive reply header: Resource temporarily unavailable.
Jun 20 15:27:33 storclust02 hastd[3727]: [SATA3] (primary) Disconnected from tcp4://10.0.0.1.
Jun 20 15:27:33 storclust02 hastd[3727]: [SATA3] (primary) Unable to write synchronization data: Invalid argument.

While on the secondary:

Code:

Jun 20 15:32:54 storclust01 hastd[2521]: [SATA2] (secondary) Unable to receive request header: Socket is not connected.
Jun 20 15:32:54 storclust01 hastd[2520]: [SATA3] (secondary) Unable to receive request header: Socket is not connected.
Jun 20 15:32:54 storclust01 hastd[2461]: [SATA2] (secondary) Worker process exited ungracefully (pid=2521, exitcode=75).
Jun 20 15:32:55 storclust01 hastd[2519]: [SATA1] (secondary) Unable to receive request header: Socket is not connected.
Jun 20 15:32:55 storclust01 hastd[2461]: [SATA3] (secondary) Worker process exited ungracefully (pid=2520, exitcode=75).
Jun 20 15:33:00 storclust01 hastd[2461]: [SATA1] (secondary) Worker process exited ungracefully (pid=2519, exitcode=75).

I'm running 8.2-STABLE on both servers with an up-to-date source code.

Any hints would be appreciated.

Thank you

thinkofit · Jun 30, 2011

Hello, just a follow-up. I compiled again kernel and world on both servers, initialized again with [CMD=""]hastctl create[/CMD] then changed roles accordingly but now I get

Code:

Resource unique ID mismatch

On primary server:

Code:

Jun 30 15:43:42 storclust02 hastd[2129]: [SATA1] (primary) Resource unique ID mismatch (primary=17594788180985123033, secondary=16737981264370261780).
Jun 30 15:43:42 storclust02 hastd[2131]: [SATA3] (primary) Resource unique ID mismatch (primary=5719337528035064011, secondary=4022951110340696488).
Jun 30 15:43:42 storclust02 hastd[2130]: [SATA2] (primary) Resource unique ID mismatch (primary=1591740586151031606, secondary=14086390080612440148).

On secondary server:

Code:

Jun 30 15:44:22 storclust01 hastd[3248]: [SATA1] (secondary) Resource unique ID mismatch (primary=17594788180985123033, secondary=16737981264370261780).
Jun 30 15:44:22 storclust01 hastd[2759]: [SATA1] (secondary) Worker process exited ungracefully (pid=3248, exitcode=78).
Jun 30 15:44:22 storclust01 hastd[3249]: [SATA3] (secondary) Resource unique ID mismatch (primary=5719337528035064011, secondary=4022951110340696488).
Jun 30 15:44:23 storclust01 hastd[2759]: [SATA3] (secondary) Worker process exited ungracefully (pid=3249, exitcode=78).
Jun 30 15:44:23 storclust01 hastd[3250]: [SATA2] (secondary) Resource unique ID mismatch (primary=1591740586151031606, secondary=14086390080612440148).
Jun 30 15:44:28 storclust01 hastd[2759]: [SATA2] (secondary) Worker process exited ungracefully (pid=3250, exitcode=78).

It seems hard to make them synchronize again. Any idea on that?

Thank you.

HAST Unable to write synchronization data

thinkofit

thinkofit