MPI through a firewall

Hello,

i want to run MPI jobs through a firewall on my little "cluster" of machines but MPI communication does not work. Here's
what happens:

If I disable the firewalls on both machines it works:

Code:
Process 0 of 2 is on hostA
Process 1 of 2 is on hostB
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.016866

When I turn on the firewall on the host on which I invoke the job (hostA):

Code:
MPIEXEC_PORT_RANGE=10000:10010 mpirun -f ~/machinefile -np 2 ./cpi
Abort(816441615) on node 0: Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(70)...........................: MPI_Init(argc=0x820e88648, argv=0x820e88640) failed
MPII_Init_thread(282).......................:
MPIR_init_comm_world(34)....................:
MPIR_Comm_commit(817).......................:
MPID_Comm_commit_post_hook(222).............:
MPIDI_world_post_init(689)..................:
MPIDI_OFI_init_vcis(830)....................:
check_num_nics(883).........................:
MPIR_Allreduce_allcomm_auto(4732)...........:
MPIR_Allreduce_intra_recursive_doubling(115):
MPIC_Sendrecv(259)..........................:
MPID_Isend(60)..............................:
MPIDI_isend(32).............................:
MPIDI_NM_mpi_isend(780).....................:
MPIDI_OFI_send_fallback(483)................: OFI call tsendv failed (default nic=re0: No such file or directory)
Abort(280095119) on node 1: Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(70)...........................: MPI_Init(argc=0x8211cb138, argv=0x8211cb130) failed
MPII_Init_thread(282).......................:
MPIR_init_comm_world(34)....................:
MPIR_Comm_commit(817).......................:
MPID_Comm_commit_post_hook(222).............:
MPIDI_world_post_init(689)..................:
MPIDI_OFI_init_vcis(830)....................:
check_num_nics(883).........................:
MPIR_Allreduce_allcomm_auto(4732)...........:
MPIR_Allreduce_intra_recursive_doubling(115):
MPIC_Sendrecv(263)..........................:
MPIC_Wait(90)...............................:
MPIR_Wait(751)..............................:
MPIR_Wait_state(708)........................:
MPIDI_progress_test(142)....................:
MPIDI_OFI_handle_cq_error(788)..............: OFI poll failed (default nic=bge0: Input/output error)

I running on FreeBSD-13.5-RELEASE using MPICH-4.3.1 from packages on both machines. cpi is a simple example that comes with the MPICH source cde.
The firewall is pf, I will post pf.conf if needed.

Thanks
sprock
 
One observation: There are two different ethernet ports listed in the output, re0 and bge0. Could it be a configuration problem?

And the other question is obvious: Do you know what protocol (UDP or TCP) and what port MPI is trying to use? Have you checked your pf configuration?
 
Allow the TCP port range from 10000 to 10010 specified by MPIEXEC_PORT_RANGE=10000:10010 between HOST-A and HOST-B in your Firewall
 
Thanks for your reply.

On hostA: pass in log proto { tcp udp } to port {10000:10010}
pass out log proto { tcp udp } to port {10000:10010}

hostB has pf disabled for testing.
 
Back
Top