MPI through a firewall

Hello,

i want to run MPI jobs through a firewall on my little "cluster" of machines but MPI communication does not work. Here's
what happens:

If I disable the firewalls on both machines it works:

Code:
Process 0 of 2 is on hostA
Process 1 of 2 is on hostB
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.016866

When I turn on the firewall on the host on which I invoke the job (hostA):

Code:
MPIEXEC_PORT_RANGE=10000:10010 mpirun -f ~/machinefile -np 2 ./cpi
Abort(816441615) on node 0: Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(70)...........................: MPI_Init(argc=0x820e88648, argv=0x820e88640) failed
MPII_Init_thread(282).......................:
MPIR_init_comm_world(34)....................:
MPIR_Comm_commit(817).......................:
MPID_Comm_commit_post_hook(222).............:
MPIDI_world_post_init(689)..................:
MPIDI_OFI_init_vcis(830)....................:
check_num_nics(883).........................:
MPIR_Allreduce_allcomm_auto(4732)...........:
MPIR_Allreduce_intra_recursive_doubling(115):
MPIC_Sendrecv(259)..........................:
MPID_Isend(60)..............................:
MPIDI_isend(32).............................:
MPIDI_NM_mpi_isend(780).....................:
MPIDI_OFI_send_fallback(483)................: OFI call tsendv failed (default nic=re0: No such file or directory)
Abort(280095119) on node 1: Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(70)...........................: MPI_Init(argc=0x8211cb138, argv=0x8211cb130) failed
MPII_Init_thread(282).......................:
MPIR_init_comm_world(34)....................:
MPIR_Comm_commit(817).......................:
MPID_Comm_commit_post_hook(222).............:
MPIDI_world_post_init(689)..................:
MPIDI_OFI_init_vcis(830)....................:
check_num_nics(883).........................:
MPIR_Allreduce_allcomm_auto(4732)...........:
MPIR_Allreduce_intra_recursive_doubling(115):
MPIC_Sendrecv(263)..........................:
MPIC_Wait(90)...............................:
MPIR_Wait(751)..............................:
MPIR_Wait_state(708)........................:
MPIDI_progress_test(142)....................:
MPIDI_OFI_handle_cq_error(788)..............: OFI poll failed (default nic=bge0: Input/output error)

I running on FreeBSD-13.5-RELEASE using MPICH-4.3.1 from packages on both machines. cpi is a simple example that comes with the MPICH source cde.
The firewall is pf, I will post pf.conf if needed.

Thanks
sprock
 
One observation: There are two different ethernet ports listed in the output, re0 and bge0. Could it be a configuration problem?

And the other question is obvious: Do you know what protocol (UDP or TCP) and what port MPI is trying to use? Have you checked your pf configuration?
 
Thanks for your reply.

On hostA: pass in log proto { tcp udp } to port {10000:10010}
pass out log proto { tcp udp } to port {10000:10010}

hostB has pf disabled for testing.
 
Back
Top