The section describes how to configure OpenMPI to use Infiniband.
OpenMPI uses IPoIB for job startup and tear-down. You should configure IPoIB on all of your hosts.
Ensure the rdma_ucm module is loaded.
modprobe rdma_ucm
Uses who want to run MPI jobs will need to have write permissions for the following devices:
/dev/infiniband/uverbs*
/dev/infiniband/rdma_cm*
The simplest way to do this is to add the users to the rdma group. If that is not suitiable for
your site, you can change the permissions and ownership of these devices by editing the following
udev rules:
/etc/udev/rules.d/50-udev.rules
/etc/udev/rules.d/91-permissions.rules
OpenMPI will need to pin memory. Edit /etc/security/limits.conf and add the line:
* hard memlock unlimited
Check the mpitests package is installed.
aptitude install mpitests
OpenMPI uses ssh to spawn jobs on remote hosts. You should configure a public/private keypair to ensure that you can ssh between hosts without entering a password. You should also ensure that your login process is silent.
We will use the MPI PingPong benchmark for our testing. By default, openmpi should use inifiniband networks in preference to any tcp networks it finds. However, we will force mpi to ignore tcp networks to ensure that is using the infiniband network.
#!/bin/bash #Infiniband MPI test program #Edit the hosts below to match your test hosts cat > /tmp/hostfile.$$.mpi <<EOF hostA slots=1 HostB slots=1 EOF mpirun --mca btl_openib_verbose 1 --mca btl ^tcp -n 2 -hostfile /tmp/hostfile.$$.mpi IMB-MPI1 PingPong
If all goes well you should see openib debugging messages from both hosts, together with the job output.
<snip>
# PingPong
[HostB][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
[HostB][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
[HostA][0,1,0][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
[HostA][0,1,0][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.53 0.00
1 1000 1.44 0.66
2 1000 1.42 1.34
4 1000 1.41 2.70
8 1000 1.48 5.15
16 1000 1.50 10.15
32 1000 1.54 19.85
64 1000 1.79 34.05
128 1000 3.01 40.56
256 1000 3.56 68.66
512 1000 4.46 109.41
1024 1000 5.37 181.92
2048 1000 8.13 240.25
4096 1000 10.87 359.48
8192 1000 15.97 489.17
16384 1000 30.54 511.68
32768 1000 55.01 568.12
65536 640 122.20 511.46
131072 320 207.20 603.27
262144 160 377.10 662.96
524288 80 706.21 708.00
1048576 40 1376.93 726.25
2097152 20 1946.00 1027.75
4194304 10 3119.29 1282.34
If you encounter any errors read the excellent OpenMPI troubleshooting guide. http://www.openmpi.org
If you want to compare infiniband performance with your ethernet/TCP networks, you can re-run the tests using flags to tell openmpi to use your ethernet network. (The example below assumes that your test nodes are connected via eth0).
#!/bin/bash #TCP MPI test program #Edit the hosts below to match your test hosts cat > /tmp/hostfile.$$.mpi <<EOF hostA slots=1 HostB slots=1 EOF mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 --hostfile hostfile -n 2 IMB-MPI1 -benchmark PingPong
You should notice signficantly higher latencies than for the infiniband test.