MPI

From Liki

(Redirected from Mpe)
Jump to: navigation, search

Set up files and key for using MPI

MPI requires a way to access each machine in an MPI ring in order to setup and talk to the local MPI daemons and execute programs in parallel. This is usually done using SSH. Normally SSH requires a password for authentication, but no one wants to type a password ten times every time you start a parallel job. The answer is to use key based authentication. For details see SSH. Note that if all machines in your ring have access to your home directory (e.g. the xphy machines in 704), instead of scping the public key, one needs only append the key to authorized_keys:

%>cd ~/.ssh
%>cat id_rsa.pub >> authorized_keys

Now, MPI also requires a secret password of sorts to run. In your home directory create a file .mpd.conf and to it your (arbitrary) secret word and give it 600 permissions:

%>cd ~
%>cat "MPD_SECRETWORD=mysecretword" > .mpd.conf
%>chmod 600 .mpd.conf


If the MPI binaries live in a nonstandard location, add the MPICH2 binary directory to the system path by adding the following line to the .bash_profile, .bashrc, or .profile file inside of your home directory. Note that this is unnecessary on the xphy computers.

%>export PATH=/usr/local/mpich2/bin:$PATH

In order for these actions to take effect, the user cause the files to run somehow, either by loging out and then log back in, or by executing the appropriate file with

%>source ~/.bashrc

Basic MPI

Here we list the basic ingredients for running an MPI ring. First you require a file that lists hosts that will be in your MPI ring. For example, in 704 you will want all xphy computers available for your ring. The xphy computers have ip addresses 192.168.2.x, where x ranges from 10 through 24. A quick way to make the list is the following:

%>for i in `seq 10 24`;do echo "192.168.2.$i";done > hosts

If the machines have multiple cores (this can be checked by running cat /proc/cpuinfo and counting the number of processors detected. They are labeled starting from 0) you want to tell MPI since it you want to utilize all threads you can (two processors means two programs running at the same time). This is done by appending :n to each hostname in your hosts file, where n is the number of processors. The xphy machines are dual core, so

%>for i in `seq 10 24`;do echo "192.168.2.$i:2";done > hosts

will do the trick.

The basic command to start an MPI ring is mpdboot. The basic command is run by

%>mpdboot -n n1 --ncpus=n2 -f hostfile

This specifies that n1 mpds will be started, n2 processors are available on the localhost (node 0), and hostfile will be used for the list of machines in the ring. If the hostfile contains fewer computers than n1, you will get an error. For example

%>mpdboot -n 2 -f hosts
%>mpdtrace
xphy1
xphy2

where mpdtrace tells you which computers are in your ring.

One a ring has been started, mpiexec is used to submit a job to the ring for parallel execution. This is done by

%>mpiexec -n n3 ./prog

which specifies that n3 threads of prog will be started (the ./ is used to execute a program in the current direction, which is the most common scenario). Here, n3 need not be equal to the number n1 of physical machines in your ring. If it is smaller, only a subset of the mpi ring will be utilized. If larger, mpd will assign jobs in a round-robin fashion. If multiple cpus are specified, each computer is given one job per cpu until all computers are taken, then the second cpu is specified the second time through. If all cpus have a thread and there are still processes to assign, mpd will assign more than one job per cpu. This should generally be avoided as it will tend to hurt parallel performance.

One an mpd ring is up (mpdboot), it will stay up and many jobs can be submitted using mpiexec. The ring must then be manually stutdown when you are done executing jobs. This is done by mpdallexit:

%>mpdallexit

MPE Graphics

The MPE graphics environment allows the writing of a single X screen to be done "in parallel" by all nodes in an MPI ring. In order for this to work the nodes in the ring need to know which screen on which to write and to have access to that screen. MPE uses a TCP connection to talk to the X server, so this must be enabled. If you are using gdm (Gnome Display Manager), allow TCP connections by editing the gdm.conf file thus:

DisallowTCP=false

A restart of gdm (/etc/init.d/gdm restart) is necessary for this change to take effect.

Now the nodes must know which screen to write to. This is done by passing the DISPLAY environmental variable inside of an MPE program. This can either be set by hand in the MPE program or the user environment can be set one and for all. Normally, the contents of DISPLAY will be

%>echo $DISPLAY
:0.0

which implicitly specifies localhost as the X server. If this variable is passed to the MPI ring, each node will then attempt to write to its local X server, which most likely it will not have permission to do. Even if it did, this is not the behavior we want. This can be remedied by explicitly specifying the actual hostname on which the X server lives. Thus

%>DISPLAY=`hostname``echo $DISPLAY`;export DISPLAY

will update the DISPLAY environmental variable. If hostname is xphy1.physics.xterm.net and DISPLAY was originally :0.0, this sets DISPLAY to

xphy1.physics.xterm.net:0.0

This change can be set automatically at xterm logins by adding the above line to the .bashrc file (or the appropriate rc file for your shell). Now every node will know which machine to write to.

If nodes still have trouble writing to the X server, you may need to allow access via xhost by

%>xhost +

which grants everyone access, or more securely

%>xhost +username@host

where username is you username and host is some machine in your ring. This must then be repeated for every machine in your ring.

Personal tools