Cluster Time-saving Tricks
From Debian Clusters
SSH Keys
Both of the two tricks below will require authentication on each one of the worker nodes. In other words, when you SSH in to run a command, or run rsync (which runs over SSH), you'll be prompted for the password for that machine. There's an easier way than typing in the password every single time. Setting up password-less SSH allows the machine you're connecting from to authentication automatically with the machine you're connecting to. This can be a real time saver!
Running Commands with SSH
Whenever I need to something on all of my nodes, say an apt-get update && apt-get upgrade, or run a command to see what mpi daemons are running, it takes a lot of time to ssh into each node, wait for it to finish, then try to remember which ones I haven't done yet, and continue the process. I only have eight worker nodes and it's a pain; on a production cluster with tens or even hundreds of nodes, that would take much, much longer.
Rather than doing that, I use simple scripting to run commands. The first thing I do is keep a list of all my hosts in a file in root's home directory on the head node. It just looks like this:
eagle goshawk harrier kestrel kite osprey owl peregrine
Pretty simple, but saves a lot of time not having to recreate that list all the time. Then, whenever I want to, say apt-get update, I write a little script at the command line from the head node. You can type it as it's show below, hitting shift+enter to create new lines -
gyrfalcon:~# for x in `cat ~/machines` > do > ssh $x apt-get update > done
or all one one line, like this
for x in `cat ~/machines`; do ssh $x apt-get update; done
You can replace apt-get update with whatever command you want to run. If it's an interactive command (like apt-get upgrade), you'll have interactive session with the host until that command finishes.
Copying a File to All the Nodes
I also use my ~/machines file on my head node for copying a file out to all the worker nodes using the command rsync. (You might have used rsync to image your worker nodes if you followed the Cloning Worker Nodes with Rsync tutorial.) If I want to copy, say, mycoolscript to all the nodes, I would run (from the head node)
gyrfalcon:~# for x in `cat ~/machines` > do > rsync -plarv ~/mycoolscript root@$x:~/ > done
or as one line,
for x in `cat ~/machines`; do rsync -plarv ~/mycoolscript root@$x:~/; done

