Server Tips#

Check current utilization gsemon#

You can use the command gsemon on any of our clusters. It will open a figure showing the current utilization of the GSE CPU cluster or, if you give the flag --gpu, of the GSE GPU cluster. This can be useful if you plan to run a heavy job and want to know which machines are currently busy. You can type gsemon --help to see all the options that the command takes. (Note: You need to connect to the server with the -Y flag, hence ssh -Y user@server, so the figure can be displayed on your computer.)

gsemon

Jupyter on the server#

If you want to use Jupyter (or generally IPython) on the cluster it is best to start a kernel on the cluster, and listen to it from your local computer.

  1. Log in to the server (e.g., palmyra), and start your jupyter kernel in the following way (you need to have your own Python environment installed):

    nice jupyter lab --no-browser --port=xxxx & disown
    

    (=> Reminder: Always use nice in front of your heavy commands on the server!)
    xxxx is a four-digit number of your choice. The kernel will start, and you should see a long link, something like http://localhost:xxxx/lab?token=a-long-string-of-random-characters The important parts here are:

    1. Check the four digit number xxxx: if it is different from what you put in, then that number was already occupied and you have to use the number it shows.

    2. The entire link

  2. If you have copied the link you can logout of the server again.

  3. On your local machine you have to listen to this port (the four digit number) in the following way:
    ssh -N -L xxxx:localhost:xxxx username@servername.citg.tudelft.nl
    => Replace xxxx with your four digits, username with your NetID, and servername with the server name where you started the kernel. Note: You need to be in the TUD network for this to work.

  4. Open locally your browser and paste the long link - Voilá, you have a Jupyter instance in your local browser, computing on the server.

Stopping the kernel is unfortunately a bit trickier.

  1. Login to the server.

  2. Try jupyter lab stop xxxx (again, xxxx are your four digits). This can work, but often it does not, throwing an error.

  3. If the second step throws an error, type ps aux | grep xxxx, get the PID, and kill it via kill PID.

If .bashrc is not sourced: .profile#

Logging in through ssh to a server might not source the .bashrc file. To fix this, create a file called .profile in your home directory with the following content:

# if running bash
if [ -n "$BASH_VERSION" ]; then
    # include .bashrc if it exists
    if [ -f "$HOME/.bashrc" ]; then
    . "$HOME/.bashrc"
    fi
fi

Running jobs while terminal is closed or terminated#

Starting a job/application in a terminal with your normal command (always use nice!) and finishing with an &,

[jthorbecke@samoa ~]$ nice ./application_run &

places the job in the background. The command bg show the jobs running in the background in the terminal. These jobs keep running while the terminal is active. Closing the terminal will also terminate all the jobs running in the background of this terminal. To avoid terminating jobs after closing a terminal the command nohup (no hangup) can be used:

Starting it with nohup and &,

[jthorbecke@samoa ~]$ nice nohup ./application_run &

places the job also in the background, and in addition it will not be terminated when the terminal is closed. The output of the ./application_run is collected in the file nohup.out.

There is also the command screen that sets-up a terminal (screen) session that you can enter and leave from your terminal. This screen session also keeps running while the terminal closes. Opening a new terminal the screen session can be attached again and all commands and running jobs of that screen session are still there. The manual page man screen has more information.