View on GitHub

climate-hpc

Documentation for getting started in the Climate Modeling group at Princeton University

Table of Contents

  1. Getting an Account on the Computers
  2. Using the Computers
    1. Which Computer Should I Use for My Research Project?
    2. Logging in to the Computers
      1. tigercpu or tigressdata: use an ssh client
      2. Jupyterhub
      3. Connecting from off-campus: use the VPN
      4. What is your username? Princeton NetID
    3. Using the Computers
      1. The Operating System: Linux
      2. Using a Remote Desktop on tigressdata: TurboVNC
    4. Data Storage
    5. Data Import
  3. References for Further Learning

Getting an Account on the Computers

To get an account on the Tigress system have your faculty advisor send a request to cses@princeton.edu for a new account on tiger.

An account on the tigercpu cluster gives you access to:

Using the Computers

Which Computer Should I Use for My Research Project?

When you get an account on the clusters to collaborate with researchers in the climate group, you are in effect given access to three machines tigercpu, tigressdata and jupyterhub:

So where should you start?

Logging in to the Computers

tigercpu or tigressdata: use an ssh client

You log in to either tigercpu or tigressdata through the ssh protocol. The remote machines run the ssh server and you use an ssh client to it.

Which ssh client to use depends on the operating system on your laptop or desktop:

When you connecting on a remote host you may need to use the FQDN (Fully Qualified Domain Name) in your ssh client application, they are:

Jupyterhub

Jupyterhub web based and you access it by simply going to https://jupyter.rc.princeton.edu/hub/home in a web browser.

This section: jupyter.rc explains how to run Jupyter notebooks on jupyterhub.

Connecting from off-campus: use the VPN

You can only access tigercpu, tigressdata or jupyterhub in either of two scenarios:

  1. you are on campus, or,
  2. you are using the VPN. The instructions for installing the VPN on your machine are here:

    GlobalProtect VPN: Installation Instructions

    The OIT Tech Clinic in the Frist Campus Center can help you install the VPN on your machine.

What is your username? Princeton NetID

Your username on the Research Computing machines is your Princeton NetID.

Unless you have an alias, your netid is the first part of your Princeton email address. For instance is your Princeton email address is jdoe@princeton.edu then your netid is most likely jdoe.

To be sure what you netid is, go to the University’s web site: https://www.princeton.edu and search for your name, click on the People result, look for the NetID field. img

Using the Computers

The Operating System: Linux

The Operating System (OS) on the Research Computing (RC) computers is called Linux. The best way to interact with those computers is through the command line, which is a departure from the Graphical User Interfaces (GUI) that come with the macOS or Windows.

You should spend some time learning the fundamentals of using the command line, not only will it make you more efficient, but avoiding learning it will cost you a lot of time. There are a lot of resources online to learn Linux, here are some recommendations:

Using a Remote Desktop on tigressdata: TurboVNC

You can get a full Linux desktop environment on tigressdata through a remote desktop software called TurboVNC. The primary use of TurboVNC is to use visualization software remotely in an efficient maner. There are two added benefits:

  1. Your TurboVNC session is stays open until tigressdata is rebooted. This means that you can start working in one location, close your laptop, go somewhere else and resume your work: the processes you started are still running. This is to be contrasted to connected through the ssh client where your process are killed as soon as the ssh session is dropped.
  2. Having a full graphical desktop environment makes it easier to interact with the operating system. You can use the graphical interface to manipulate and edit files for example. But remember that TurboVNC is only available on tigressdata.

To use TurboVNC you need to install and configure it. One good reference on how to use it on the RC systems is: How do I use VNC on Tigressdata? The OIT Tech Clinic can also help you install it and use it on tigressdata

Data Storage

There are multiple places where you can store the data for your project. There are two major types of storage:

  1. storage that is reserved for a specific user,
  2. storage that is shared with the climate modelling group.

The storage locations reserved for user NetID are:

The storage locations shared by the group are:

The figure below shows the different storage locations as well the machines that can access them. A machine can access a storage location if it has an arrow pointing to it.

img

There are three factors that differentiate the filesystems /home, (/tigress, /projects) and /scratch/gpfs:

  1. size: /home is limited, (/tigress, /projects) and /scratch/gpfs are large.
  2. speed of access: /home/ and /scratch/gpfs are fast, /tigress and /projects are slow.
  3. backup: /home is backed up every day, /tigress and /projects are backed weekly, /scratch/gpfs is not backed up.

Selecting a location for your data can be overwhelming at first, so to get started, assuming that your are working in the Resplandy group, follow those steps:

  1. Create your own directory in /projects/GEOCLIM/LRGROUP e.g.:

    $ mkdir /projects/GEOCLIM/LRGROUP/$USER
    

    where $USER should be automatically replaced by your NetID.

  2. Store your data there.

Data import

Getting datasets onto the filesystem tigress (which can be accessed by all the machines above) can be done in multiple ways:

  1. Download to local machine and transfer to remote (easy but only works for medium sized datasets, which fit onto your local harddisk)

Download to local machine and transfer to remote

Download your dataset to a location on your harddrive (e.g. ~/Downloads).

From there you can copy the file to the remote filesystem by using

scp ~/Downloads/<yourfile> <username>@tigressdata.princeton.edu:/tigress/<username>/

The words in <...> need to be replaced with specific filesnames and your princeton username. If you have set up SSH keys (e.g. if you log into tigressdata with ssh tigressdata), you can simplify the command above to:

scp ~/Downloads/<yourfile> tigressdata:/tigress/<username>/

Now the file is in your folder on tigress and you can load it into your jupyter notebook, by using the path /tigress/<username>/<yourfile>.

Always make a README_<yourfile>.txt file that describes where you got the data (links) and what is in the file. Copy that .txt file like you did the datafile.

References for Further Learning