Table of Contents
- Getting an Account on the Computers
- Using the Computers
- References for Further Learning
Getting an Account on the Computers
To get an account on the Tigress system have your faculty advisor
send a request to cses@princeton.edu for a new account on tiger
.
An account on the tigercpu
cluster gives you access to:
tigercpu
: the cluster,tigressdata
: a visualization node,jupyter.rc
: a JupyterHub host.
Using the Computers
Which Computer Should I Use for My Research Project?
When you get an account on the clusters to collaborate with
researchers in the climate group, you are in effect given access to
three machines tigercpu
, tigressdata
and jupyterhub
:
tigercpu
is used to run simulations that require a lot of computational power.tigressdata
is used for visualization and post-processing jobs. This machine runs a remote desktop which can make it easier to interact with the data,jupyterhub
is a Jupyter notebook server. It runs the JupyterHub software.
So where should you start?
- If all you need is run some Jupyter notebooks on some climate
data, then
jupyterhub
is the best place, - If you need to run other programs such as
Matlab
,IDL
,ncview
, on climate data, then usetigressdata
. One easy way to use tigressdata is through the remote desktop TurboVNC. - When you get started, unless you are going to run simulations, you should only access tigressdata.
- If you need to run computationally intensive jobs, then you will
need to use
tigercpu
. It’s best to talk with your research advisor to determine whether your work will require tiger.
Logging in to the Computers
tigercpu or tigressdata: use an ssh client
You log in to either tigercpu
or tigressdata
through the ssh
protocol. The remote
machines run the ssh
server and you use an ssh
client to it.
Which ssh
client to use depends on the operating system on your
laptop or desktop:
macOS
comes with anssh
client, so you don’t need to install anything. To access it you will need to start theTerminal
application.Windows
:Windows
does not come with an ssh client installed, so you need to install one yourself. There are multiplessh
client available forWindows
. OIT recommends using MobaXterm. Other popular options are PuTTY and, on Windows 10 and higher installing you can install Install Windows Subsystem for Linux (WSL). If you need help installing or connecting to the remote computers, you can go to the OIT Tech Clinic in the Frist Campus Center.
When you connecting on a remote host you may need to use the FQDN (Fully Qualified Domain Name) in your ssh
client application, they are:
tigercpu.princeton.edu
,tigressdata.princeton.edu
.
Jupyterhub
Jupyterhub
web based and you access it by simply going to
https://jupyter.rc.princeton.edu/hub/home
in a web browser.
This section:
jupyter.rc
explains how to run Jupyter notebooks on jupyterhub
.
Connecting from off-campus: use the VPN
You can only access tigercpu
, tigressdata
or jupyterhub
in either of two scenarios:
- you are on campus, or,
-
you are using the VPN. The instructions for installing the VPN on your machine are here:
GlobalProtect VPN: Installation Instructions
The OIT Tech Clinic in the Frist Campus Center can help you install the VPN on your machine.
What is your username? Princeton NetID
Your username on the Research Computing machines is your Princeton NetID.
Unless you have an alias, your netid is the first part of your
Princeton email address. For instance is your Princeton email
address is jdoe@princeton.edu
then your netid
is most likely
jdoe
.
To be sure what you netid
is, go to the University’s web site:
https://www.princeton.edu
and search for your name, click on the People
result, look for the NetID
field.
Using the Computers
The Operating System: Linux
The Operating System (OS) on the Research Computing (RC) computers is called Linux. The best way to interact with those computers is through the command line, which is a departure from the Graphical User Interfaces (GUI) that come with the macOS or Windows.
You should spend some time learning the fundamentals of using the command line, not only will it make you more efficient, but avoiding learning it will cost you a lot of time. There are a lot of resources online to learn Linux, here are some recommendations:
- Linux Tutorial - Learn the Bash Command Line: this is a well written tutorial that covers the basic operations. It is a good place to get started.
- LinuxCommand.org: Learning the shell:
- This is written from the point view of someone running Linux on
their local machine. So the first part is about getting a shell on
your local machine. In your case you do not need a Terminal
running on your local machine, instead you connect on either
tigressdata
ortigecpu
to access a shell. But those sections are relevant and important:- Learning the shell - Lesson 2: Navigation
- Learning the shell - Lesson 3: Looking around
- You can ignore: Learning the shell - Lesson 4: A Guided Tour
- Learning the shell - Lesson 5: Manipulating Files
- Learning the shell - Lesson 6: Working with Commands
- Initialy, you can ignore: Learning the shell - Lesson 7: I/O Redirection
- Learning the shell - Lesson 8: Expansion
- Learning the shell - Lesson 9: Permissions
- Learning the shell - Lesson 10: Job Control
- You can ignore the rest at the beginning: LinuxCommand.org: Writing shell scripts.
- This is written from the point view of someone running Linux on
their local machine. So the first part is about getting a shell on
your local machine. In your case you do not need a Terminal
running on your local machine, instead you connect on either
- Software Capentry: The Unix Shell
- and a summary can be found here: Intro to Unix.
- If you prefer learning by watching videos, you can look at these:
- LinkedIn: Unix for macOS Users: Princeton University has a subcription to LinkedIn learning. Even though it is written for macOS, section 1 through 8 are relevant to Linux.
- O’Reilly: Linnux command line: Princeton University has a subscription to O’Reilly.
Using a Remote Desktop on tigressdata: TurboVNC
You can get a full Linux desktop environment on tigressdata
through
a remote desktop software called TurboVNC. The primary use of TurboVNC
is to use visualization software remotely in an efficient maner. There
are two added benefits:
- Your TurboVNC session is stays open until
tigressdata
is rebooted. This means that you can start working in one location, close your laptop, go somewhere else and resume your work: the processes you started are still running. This is to be contrasted to connected through thessh
client where your process are killed as soon as thessh
session is dropped. - Having a full graphical desktop environment makes it easier to
interact with the operating system. You can use the graphical
interface to manipulate and edit files for example. But remember
that
TurboVNC
is only available ontigressdata
.
To use TurboVNC
you need to install and configure it. One good
reference on how to use it on the RC systems is:
How do I use VNC on Tigressdata?
The OIT Tech Clinic can also help you install it and use it on tigressdata
Data Storage
There are multiple places where you can store the data for your project. There are two major types of storage:
- storage that is reserved for a specific user,
- storage that is shared with the climate modelling group.
The storage locations reserved for user NetID
are:
/home/NetID
: each of the three machines have their own/home
partitions that only a specifc machine can access./tigress/NetID
and/scratch/gpfs/NetID
: all three machines share these partitions. Note that/scratch/gpfs/NetID
is accessed fromtigressdata
andjupyther
from/tiger/scratch/gpfs/NetID
The storage locations shared by the group are:
/projects/GEOCLIM
and/tigress/GEOCLIM
:/tigress/GEOCLIM
is an alias (a symbolic link) to/projects/GEOCLIM
./scratch/gpfs/GEOCLIM
: Note that/scratch/gpfs/GEOCLIM
is accessed fromtigressdata
andjupyther
from/tiger/scratch/gpfs/GEOCLIM
.
The figure below shows the different storage locations as well the machines that can access them. A machine can access a storage location if it has an arrow pointing to it.
There are three factors that differentiate the filesystems /home
, (/tigress
, /projects
) and /scratch/gpfs
:
- size:
/home
is limited, (/tigress
,/projects
) and/scratch/gpfs
are large. - speed of access:
/home/
and/scratch/gpfs
are fast,/tigress
and/projects
are slow. - backup:
/home
is backed up every day,/tigress
and/projects
are backed weekly,/scratch/gpfs
is not backed up.
Selecting a location for your data can be overwhelming at first, so to get started, assuming that your are working in the Resplandy group, follow those steps:
-
Create your own directory in
/projects/GEOCLIM/LRGROUP
e.g.:$ mkdir /projects/GEOCLIM/LRGROUP/$USER
where
$USER
should be automatically replaced by yourNetID
. -
Store your data there.
Data import
Getting datasets onto the filesystem tigress
(which can be accessed by all the machines above)
can be done in multiple ways:
- Download to local machine and transfer to remote (easy but only works for
medium sized
datasets, which fit onto your local harddisk)
Download to local machine and transfer to remote
Download your dataset to a location on your harddrive (e.g. ~/Downloads
).
From there you can copy the file to the remote filesystem by using
scp ~/Downloads/<yourfile> <username>@tigressdata.princeton.edu:/tigress/<username>/
The words in <...>
need to be replaced with specific filesnames and your princeton username.
If you have set up SSH keys (e.g. if you log into tigressdata
with ssh tigressdata
),
you can simplify the command above to:
scp ~/Downloads/<yourfile> tigressdata:/tigress/<username>/
Now the file is in your folder on tigress and you can load it into your jupyter notebook,
by using the path /tigress/<username>/<yourfile>
.
Always make a
README_<yourfile>.txt
file that describes where you got the data (links) and what is in the file. Copy that.txt
file like you did the datafile.
References for Further Learning
- This is an interesting online book that Julius Busecke contributed to: An Introduction to Earth and Environmental Data Science.