Bootcamp, Winter 2021     Agenda     Registration     About

Getting Started with the Research Computing Clusters


Instructor bio

Description

This session is a hands-on introduction to the research computing ecosystem at Princeton: the computing clusters (Perseus, Della, Tiger, Traverse), the Tigress storage system, and the Tigressdata machine for data visualization.

After an overview of the different computing systems, their hardware, and the sorts of tasks each system is geared toward, the course gives users a hands-on introduction to technical topics including how to connect to the clusters, how to keep programs running even after disconnecting from a cluster, how to launch jobs through our scheduling software (SLURM), and how to access or install additional software they may need.

Participants will also learn the basic civics of working on Princeton’s shared systems, including where and when to store which sorts of files and data, guidance on how to request the right amount of memory and computing power (including choosing CPUs vs GPUs), and rules of thumb for avoiding delays in or interruptions of your computing jobs (some of which is Princeton-specific).

While this session does cover the basics of how to launch and monitor parallel jobs on Princeton’s systems, this session does not teach users how to write the code for such jobs. Parallel programming is a more extensive topic, covered in more detail during Week 2 of the bootcamp.

Learning objectives

Attendees will come away with the basic skills needed to connect to a research computing cluster, navigate its environment and file system, run programs through the SLURM scheduler, and install and manage their software environment. Participants will also get a high-level overview of different parallel computing paradigms and guidance on how to assess their computing needs in order to use the Princeton resources judiciously.

Knowledge prerequisites

A working facility with the Linux command line is essential for this session. Prospective participants should complete the earlier Linux primer (or its equivalent) before taking this workshop. THERE WILL BE NO REVIEW OF COMMAND-LINE BASICS DURING THIS SESSSION!

Hardware/software prerequisites

Participants in any PICSciE virtual workshop need a Princeton Zoom account. For this session, users should also have an account on our Adroit cluster (another cluster like Tiger or Della is ok), and they should confirm that they can SSH into Adroit at least 48 hours beforehand.

Details on all of the above can be found in the advance setup guide for PICSciE virtual workshops. Those who need extra help setting up should visit the “setup session” on Monday January 18. THERE WILL BE LITTLE TO NO TROUBLESHOOTING DURING THE SESSION ITSELF!

Session format

Lecture, demo, and hands-on

Session Materials

All presentation materials are in this Github repo.

Session Recording

A recording of the session is here (requires active Princeton NetID to view).