The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture.
This session provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor – a powerful profiling tool – to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting.
This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes. Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them.
Basic Linux, experience with C/C++, and a basic familiarity with the Princeton research computing clusters.
Participants in any PICSciE virtual workshop need a Princeton Zoom account. For this session, users should also have an account on the Adroit cluster, and they should confirm that they can SSH into Adroit at least 48 hours beforehand. Details on all of the above can be found in the advance setup guide for PICSciE virtual workshops.
Lecture, demo, and hands-on
Presentation materials are here. Code samples for the hands-on exercises are in this Github repo.
A recording of the session is here (requires active Princeton NetID to view).