BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132456Z
LOCATION:
DTSTART;TZID=Europe/Stockholm:20210706T173000
DTEND;TZID=Europe/Stockholm:20210706T190000
UID:submissions.pasc-conference.org_PASC21_sess182_post153@linklings.com
SUMMARY:P27 - Single- and Two-Level Dynamic Load Balancing of Scientific A
 pplications
DESCRIPTION:Poster\n\nP27 - Single- and Two-Level Dynamic Load Balancing o
 f Scientific Applications\n\nMohammed, Eleliemy, Müller Korndörfer, Cabezó
 n, Ciorba\n\nModern high performance computing (HPC) systems exhibit incre
 ased hardware parallelism at node level (hundreds of compute cores) and at
  system level (hundreds of nodes). Exploiting this hardware paralleli
 sm across and within compute nodes, optimally and concurrently, is challen
 ging. Computationally-intensive scientific applications often consist of l
 arge, data-parallel loops, typically parallelized to exploit the
  multi-level hardware parallelism across nodes using MPI and within nodes 
 using OpenMP. Efficient scheduling of work in such scientific applica
 tions is crucial to achieving high performance. Current scheduling approac
 hes statically and/or repeatedly partition and assign the work across
  the allocated nodes using MPI and employ standard OpenMP work sharing met
 hods among the cores within a node. In this work we show the impact o
 f load balancing at both MPI and OpenMP levels on the overall pe
 rformance of scientific applications. To this end, we employ sta
 te-of-the-art dynamic loop scheduling methods implemented in a load b
 alancing library for MPI processes (DLS4LB) as well as in a runt
 ime load balancing library for OpenMP threads (eOMPRTL). Dynamic load
  balancing (DLB) at both MPI and OpenMP levels showed significant per
 formance gains over DLB at either individual level, as well as reveal
 ed the interplay between load imbalance at process and thre
 ad levels.
END:VEVENT
END:VCALENDAR
