BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132455Z
LOCATION:
DTSTART;TZID=Europe/Stockholm:20210706T173000
DTEND;TZID=Europe/Stockholm:20210706T190000
UID:submissions.pasc-conference.org_PASC21_sess182_post107@linklings.com
SUMMARY:P02 - Porting Nek5000 on GPUs Using OpenACC and CUDA
DESCRIPTION:Poster\n\nP02 - Porting Nek5000 on GPUs Using OpenACC and CUDA
 \n\nJocksch, Gong, Jansson, Peplinski, Gray...\n\nNek5000 is a spectral el
 ement code for fluid dynamics applications. We revisit the existing OpenAC
 C port [1,2] and obtain speedup of 40% by rearrangement of loops. A distin
 ctive feature of the code is small dense matrix-matrix multiplications lea
 ding to an irregular memory access pattern. The most dominant of these ope
 rations is involved in three main subroutines applied to the linear solver
 s of pressure and velocities. The key hardware feature is the "shared memo
 ry" on the so-called streaming multiprocessor (SMX). The code reads data i
 nto the shared memory completely before computations of the multiplication
 s. Thus it is a limitation that every spectral element needs to fit on one
  SMX -- the polynomial order is limited by the size of the shared memory. 
 With shared memory an additional speedup of 15% is obtained. In order to a
 chieve this, time critical routines are rewritten in CUDA C. The alternati
 ve existing implementation nekRS uses the abstraction layer libocca. For t
 his code also our shared memory approach should be used. Our study is carr
 ied out for the pipe simulations.<br />[1] Otero,E., Gong,J., Min,M., Fisc
 her,P., Schlatter,P., and Laure,E., J. Parallel Distrib. Comp., Vol(132),6
 9-78(2019)<br /> [2] Gong, J. et al. J Supercomput 72, 4160–4180(201
 6)
END:VEVENT
END:VCALENDAR
