BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132446Z
LOCATION:Lise Girardin
DTSTART;TZID=Europe/Stockholm:20210705T133000
DTEND;TZID=Europe/Stockholm:20210705T140000
UID:submissions.pasc-conference.org_PASC21_sess105_pap108@linklings.com
SUMMARY:Accelerating Parallel CFD Codes on Modern Vector Processors Using
Blockettes
DESCRIPTION:Paper\n\nAccelerating Parallel CFD Codes on Modern Vector Proc
essors Using Blockettes\n\nYildirim, Mader, Martins\n\nThe performance and
scalability of computational fluid dynamics (CFD) solvers are essential f
or many applications, including multidisciplinary design optimization. Wit
h the evolution of high-performance computing resources such as Intel's Kn
ights Landing and Skylake architectures in the Stampede2 cluster, CFD solv
er performance can be improved by modifying how the core computations are
performed while keeping the mathematical formulation unchanged. In this wo
rk, we introduce a cache-blocking method to improve memory-bound CFD codes
that use structured grids. The overall idea is to split computational blo
cks into smaller, fixed-sized blockettes that are sufficiently small to co
mpletely fit into the available cache size per-processor on each architect
ure. We can fully take advantage of modern vector instruction sets such as
AVX2 and AVX512 on these modern architectures with this approach. Using t
his method, we have achieved up to 3.27 times speedup in the core routines
of the open-source CFD solver, ADflow.\n\nDomain: CS and Math, Engineerin
g
END:VEVENT
END:VCALENDAR