BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132446Z
LOCATION:Lise Girardin
DTSTART;TZID=Europe/Stockholm:20210705T133000
DTEND;TZID=Europe/Stockholm:20210705T140000
UID:submissions.pasc-conference.org_PASC21_sess105_pap108@linklings.com
SUMMARY:Accelerating Parallel CFD Codes on Modern Vector Processors Using 
 Blockettes
DESCRIPTION:Paper\n\nAccelerating Parallel CFD Codes on Modern Vector Proc
 essors Using Blockettes\n\nYildirim, Mader, Martins\n\nThe performance and
  scalability of computational fluid dynamics (CFD) solvers are essential f
 or many applications, including multidisciplinary design optimization. Wit
 h the evolution of high-performance computing resources such as Intel's Kn
 ights Landing and Skylake architectures in the Stampede2 cluster, CFD solv
 er performance can be improved by modifying how the core computations are 
 performed while keeping the mathematical formulation unchanged. In this wo
 rk, we introduce a cache-blocking method to improve memory-bound CFD codes
  that use structured grids. The overall idea is to split computational blo
 cks into smaller, fixed-sized blockettes that are sufficiently small to co
 mpletely fit into the available cache size per-processor on each architect
 ure. We can fully take advantage of modern vector instruction sets such as
  AVX2 and AVX512 on these modern architectures with this approach. Using t
 his method, we have achieved up to 3.27 times speedup in the core routines
  of the open-source CFD solver, ADflow.\n\nDomain: CS and Math, Engineerin
 g
END:VEVENT
END:VCALENDAR
