BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132447Z
LOCATION:Ernesto Bertarelli
DTSTART;TZID=Europe/Stockholm:20210709T120000
DTEND;TZID=Europe/Stockholm:20210709T123000
UID:submissions.pasc-conference.org_PASC21_sess189_pap121@linklings.com
SUMMARY:Algorithm-Hardware Co-design of a Discontinuous Galerkin Shallow-W
 ater Model for a Dataflow Architecture on FPGA
DESCRIPTION:Paper\n\nAlgorithm-Hardware Co-design of a Discontinuous Galer
 kin Shallow-Water Model for a Dataflow Architecture on FPGA\n\nKenter, Sha
 mbhu, Faghih-Naini, Aizinger\n\nWe present the first FPGA implementation o
 f the full simulation pipeline of a shallow water code based on the discon
 tinuous Galerkin method. Using OpenCL and following an algorithm-hardware 
 co-design approach, the software reference is transformed into a dataflow 
 architecture that can process a full mesh element per clock cycle. The nov
 el projection approach on the algorithmic level complements the pipeline a
 nd memory optimizations in the hardware design. With this, the FPGA kernel
 s for different polynomial orders outperform the CPU reference by 43x -- 1
 44x in a strong scaling benchmark scenario. A performance model can explai
 n the measured FPGA performance of up to 717 GFLOPS accurately.\n\nDomain:
  CS and Math
END:VEVENT
END:VCALENDAR
