BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132456Z
LOCATION:
DTSTART;TZID=Europe/Stockholm:20210706T173000
DTEND;TZID=Europe/Stockholm:20210706T190000
UID:submissions.pasc-conference.org_PASC21_sess182_post163@linklings.com
SUMMARY:P32 - StencilFlow: Mapping Large Stencil Programs to Distributed D
 ataflow Systems
DESCRIPTION:Poster\n\nP32 - StencilFlow: Mapping Large Stencil Programs to
  Distributed Dataflow Systems\n\nKuster, de Fine Licht, Hoefler\n\nAccurat
 e and reliable weather forecast is of vital importance for a broad field o
 f industries and the general public. Highly regular and statically analyza
 ble stencil operators on structured grids are used to numerically solve th
 e partial differential equations of such weather prediction models. Since 
 technology has hit the power-wall for air-cooled CMOS fabrics, future high
 -performance architectures must increase the fraction of energy spent on c
 omputations to continue scaling. By implementing custom dataflow architect
 ures, there is a potential to greatly reduce control and data movement ove
 rhead of such stencil programs. We introduce StencilFlow, a framework for 
 mapping stencil programs to FPGAs, offering a complete toolchain from inpu
 t data analysis, optimization, and simulation, to generation of optimized 
 code for reprogrammable devices. We formalize the input program as a direc
 ted acyclic graph of streaming modules, and simultaneously optimize for ma
 ximum utilization of on-chip memory capacity and off-chip memory bandwidth
 . The optimized stencil program is mapped onto FPGA hardware using the DaC
 e framework. As a realistic use case, we study the COSMO numerical weather
  forecasting model, currently running in production on a hybrid CPU/GPU ar
 chitecture. With StencilFlow, high-level users can rapidly implement, opti
 mize, debug, and synthesize large stencil programs for distributed FPGA sy
 stems.
END:VEVENT
END:VCALENDAR
