BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132449Z
LOCATION:Ernesto Bertarelli
DTSTART;TZID=Europe/Stockholm:20210706T120000
DTEND;TZID=Europe/Stockholm:20210706T123000
UID:submissions.pasc-conference.org_PASC21_sess128_msa139@linklings.com
SUMMARY:Performance Evaluation and Optimization of Cartesian CFD Code CUBE
  on Fugaku
DESCRIPTION:Minisymposium\n\nPerformance Evaluation and Optimization of Ca
 rtesian CFD Code CUBE on Fugaku\n\nAndo, Onishi, Li, Kumahata, Minami...\n
 \nWe conducted a performance evaluation and optimization of the Cartesian 
 CFD code CUBE on supercomputer Fugaku. CUBE is a simulation framework for 
 complex industrial flow problems, such as vehicles' aerodynamics, based on
  a hierarchical Cartesian mesh. At first, we evaluated the performance of 
 the compressible flow solver of CUBE using an ideal benchmark problem. The
  performance of the entire program (Time-stepping loop) indicates 215.84 G
 FLOPS (6.38% of the peak) with the single node, and the weak scaling perfo
 rmance is 91.07% in 27,648 nodes (1,327,104 computational cores) with idea
 l problem settings. This code typically costs in the two subroutines, thus
  convection term and viscous term, so we optimized them. As a result, 2.02
 - and 1.44-fold growth of the computational performance was achieved, and 
 these routines indicate 106.75 GFLOPS (13.95%) and 123.98 GFLOPS (14.70%) 
 with the single CMG (Core Memory Group) in the viscous term and convection
  term, respectively. On the other hand, we evaluated the incompressible fl
 ow solver's scaling performance using the production run model (vehicle mo
 del). The strong scaling test indicates 69.55% for 160 nodes (640 MPI proc
 esses) relative to 10 nodes (40 MPI processes). Furthermore, the weak scal
 ing test indicates 74.11% for 51,200 nodes (204,800 MPI processes) relativ
 e to 100 nodes (400 MPI processes).\n\nDomain: CS and Math, Physics, Engin
 eering
END:VEVENT
END:VCALENDAR
