BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132448Z
LOCATION:Michel Mayor
DTSTART;TZID=Europe/Stockholm:20210706T110000
DTEND;TZID=Europe/Stockholm:20210706T113000
UID:submissions.pasc-conference.org_PASC21_sess126_msa120@linklings.com
SUMMARY:New Linear-Scaling Methods and their Acceleration with FPGAs
DESCRIPTION:Minisymposium\n\nNew Linear-Scaling Methods and their Accelera
 tion with FPGAs\n\nSchade, Lass, Kühne, Plessl\n\nWe present the submatrix
  method, a novel linear-scaling DFT method, as well as the implementation 
 of the technique in CP2K. Even though initially proposed for inverse p-th 
 roots[1], it has recently been recognized that the submatrix method repres
 ents a general method to approximate arbitrary matrix functions such as th
 e matrix-sign function of large sparse matrices. The Matrix-sign function 
 is the essential workhorse of linear-scaling DFT, and we present an intuit
 ive chemical justification for the accuracy of the submatrix method. We wi
 ll discuss the efficient implementation of the submatrix method into CP2K 
 with a special focus on limiting communication between compute nodes. The 
 resulting compute kernel is the sign function of a relatively small but de
 nse matrix. Our optimized implementation with a simple diagonalization-bas
 ed evaluation of the sign function of the submatrices outperforms the Newt
 on-Schulz Sign iteration in initial results [2], especially for larger cut
 offs of matrix elements. This observation shows that the submatrix method 
 will be a valuable tool in the context of approximate computing. The chall
 enge of power efficiency in large-scale linear-scaling DFT calculations ca
 n be addressed by employing the flexibility of FPGAs or compute units like
  tensor cores on GPUs when performing calculations in lower precision than
  conventional double precision. We show that this is indeed the case for t
 he compute kernel of the submatrix method and describe the acceleration wi
 th FPGAs and tensor-cores of GPUs.<br />[1] https://doi.org/10.1145/321817
 6.3218231<br />[2] https://arxiv.org/abs/2003.03868\n\nDomain: CS and Math
 , Chemistry and Materials, Physics, Engineering
END:VEVENT
END:VCALENDAR