BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132448Z
LOCATION:Michel Mayor
DTSTART;TZID=Europe/Stockholm:20210706T110000
DTEND;TZID=Europe/Stockholm:20210706T113000
UID:submissions.pasc-conference.org_PASC21_sess126_msa120@linklings.com
SUMMARY:New Linear-Scaling Methods and their Acceleration with FPGAs
DESCRIPTION:Minisymposium\n\nNew Linear-Scaling Methods and their Accelera
tion with FPGAs\n\nSchade, Lass, Kühne, Plessl\n\nWe present the submatrix
method, a novel linear-scaling DFT method, as well as the implementation
of the technique in CP2K. Even though initially proposed for inverse p-th
roots[1], it has recently been recognized that the submatrix method repres
ents a general method to approximate arbitrary matrix functions such as th
e matrix-sign function of large sparse matrices. The Matrix-sign function
is the essential workhorse of linear-scaling DFT, and we present an intuit
ive chemical justification for the accuracy of the submatrix method. We wi
ll discuss the efficient implementation of the submatrix method into CP2K
with a special focus on limiting communication between compute nodes. The
resulting compute kernel is the sign function of a relatively small but de
nse matrix. Our optimized implementation with a simple diagonalization-bas
ed evaluation of the sign function of the submatrices outperforms the Newt
on-Schulz Sign iteration in initial results [2], especially for larger cut
offs of matrix elements. This observation shows that the submatrix method
will be a valuable tool in the context of approximate computing. The chall
enge of power efficiency in large-scale linear-scaling DFT calculations ca
n be addressed by employing the flexibility of FPGAs or compute units like
tensor cores on GPUs when performing calculations in lower precision than
conventional double precision. We show that this is indeed the case for t
he compute kernel of the submatrix method and describe the acceleration wi
th FPGAs and tensor-cores of GPUs.
[1] https://doi.org/10.1145/321817
6.3218231
[2] https://arxiv.org/abs/2003.03868\n\nDomain: CS and Math
, Chemistry and Materials, Physics, Engineering
END:VEVENT
END:VCALENDAR