BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210916T132456Z
LOCATION:
DTSTART;TZID=Europe/Stockholm:20210706T173000
DTEND;TZID=Europe/Stockholm:20210706T190000
UID:submissions.pasc-conference.org_PASC21_sess182_post178@linklings.com
SUMMARY:P38 - Extreme-Scale Tile Low-Rank Cholesky Factorization Using the
  PaRSEC Task-Based Runtime
DESCRIPTION:Poster\n\nP38 - Extreme-Scale Tile Low-Rank Cholesky Factoriza
 tion Using the PaRSEC Task-Based Runtime\n\nCao, Pei, Akbudak, Mikhalev, B
 osilca...\n\nThis work investigates the necessary capabilities of a task-b
 ased runtime for efficient low-rank matrix computations. Unlike their dens
 e counterparts, variable tile ranks in low-rank computations generate a si
 gnificant computational, memory and communication load imbalance, dependen
 t on the input data. This highly dynamic behavior stresses many aspects of
  the underlying programming model, making it difficult to implement portab
 le solutions that remains efficient at any problems and platforms sizes. T
 he target algorithm, tile low-rank (TLR) Cholesky, represents one of the m
 ost critical matrix factorizations widely used in large-scale scientific a
 pplications. Our implementation, over the PaRSEC task-based runtime system
 , provides a complete redesign of the tile-based dense Cholesky algorithm,
  starting from a hierarchical view of the input data, to a dynamic mapping
  of executions on heterogenous resources. Moreover, our algorithm minimize
 s the tile rank growth, a necessary condition not only for bounding the co
 mputational intensity of the algorithm but also for limiting the memory re
 quired to store the temporary data, by taking advantage of the associativi
 ty of the matrix-update and reordering the operations. We describe these n
 ew algorithmic approaches and how they are mapped on PaRSEC capabilities, 
 providing a highly portable and efficient implementation on different homo
 geneous and heterogeneous architectures.
END:VEVENT
END:VCALENDAR
