[Defense] Matrix Computations on TensorCore GPU
Wednesday, April 20, 2022
4:00 pm - 7:00 pm
In
Partial
Fulfillment
of
the
Requirements
for
the
Degree
of
Doctor
of
Philosophy
Shaoshuai
Zhang
will
defend
his
dissertation
Matrix
Computations
on
TensorCore
GPU
Abstract
The emergence of neural engines such as Nvidia TensorCore GPU brings a revolution to deep neural networks, as the neural engines can perform extremely fast general matrix multiplications. However, how to deploy other algorithms or applications on neural engines remains questionable. In this dissertation, I try to explore the possibilities of using TensorCore GPU to accelerate BLAS3 operations, linear algebra algorithms and machine learning algorithms on GPUs, hybrid CPU-GPU architecture and distributed system. Specifically, I design TensorCore-based matrix computation algorithms that can work on different architectures. On single GPU, my work include implementing some of the basic linear algebra operations, and these operations can be used in further matrix factorization. In terms of matrix factorization, I devote into developing the recursive QR factorization which utilizes the TensorCore GPU efficiently. I also try to use TensorCore to accelerate the 2-stage Eigen Value Decompostion. In addition, I also try to migrate the scalable CPU-based support vector machine tool to TensorCore, which exhibits a significant speedup and shows better performance performance compared to the state-of-art GPU-based SVD software. On the CPU-GPU hybrid architecture, I go a step further on investigating the recursive strategy, then do a case study of out-of-core QR factorization using the recursive strategy, and the results prove that the recursive algorithm works much better than the conventional algorithm. On the distributed memory system, my current work is developing an unique data structure named Universal Distributed Array (UDA) which has abundant programming flexibility and it can utilize TensorCore as well. Generally speaking, the TensorCore-based algorithms are typically have a very high performance, but it has to face the accuracy loss problem because of using half precision.
Wednesday,
April
20,
2022
4PM
-
7PM
CT
Online
via
聽
Dr. Panruo Wu, dissertation advisor
Faculty, students and the general public are invited.
