[Defense] Computationally Efficient DNN Mapping Search Heuristic using Reinforcement Learning
Monday, November 28, 2022
12:00 pm - 1:00 pm
In
Partial
Fulfillment
of
the
Requirements
for
the
Degree
of
Doctor
of
Philosophy
Suyash
Bakshi
will
defend
his
proposal
Computationally
Efficient
DNN
Mapping
Search
Heuristic
using
Reinforcement
Learning
Abstract
Achieving high performance and energy efficiency for DNNs require architecture specific optimizations. With the number of possible loop tilings, loop orders and assignments of related data to memories in a memory hierarchy and processors in a parallel platform being of the order of 1035 for many 3D data sets and convolution layers hand-tuning is not realistic. Further, the search space for objective criteria such as execution time, power or energy is non-smooth and non-convex. In this research, we propose a Reinforcement Learning (RL) search heuristic for finding loop tilings, orderings and parallelization of perfect deep loop-nests that make efficient use of memory hierarchies and parallelization as measured by execution time, energy, and their combination, the Energy-Delay Product (EDP). The aim of our RL approach is to reduce the computational effort for similar, or better, quality mappings compared to published mapping methods. Our novel contributions include 1) a RL search heuristic that uses a reward function based on operand potential data reuse and 2) a linearized representation of mappings that generalizes to n-deep nested loops and parallel hardware architectures with a per processor m-level memory hierarchy and a global memory. We implement a stochastic ‘policy’ for the RL agent action selection enabling exploration of the space of possible mappings. The potential operand reuse for a mapping is computationally inexpensive compared to determining the actual reuse and has been found to yield good quality mappings with less total computational effort than other evaluated methods for comparable mapping qualities. The potential operand reuse is also used for pruning, by not evaluating the quality of mappings below a threshold. The linearly ordered loop tiles and orders is an effective way to determine the next mapping to assess with respect to mapping quality criteria and to assure that only mappings feasible with respect to the hardware are considered. Processor associated memory is treated as scratchpad memory. We compare the computational requirements and mapping qualities of our framework to three state-of-the-art published mapping search frameworks, Mind Mappings (supervised learning search), Chameleon (RL based search), and Timeloop (random search). Mind Mapping published results use an extensive training dataset and use the training for the search. Our RL method, Chameleon and Timeloop does not use any training data sets. Both our RL approach and Chameleon are guided search methods expending computations in seeking to find good quality mappings at less cost that randomly selected mappings, as in Timeloop. Compared to Chameleon our reward function is computationally much less demanding than the Chameleon execution-time based reward function. In our current preliminary analysis assessing 30720 mappings for each of 19 3D convolution layer for search space sizes varying between 2.4x1018 and 2.6x1035, our RL method yielded mappings with on average 10% lower EDP and a maximum improvement of 35% for the 19 layers of 10 random Timeloop trials. The computational expense for our RL method on average was 37.1% for one Timeloop trial assessing the 30720 mappings. Thus, the pruning strategy of our RL more than make up for the computational expense in performing a guided search. A comprehensive statistical evaluation is planned. Compared to Chameleon RL guided search method preliminary analysis indicates a computational advantage of our method using similar sized neural networks for the RL agent by using a computationally inexpensive reward function. A careful analysis is planned also for the comparison with Chameleon. Finally, in comparison with Mind Mapping the training set used in the publication of the method was 1 Million mappings whereas in our method good mappings were achieved after considering a few ten’s of thousands of mappings. A more rigorous comparison is planned.
Monday,
November
28,
2022
12:00PM
-
1:00PM
CT
Online
via
Dr. Lennart Johnsson, dissertation advisor
Faculty, students, and the general public are invited.
