Optimizing implementation of CNN inferences: change the model or the architecture?

Alexandre Honorat

apr. 04 2021

Zoom

CNN are now widely used so it is necessary to implement them efficiently. To do so, CNN are most commonly implemented on GPU processors, and also a bit on FPGA. In this talk, without entering into the details, we will list some problems arising when implementing the CNN inferences, especially on FPGA. We will also link these problems to the CNN models themselves and we will highlight a few general recommendations extracted from the following papers.


Zhang, Xiangyu and Zou, Jianhua and He, Kaiming and Sun, Jian. 2016. Accelerating Very Deep Convolutional Networks for Classification and Detection.

Michaela Blott and Thomas Preusser and Nicholas Fraser and Giulio Gambardella and Kenneth O'Brien and Yaman Umuroglu. 2018. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks.

Aravind Vasudevan and Andrew Anderson and David Gregg. 2017. Parallel Multi Channel Convolution using General Matrix Multiplication.