Finite Width, Large Depth Neural Networks as Perturbatively Solvable Models

Wed, Mar 10, 2021, 12:00 pm
One World Seminar Series on the Mathematics of Machine Learning

Abstract: Deep neural networks are often considered to be complicated "black boxes," for which a systematic analysis is not only out of reach but potentially impossible. In this talk, which is based on ongoing joint work with Dan Roberts and Sho Yaida, I will make the opposite claim. Namely, that deep neural networks at initialization are perturbatively solvable models. The perturbative parameter is the width n of the network and we can obtain corrections to all orders in n. Our approach applies to networks at finite width n and large depth L. A key point is an emergent tension between depth and width. Large values of n make neural networks more like Gaussian processes, which are well behaved but incapable of feature learning due to a frozen NTK (at least with standard initialization schemes). Large values of L, in contrast, amplify higher cumulants and change in the NTK, both of which scale with the network aspect ratio L/n.