Extra AI Seminar: The Information Bottleneck Theory of Deep Learning: Towards Interpretable Deep Neural Networks

Speaker

Dr. Naftali Tishby is a professor of Computer Science, and the incumbent of the Ruth and Stan Flinkman Chair for Brain Research at the Edmond and Lily Safra Center for Brain Science (ELSC) at the Hebrew University of Jerusalem.

Abstract

In the past several years we have developed a comprehensive theory of large scale learning with Deep Neural Networks (DNN), when optimized with Stochastic Gradient Decent (SGD). The theory is built on three theoretical components: (1) rethinking the standard (PAC like) distribution independent worse case generalisation bounds - turning them to problem dependent typical (in the Information Theory sense) bounds that are independent of the model architecture. 

(2) The Information Plane theorem: For large scale typical learning the sample-complexity and accuracy trade-off is characterized by only two numbers: the mutual information that the representation (a layer in the network) maintain on the input patterns, and the mutual information each layer has on the desired output label. The Information Theoretic optimal trade-off between these encoder and decoder information values is given by the Information Bottleneck (IB) bound for the rule specific input-output distribution.  (3) The layers of the DNN reach this optimal bound via standard SGD training, in high (input & layers) dimension.

In this (2 hour) talk I will review these results and discuss two new outcomes of this theory: (1) The computational benefit of the hidden layers, (2) the emerging understanding of the features encoded by each layers which follows from the convergence to the IB bound.

Based on joint works with Noga Zaslavsky, Ravid Ziv, and Amichai Painsky.

Bio

Dr. Naftali Tishby is one of the leaders of machine learning research and computational neuroscience in Israel and his numerous ex-students serve at key academic and industrial research positions all over the world. Prof. Tishby was the founding chair of the new computer-engineering program, and a director of the Leibnitz research center in computer science, at the Hebrew university. Tishby received his PhD in theoretical physics from the Hebrew university in 1985 and was a research staff member at MIT and Bell Labs from 1985 to 1991. Prof. Tishby was also a visiting professor at Princeton NECI, University of Pennsylvania, UCSB, and IBM research.

His current research is at the interface between computer science, statistical physics, and computational neuroscience. He pioneered various applications of statistical physics and information theory in computational learning theory. More recently, he has been working on the foundations of biological information processing and deep learning and the connections between dynamics and information. He has introduced with his colleagues new theoretical frameworks for optimal adaptation and efficient information representation in biology, such as the Information Bottleneck method and the Minimum Information principle for neural coding. This year Prof. Tishby has received the prestigious IBT award in Mathematical Neuroscience.