banner

Abstracts for 2016 Challis Lectures
by Michael Jordan
of the University of California, Berkeley

Co-sponsored by the Department of Statistics and
the Informatics Institute of the University of Florida

 

General Lecture (3:30 p.m., March 29, 2017) Live Stream

On Computational Thinking, Inferential Thinking and Data Science

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level---in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and methods for trading off the speed and accuracy of inference.

Technical Lecture (2:30 p.m., March 30, 2017)

Communication-Avoiding Statistical Inference

Modern data analysis increasingly takes place on distributed computing platforms. In the distributed setting, procedures that minimize communication among processors can be orders-of-magnitude faster than naive procedures. This fact has revolutionized numerical linear algebra, but it has yet to have significant impact on statistics. I discuss communication-avoiding approaches to statistical inference, including a novel form of the bootstrap, a primal-dual approach to M-estimation, a surrogate likelihood framework and distributed forms of false discovery rate control.