There is immense hype, and immense promise, in machine learning for physics and astronomy. I use the case of stellar astrophysics as an example area in which to explore these ideas. It is an ideal field, because there are both very large data sets and incredibly detailed and successful physical models. And yet these models are nonetheless strongly challenged (or even ruled out) by the data. That is, the data are better than the models; how can we benefit from this in ways that deliver new insights about fundamental physics? One of the main themes is that we want to pick and choose the parts of machine learning we do and don't want to be using, because our objectives are very different from those of Amazon and Facebook. I'll put a lot of emphasis on generalizability and causal structure. (Oh and by the way, data-driven models currently produce more precise measurements of stellar properties and compositions than any physical models.)
David W. Hogg's work has ranged from fundamental cosmological measurements to stellar dynamics to extra-solar planet searches. His work includes a significant engineering component, in areas of instrument calibration, automated data analysis, and statistical inference. He was a founding member of NYU's Center for Data Science; he spends a part of each year at the Max Planck Institute for Astronomy in Heidelberg; and he is a group leader (Astronomical Data Group) at the Flatiron Institute of the Simons Foundation.