Collaborative Research: Statistical and Computational Models and Methods for Extracting Knowledge from Massive Disparate Data for Quantifying Uncertain Hazards
Eliza Calder Principal Investigator
MetadataShow full item record
The investigators propose specific methodological advances in three areas of statistical science. Many conventional statistical methods break down for massive data sets because they do not scale well--- the amount of computational effort or memory increases as a power or even exponentially with problem size. One area is that of statistical emulation of the output of computer simulation models. Here the investigators propose to generate adaptive subdesigns automatically, carefully selecting and using only the small subset of the data that bears on each specific computational goal; and to develop what they call "parallel partial emulation" in which emulation is performed simultaneously in parallel with some model inputs kept at a range of fixed values. A second area is that of multiple scale stochastic models, exploiting infinitely-divisible distributions for some model features to permit coupled parallel analyses at a range of scales, with coarser scales requiring less computational effort and running faster to help the finer scales reach equilibrium faster. A third area is dynamic evolution models in which computational effort is focused on those aspects that change most rapidly, while other aspects are treated as slowly-varying or piecewise-constant. All methods are applied to the same important application area, the quantitative assessment of geophysical hazard for volcanic events.<br/><br/>The investigators propose to develop new mathematical, statistical, and computational methods to address the problem of making principled statistical inference on the basis of massive data sets. The new methods are developed and applied in the context of a specific important societal problem: improving methods for the quantitative assessment of risk associated with volcanic activity. In this application area the product of this research would be maps indicating which areas face specified levels of hazard (say, 1000:1, 100:1, 10:1) for specified lengths of time (say, 1 month, 1 year, 1 decade), with estimates based on geophysical evidence and validated computational models. The methods are applicable in other areas of modern empirical science--- both for making quantitative assessments of other geophysical hazards and, more broadly, other scientific endeavors with large amounts of data.