2016
 Discretization, Solution, and Inversion for Large Systems of PDEs

Computer Science Department, University of Colorado Boulder, March 3, 2016
 We are often forced to make important decisions with imperfect and incomplete
data. In modelbased inference, our efforts to extract useful information
from data are aided by models of what occurs where we have no observations:
examples range from climate prediction to patientspecific medicine. In many
cases, these models can take the form of systems of PDEs with
criticalyetunknown parameter fields, such as initial conditions or material
coefficients of heterogeneous media.
A concrete example that I will present is to make predictions about the
Antarctic ice sheet from satellite observations, when we model the ice sheet
using a system of nonlinear Stokes equations with a Robintype boundary
condition, governed by a critical, spatially varying coefficient. This talk
will present three aspects of the computational stack used to efficiently
estimate statistics for this kind of inference problem.
At the top is an posteriordistribution approximation for Bayesian inference,
that combines Laplace's method with randomized calculations to compute an
optimal lowrank representation.
Below that, the performance of this approach to inference is highly dependent
on the efficient and scalable solution of the underlying model equation, and
its first and second adjoint equations. A highlevel description of a
problem (in this case, a nonlinear Stokes boundary value problem) may suggest
an approach to designing an optimal solver, but this is just the jumpingoff
point: differences in geometry, boundary conditions, and other considerations
will significantly affect performance. I will discuss how the peculiarities
of the ice sheet dynamics problem lead to the development of an anisotropic
multigrid method (available as a plugin to the PETSc library for scientific
computing) that improves on standard approaches.
At the bottom, to increase the accuracy per degree of freedom of discretized
PDEs, I develop adaptive mesh refinement (AMR) techniques for largescale
problems. I will present my algorithmic contributions to the p4est library
for parallel AMR that enable it to scale to concurrencies of O(10^6), as well
as recent work commoditizing AMR techniques in PETSc.
2015
 Uniting Performance and Extensibility in Adaptive Finite Element Computations

CAAM Colloqium, Rice University, Houston, Texas, September 14, 2015

This talk will begin with a presentation of recent work that applies
p4est—a library for parallel adaptive mesh refinement—to large scale
computations, and ice sheet dynamics with highly anisotropic meshs and
global mantle convection simulations with billions of degrees of freedom.
The former is an example where extension of the datatypes and algorithms in
a library can help it to address unanticipated problems; the latter is an
example of how extensibility is crucial for the same library that performs
well on one process to peform well on a million. Time permitting, this will
lead into a discussion of work to expand the reach of adaptive methods in
PETSc, including work to make DMPlex—already a flexible interface for
unstructured meshes—even for flexible by adding support for the
nonconformal meshes that arise in AMR applications.
 Multilevel Methods for Forward and Inverse Ice Sheet Modeling

SIAM Conference on Computational Science and Engineering, Salt Lake City, Utah, March 14, 2015

We present our work on recovering ice sheet modeling parameter fields
(such as basal slipperiness) from observations, using both
deterministic and Bayesian inversion. The scalability of our adjoint
and Hessianbased methods is determined by the scalability of two
subproblems: the solution of the state PDEs (Stokes equations of ice
sheet dynamics), and the approximation and preconditioning of the
parametertoobservation Hessian. For the former problem, we compare
the effectiveness of geometric and algebraic multigrid within the
solution of the state PDEs; for the latter, we discuss the use of
multilevel approximations to improve on Hessian approximation by
lowrank updates. The scalability of our work is tested on fullscale
models of the Antarctic ice sheet.
2014
 Statistical Inversion for Basal Parameters for the Antarctic Ice Sheet

3rd SIAM Conference on Uncertainty Quantification, Savannah, Georgia, April 3, 2014

We formulate a Bayesian inference problem for the friction field at the
base of the Antarctic ice sheet from distributions for the observed
surface velocities and for the prior knowledge of the basal friction.
The dimension of the parameter space is large, and the map from
parameters to observations requires the solution of a system of
implicit nonlinear 3D PDEs. We approximate the posterior distribution
with a Gaussian centered at the maximum a posteriori point, with
covariance given by the inverse Hessian of the log posterior. By using
a lowrank approximation of the log likelihood, we are able to scale up
to the problem size of interest.
 Hybrid Quadtree/Octree AMR for Anisotropic Domains

16th SIAM Conference on Parallel Processing for Scientific Computing, Portland, Oregon, February 19, 2014

Many problems in geosciences, such as ice sheets, oceans, and the
atmosphere, involve thin, anisotropic domains. Researchers in these
areas are using 3D models where once they used 2D approximations. The
p4est library for parallel adaptive mesh refinement (AMR), which uses a
forestofoctrees approach to AMR, only natively supports isotropic
mesh refinement, which is often insufficient for such problems. We
present an extension to this library with two refinement modes for
these anisotropic problems.
<2014 (Select)
 LowCost Parallel Algorithms for 2:1 Octree Balance

26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, May 22, 2012

The logical structure of a forest of octrees can be used to create
scalable algorithms for parallel adaptive mesh refinement (AMR), which
has recently been demonstrated for several petascale applications.
Among various frequently used octree based mesh operations, including
refinement, coarsening, partitioning, and enumerating nodes, ensuring a
2:1 size balance between neighboring elements has historically been the
most expensive in terms of CPU time and communication volume. The 2:1
balance operation is thus a primary target to optimize. One important
component of a parallel balance algorithm is the ability to determine
whether any two given octants have a consistent distance/size relation.
Based on new logical concepts we propose fast algorithms for making
this decision for all types of 2:1 balance conditions in 2D and 3D.
Since we are able to achieve this without constructing any parent nodes
in the tree that would otherwise need to be sorted and communicated, we
can significantly reduce the required memory and communication volume.
In addition, we propose a lightweight col lective algorithm for
reversing the asymmetric communication pattern induced by nonlocal
octant interactions. We have implemented our improvements as part of
the open source “p4est” software. Benchmarking this code with both
synthetic and simulationdriven adapted meshes we are able to
demonstrate much reduced runtime and excellent weak and strong
scalability. On our largest benchmark problem with 5.13 x
10^{11} octants the new 2:1 balance algorithm executes in less
than 8 seconds on 112,128 CPU cores of the Jaguar Cray XT5
supercomputer.