Download notebook (Right click and save): https://github.com/NCAR/GeoCAT-examples/raw/main/docs/gallery-notebooks/Datashader/MPAS_Datashader_Trimesh.ipynb


MPAS with Datashader and Geoviews

To run this notebook in a local setup, the following packages need to be installed by running:

conda install -c conda-forge datashader holoviews geoviews
conda install -c ncar geocat-datafiles  # Only for reading-in datasets

Interactively plotting unstructured grid MPAS data with Datashader and Geoviews

This example demonstrates several key points:

  1. Making use of the MPAS file’s connectivity information to render data on the native grid, and also avoid costly Delaunay triangulation that is required if the MPAS connectivity information is not used.

  2. Rendering data that is sampled on both the ‘primal’ and ‘dual’ MPAS mesh.

  3. Using geoviews/holoviews for interactive plotting in a Jupyter Notebook. The plotting is interactive in the sense that you can pan and zoom the data. Doing so will reveal greater and greater data fidelity.

  4. Using Datashader to perform background rendering in place of Matplotlib. Unlike Matplotlib, Datashader was designed for performance with large data sets.

The data

The global data sets used in this example are from the same experiment, but run at several resolutions from 30km to 3.75km. Due to their size, the higher resolution data sets are only distributed with two variables in them: + relhum_200hPa: Relative humidity vertically interpolated to 200 hPa + vorticity_200hPa: Relative vorticity vertically interpolated to 200 hPa

The relhum_200hPa is computed on the MPAS ‘primal’ mesh, while the vorticity_200hPa is computed on the MPAS ‘dual’ mesh. Note that data may also be sampled on the edges of the primal mesh. This example does not include/cover edge-centered data.

These data are courtesy of NCAR’s Falko Judt, and were produced as part of the DYAMOND initiative: http://dx.doi.org/10.1186/s40645-019-0304-z.

[1]:
import cartopy.crs as ccrs
import numpy as np

import dask.dataframe as dd

import holoviews as hv
from holoviews import opts

from holoviews.operation.datashader import rasterize as hds_rasterize
import geoviews.feature as gf # only needed for coastlines

hv.extension("bokeh","matplotlib")

opts.defaults(
    opts.Image(width=1200, height=600),
    opts.RGB(width=1200, height=600))

Set up Dask

TO-DO: Dask parallelization will be investigated later…

[2]:
n_workers = 1

# from dask.distributed import LocalCluster, Client

# cluster = LocalCluster()
# cluster.scale(n_workers)
# client = Client(cluster)

Utility functions

[3]:
# This funtion splits a global mesh along longitude
#
# Examine the X coordinates of each triangle in 'tris'. Return an array of 'tris' where only those triangles
# with legs whose length is less than 't' are returned.
#
def unzipMesh(x,tris,t):
    return tris[(np.abs((x[tris[:,0]])-(x[tris[:,1]])) < t) & (np.abs((x[tris[:,0]])-(x[tris[:,2]])) < t)]
[4]:
# Compute the signed area of a triangle
#
def triArea(x,y,tris):
    return ((x[tris[:,1]]-x[tris[:,0]]) * (y[tris[:,2]]-y[tris[:,0]])) - ((x[tris[:,2]]-x[tris[:,0]]) * (y[tris[:,1]]-y[tris[:,0]]))
[5]:
# Reorder triangles as necessary so they all have counter clockwise winding order. CCW is what Datashader and MPL
# require.
#
def orderCCW(x,y,tris):
    tris[triArea(x,y,tris)<0.0,:] = tris[triArea(x,y,tris)<0.0,::-1]
    return(tris)

[6]:
# Create a Holoviews Triangle Mesh suitable for rendering with Datashader
#
# This function returns a Holoviews TriMesh that is created from a list of coordinates, 'x' and 'y',
# an array of triangle indices that addressess the coordinates in 'x' and 'y', and a data variable 'var'. The
# data variable's values will annotate the triangle vertices
#

import pandas as pd

def createHVTriMesh(x,y,triangle_indices, var,n_workers=1):
    # Declare verts array
    verts = np.column_stack([x, y, var])


    # Convert to pandas
    verts_df  = pd.DataFrame(verts,  columns=['x', 'y', 'z'])
    tris_df   = pd.DataFrame(triangle_indices, columns=['v0', 'v1', 'v2'])

    # Convert to dask
    verts_ddf = dd.from_pandas(verts_df, npartitions=n_workers)
    tris_ddf = dd.from_pandas(tris_df, npartitions=n_workers)

    # Declare HoloViews element
    tri_nodes = hv.Nodes(verts_ddf, ['x', 'y', 'index'], ['z'])
    trimesh = hv.TriMesh((tris_ddf, tri_nodes))
    return(trimesh)
[7]:
# Triangulate MPAS primary mesh:
#
# Triangulate each polygon in a heterogenous mesh of n-gons by connecting
# each internal polygon vertex to the first vertex. Uses the MPAS
# auxilliary variables verticesOnCell, and nEdgesOnCell.
#
# The function is decorated with Numba's just-in-time compiler so that it is translated into
# optimized machine code for better peformance
#

from numba import jit

@jit(nopython=True)
def triangulatePoly(verticesOnCell, nEdgesOnCell):

    # Calculate the number of triangles. nEdgesOnCell gives the number of vertices for each cell (polygon)
    # The number of triangles per polygon is the number of vertices minus 2.
    #
    nTriangles = np.sum(nEdgesOnCell - 2)

    triangles = np.ones((nTriangles, 3), dtype=np.int64)
    nCells = verticesOnCell.shape[0]
    triIndex = 0
    for j in range(nCells):
        for i in range(nEdgesOnCell[j]-2):
            triangles[triIndex][0] = verticesOnCell[j][0]
            triangles[triIndex][1] = verticesOnCell[j][i+1]
            triangles[triIndex][2] = verticesOnCell[j][i+2]
            triIndex += 1

    return triangles

Load data and coordinates

The dyamond_1 data set is available in several resolutions, ranging from 30 km to 3.75 km.

Currently, the 30-km resolution dataset used in this example is available at geocat-datafiles. However, the other resolutions of these data are stored on Glade because of their size.

[8]:
# Load data

import math as math

import geocat.datafiles as gdf  # Only for reading-in datasets

from xarray import open_mfdataset

datafiles = (gdf.get("netcdf_files/MPAS/FalkoJudt/dyamond_1/30km/diag.2016-08-20_00.00.00_subset.nc"),
                    gdf.get("netcdf_files/MPAS/FalkoJudt/dyamond_1/30km/x1.655362.grid_subset.nc") )

primalVarName = 'relhum_200hPa'
dualVarName = 'vorticity_200hPa'
central_longitude = 0.0

ds = open_mfdataset(datafiles, decode_times=False)
primalVar = ds[primalVarName].isel(Time=0).values
dualVar = ds[dualVarName].isel(Time=0).values

# Fetch lat and lon coordinates for the primal and dual mesh.
lonCell = ds['lonCell'].values * 180.0 / math.pi
latCell = ds['latCell'].values * 180.0 / math.pi
lonCell = ((lonCell - 180.0) % 360.0) - 180.0

lonVertex = ds['lonVertex'].values * 180.0 / math.pi
latVertex = ds['latVertex'].values * 180.0 / math.pi
lonVertex = ((lonVertex - 180.0) % 360.0) - 180.0

Example 1 - Using MPAS’s cell connectivity array to plot primal mesh data

In this example we use the MPAS cellsOnVertex auxilliary variable, which defines mesh connectivity for the MPAS grid. Specifically, this variable tells us the cell IDs for each cell that contains each vertex.

The benefits of this are twofold: 1. We’re using the actual mesh description from the MPAS output file; and 2. For large grid this is much faster than synthesizing the connectivity information as is done in the next example

[9]:
# Get triangle indices for each vertex in the MPAS file. Note, indexing in MPAS starts from 1, not zero :-(
#
tris = ds.cellsOnVertex.values - 1

# The MPAS connectivity array unforunately does not seem to guarantee consistent clockwise winding order, which
# is required by Datashader (and Matplotlib)
#
tris = orderCCW(lonCell,latCell,tris)

# Lastly, we need to "unzip" the mesh along a constant line of longitude so that when we project to PCS coordinates
# cells don't wrap around from east to west. The function below does the job, but it assumes that the
# central_longitude from the map projection is 0.0. I.e. it will cut the mesh where longitude
# wraps around from -180.0 to 180.0. We'll need to generalize this
#
tris = unzipMesh(lonCell,tris,90.0)


# Project verts from geographic to PCS coordinates
#
projection = ccrs.Robinson(central_longitude=central_longitude)
xPCS, yPCS, _ = projection.transform_points(ccrs.PlateCarree(), lonCell, latCell).T


trimesh = createHVTriMesh(xPCS,yPCS,tris, primalVar,n_workers=n_workers)
[10]:
# Use precompute so it caches the data internally
rasterized = hds_rasterize(trimesh, aggregator='mean', precompute=True)
rasterized.opts(tools=['hover'], colorbar=True, cmap='coolwarm') * gf.coastline(projection=projection)

[10]:

Example 2 - Synthesizing triangles from points using Delaunay triangulation

In this second example we do not use the triangle connectivity information stored in the MPAS file. Instead we use Delaunay triangulation to artifically create a triangle mesh. The benefit of this approach is that we do not need the MPAS cellsOnVertex variable if it is not available. Also, since the triangulation algorithm is run on the coordinates after they are projected to meters we do not have to worry about wraparound. The downside is that for high-resolution data Delaunay triangulation is prohibitively expensive. The highest resolution data set included in this notebook (3.75km) will not triangulate in a reasonable amount of time, if at all

[11]:
# Use Delaunay triangulation to generate the triangle connectivity. Note, it's important that the coordinate
# arrays already be in PCS coordinates (not lat-lon) for the triangulation to perform optimally
#

from matplotlib.tri import Triangulation

tris = Triangulation(xPCS,yPCS).triangles

trimesh = createHVTriMesh(xPCS,yPCS,tris, primalVar,n_workers=n_workers)

[12]:
# Use precompute so it caches the data internally
rasterized = hds_rasterize(trimesh, aggregator='mean', precompute=True)
rasterized.opts(tools=['hover'], colorbar=True, cmap='coolwarm') * gf.coastline(projection=projection)

[12]:

Example 3 - Using MPAS’s cell connectivity array to plot dual mesh data

In this example we use the MPAS verticesOnCell and nEdgesOnCell auxilliary variables, which defines mesh connectivity for the MPAS dual grid.

As with the first example using the MPAS primal grid, data on the dual grid could be plotted by first triangulating them with, for example, Delaunay triangulation. But using grid’s native connectivity information is faster.

[13]:
verticesOnCell = ds.verticesOnCell.values - 1
nEdgesOnCell = ds.nEdgesOnCell.values

# For the dual mesh the data are located on triangle centers, which correspond to cell (polygon) vertices. Here
# we decompose each cell into triangles
#
tris = triangulatePoly(verticesOnCell, nEdgesOnCell)

tris = unzipMesh(lonVertex,tris,90.0)

# Project verts from geographic to PCS coordinates
#
projection = ccrs.Robinson(central_longitude=central_longitude)
xPCS, yPCS, _ = projection.transform_points(ccrs.PlateCarree(), lonVertex, latVertex).T

trimesh = createHVTriMesh(xPCS,yPCS,tris, dualVar,n_workers=n_workers)

[14]:
rasterized = hds_rasterize(trimesh, aggregator='mean', precompute=True)
rasterized.opts(tools=['hover'], colorbar=True, cmap='coolwarm') * gf.coastline(projection=projection)


[14]: