API

filtering

The main lagrangian-filtering module.

This module contains the crucial datastructure for lagrangian-filtering, LagrangeFilter. See project documentation for examples on how to construct a filtering workflow using this library.

class filtering.filtering.LagrangeFilter(name, filenames_or_dataset, variables, dimensions, sample_variables, c_grid=False, uneven_window=False, window_size=None, minimum_window=None, highpass_frequency=5e-05, advection_dt=datetime.timedelta(seconds=300), **kwargs)

The main class for a Lagrangian filtering workflow.

The workflow is set up using the input files and the filtering parameters. Filtering can be performed all at once, or on individual time levels.

Data must contain horizontal velocity components U and V to perform the Lagrangian frame transformation. Any variables that should be filtered must be specified in the sample_variables list (this includes velocity).

Note

We use the OceanParcels convention for variable names. This means that U, V, lon, lat, time and depth are canonical names for properties required for particle advection. The mapping from the actual variable name in your data files to these canonical names is contained in the variables and dimensions dictionaries. When specifying filenames or sample_variables, the canonical names must be used, however any other variables may use whatever name you would like.

Once the LagrangeFilter has been constructed, you may call it as a function to perform the filtering workflow. See filter() for documentation.

Example

A straightforward filtering workflow:

f = LagrangeFilter(
    name, filenames, variables, dimensions, sample_variables,
)
f()

Would result in a new file with the given name and an appropriate extension containing the filtered data for each of the sample_variables.

Parameters:
  • name (str) – The name of the workflow
  • filenames_or_dataset (Union[Dict[str, str], xarray.Dataset]) –

    Either a mapping from data variable names to the files containing the data, or an xarray Dataset containing the input data.

    Filenames can contain globs if the data is spread across multiple files.

  • variables (Dict[str, str]) – A mapping from canonical variable names to the variable names in your data files.
  • dimensions (Dict[str, str]) – A mapping from canonical dimension names to the dimension names in your data files.
  • sample_variables (List[str]) – A list of variable names that should be sampled into the Lagrangian frame of reference and filtered.
  • mesh (Optional[str]) – The OceanParcels mesh type, either “flat” or “spherical”. “flat” meshes are expected to have dimensions in metres, and “spherical” meshes in degrees.
  • c_grid (Optional[bool]) – Whether to interpolate velocity components on an Arakawa C grid (defaults to no).
  • indices (Optional[Dict[str, List[int]]]) – An optional dictionary specifying the indices to which a certain dimension should be restricted.
  • uneven_window (Optional[bool]) – Whether to allow different lengths for the forward and backward advection phases.
  • window_size (Optional[float]) – The nominal length of the both the forward and backward advection windows, in seconds. A longer window may better capture the low-frequency signal to be removed.
  • minimum_window (Optional[float]) – If provided, particles will be filtered if they successfully advected for at least this long in total. This can increase the yield of filtered data by salvaging particles that would otherwise be considered dead.
  • highpass_frequency (Optional[float]) – The 3dB cutoff frequency for filtering, below which spectral components will be attenuated. This should be an angular frequency, in [rad/s].
  • advection_dt (Optional[datetime.timedelta]) – The timestep to use for advection. May need to be adjusted depending on the resolution/frequency of your data.
  • **kwargs (Optional) – Additional arguments are passed to the Parcels FieldSet constructor.
advection_step(time, output_time=False)

Perform forward-backward advection at a single point in time.

This routine is responsible for creating a new ParticleSet at the given time, and performing the forward and backward advection steps in the Lagrangian transformation.

Parameters:
  • time (float) – The point in time at which to calculate filtered data.
  • output_time (Optional[bool]) – Whether to include “time” as a numpy array in the output dictionary, for doing manual analysis.

Note

If output_time is True, the output object will not be compatible with the default filtering workflow, filter_step()!

If output_dt has not been set on the filtering object, it will default to the difference between successive time steps in the first grid defined in the parcels FieldSet. This may be a concern if using data which has been sampled at different frequencies in the input data files.

Returns:
A dictionary of the advection
data, mapping variable names to a pair. The first element is the index of the sampled timestep in the data, and the second element is a lazy dask array concatenating the forward and backward advection data.
Return type:Dict[str, Tuple[int, dask.array.Array]]
create_out(clobber=False)

Create a netCDF dataset to hold filtered output.

Here we create a new netCDF4.Dataset for filtered output. For each sampled variable in the input files, a corresponding variable in created in the output file, with the dimensions of the output grid.

Parameters:clobber (Optional[bool]) – Whether to overwrite any existing output file with the same name as this experiment. Default behaviour will not clobber an existing output file.
Returns:
A tuple containing a single
dataset that will hold all filtered output and the name of the time dimension in the output file.
Return type:Tuple[netCDF4.Dataset, str]
filter(*args, **kwargs)

Run the filtering process on this experiment.

Note

Instead of f.filter(…), you can call f(…) directly.

This is main method of the filtering workflow. The timesteps to filter may either be specified manually, or determined from the window size and the timesteps within the input files. In this latter case, only timesteps that have the full window size on either side are selected.

Note

If absolute is True, the times must be the same datatype as those the input data. For dates with a calendar, this is likely numpy.datetime64 or cftime.datetime. For abstract times, this may simply be a number.

Parameters:
  • times (Optional[List[float]]) – A list of timesteps at which to run the filtering. If this is omitted, all timesteps that are fully covered by the filtering window are selected.
  • clobber (Optional[bool]) – Whether to overwrite any existing output file with the same name as this experiment. Default behaviour will not clobber an existing output file.
  • absolute (Optional[bool]) – If times is provided, this argument determines whether to interpret them as relative to the first timestep in the input dataset (False, default), or as absolute, following the actual time dimension in the dataset (True).
filter_step(advection_data)

Perform filtering of a single step of advection data.

The Lagrangian-transformed data from advection_step() is high-pass filtered in time, leaving only the signal at the origin point (i.e. the filtered forward and backward advection data is discarded).

Note

If an inertial filter object hasn’t been attached before this function is called, one will automatically be created.

Parameters:advection_data (Dict[str, Tuple[int, dask.array.Array]]) – A dictionary of particle advection data from a single timestep, returned from advection_step().
Returns:
A dictionary mapping sampled
variable names to a 1D dask array containing the filtered data at the specified time. This data is not lazy, as it has already been computed out of the temporary advection data.
Return type:Dict[str, dask.array.Array]
make_meridionally_periodic(width=None)

Mark the domain as meridionally periodic.

This will add a halo to the northern and southern edges of the domain, so that they may cross over during advection without being marked out of bounds. If a particle ends up within the halo after advection, it is reset to the valid portion of the domain.

If the domain has already been marked as meridionally periodic, nothing happens.

Due to the method of resetting particles that end up in the halo, this is incompatible with curvilinear grids.

Parameters:width (Optional[int]) – The width of the halo, defaults to 5 (per parcels). This needs to be less than half the number of points in the grid in the y direction. This may need to be adjusted for small domains, or if particles are still escaping the halo.

Note

This causes the kernel to be recompiled to add another stage which resets particles that end up in the halo to the main domain.

If the kernel has already been recompiled for zonal periodicity, it is again reset to include periodicity in both directions.

make_zonally_periodic(width=None)

Mark the domain as zonally periodic.

This will add a halo to the eastern and western edges of the domain, so that they may cross over during advection without being marked out of bounds. If a particle ends up within the halo after advection, it is reset to the valid portion of the domain.

If the domain has already been marked as zonally periodic, nothing happens.

Due to the method of resetting particles that end up in the halo, this is incompatible with curvilinear grids.

Parameters:width (Optional[int]) – The width of the halo, defaults to 5 (per parcels). This needs to be less than half the number of points in the grid in the x direction. This may need to be adjusted for small domains, or if particles are still escaping the halo.

Note

This causes the kernel to be recompiled to add another stage which resets particles that end up in the halo to the main domain.

If the kernel has already been recompiled for meridional periodicity, it is again reset to include periodicity in both directions.

particleset(time)

Create a ParticleSet initialised at the given time.

Parameters:time (float) – The origin time for forward and backward advection on this ParticleSet.
Returns:
A new ParticleSet
containing a single particle at every gridpoint, initialised at the specified time.
Return type:parcels.particlesets.particlesetsoa.ParticleSetSOA
seed_lat

The 2D grid of seed particle latitudes.

Note

This is determined by set_particle_grid() and seed_subdomain().

Returns:The seed particle latitudes.
Return type:numpy.ndarray
seed_lon

The 2D grid of seed particle longitudes.

Note

This is determined by set_particle_grid() and seed_subdomain().

Returns:The seed particle longitudes.
Return type:numpy.ndarray
seed_subdomain(min_lon=None, max_lon=None, min_lat=None, max_lat=None, skip=None)

Restrict particle seeding to a subdomain.

This uses the full set of available data for advection, but restricts the particle seeding, and therefore data filtering, to specified latitude/longitude.

Points in the output dataset that fall outside this seeding range will not be written, and will thus have a missing value.

Parameters:
  • min_lon (Optional[float]) – The lower bound on longitude for which to seed particles. If not specifed, seed from the western edge of the domain.
  • max_lon (Optional[float]) – The upper bound on longitude for which to seed particles. If not specifed, seed from the easter edge of the domain.
  • min_lat (Optional[float]) – The lower bound on latitude for which to seed particles. If not specifed, seed from the southern edge of the domain.
  • max_lat (Optional[float]) – The upper bound on latitude for which to seed particles. If not specifed, seed from the northern edge of the domain.
  • skip (Optional[int]) – The number of gridpoints to skip from the edge of the domain.
set_output_compression(complevel=None)

Enable compression on variables in the output NetCDF file.

This enables zlib compression on the output file, which can significantly improve filesize at a small expense to computation time.

Parameters:complevel (Optional[int]) – If specified as a value from 1-9, this overrides the default compression level (4 for the netCDF4 library).
set_particle_grid(field)

Set the grid for the sampling particles by choosing a field.

By default, particles are seeded on the gridpoints of the first field in the Parcels FieldSet (usually U velocity). To use another grid, such as a tracer grid, pass the relevant field name to this function. This field name should be in Parcels-space, i.e. the keys in the variables dictionary.

Note

Because we get the particle grid from the Parcels gridset, and changing halos alters the underlying grids, this needs to be called before make_zonally_periodic() or make_meridionally_periodic().

Parameters:field (str) – The name of the field whose grid to use for particles.
filtering.filtering.ParticleFactory(variables, name='SamplingParticle', BaseClass=<class 'parcels.particle.JITParticle'>)

Create a Particle class that samples the specified variables.

The variables that should be sampled will be prepended by var_ as class attributes, in case there are any namespace clashes with existing variables on the base class.

Parameters:
  • variables (List[str]) – A list of variable names which should be sampled.
  • name (str) – The name of the generated particle class.
  • BaseClass (Type[parcels.particle._Particle]) – The base particles class upon which to append the required variables.
Returns:

The new particle class

Return type:

Type[parcels.particle._Particle]

filter

Inertial filter objects.

This definition allows for the definition of inertial filters. These may be as simple as constant frequency, or may vary depending on latitude or even arbitrary conditions like vorticity.

class filtering.filter.Filter(frequency, fs, **kwargs)

The base class for inertial filters.

This holds the filter state, and provides an interface for applying the filter to advected particle data.

Parameters:
  • frequency (Union[float, Tuple[float, float]]) – The low-pass or high-pass cutoff frequency of the filter in [/s], or a pair denoting the band to pass.
  • fs (float) – The sampling frequency of the data over which the filter is applied in [s].
  • **kwargs (Optional) – Additional arguments are passed to the create_filter() method.
apply_filter(data, time_index, min_window=None)

Apply the filter to an array of data.

Parameters:
  • data (dask.array.Array) – An array of (time x particle) of advected particle data. This can be a dask array of lazily-loaded temporary data.
  • time_index (int) – The index along the time dimension corresponding to the central point, to extract after filtering.
  • min_window (Optional[int]) – A minimum window size for considering particles valid for filtering.
Returns:

An array of (particle) of the filtered particle data, restricted to the specified time index.

Return type:

dask.array.Array

static create_filter(frequency, fs, order=4, filter_type='highpass')

Create a filter.

This creates an analogue Butterworth filter with the given frequency and sampling parameters.

Parameters:
  • frequency (float) – The high-pass angular cutoff frequency of the filter in [/s].
  • fs (float) – The sampling frequency of the data in [s].
  • order (Optional[int]) – The filter order, default 4.
  • filter_type (Optional[str]) – The type of filter, one of (“highpass”, “bandpass”, “lowpass”), defaults to “highpass”.
static pad_window(x, centre_index, min_window)

Perform minimum window padding of an array.

Note

This performs in-place modification of x.

Parameters:
  • x (numpy.ndarray) – An array of (time x particle) of particle dat.a
  • centre_index (int) – The index of the seeding time of the particles, to identify the forward and backward advection data.
  • min_window (int) – The minimum window size; particles with at least this many non-NaN datapoints are padded with the last valid value in each direction.
class filtering.filter.FrequencySpaceFilter(frequency, fs)

A filter defined and applied in frequency space.

This may be used, for example, to implement a sharp cutoff filter, without the possible imprecision of representing the cutoff as a time-domain sinc function.

Parameters:
  • frequency (float) – The high-pass cutoff frequency of the filter in [/s].
  • fs (float) – The sampling frequency of the daat over which the filter is applied in [s].
apply_filter(data, time_index, min_window=None)

Apply the filter to an array of data.

class filtering.filter.SpatialFilter(frequencies, fs, **kwargs)

A filter with a programmable, variable frequency.

Example

A Coriolis parameter-dependent filter:

Ω = 7.2921e-5
f = 2*Ω*np.sin(np.deg2rad(ff.seed_lat))
filt = SpatialFilter(f, 1.0 / ff.output_dt)
Parameters:
  • frequencies (numpy.ndarray) – An array with the same number of elements as seeded particles, containing the cutoff frequency to be used for each particle, in [/s].
  • fs (float) – The sampling frequency of the data over which the filter is applied in [s].
  • **kwargs (Optional) – Additional arguments are passed to the create_filter() method.
apply_filter(data, time_index, min_window=None)

Apply the filter to an array of data.

static create_filter(frequencies, fs, order=4, filter_type='highpass')

Create a series of filters.

This creates an analogue Butterworth filter with the given array of frequencies and sampling parameters.

Parameters:
  • frequencies (numpy.ndarray) – The high-pass cutoff frequencies of the filters in [/s].
  • fs (float) – The sampling frequency of the data in [s].
  • order (Optional[int]) – The filter order, default 4.
  • filter_type (Optional[str]) – The type of filter, one of (“highpass”, “lowpass”), defaults to “highpass”. Note that bandpass spatial filters aren’t supported.

file

ParticleFile implementations for receiving particle advection data from OceanParcels.

Parcels defines the ParticleFile class, which has a write method to write a ParticleSet to disk. The frequency at which this is called is determined by the outputdt property.

class filtering.file.LagrangeParticleArray(particleset, outputdt=inf, variables=None)

A ParticleFile for in-memory caching of advected data.

For smaller spatial extents, or sufficient memory, it is easier to work with in-memory arrays to cache advection data.

Important

This requires a bit more management than LagrangeParticleFile: after forward advection, reverse the data with reverse_data() then skip the first output of backward advection (the sampling of initial particle positions) with set_skip().

Parameters:
  • particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – The particle set for which advection data will be cached. We use this to get the names and types of sampled variables.
  • outputdt (Optional[float]) – The frequency at which advection data should be saved. If not specified, or infinite, the data will be saved at the first timestep only.
  • variables (Optional[List[parcels.particle.Variable]]) – An explicit subset of variables to output. If not specified, all variables belonging to the particleset’s particletype that are to_write are written.
reverse_data()

Reverse all cached advection data.

This can be used before and after a backward advection to make sure the data is correctly ordered.

set_skip(n)

Skip a number of subsequent output steps.

This is particularly useful to ignore the first advection output, which is the values of particles before the kernel is called, and often contains junk unless an explicit zero-time sampling kernel is used.

Parameters:n (int) – The number of output steps to skip.
write(particleset, time, deleted_only=False)

Write data from a particle set.

This is intended to be called from a particle execution kernel. The frequency of writes is determined by the outputdt attribute on the class.

Particles which have been deleted (due to becoming out of bounds, for example) are masked with NaN.

Parameters:
  • particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – Particle set with data to write to disk.
  • time (float) – Timestamp into particle execution at which this write was called.
  • deleted_only (Optional[bool]) – Whether to only write deleted particles (does not do anything, only present for compatibility with the call signature on the parcels version of the class).
class filtering.file.LagrangeParticleFile(particleset, outputdt=inf, variables=None, output_dir='.')

A ParticleFile for on-disk caching in a temporary HDF5 file.

A temporary HDF5 file is used to store advection data. Data is stored in 2D, with all particles contiguous as they appear in their particle set. The time dimension is extendable, and is appended for each write operation. This means that the number of writes does not need to be known ahead-of-time.

Important

Before calling the particle kernel, a group must be created in the file by calling set_group().

Example

Create an instance of the class, and run forward advection on a ParticleSet:

f = LagrangeParticleFile(ps, output_dt)
f.set_group("forward")
ps.execute(kernel, dt=advection_dt, output_file=f)

Note

Advection data is stored to a NamedTemporaryFile that is scoped with the same lifetime as this ParticleFile. This should ensure that upon successful completion, the temporary files are cleaned up, yet they will remain if an error occurs that causes an exception.

Parameters:
  • particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – The particle set for which this file should cache advection data on disk. It’s assumed the number of particles contained within the set does not change after initialisation.
  • outputdt (Optional[float]) – The frequency at which advection data should be saved. If not specified, or infinite, the data will be saved at the first timestep only.
  • variables (Optional[List[parcels.particle.Variable]]) – An explicit list subset of particletype variables to output. If not specified, all variables belonging to the particletype that are to_write are written.
  • output_dir (Optional[str]) – The directory in which to place the temporary output file.
data(group)

Return a group from the HDF5 object.

Each variable saved from particle advection is available as a Dataset within the group, as well the time attribute.

Parameters:group (str) – The name of the group to retrieve.
Returns:
The group from the underlying HDF5 file.
If the group hasn’t been initialised with set_group(), it will be empty.
Return type:h5py.Group
set_group(group)

Set the group for subsequent write operations.

This will create the group, and datasets for all variables that will be written by this object, if they do not already exist. Otherwise, the group will simply be selected without overwriting existing data.

Parameters:group (str) – The name of the group.
write(particleset, time, deleted_only=False)

Write data from a particle set.

This is intended to be called from a particle execution kernel. The frequency of writes is determined by the outputdt attribute on the class.

Particles which have been deleted (due to becoming out of bounds, for example) are masked with NaN.

Parameters:
  • particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – Particle set with data to write to disk.
  • time (float) – Timestamp into particle execution at which this write was called.
  • deleted_only (Optional[bool]) – Whether to only write deleted particles (does not do anything, only present for compatibility with the call signature on the parcels version of the class).

analysis

Analysis routines for Lagrangian particle data.

These routines take an already-configured ‘LagrangeFilter’ and produce diagnostic output.

filtering.analysis.power_spectrum(filter, time)

Compute the mean power spectrum over all particles at a given time.

This routine gives the power spectrum (power spectral density) for each of the sampled variables within filter, as a mean over all particles. It will run a single advection step at the specified time. The resulting dictionary contains a freq item, with the FFT frequency bins for the output spectra.

Parameters:
  • filter (filtering.LagrangeFilter) – The pre-configured filter object to use for running the analysis.
  • time (float) – The time at which to perform the analysis.
Returns:

A dictionary of power spectra for each of

the sampled variables on the filter.

Return type:

Dict[str, numpy.ndarray]