API¶
filtering¶
The main lagrangian-filtering module.
This module contains the crucial datastructure for lagrangian-filtering, LagrangeFilter. See project documentation for examples on how to construct a filtering workflow using this library.
-
class
filtering.filtering.
LagrangeFilter
(name, filenames_or_dataset, variables, dimensions, sample_variables, c_grid=False, uneven_window=False, window_size=None, minimum_window=None, highpass_frequency=5e-05, advection_dt=datetime.timedelta(seconds=300), **kwargs)¶ The main class for a Lagrangian filtering workflow.
The workflow is set up using the input files and the filtering parameters. Filtering can be performed all at once, or on individual time levels.
Data must contain horizontal velocity components U and V to perform the Lagrangian frame transformation. Any variables that should be filtered must be specified in the sample_variables list (this includes velocity).
Note
We use the OceanParcels convention for variable names. This means that
U
,V
,lon
,lat
,time
anddepth
are canonical names for properties required for particle advection. The mapping from the actual variable name in your data files to these canonical names is contained in the variables and dimensions dictionaries. When specifying filenames or sample_variables, the canonical names must be used, however any other variables may use whatever name you would like.Once the LagrangeFilter has been constructed, you may call it as a function to perform the filtering workflow. See
filter()
for documentation.Example
A straightforward filtering workflow:
f = LagrangeFilter( name, filenames, variables, dimensions, sample_variables, ) f()
Would result in a new file with the given name and an appropriate extension containing the filtered data for each of the sample_variables.
Parameters: - name (str) – The name of the workflow
- filenames_or_dataset (Union[Dict[str, str], xarray.Dataset]) –
Either a mapping from data variable names to the files containing the data, or an xarray Dataset containing the input data.
Filenames can contain globs if the data is spread across multiple files.
- variables (Dict[str, str]) – A mapping from canonical variable names to the variable names in your data files.
- dimensions (Dict[str, str]) – A mapping from canonical dimension names to the dimension names in your data files.
- sample_variables (List[str]) – A list of variable names that should be sampled into the Lagrangian frame of reference and filtered.
- mesh (Optional[str]) – The OceanParcels mesh type, either “flat” or “spherical”. “flat” meshes are expected to have dimensions in metres, and “spherical” meshes in degrees.
- c_grid (Optional[bool]) – Whether to interpolate velocity components on an Arakawa C grid (defaults to no).
- indices (Optional[Dict[str, List[int]]]) – An optional dictionary specifying the indices to which a certain dimension should be restricted.
- uneven_window (Optional[bool]) – Whether to allow different lengths for the forward and backward advection phases.
- window_size (Optional[float]) – The nominal length of the both the forward and backward advection windows, in seconds. A longer window may better capture the low-frequency signal to be removed.
- minimum_window (Optional[float]) – If provided, particles will be filtered if they successfully advected for at least this long in total. This can increase the yield of filtered data by salvaging particles that would otherwise be considered dead.
- highpass_frequency (Optional[float]) – The 3dB cutoff frequency for filtering, below which spectral components will be attenuated. This should be an angular frequency, in [rad/s].
- advection_dt (Optional[datetime.timedelta]) – The timestep to use for advection. May need to be adjusted depending on the resolution/frequency of your data.
- **kwargs (Optional) – Additional arguments are passed to the Parcels FieldSet constructor.
-
advection_step
(time, output_time=False)¶ Perform forward-backward advection at a single point in time.
This routine is responsible for creating a new ParticleSet at the given time, and performing the forward and backward advection steps in the Lagrangian transformation.
Parameters: Note
If
output_time
is True, the output object will not be compatible with the default filtering workflow,filter_step()
!If
output_dt
has not been set on the filtering object, it will default to the difference between successive time steps in the first grid defined in the parcels FieldSet. This may be a concern if using data which has been sampled at different frequencies in the input data files.Returns: - A dictionary of the advection
- data, mapping variable names to a pair. The first element is the index of the sampled timestep in the data, and the second element is a lazy dask array concatenating the forward and backward advection data.
Return type: Dict[str, Tuple[int, dask.array.Array]]
-
create_out
(clobber=False)¶ Create a netCDF dataset to hold filtered output.
Here we create a new
netCDF4.Dataset
for filtered output. For each sampled variable in the input files, a corresponding variable in created in the output file, with the dimensions of the output grid.Parameters: clobber (Optional[bool]) – Whether to overwrite any existing output file with the same name as this experiment. Default behaviour will not clobber an existing output file. Returns: - A tuple containing a single
- dataset that will hold all filtered output and the name of the time dimension in the output file.
Return type: Tuple[ netCDF4.Dataset
, str]
-
filter
(*args, **kwargs)¶ Run the filtering process on this experiment.
Note
Instead of f.filter(…), you can call f(…) directly.
This is main method of the filtering workflow. The timesteps to filter may either be specified manually, or determined from the window size and the timesteps within the input files. In this latter case, only timesteps that have the full window size on either side are selected.
Note
If absolute is True, the times must be the same datatype as those the input data. For dates with a calendar, this is likely
numpy.datetime64
orcftime.datetime
. For abstract times, this may simply be a number.Parameters: - times (Optional[List[float]]) – A list of timesteps at which to run the filtering. If this is omitted, all timesteps that are fully covered by the filtering window are selected.
- clobber (Optional[bool]) – Whether to overwrite any existing output file with the same name as this experiment. Default behaviour will not clobber an existing output file.
- absolute (Optional[bool]) – If times is provided, this argument determines whether to interpret them as relative to the first timestep in the input dataset (False, default), or as absolute, following the actual time dimension in the dataset (True).
-
filter_step
(advection_data)¶ Perform filtering of a single step of advection data.
The Lagrangian-transformed data from
advection_step()
is high-pass filtered in time, leaving only the signal at the origin point (i.e. the filtered forward and backward advection data is discarded).Note
If an inertial filter object hasn’t been attached before this function is called, one will automatically be created.
Parameters: advection_data (Dict[str, Tuple[int, dask.array.Array]]) – A dictionary of particle advection data from a single timestep, returned from advection_step()
.Returns: - A dictionary mapping sampled
- variable names to a 1D dask array containing the filtered data at the specified time. This data is not lazy, as it has already been computed out of the temporary advection data.
Return type: Dict[str, dask.array.Array]
-
make_meridionally_periodic
(width=None)¶ Mark the domain as meridionally periodic.
This will add a halo to the northern and southern edges of the domain, so that they may cross over during advection without being marked out of bounds. If a particle ends up within the halo after advection, it is reset to the valid portion of the domain.
If the domain has already been marked as meridionally periodic, nothing happens.
Due to the method of resetting particles that end up in the halo, this is incompatible with curvilinear grids.
Parameters: width (Optional[int]) – The width of the halo, defaults to 5 (per parcels). This needs to be less than half the number of points in the grid in the y direction. This may need to be adjusted for small domains, or if particles are still escaping the halo. Note
This causes the kernel to be recompiled to add another stage which resets particles that end up in the halo to the main domain.
If the kernel has already been recompiled for zonal periodicity, it is again reset to include periodicity in both directions.
-
make_zonally_periodic
(width=None)¶ Mark the domain as zonally periodic.
This will add a halo to the eastern and western edges of the domain, so that they may cross over during advection without being marked out of bounds. If a particle ends up within the halo after advection, it is reset to the valid portion of the domain.
If the domain has already been marked as zonally periodic, nothing happens.
Due to the method of resetting particles that end up in the halo, this is incompatible with curvilinear grids.
Parameters: width (Optional[int]) – The width of the halo, defaults to 5 (per parcels). This needs to be less than half the number of points in the grid in the x direction. This may need to be adjusted for small domains, or if particles are still escaping the halo. Note
This causes the kernel to be recompiled to add another stage which resets particles that end up in the halo to the main domain.
If the kernel has already been recompiled for meridional periodicity, it is again reset to include periodicity in both directions.
-
particleset
(time)¶ Create a ParticleSet initialised at the given time.
Parameters: time (float) – The origin time for forward and backward advection on this ParticleSet. Returns: - A new ParticleSet
- containing a single particle at every gridpoint, initialised at the specified time.
Return type: parcels.particlesets.particlesetsoa.ParticleSetSOA
-
seed_lat
¶ The 2D grid of seed particle latitudes.
Note
This is determined by
set_particle_grid()
andseed_subdomain()
.Returns: The seed particle latitudes. Return type: numpy.ndarray
-
seed_lon
¶ The 2D grid of seed particle longitudes.
Note
This is determined by
set_particle_grid()
andseed_subdomain()
.Returns: The seed particle longitudes. Return type: numpy.ndarray
-
seed_subdomain
(min_lon=None, max_lon=None, min_lat=None, max_lat=None, skip=None)¶ Restrict particle seeding to a subdomain.
This uses the full set of available data for advection, but restricts the particle seeding, and therefore data filtering, to specified latitude/longitude.
Points in the output dataset that fall outside this seeding range will not be written, and will thus have a missing value.
Parameters: - min_lon (Optional[float]) – The lower bound on longitude for which to seed particles. If not specifed, seed from the western edge of the domain.
- max_lon (Optional[float]) – The upper bound on longitude for which to seed particles. If not specifed, seed from the easter edge of the domain.
- min_lat (Optional[float]) – The lower bound on latitude for which to seed particles. If not specifed, seed from the southern edge of the domain.
- max_lat (Optional[float]) – The upper bound on latitude for which to seed particles. If not specifed, seed from the northern edge of the domain.
- skip (Optional[int]) – The number of gridpoints to skip from the edge of the domain.
-
set_output_compression
(complevel=None)¶ Enable compression on variables in the output NetCDF file.
This enables zlib compression on the output file, which can significantly improve filesize at a small expense to computation time.
Parameters: complevel (Optional[int]) – If specified as a value from 1-9, this overrides the default compression level (4 for the netCDF4 library).
-
set_particle_grid
(field)¶ Set the grid for the sampling particles by choosing a field.
By default, particles are seeded on the gridpoints of the first field in the Parcels FieldSet (usually U velocity). To use another grid, such as a tracer grid, pass the relevant field name to this function. This field name should be in Parcels-space, i.e. the keys in the
variables
dictionary.Note
Because we get the particle grid from the Parcels gridset, and changing halos alters the underlying grids, this needs to be called before
make_zonally_periodic()
ormake_meridionally_periodic()
.Parameters: field (str) – The name of the field whose grid to use for particles.
-
filtering.filtering.
ParticleFactory
(variables, name='SamplingParticle', BaseClass=<class 'parcels.particle.JITParticle'>)¶ Create a Particle class that samples the specified variables.
The variables that should be sampled will be prepended by
var_
as class attributes, in case there are any namespace clashes with existing variables on the base class.Parameters: Returns: The new particle class
Return type: Type[
parcels.particle._Particle
]
filter¶
Inertial filter objects.
This definition allows for the definition of inertial filters. These may be as simple as constant frequency, or may vary depending on latitude or even arbitrary conditions like vorticity.
-
class
filtering.filter.
Filter
(frequency, fs, **kwargs)¶ The base class for inertial filters.
This holds the filter state, and provides an interface for applying the filter to advected particle data.
Parameters: - frequency (Union[float, Tuple[float, float]]) – The low-pass or high-pass cutoff frequency of the filter in [/s], or a pair denoting the band to pass.
- fs (float) – The sampling frequency of the data over which the filter is applied in [s].
- **kwargs (Optional) – Additional arguments are passed to the
create_filter()
method.
-
apply_filter
(data, time_index, min_window=None)¶ Apply the filter to an array of data.
Parameters: - data (dask.array.Array) – An array of (time x particle) of advected particle data. This can be a dask array of lazily-loaded temporary data.
- time_index (int) – The index along the time dimension corresponding to the central point, to extract after filtering.
- min_window (Optional[int]) – A minimum window size for considering particles valid for filtering.
Returns: An array of (particle) of the filtered particle data, restricted to the specified time index.
Return type:
-
static
create_filter
(frequency, fs, order=4, filter_type='highpass')¶ Create a filter.
This creates an analogue Butterworth filter with the given frequency and sampling parameters.
Parameters: - frequency (float) – The high-pass angular cutoff frequency of the filter in [/s].
- fs (float) – The sampling frequency of the data in [s].
- order (Optional[int]) – The filter order, default 4.
- filter_type (Optional[str]) – The type of filter, one of (“highpass”, “bandpass”, “lowpass”), defaults to “highpass”.
-
static
pad_window
(x, centre_index, min_window)¶ Perform minimum window padding of an array.
Note
This performs in-place modification of
x
.Parameters: - x (numpy.ndarray) – An array of (time x particle) of particle dat.a
- centre_index (int) – The index of the seeding time of the particles, to identify the forward and backward advection data.
- min_window (int) – The minimum window size; particles with at least this many non-NaN datapoints are padded with the last valid value in each direction.
-
class
filtering.filter.
FrequencySpaceFilter
(frequency, fs)¶ A filter defined and applied in frequency space.
This may be used, for example, to implement a sharp cutoff filter, without the possible imprecision of representing the cutoff as a time-domain sinc function.
Parameters: -
apply_filter
(data, time_index, min_window=None)¶ Apply the filter to an array of data.
-
-
class
filtering.filter.
SpatialFilter
(frequencies, fs, **kwargs)¶ A filter with a programmable, variable frequency.
Example
A Coriolis parameter-dependent filter:
Ω = 7.2921e-5 f = 2*Ω*np.sin(np.deg2rad(ff.seed_lat)) filt = SpatialFilter(f, 1.0 / ff.output_dt)
Parameters: - frequencies (numpy.ndarray) – An array with the same number of elements as seeded particles, containing the cutoff frequency to be used for each particle, in [/s].
- fs (float) – The sampling frequency of the data over which the filter is applied in [s].
- **kwargs (Optional) – Additional arguments are passed to the
create_filter()
method.
-
apply_filter
(data, time_index, min_window=None)¶ Apply the filter to an array of data.
-
static
create_filter
(frequencies, fs, order=4, filter_type='highpass')¶ Create a series of filters.
This creates an analogue Butterworth filter with the given array of frequencies and sampling parameters.
Parameters: - frequencies (numpy.ndarray) – The high-pass cutoff frequencies of the filters in [/s].
- fs (float) – The sampling frequency of the data in [s].
- order (Optional[int]) – The filter order, default 4.
- filter_type (Optional[str]) – The type of filter, one of (“highpass”, “lowpass”), defaults to “highpass”. Note that bandpass spatial filters aren’t supported.
file¶
ParticleFile implementations for receiving particle advection data from OceanParcels.
Parcels defines the ParticleFile class, which has a write
method
to write a ParticleSet to disk. The frequency at which this is
called is determined by the outputdt
property.
-
class
filtering.file.
LagrangeParticleArray
(particleset, outputdt=inf, variables=None)¶ A ParticleFile for in-memory caching of advected data.
For smaller spatial extents, or sufficient memory, it is easier to work with in-memory arrays to cache advection data.
Important
This requires a bit more management than
LagrangeParticleFile
: after forward advection, reverse the data withreverse_data()
then skip the first output of backward advection (the sampling of initial particle positions) withset_skip()
.Parameters: - particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – The particle set for which advection data will be cached. We use this to get the names and types of sampled variables.
- outputdt (Optional[float]) – The frequency at which advection data should be saved. If not specified, or infinite, the data will be saved at the first timestep only.
- variables (Optional[List[parcels.particle.Variable]]) – An explicit
subset of variables to output. If not specified, all
variables belonging to the particleset’s particletype that
are
to_write
are written.
-
reverse_data
()¶ Reverse all cached advection data.
This can be used before and after a backward advection to make sure the data is correctly ordered.
-
set_skip
(n)¶ Skip a number of subsequent output steps.
This is particularly useful to ignore the first advection output, which is the values of particles before the kernel is called, and often contains junk unless an explicit zero-time sampling kernel is used.
Parameters: n (int) – The number of output steps to skip.
-
write
(particleset, time, deleted_only=False)¶ Write data from a particle set.
This is intended to be called from a particle execution kernel. The frequency of writes is determined by the
outputdt
attribute on the class.Particles which have been deleted (due to becoming out of bounds, for example) are masked with NaN.
Parameters: - particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – Particle set with data to write to disk.
- time (float) – Timestamp into particle execution at which this write was called.
- deleted_only (Optional[bool]) – Whether to only write deleted particles (does not do anything, only present for compatibility with the call signature on the parcels version of the class).
-
class
filtering.file.
LagrangeParticleFile
(particleset, outputdt=inf, variables=None, output_dir='.')¶ A ParticleFile for on-disk caching in a temporary HDF5 file.
A temporary HDF5 file is used to store advection data. Data is stored in 2D, with all particles contiguous as they appear in their particle set. The time dimension is extendable, and is appended for each write operation. This means that the number of writes does not need to be known ahead-of-time.
Important
Before calling the particle kernel, a group must be created in the file by calling
set_group()
.Example
Create an instance of the class, and run forward advection on a ParticleSet:
f = LagrangeParticleFile(ps, output_dt) f.set_group("forward") ps.execute(kernel, dt=advection_dt, output_file=f)
Note
Advection data is stored to a NamedTemporaryFile that is scoped with the same lifetime as this ParticleFile. This should ensure that upon successful completion, the temporary files are cleaned up, yet they will remain if an error occurs that causes an exception.
Parameters: - particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – The particle set for which this file should cache advection data on disk. It’s assumed the number of particles contained within the set does not change after initialisation.
- outputdt (Optional[float]) – The frequency at which advection data should be saved. If not specified, or infinite, the data will be saved at the first timestep only.
- variables (Optional[List[parcels.particle.Variable]]) – An explicit list subset of
particletype variables to output. If not specified, all variables
belonging to the particletype that are
to_write
are written. - output_dir (Optional[str]) – The directory in which to place the temporary output file.
-
data
(group)¶ Return a group from the HDF5 object.
Each variable saved from particle advection is available as a Dataset within the group, as well the
time
attribute.Parameters: group (str) – The name of the group to retrieve. Returns: - The group from the underlying HDF5 file.
- If the group hasn’t been initialised with
set_group()
, it will be empty.
Return type: h5py.Group
-
set_group
(group)¶ Set the group for subsequent write operations.
This will create the group, and datasets for all variables that will be written by this object, if they do not already exist. Otherwise, the group will simply be selected without overwriting existing data.
Parameters: group (str) – The name of the group.
-
write
(particleset, time, deleted_only=False)¶ Write data from a particle set.
This is intended to be called from a particle execution kernel. The frequency of writes is determined by the
outputdt
attribute on the class.Particles which have been deleted (due to becoming out of bounds, for example) are masked with NaN.
Parameters: - particleset (parcels.particlesets.particlesetsoa.ParticleSetSOA) – Particle set with data to write to disk.
- time (float) – Timestamp into particle execution at which this write was called.
- deleted_only (Optional[bool]) – Whether to only write deleted particles (does not do anything, only present for compatibility with the call signature on the parcels version of the class).
analysis¶
Analysis routines for Lagrangian particle data.
These routines take an already-configured ‘LagrangeFilter’ and produce diagnostic output.
-
filtering.analysis.
power_spectrum
(filter, time)¶ Compute the mean power spectrum over all particles at a given time.
This routine gives the power spectrum (power spectral density) for each of the sampled variables within filter, as a mean over all particles. It will run a single advection step at the specified time. The resulting dictionary contains a freq item, with the FFT frequency bins for the output spectra.
Parameters: - filter (filtering.LagrangeFilter) – The pre-configured filter object to use for running the analysis.
- time (float) – The time at which to perform the analysis.
Returns: - A dictionary of power spectra for each of
the sampled variables on the filter.
Return type: Dict[str, numpy.ndarray]