Developer’s Guide
Installing HoloPy for developers
If you are going to hack on holopy, you probably want to compile the scattering extensions.
First, download or clone the latest version of HoloPy from GitHub at https://github.com/manoharan-lab/holopy.
To gather all the dependencies needed to build HoloPy, the simplest approach is to use the included environment.yml file to make a new conda environment:
conda env create -f ./environment.yml
Then activate the new environment:
conda activate holopy-devel
Now you can build and install the package. Let’s say you downloaded or cloned
HoloPy to /home/me/holopy. Then open a terminal, cd to
/home/me/holopy and run:
python -m pip install --no-build-isolation --editable .
This will build the package and scattering extensions, and it will install a stub in your current environment that loads the package from the build directory. If you change the code and re-import holopy, it will be automatically rebuilt by meson.
Notes: Because the environment file includes the gfortran package, which
packages the compilers needed, this procedure should work on all platforms. You
should not need to install any compilers prior to building the package. But if
this procedure doesn’t work, or you find something else that does, please let
us know so that we can
improve these instructions.
How HoloPy stores data
Images in HoloPy are stored in the format of xarray DataArrays. Spatial
information is tracked in the DataArray’s dims and coords fields
according to the HoloPy Coordinate System. Additional dimensions are
sometimes specified to account for different z-slices, times, or field
components, for example. Optical parameters like refractive index and
illumination wavelength are stored in the DataArray’s attrs field.
The detector_grid() function simply creates a 2D image composed entirely
of zeros. In contrast, the detector_points() function creates a DataArray
with a single dimension named ‘point’. Spatial coordinates (in either Cartesian
or spherical form) track this dimension, so that each data value in the array
has its own set of coordinates unrelated to its neighbours. This type of
one-dimensional organization is sometimes used for 2D images as well. Inference
and fitting methods typically use only a subset of points in an image (see
[Dimiduk2014]), and so it makes sense for them to keep track of lists of
location coordinates instead of a grid. Furthermore, HoloPy’s scattering
functions accept coordinates in the form of a 3xN array of coordinates. In both
of these cases, the 2D image is flattened into a 1D DataArray like that created
by detector_points(). In this case the single dimension is ‘flat’ instead
of ‘point’. HoloPy treats arrays with these two named dimensions identically,
except that the ‘flat’ dimension can be unstacked to restore a 2D image or 3D
volume.
HoloPy’s use of DataArrays sometimes assigns smaller DataArrays in attrs,
which can lead to problems when saving data to a file. When saving a DataArray
to file, HoloPy converts any DataArrays in attrs to numpy arrays, and keeps
track of their dimension names separately. HoloPy’s save_image() writes a
yaml dump of attrs (along with spacing information) to the
imagedescription field of .tif file metadata.
MCMC sampling (see Bayesian Parameter Estimation) returns a lot of information, which is
stored in the form of a SamplingResult object. This object stores the
model and EmceeStrategy that were used in the inference calculation as
attributes. An additional attribute named dataset is an xarray Dataset that
contains both the data used in the inference calculation, as well as the raw
output. The parameter values at each step of the sampling chain and the
calculated log-probabilities at each step are stored here under the samples
and lnprobs namespaces.
Adding a new scattering theory
Adding a new scattering theory is relatively straightforward. You just need to define a new scattering theory class and implement one or two methods to compute the raw scattering values:
class YourTheory(ScatteringTheory):
def can_handle(self, scatterer):
# Your code here
def raw_fields(self, positions, scatterer, medium_wavevec, medium_index, illum_polarization):
# Your code here
def raw_scat_matrs(self, scatterer, pos, medium_wavevec, medium_index):
# Your code here
def raw_cross_sections(self, scatterer, medium_wavevec, medium_index, illum_polarization):
# Your code here
You can get away with just defining one of either raw_scat_matrs or
raw_fields if you just want holograms, fields, or intensities. If
you want scattering matrices you will need to implement
raw_scat_matrs, and if you want cross sections, you will need to
implement raw_cross_sections. We separate out raw_fields from
raw_scat_matrs to allow for faster fields calculation for specific
cases, such as the Mie, MieLens, and Multisphere theories (and you might
want to do so for your theory as well); the base
ScatteringTheory class calculates the fields from the
scattering matrices by default.
You can look at the Mie theory in HoloPy for an example of calling Fortran functions to compute scattering (C functions will look similar from the python side) or DDA for an an example of calling out to an external command line tool by generating files and reading output files.
If you want to fit parameters in your scattering theory, you also need
to define a class attribute parameter_names that contains the fittable
attributes of the scattering theory. Once you do this, fitting should
work natively with your new scattering theory: you should be able to
specify the parameters as a prior.Prior object and holopy’s
inference Model will auto-detect them as fittable parameters.
For an example of this, see the Lens, MieLens, or
AberratedMieLens classes.
Adding a new inference model
To perform inference, you need a noise model. You can make a new noise model by
inheriting from NoiseModel. This class
has all the machinery to compute likelihoods of observing data given some set of
parameters and assuming Gaussian noise.
To implement a new model, you just need to implement one function: forward.
This function receives a dictionary of parameter values and a data shape schema
(defined by detector_grid(), for example) and needs to return simulated
data of shape specified. See the _forward function in
AlphaModel for an example of how to do
this.
If you want to use some other noise model, you may need to override _lnlike
and define the probablity given your uncertainty. You can reference _lnlike
in NoiseModel.
Running tests
HoloPy comes with a suite of tests that ensure everything has been built
correctly and that it can perform all of the calculations it is designed to do.
To run these tests, navigate to the root of the source tree (e.g.,
/home/me/holopy) and run
python run_tests.py
or you can just run pytest (or pytest -v for verbose output) directly
from the command line. It will automatically discover all the tests and run
them.
Note that you can download the full test holograms by installing git lfs and
doing:
git lfs pull
You don’t need to do this, but it can be helpful to inspect differences between the test calculations and the expected holograms if the tests fail.
Notes on the build system
We use the meson build system with the
meson-python extension to
build the scattering extensions and install the package. Previously we used
numpy.distutils but this module has been deprecated. There are a few things
you need to know about the build system in order to ensure that your changes to
HoloPy will work properly.
1. Note that all meson builds are done “out of tree”. That means compiled extensions are not installed into the same directory as their sources. Note that holopy has several extensions that must be installed in a way that makes it possible to do (for example)
import holopy.scattering.theory.mie_f.scsmfo_min
To enable this functionality, we need to tell meson to copy the extensions to
the appropriate point in the installation tree, and to copy the python files
too. In the subdirectories, you’ll see meson.build files that call
install_sources(), which installs the .py files of holopy, and that call
extension_module(subdir=...), which tells meson where to install the
compiled scattering extensions. All files have to be specified, so if you add a
Python file somewhere, you need to update the relevant meson.build file to
include it in the installation. Having to specify all the files is a big change
from how we did things with numpy.distutils, but it’s supposed to make the
build process more efficient.
Currently we do not add the test files or the example data to the installation, because we’d need to specify a lot of files, and it would add an extra step to writing new tests.
2. Unlike numpy.distutils, meson doesn’t run f2py automatically to
compile the scattering extensions. There is some code in the meson.build
file in the mie_f directory that will automatically run f2py to generate the
C and Fortran wrappers for the scattering extensions. Have a look at this file
if you’re adding a new Fortran extension.
Gotchas
If you open a Python interpreter or Jupyter notebook in the root of the repository, remember that Python will see the subdirectory
holopyas a package. So even if you haven’t built the package with meson,import holopymight work, and will probably give you a lot of unexpected results (like the scattering theories being missing). Remember that meson builds do not happen in the source tree. To check whether you have actually built and installed the package, try to import it from a directory that does not have theholopysource tree as a subdirectory.To run the tests, however, you do need your current working directory to be inside the source tree. This is because the tests are not installed with the package.
All python files that include tests that use multiprocessing must be added to
install_sources()in the relevantmeson.buildfile. This is because the multiprocessing module needs to do some pickling, and it tries to import the test file as a module. This is the exception to the rule that we do not include test files in the installation. If you don’t install the file containing the tests, you might see thatpytesthangs on the test. Doingpytest -sis a good way to debug any hanging tests. It runs pytest, but it shows all the output (stdout and stderr) from the code. If you see aModuleNotFoundErrorfrom themultiprocessingpackage, you need to include your test file in the installation. So, for example, we have to include/holopy/inference/tests/test_cma.pyin/holopy/inference/tests/meson.buildbecause it relies on thecmaesmodule, which usesmultiprocessing.