Extra modules

dm_sheet

This is a module for ARES/HADES/BORG. It adds the algorithms dm_sheet to compute cosmological fields from the dark matter phase-space sheet (in particular, density and velocity fields from tetrahedra formalism).

borg_forward supports the use of dm_sheet when it is available.

Setup

To use this module, clone the repository in $ARES_ROOT/extra/ (where $ARES_ROOT represents the root source directory of ARES on your computer).

For example, you can do:

cd $ARES_SOURCE/extra
git clone git@bitbucket.org:/bayesian_lss_team/dm_sheet.git dm_sheet

and rebuild.

Use

To use dm_sheet in borg_forward, use the flag --dmsheet. New fields are then added to the output files.

Contributors

The main authors of this module are:

  • Florent Leclercq

  • Guilhem Lavaux

To add more features, please contact these people, or submit pull requests.

Additional contributions from:

  • James Prideaux-Ghee

References

    1. Abel, O. Hahn, R. Kaehler (2012), Tracing the Dark Matter Sheet in Phase Space, arXiv:1111.3944

    1. Hahn, R. Angulo, T. Abel (2015), The Properties of Cosmic Velocity Fields, arXiv:1404.2280

    1. Leclercq, J. Jasche, G. Lavaux, B. Wandelt, W. Percival (2017), The phase-space structure of nearby dark matter as constrained by the SDSS, arXiv:1601.00093

hmclet

Guilhem has developped a much smaller variant of the Hamiltonian Markov Chain algorithm to jointly sample a limited set of parameters (like < 100).

This is HMCLET: a small extra HMC framework for ARES to allow sampling a bunch of model parameters together. It provides a self calibration step to estimate the masses for the HMC.

Setup

The code is available in “hmclet” module . To use it, clone this repository into extra/hmclet in ARES source tree. You can for example do:

cd $ARES_SOURCE/extra
git clone https://bitbucket.org/bayesian_lss_team/hmclet.git hmclet

Once it is checked out you can move to the build directory and run cmake ., then make and you will have the new module compiled.

You can run libLSS/tests/test_hmclet to check that no error is triggered and verify the content of “test_sample.h5”. It must contain a chain with 2 parameters for which the first one oscillates around 1 with a variance of 10, and the other oscillates around 4 with a variance of 2.

Use

The Little HMC (HMClet, like Applet) framework consists in two classes in the namespace LibLSS::HMCLet:

  • JointPosterior, which is the one acting like a parent to your class describing the log-posterior,

  • SimpleSampler, which is using an instance of JointPosterior to generate samples using the HMC algorithm.

There is a demonstration (and test case) available in libLSS/tests/test_hmclet.cpp, please have a look at it.

To use SingleSampler you have to make a new class derivative of JointPosterior and implement three functions:

  • getNumberOfParameters() which returns an integer corresponding to the number of parameters supported by your posterior

  • evaluate(parameters) which returns the opposite of the log-posterior (i.e. like chi2/2)

  • adjointGradient(parameters, adjoint_gradient) which fills the adjoint gradient vector corresponding to the given parameters.

An example is as follow:

class MyPosterior: virtual public JointPosterior {
public:
   /* Bla bla for constructor and destructor */
   virtual size_t getNumberOfParameters() const  {
    return 1;
  }

  virtual double evaluate(VectorType const& params) {
    return 0.5 * square(params[0]-1)/10.;
  }

  virtual void adjointGradient(VectorType const& params, VectorType& params_gradient) {
    params_gradient[0] = (params[0]-1)/10.;
  }
};

The above posterior will represent a Gaussian distribution centered on one, with a variance of 10. It depends on a single parameter.

The sampling would occur like this:

auto posterior = std::make_shared<MyPosterior>();
SimpleSampler sampler(posterior);

/* Calibrate the mass matrix.
 *   comm: MPI communication
 *   rgen: Random number generator
 *   steps: number of steps to attempt for calibration
 *   init_params: initial parameters to start calibration
 *   init_step: typical step size to start with
 */
sampler.calibrate(comm, rgen, steps, init_params, init_step);

/* Generate a sample with HMC
 *   comm: MPI communication
 *   rgen: Random number generator
 *   params: current parameter state
 */
sampler.newSample(comm, rgen, init_params);

Contributors

  • Guilhem Lavaux

  • Jens Jasche

You can submit pull requests to the BLSS team admin.

virbius

To be written…

Python

This pages presents the features of the ARES/BORG Python module

Installation

bash get-aquila-modules.sh --clone automatically retrieves the module.

Use the --python flag in build.sh (see building). The python package installation is automatic if you run make install. At the end of the make phase, a python module will be installed in the user site-package directory and made available to python VM. If you also require to run with python defined likelihood (see how to write a likelihood in python) with hades then you also need to append --hades-python while executing build.sh. This requirement will probably go away later.

Note

If you compile with MPI support the Python binding interface will look for the MPI4PY package. If it is not found, it will just proceed as usual. However, if it is found, the MPI4PY must have been compiled with the same MPI framework as ARES/BORG. Not doing so will very likely result in a segmentation fault when importing borg. A succesfull import will look like the following:

>>> import borg
Initializing console.
[INFO   ] libLSS version v2.0.0alpha-47-g7d560cc built-in modules ares_fg;borg;dm_sheet;hades;hmclet;python
[INFO S ] Registered forward models:
[INFO S ]   - 2LPT_CIC
[INFO S ]   - 2LPT_CIC_OPENMP
[INFO S ]   - 2LPT_DOUBLE
[INFO S ]   - 2LPT_NGP
[INFO S ]   - Downgrade
[INFO S ]   - EnforceMass
[INFO S ]   - HADES_LOG
[INFO S ]   - HADES_PT
[INFO S ]   - Haar
[INFO S ]   - LPT_CIC
[INFO S ]   - LPT_CIC_OPENMP
[INFO S ]   - LPT_DOUBLE
[INFO S ]   - LPT_NGP
[INFO S ]   - PATCH_MODEL
[INFO S ]   - PM_CIC
[INFO S ]   - PM_CIC_OPENMP
[INFO S ]   - PRIMORDIAL
[INFO S ]   - PRIMORDIAL_FNL
[INFO S ]   - Softplus
[INFO S ]   - TRANSFER_EHU
[INFO S ]   - Transfer
[INFO S ]   - Upgrade
[INFO S ]   - bias::BrokenPowerLaw
[INFO S ]   - bias::DoubleBrokenPowerLaw
[INFO S ]   - bias::EFT
[INFO S ]   - bias::EFT_Thresh
[INFO S ]   - bias::Linear
[INFO S ]   - bias::ManyPower_1^1
[INFO S ]   - bias::ManyPower_1^2
[INFO S ]   - bias::ManyPower_1^4
[INFO S ]   - bias::ManyPower_2^2
[INFO S ]   - bias::Noop
[INFO S ]   - bias::PowerLaw
[INFO   ] Found MPI4PY.
[INFO   ] CPU features: MMX [!AVX] [!AVX2] SSE SSE2 [!SSE3] [!SSE4.1] [!SSE4.2]
>>>

As you can see there is a line “Found MPI4PY”.

Usage

First step:

import borg

# This retrieve the console management object
console = borg.console()
# This prints at the STD level
console.print_std("Hello!")
# Reduce verbosity
console.setVerboseLevel(2)

Building your first chain

The BORG python pipeline closely follow the BORGForwardModel v2 API. This means that the input is assumed to be Gaussian random number with unit variance in Fourier space. Fortunately the generation of such numbers is easy:

import numpy as np

# Define a physical box (that is optional for this step, but it will be useful later
box = borg.forward.BoxModel()
box.L = (200,200,200)
box.N = (64,64,64)

# Generate gaussian random numbers, Fourier transform them, and rescale to ensure unit-variance
ic = np.fft.rfftn(np.random.randn(*box.N))/box.N[0]**(1.5)

In the above code snippet we have also defined a BORG box, which is at the moment limited to 3d. box.L is the physical size (in Mpc/h) of the box in each direction, while box.N is the grid size. In the above you see that the Fourier transformed density has been rescaled by \(1/\sqrt{N^3}\). This comes because of simple linear algebraic properties, and the requirement of unit variance in the Fourier representation.

Now we need to create a new chain object:

chain = borg.forward.ChainForwardModel(box)
chain.addModel(borg.forward.models.HermiticEnforcer(box))

We have immediately added an element that enforces that the elements of the input Fourier density field to be self-complex conjugated. This is not strictly required here as ic was generated by np.fft.rfftn.

Our first real element of the chain is the injection of primordial gravity fluctuation:

chain.addModel(borg.forward.models.Primordial(box, 0.1))

This multiplies in Fourier space the input density with a function: \(A(k) \propto -k^{n_S/2-2}\) The exact constant of proportionality depends on \(\sigma_8\) (or \(A_S\)), the volume and the Hubble constant. Note the 0.1 which indicates the scale factor at which the potential is seeded in the chain. The next elements depend on that number.

The next element is to add a physical transfer function to produce density fluctuations out of this gravitational potential:

chain.addModel(borg.forward.models.EisensteinHu(box))

This is a simple Einsenstein & Hu power spectrum, which does not change the scale factor of the universe.

Now we need to add a real gravity solver. One simple solver is provided by “BorgLpt” (BORG 1-Lagrangian Perturbation Theory, also known as Zel’dovich approximation).

lpt = borg.forward.models.BorgLpt(box=box, box_out=box, ai=0.1, af=1.0, supersampling=4)
chain.addModel(lpt)

(Question from Andrija: What does the supersampling param control? The ai and af look intuitive enough, for initial scale factor and final one essentially controlling the time, but supersampling I don’t understand. Also doing help(borg.forward.models.BorgLpt) didn’t help me much in understanding)

In the above case we keep the object lpt in the current scope to be able to access more internal state later.

We can now setup the cosmology:

cosmo_par = borg.cosmo.CosmologicalParameters()
cosmo_par.default()
print(repr(cosmo_par))
chain.setCosmoParams(cosmo_par)

We have used some sane defaults for the cosmology in the above. The values of the parameters are printed using the print statement. All the elements of the chain are being updated with the last statement. They try to do this “lazily”, i.e. if the cosmology has not changed nothing will happen (as updating the internal cached state may be very costly).

The model is run with chain.forwardModel_v2(ic), which goes through the entire chain. The final density field is not yet produced. To do this we need to request it explicitly:

rho = np.empty(chain.getOutputBoxModel().N)
chain.getDensityFinal(rho)

rho holds now density contrast of the simulation. In IPython, one can show check a slice using:

from matplotlib import pyplot as plt
plt.imshow(rho[:,:,chain.getOutputBoxModel().N[2]//2])
plt.show()

Computing the adjoint gradient

The evaluation of the adjoint gradient follows the same pattern as for the forward evaluation. Instead of the pair forwardModel_v2 and getDensityFinal, one must use adjointModel_v2 and getAdjointModel. However keep in mind the shapes of the arrays are reversed: adjointModel_v2 requires an array according to the output of the forward model. Thus we have:

dlogL_drho = np.empty(chain.getOutputBoxModel().N)
# Here fill up dlogL_drho from the gradient of the likelihood
chain.adjointModel_v2(dlogL_drho)
ic = np.empty(chain.getBoxModel().N)
chain.getAdjointModel(ic)

Note also that we have requested the initial conditions in real representation (and not Fourier). A Fourier representation may be requested by providing an adequate sized complex array.

Computing the velocity field

BORG comes pre-bundled with velocity field estimator (along with their adjoint gradient of course). A very simple estimator is provided by the CIC density estimator. It requires a particle based simulator to estimate the velocity field from. Such particle based simulators are for example BorgLpt, Borg2Lpt or BorgPM. If the types are not compatible, an exception will be thrown.

The usage is simple, here is an example:

vmodel = borg.forward.velocity.CICModel(box, lpt)
out_v = np.empty((3,)+box.N)
vmodel.getVelocityField(out_v)

The first statement creates the velocity field estimator, with the requested box to be produced and the particle based forward model lpt (same variable as in the section “Building your first chain”). The second statement allocates the required memory. The last statement triggers the computation. The above statements shall be run after executing forwardModel_v2 on the chain object.

One can then show a slice (here of the x-component), and the check the compatibility with the density field:

plt.imshow(out_v[0,:,:,chain.getOutputBoxModel().N[2]//2])
plt.show()

Computing some bias models directly

PyBORG has a submodule called “bias” which provides a direct route to some of the bundled bias models (in C++ those are the generic bias models). Not all models are linked in though. The usage is relatively straightforward. We will use EFTBiasDefault as an example:

import numpy as np
import borg

boxm = borg.forward.BoxModel()
model = borg.forward.models.HadesLinear(boxm, 0.1, 1.0)

bias_model = borg.bias.EFTBiasDefault(0.1)

density = np.random.normal(size=boxm.N)
biased_density = np.zeros(boxm.N)

params = np.ones(7)

bias_model.compute(model, 1.0, params, density, biased_density)

The example starts by loading the borg module. Then we just construct a forward model element for the example using HadesLinear. In your code that should be a reasonable element that you used to produce the matter density field. The bias model may try to discuss directly with that element so it is a good practice to really provide meaningful elements. Then we construct a bias model object EFTBiasDefault. This one has a mandatory argument to specify the Lambda parameter in that specific model, which we set to \(0.1h \mathrm{Mpc}^{-1}\) here. The next steps are just initialization of the field used for bias_model.compute. As can be directly inferred from the call the following arguments are required:

  • a borg forward model (model)

  • the value of nmean (though it could be ignored depending on the specific bias model)

  • a 1d numpy array of float64 for the parameters of the model

  • the 3d density contrast (density)

  • the output 3d biased density (biased_density)

Running with MPI

Using MPI requires some care that is not completely handled automatically.

One may initialize the python with MPI like this:

import numpy as np
import borg
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

In rank and size you will now have the rank of the current process in the MPI communicator, and size will hold the total size. Then a typical initialization chain of forward models may be constructed as indicated there. Assuming that chain is such an object one may query the expected slabs with getMPISlice()

(for the input) and getOutputMPISlice() (for the output):

startN0,localN0,in_N1,in_N2 = chain.getMPISlice()
out_startN0,out_localN0,out_N1,out_N2 = chain.getOutputMPISlice()

These may be used like this:

x = np.zeros((localN0,in_N1,in_N2))
if localN0 > 0:
    x[:,:,:] = ref_data[startN0:(startN0+localN0),:,:]

with ref_data some array that covers the entire box. As you can see, the x array requires only the part between startN0 and startN0+localN0 of that array. In practice that array (ref_data) may not have to exist in memory.

Then x may be directly provided to chain.forwardModel_v2 as a first argument. The output density field follow the same rule as the input density field.

Writing a new forward model

The interface of the forward model in python closely follow the one in C++. The basic skeleton is given by the following lines of code:

import jax

class MyModel(borg.forward.BaseForwardModel):
   # Constructor
   def __init__(self, box):
     super().__init__(box, box)

   # IO "preferences"
   def getPreferredInput(self):
     return borg.forward.PREFERRED_REAL

   def getPreferredOutput(self):
     return borg.forward.PREFERRED_REAL

   # Forward part

   def forwardModel_v2(self, input_array):
     self.save = jax.numpy.array(input_array)

   def getDensityFinal(self, output_array):
     output_array[:] = self.save**2

   # Adjoint part

   def adjointModel_v2(self, input_ag):
     self.ag = input_ag

   def getAdjointModel(self, output_ag):
     output_ag[:] = 2 * self.ag * self.save

There are four main group in the function that needs be implemented:

  • the constructor. It is crucial that the constructor of the parent is explicitly called. Otherwise the interface will not work. The parent constructor takes two argument: the input box (of type borg.forward.BoxModel) and the output box (same type).

  • the function providing the “preferred IO” for the forward and adjoint functions. In practice the preferrence is enforced for python. This means that the value indicated here will change the kind of arrays that are provided to the forward and adjoint part. At the moment two type of IO are possible:

    • PREFERRED_REAL: the model wants a 3d real space representation as an argument

    • PREFERRED_FOURIER: the model wants a 3d fourier space representation as an argument

  • then the forward evaluation part itself has to be implemented in two pieces: forwardModel_v2 and getDensityFinal (it is optional depending on what is put after that model). It is expected that forwardModel_v2 executes the main part of the computation but it is not fully required.

  • finally the computation of the adjoint gradient follows the same pattern as the forward computation. The difference is that the types and shapes of arrays are reversed. input_ag has a shape/type corresponding to the output and output_ag to the input.

Finally, as shown above, the input/output array are using a numpy interface. They can thus be used in JAX/Tensorflow/whatever. In the example code above the input array is saved in a jax array and evaluated later. This is legit, though bear in mind that means there will be memory that will not be freed while you retain that reference.

Build a python likelihood script

Ini file

[python]
likelihood_path=test_likelihood.py
bias_sampler_type=slice

The hades_python initializers

A typical python likelihood requires three initialization function. They must be registered using the helper decorators borg.registerGravityBuilder (for the forward model), borg.registerLikelihoodBuilder (for the bias+likelihood part), borg.registerSamplerBuilder (for extra sampling strategies).

An example of their use is the following piece of code:

import borg

@borg.registerGravityBuilder
def build_gravity_model(state, box):
    global model
    chain = borg.forward.ChainForwardModel(box)
    chain.addModel(borg.forward.models.HermiticEnforcer(box))
    chain.addModel(borg.forward.models.Primordial(box, 1.0))
    chain.addModel(borg.forward.models.EisensteinHu(box))
    model = chain
    return chain


@borg.registerLikelihoodBuilder
def build_likelihood(state, info):
    boxm = model.getBoxModel()
    return MyLikelihood(model, boxm.N, boxm.L)

@borg.registerSamplerBuilder
def build_sampler(state, info):
    return []

The build_gravity_model function returns a BORGForwardModel object, and take a MarkovState and a BoxModel as parameters. The build_likelihood function must return a Likelihood3d object (check help(borg.likelihood.Likelihood3d)). Finally build_sampler must return a list of sampler object.

The forward model elements can be either the C++ or Python object and both work transparently. Likelihoods may also be written in pure python though MPI is still untested at this time (August 2020).

Writing a likelihood

In the previous section we have seen how to build the objects required by hades_python to analyze data. We have not approached how to write a likelihood in python. A lot of likelihood and bias are already available from the C++ side, for example borg.likelihood.GaussianPassthrough, borg.likelihood.GaussianLinear or borg.likelihood.PoissonPowerLaw. To create new ones easily in python, one has to write a class inheriting from borg.likelihood.BaseLikelihood and implement a number of functions. An example of a simple gaussian likelihood is shown herebelow:

import borg

cons = borg.console()

myprint = lambda x: cons.print_std(x) if type(x) == str else cons.print_std(
    repr(x))

class MyLikelihood(borg.likelihood.BaseLikelihood):
    def __init__(self, fwd, N, L):
        myprint(f" Init {N}, {L}")
        super().__init__(fwd, N, L)

    def initializeLikelihood(self, state):
        myprint("Init likelihood")
        self.data = state['galaxy_data_0']
        state.newArray3d("my_density_field", True, self.data.shape[0],
                         self.data.shape[1], self.data.shape[2])

    def updateMetaParameters(self, state):
        cpar = state['cosmology']
        myprint(f"Cosmology is {cpar}")
        self.getForwardModel().setCosmoParams(cpar)

    def generateMockData(self, s_hat, state):

        fwd = self.getForwardModel()
        output = np.zeros(fwd.getOutputBoxModel().N)
        fwd.forwardModel_v2(s_hat)
        fwd.getDensityFinal(output)

        state['galaxy_data_0'][:] = output + np.random.normal(
            size=output.shape) * sigma_noise
        state['my_density_field'][:] = output
        like = ((state['galaxy_data_0'][:] - output)**2).sum() / sigma_noise**2
        myprint(
            f"Initial log_likelihood: {like}, var(s_hat) = {np.var(s_hat)}")

    def logLikelihoodComplex(self, s_hat, gradientIsNext):
        fwd = self.getForwardModel()

        output = np.zeros(fwd.getBoxModel().N)
        fwd.forwardModel_v2(s_hat)
        fwd.getDensityFinal(output)
        L = 0.5 * ((output - self.data)**2).sum() / sigma_noise**2
        myprint(f"var(s_hat): {np.var(s_hat)}, Call to logLike: {L}")
        return L

    def gradientLikelihoodComplex(self, s_hat):
        fwd = self.getForwardModel()
        output = np.zeros(fwd.getOutputBoxModel().N)
        fwd.forwardModel_v2(s_hat)
        fwd.getDensityFinal(output)
        mygradient = (output - self.data) / sigma_noise**2
        fwd.adjointModel_v2(mygradient)
        mygrad_hat = np.zeros(s_hat.shape, dtype=np.complex128)
        fwd.getAdjointModel(mygrad_hat)
        return mygrad_hat

The function myprint is an helper to create nice output that streams correctly with the rest of the C++ code. It is not mandatory but strongly recommended to use the borg.console() object as it will seemlessly integrate with other BORG tools.

We will now look at each function one after the other:

  • __init__ is the constructor. It is crucial that the base constructor is called in the constructor of the new class: it will not be done implicitly by the python virtual machine. The base constructor takes a BORGForwardModel object, and the grid specifications N and L as tuples.

  • initializeLikelihood is called at the initialization of the chain and before restoration. If you want to store additional fields in the mcmc, you should allocate them at that moment in the state object. In the above example, a new 3d array is allocated to store the density field after the forward model evaluation.

Note that the forward model has to be evaluated in the log likelihood and its gradient. Though it is in principle required to implement logLikelihood and gradientLikelihood (the real counterpart of the complex functions hereabove), in practice they are not used for the run.

More python jupyter tutorials

  • A notebook to showcase tCOLA and its convergence by considering at \(P(k)\) is here.