mcmc
sampler¶
Note
If you use this sampler, please cite it as:
A. Lewis and S. Bridle, “Cosmological parameters from CMB and other data: A Monte Carlo approach”
(arXiv:astroph/0205436)
A. Lewis, “Efficient sampling of fast and slow cosmological parameters”
(arXiv:1304.4473)
If you use fastdragging, you should also cite
R.M. Neal, “Taking Bigger Metropolis Steps by Dragging Fast Variables”
(arXiv:math/0502099)
This is the Markov Chain Monte Carlo Metropolis sampler used by CosmoMC, and described in Lewis, “Efficient sampling of fast and slow cosmological parameters” (arXiv:1304.4473). It works well on simple unimodal (or only weakly multimodal) distributions.
The proposal pdf is a gaussian mixed with an exponential pdf in random directions,
which is more robust to misestimation of the width of the proposal than a pure gaussian.
The scale width of the proposal can be specified per parameter with the property
proposal
(it defaults to the standard deviation of the reference pdf, if defined,
or the prior’s one, if not). However, initial performance will be much better if you
provide a covariance matrix, which overrides the default proposal scale width set for
each parameter.
Note
The proposal
size for a certain parameter should be close to its conditional
posterior, not it marginalized one, since for strong degeneracies, the latter being
wider than the former, it could cause the chain to get stuck.
If the distribution being sampled is known have tight strongly nonlinear parameter degeneracies, redefine the sampled parameters to remove the degeneracy before sampling (linear degeneracies are not a problem, esp. if you provide an approximate initial covariance matrix).
Progress monitoring – new in 2.1¶
When writing to the hard drive, the MCMC sampler produces an additional [output_prefix].progress
file containing the acceptance rate and the Gelman \(R1\) diagnostics (for means and confidence level contours) per checkpoint, so that the user can monitor the convergence of the chain. In interactive mode (when running inside a Python script of in the Jupyter notebook), an equivalent progress
table in a pandas.DataFrame
is returned among the products
.
The mcmc
modules provides a plotting tool to produce a graphical representation of convergence, see plot_progress()
. An example plot can be seen below:
from cobaya.samplers.mcmc import plot_progress
# Assuming chain saved at `chains/gaussian`
plot_progress("chains/gaussian", fig_args={"figsize": (6,4)})
import matplotlib.pyplot as plt
plt.tight_layout()
plt.show()
When writing to the hard drive (i.e. when an [output_prefix].progress
file exists), one can produce these plots even if the sampler is still running.
Callback functions¶
A callback function can be specified through the callback_function
option. It must be a function of a single argument, which at runtime is the current instance of the mcmc
sampler. You can access its attributes and methods inside your function, including the collection
of chain points and the model
(of which prior
and likelihood
are attributes). For example, the following callback function would print the points added to the chain since the last callback:
def my_callback(sampler):
print(sampler.collection[sampler.last_point_callback:])
The callback function is called every callback_every
points have been added to the chain, or at every checkpoint if that option has not been defined.
Initial point and covariance of the proposal pdf¶
The initial points for the chains are sampled from the reference pdf (see Parameters and priors). The reference pdf can be a fixed point, and in that case the chain starts always from that same point. If there is no reference pdf defined for a parameter, the initial sample is drawn from the prior instead.
Example parameters block:
params:
a:
ref:
min: 1
max: 1
prior:
min: 2
max: 2
latex: \alpha
proposal: 0.5
b:
ref: 2
prior:
min: 1
max: 4
latex: \beta
proposal: 0.25
c:
ref:
dist: norm
loc: 0
scale: 0.2
prior:
min: 1
max: 1
latex: \gamma
a
– the initial point of the chain is drawn from an uniform pdf between 1 and 1, and its proposal width is 0.5.b
– the initial point of the chain is always 2, and its proposal width is 0.25.c
– the initial point of the chain is drawn from a gaussian centred at 0 with standard deviation 0.2; its proposal width is not specified, so it is taken to be that of the reference pdf, 0.2.
Fixing the initial point is not usually recommended, since to assess convergence it is useful to run multiple chains (which you can do using MPI), and use the difference between the chains to assess convergence: if the chains all start in exactly the same point, the chains could appear to have converged just because they started at the same place. On the other hand if your initial points are spread much more widely than the posterior it could take longer for chains to converge.
A good initial covariance matrix for the proposal is useful for faster and more reliable convergence.
It can be specified either with the property proposal
of each parameter, as shown
above, or through mcmc
’s property covmat
, as a file name (including path,
if not located at the invocation folder).
The first line of the covmat
file must start with #
, followed by a list of parameter
names, separated by a space. The rest of the file must contain the covariance matrix,
one row per line. It does not need to contain the same parameters as the sampled ones:
where sampled parameters exist in the file the they override the proposal
(and add covariance information),
nonsampled ones are ignored, and for missing parameters the specified input proposal
is used, assuming no correlations.
An example for the case above:
# a b
0.1 0.01
0.01 0.2
In this case, internally, the final covariance matrix of the proposal would be:
# a b c
0.1 0.01 0
0.01 0.2 0
0 0 0.04
If the option learn_proposal
is set to True
, the covariance matrix will be updated
regularly. This means that accuracy of the initial covariance is not critical, and even if you do not initially know
the covariance, it will be adaptively learnt (just make sure your proposal
widths are sufficiently small that
chains can move and hence explore the local shape; if your widths are too wide the parameter may just remain stuck).
If you are not sure that your posterior has one single mode, or if its shape is very
irregular, you should probably set learn_proposal: False
; however the MCMC sampler is not likely to work
well in this case and other samplers designed for multimodal distributions may be much more efficient.
If you don’t know how good your initial guess for the starting point and covariance is, a number of initial burn in samples
can be ignored from the start of the chains (e.g. 10 per dimension). This can be specified with the parameter burn_in
.
These samples will be ignored for all purposes (output, convergence, proposal learning…). Of course there may well
also be more burn in after these points are discarded, as the chain points converge (and, using learn_proposal
, the proposal estimates
also converge). Often removing the first 30% the entire final chains gives good results (using ignore_rows=0.3
when analysing with getdist).
Taking advantage of a speed hierarchy¶
The proposal pdf is blocked by speeds, i.e. it allows for efficient sampling of a mixture of fast and slow parameters, such that we can avoid recomputing the slowest parts of the likelihood when sampling along the fast directions only. This is often very useful when the likelihoods have large numbers of nuisance parameters, but recomputing the likelihood for different sets of nuisance parameters is fast.
Two different sampling schemes are available to take additional advantage from a speed hierarchy:
 Dragging the fast parameters: implies a number of intermediate steps when jumping between fast+slow combinations, such that the jump in the fast parameters is optimized with respect to the jump in the slow parameters to explore any possible degeneracy between them. If enabled (
drag: True
), tries to spend the same amount of time doing dragging steps as it takes to compute a jump in the slow direction (make sure your likelihoodsspeed
’s are accurate; see below).  Oversampling the fast parameters: consists simply of taking a larger proportion of steps in the faster directions, useful when exploring their conditional distributions is cheap. If enabled (
oversample: True
), it tries to spend the same amount of time in each block.
In general, the dragging method is the recommended one if there are nontrivial degeneracies between fast and slow parameters. Oversampling can potentially produce very large output files; dragging outputs smaller chain files since fast parameters are effectively partially marginalized over internally. For a thorough description of both methods and references, see A. Lewis, “Efficient sampling of fast and slow cosmological parameters” (arXiv:1304.4473).
The relative speeds can be specified per likelihood/theory, with the option speed
,
preferably in evaluations per second (approximately).
To measure the speed of your likelihood, set timing: True
at the highest level of your input (i.e. not inside any of the blocks), set the mcmc
options burn_in: 0
and max_samples
to a reasonably large number (so that it will be done in a few minutes), and check the output: it should have printed, towards the end, computation times for the likelihoods and the theory code in seconds, the inverse of which are the speeds.
If the speed has not been specified for a likelihood, it is assigned the slowest one in the set. If two or more likelihoods with different speeds share a parameter, said parameter is assigned to a separate block with a speed that takes into account the computation time of all the likelihoods it belongs to.
For example:
theory:
theory_code:
speed: 2
likelihood:
lik_a:
lik_b:
speed: 4
Here, evaluating the theory code is the slowest step, while the lik_b
is faster.
Likelihood lik_a
is assumed to be as slow as the theory code.
Manual specification of speedblocking – new in 1.1¶
Automatic speedblocking takes advantage of differences in speed per likelihood (or theory). If the parameters of your likelihood or theory have some internal speed hierarchy that you would like to exploit (e.g. if your likelihood internally caches the result of a computation depending only on a subset of the likelihood parameters), you can specify a finegrained list of parameter blocks and their speeds under the mcmc
option blocking
.
E.g. if a likelihood depends of parameters a
, b
and c
and the cost of varying a
is twice as big as the other two, your mcmc
block should look like
mcmc:
blocking:
 [1, [a]]
 [2, [b,c]]
oversampling: True # if desired
# or `drag: True`, if 2blocks only (put fastest last)
# [other options...]
Warning
The cost of a parameter block should be the total cost of varying one parameter in the block, i.e. it needs to take into account the time needed to recompute every part of the code that depends (directly or indirectly) on it.
For example, if varying parameter a
in the example above would also force a recomputation of the part of the code associated to parameters b
and c
, then the relative cost of varying the parameters in each block would not be 2to1, but (2+1)to1, meaning relative speeds would be 1 and 3.
Note
If blocking
is specified, it must contain all the sampled parameters.
Note
If automatic learning of the proposal covariance is enabled, after some checkpoint the proposed steps will mix parameters from different blocks, but always towards faster ones. Thus, it is important to specify your blocking in ascending order of speed, when not prevented by the architecture of your likelihood (e.g. due to internal caching of intermediate results that require some particular order of parameter variation).
Options and defaults¶
Simply copy this block in your input yaml
file and modify whatever options you want (you can delete the rest).
# Default arguments for the Markov Chain Monte Carlo sampler
# ('Xd' means 'X steps per dimension')
sampler:
mcmc:
# Number of discarded burnin samples
burn_in: 20d
# Error criterion: max attempts (= weight1) before deciding that the chain
# is stuck and failing. Set to `.inf` to ignore this kind of errors.
max_tries: 40d
# File (including path) or matrix defining a covariance matrix for the proposal:
#  null (default): will be generated from params info (prior and proposal)
#  matrix: remember to set `covmat_params` to the parameters in the matrix
#  "auto" (cosmology runs only): will be looked up in a library
covmat:
covmat_params:
# Overall scale of the proposal pdf (increase for longer steps)
proposal_scale: 2.4
# Number of steps between convergence checks & proposal learn
check_every: 40d
# Update output file(s) every X accepted samples
output_every: 20
# Proposal covariance matrix learning
# 
learn_proposal: True
# Don't learn if convergence is worse than...
learn_proposal_Rminus1_max: 2.
# (even earlier if a param is not in the given covariance matrix)
learn_proposal_Rminus1_max_early: 30.
# ... or if it is better than... (no need to learn, already good!)
learn_proposal_Rminus1_min: 0.
# Convergence and stopping
# 
# Maximum number of posterior evaluations
max_samples: .inf
# GelmanRubin R1 on means
Rminus1_stop: 0.01
# GelmanRubin R1 on std deviations
Rminus1_cl_stop: 0.2
Rminus1_cl_level: 0.95
# When no MPI used, number of fractions of the chain to compare
Rminus1_single_split: 4
# Exploiting speed hierarchy
# 
# Method I: Oversampling of each parameters block relative to it speed
oversample: False # produces potentially MANY samples
# Method II: Dragging: simulates jumps on slow params when varying fast ones
# Set to True, or a factor of the time of a jump on the slow block
drag: False
# Min and max number of times the fast block is iterated
drag_limits : [1, 10]
# Manual blocking
# 
# To take full advantage of speedblocking, sort the blocks by ascending speeds
# (unless not possible due to the architecture of your likelihood)
blocking:
#  [speed_1, [params_1]]
#  etc.
# Callback function
# 
callback_function:
callback_every: # default: every checkpoint
# Seeding runs
# 
seed: # integer between 0 and 2**32  1
# Checkpointing  do not modify!
Rminus1_last: .inf
converged:
blocks:
oversampling_factors:
i_last_slow_block:
mpi_size:
Module documentation¶
Synopsis:  Blocked fastslow Metropolis sampler (Lewis 1304.4473) 

Author:  Antony Lewis (for the CosmoMC sampler, wrapped for cobaya by Jesus Torrado) 
Sampler class¶

class
samplers.mcmc.
mcmc
(info_sampler, model, output, resume=False, modules=None)¶ 
initialize
()¶ Initializes the sampler: creates the proposal distribution and draws the initial sample.

initial_proposal_covmat
(slow_params=None)¶ Build the initial covariance matrix, using the data provided, in descending order of priority: 1. “covmat” field in the “mcmc” sampler block. 2. “proposal” field for each parameter. 3. variance of the reference pdf. 4. variance of the prior pdf.
The covariances between parameters when both are present in a covariance matrix provided through option 1 are preserved. All other covariances are assumed 0.

run
()¶ Runs the sampler.

n
(burn_in=False)¶ Returns the total number of steps taken, including or not burnin steps depending on the value of the burn_in keyword.

get_new_sample_metropolis
()¶ Draws a new trial point from the proposal pdf and checks whether it is accepted: if it is accepted, it saves the old one into the collection and sets the new one as the current state; if it is rejected increases the weight of the current state by 1.
Returns: True
for an accepted step,False
for a rejected one.

get_new_sample_dragging
()¶ Draws a new trial point in the slow subspace, and gets the corresponding trial in the fast subspace by “dragging” the fast parameters. Finally, checks the acceptance of the total step using the “dragging” pdf: if it is accepted, it saves the old one into the collection and sets the new one as the current state; if it is rejected increases the weight of the current state by 1.
Returns: True
for an accepted step,False
for a rejected one.

metropolis_accept
(logp_trial, logp_current)¶ Symmetricproposal MetropolisHastings test.
Returns: True
orFalse
.

process_accept_or_reject
(accept_state, trial=None, derived=None, logpost_trial=None, logprior_trial=None, loglikes_trial=None)¶ Processes the acceptance/rejection of the new point.

check_all_ready
()¶ Checks if the chain(s) is(/are) ready to check convergence and, if requested, learn a new covariance matrix for the proposal distribution.

check_convergence_and_learn_proposal
()¶ Checks the convergence of the sampling process (MPI only), and, if requested, learns a new covariance matrix for the proposal distribution from the covariance of the last samples.

products
()¶ Auxiliary function to define what should be returned in a scripted call.
Returns: The sample Collection
containing the accepted steps.

Progress monitoring¶

samplers.mcmc.
plot_progress
(progress, ax=None, index=None, figure_kwargs={}, legend_kwargs={})¶ Plots progress of one or more MCMC runs: evolution of R1 (for means and c.l. intervals) and acceptance rate.
Takes a
progress
instance (actually apandas.DataFrame
, returned as part of the samplerproducts
), a chainoutput
prefix, or a list of any of those for plotting progress of several chains at once.You can use
figure_kwargs
andlegend_kwargs
to pass arguments tomatplotlib.pyplot.figure
andmatplotlib.pyplot.legend
respectively.Return a subplots axes array. Display with
matplotlib.pyplot.show()
.
Proposal¶
Synopsis:  proposal distributions 

Author:  Antony Lewis (from CosmoMC) 
Using the covariance matrix to give the proposal directions typically significantly increases the acceptance rate and gives faster movement around parameter space.
We generate a random basis in the eigenvectors, then cycle through them proposing changes to each, then generate a new random basis. The distance proposal in the random direction is given by a twoD Gaussian radial function mixed with an exponential, which is quite robust to wrong width estimates
See https://cosmologist.info/notes/CosmoMC.pdf

class
samplers.mcmc.proposal.
CyclicIndexRandomizer
(n)¶ 
next
()¶ Get the next random index
Returns: index


class
samplers.mcmc.proposal.
RandDirectionProposer
(n)¶ 
propose_vec
(scale=1)¶ propose a random ndimension vector
Parameters: scale – units for the distance Returns: array with vector

propose_r
()¶ Radial proposal. By default a mixture of an exponential and 2D Gaussian radial proposal (to make wider tails and more mass near zero, so more robust to scale misestimation)
Returns: random distance (unit scale)


class
samplers.mcmc.proposal.
BlockedProposer
(parameter_blocks, oversampling_factors=None, i_last_slow_block=None, proposal_scale=2.4)¶ 
set_covariance
(propose_matrix)¶ Take covariance of sampled parameters (propose_matrix), and construct orthonormal parameters where orthonormal parameters are grouped in blocks by speed, so changes in slowest block changes slow and fast parameters, but changes in the fastest block only changes fast parameters
Parameters: propose_matrix – covariance matrix for the sampled parameters.
