Output

The results of a cobaya run are in all cases an updated information dictionary (interactive call) or file (shell call), plus the products generated by the sampler used.

Interactive call

The updated information and products mentioned above are returned by the run function of the cobaya.run module, which performs the sampling process.

from cobaya import run
updated_info, sampler = run(your_input)

sampler here is the sampler instance that just ran, e.g. the mcmc sampler. The results of the sampler can be obtained as sampler.products(), which returns a dictionary whose contents depend on the sampler used, e.g. one chain for the mcmc sampler.

If the input information contains a non-null output, products are written to the hard drive too, as described below.

Shell call

When called from the shell, cobaya generates most commonly the following output files:

  • [prefix].input.yaml: a file with the same content as the input file.

  • [prefix].updated.yaml: a file containing the input information plus the default values used by each component.

  • [prefix].[number].txt: one or more sample files, containing one sample per line, with values separated by spaces. The first line specifies the columns.

Note

Some samplers produce additional output, e.g.

  • MCMC produces an additional [prefix].progress file monitoring the convergence of the chain, that can be inspected or plotted.

  • PolyChord produces native output, which is translated into cobaya’s output format with the usual file names, but also kept under a sub-folder within the output folder.

To specify the folder where the output files will be written and their name, use the option output at the top-level of the input file (i.e. not inside any block, see the example input in the Quickstart example):

  • output: something: the output will be written into the current folder, and all output file names will start with something.

  • output: somefolder/something: similar to the last case, but writes into the folder somefolder, which is created at that point if necessary.

  • output: somefolder/: writes into the folder somefolder, which is created at that point if necessary, with no prefix for the file names.

  • output: null: will produce no output files whatsoever – the products will be just loaded in memory. Use only when invoking from the Python interpreter.

If calling cobaya-run from the command line, you can also specify the output prefix with an --output [something] flag (it takes precedence over the output defined inside the yaml file, if it exists).

Note

When calling from the command line, if output has not been specified, it defaults to the first case, using as a prefix the name of the input file without the yaml extension.

Instead, when calling from a Python interpreter, if output has not been specified, it is understood as output: null.

In all cases, the output folder is based on the invocation folder if cobaya is called from the command line, or the current working directory (i.e. the output of import os; os.getcwd()) if invoked within a Python script or a Jupyter notebook.

Warning

If cobaya output files already exist with the given prefix, it will raise an error, unless you explicitly request to resume or overwrite the existing sample (see Resuming or overwriting an existing run).

Note

When the output is written into a certain folder different from the invocation one, the value of output in the output .yaml file(s) is updated such that it drops the mention to that folder.

Sample files or SampleCollection instances

Samples are stored in files (if text output requested) or SampleCollection instances (in interactive mode). A typical sample file will look like the one presented in the quickstart example:

# weight  minuslogpost         a         b  derived_a  derived_b  minuslogprior  minuslogprior__0      chi2  chi2__gaussian
    10.0      4.232834  0.705346 -0.314669   1.598046  -1.356208       2.221210          2.221210  4.023248        4.023248
     2.0      4.829217 -0.121871  0.693151  -1.017847   2.041657       2.411930          2.411930  4.834574        4.834574

Both sample files and collections contain the following columns, in this order:

  • weight: the relative weight of the sample.

  • minuslogpost: minus the log-posterior, unnormalized.

  • a, b...: sampled parameter values for each sample

  • derived_a, derived_b: derived parameter values for each sample. They appear after the sampled ones, but cannot be distinguished from them by name (they just happen to start with derived_ in this particular example, but can have any name).

  • minuslogprior: minus the log-prior (unnormalized if external priors have been defined), sum of the individual log-priors.

  • minuslogprior__[...]: individual priors; the first of which, named 0, corresponds to the separable product of 1-dimensional priors defined in the params block, and the rest to external priors, if they exist.

  • chi2: total effective \(\chi^2\), equals twice minus the total log-likelihood.

  • chi2__[...]: individual effective \(\chi^2\)’s, adding up to the total one.

output module documentation

Synopsis:

Generic output class and output drivers

Author:

Jesus Torrado

output.split_prefix(prefix)

Splits an output prefix into folder and file name prefix.

If on Windows, allows for unix-like input.

output.get_info_path(folder, prefix, infix=None, kind='updated', ext='.yaml')

Gets path to info files saved by Output.

output.get_output(*args, **kwargs)

Auxiliary function to retrieve the output driver (e.g. whether to get the MPI-wrapped one, or a dummy output driver).

Return type:

Output

class output.OutputReadOnly(prefix, infix=None)

A read-only output driver: it tracks naming of, and can load input and collection files. Contrary to output.Output, this class is not MPI-aware, which makes it useful to be able to do these operations within isolated MPI processes.

is_prefix_folder()

Returns True if the output prefix is a bare folder, e.g. chains/.

updated_prefix()

Updated path: drops folder: now it’s relative to the chain’s location.

separator_if_needed(separator)

Returns the given separator if there is an actual file name prefix (i.e. the output prefix is not a bare folder), or an empty string otherwise.

Useful to add custom suffixes to output prefixes (may want to use Output.add_suffix for that).

sanitize_collection_extension(extension)

Returns the extension without the leading dot, if given, or the default one Output.ext otherwise.

add_suffix(suffix, separator='_')

Returns the full output prefix (folder and file name prefix) combined with a given suffix, inserting a given separator in between (default: _) if needed.

get_updated_info(use_cache=False, cache=False)

Returns the version of the input file updated with defaults, loading it if necessary not previously cached, or if forced by use_cache=False.

If loading is forced and cache=True, the loaded input will be cached for future calls.

Return type:

Optional[InputDict]

reload_updated_info(cache=False)

Reloads and returns the version of the input file updated with defaults.

If none is found, returns None without raising an error.

If cache=True, the loaded input will be cached for future calls.

Return type:

Optional[InputDict]

prepare_collection(name=None, extension=None)

Generates a file name for the collection, as [folder]/[prefix].[name].[extension].

Notice that name=None generates a date, but name="" removes the name field, making it simply [folder]/[prefix].[extension].

collection_regexp(name=None, extension=None)

Returns a regexp for collections compatible with this output settings.

Use name for particular types of collections (default: any number). Pass False to mean there is nothing between the output prefix and the extension.

is_collection_file_name(file_name, name=None, extension=None)

Check if a file_name is a collection compatible with this Output instance.

Use name for particular types of collections (default: any number). Pass False to mean there is nothing between the output prefix and the extension.

find_collections(name=None, extension=None)

Returns all collection files found which are compatible with this Output instance, including their path in their name.

Use name for particular types of collections (default: matches any number). Pass False to mean there is nothing between the output prefix and the extension.

load_collections(model, skip=0, thin=1, combined=False, name=None, extension=None, concatenate=None)

Loads all collection files found which are compatible with this Output instance, including their path in their name.

Use name for particular types of collections (default: any number). Pass False to mean there is nothing between the output prefix and the extension.

Notes

Unless you know what you are doing, use the cobaya.output.load_samples() function instead to load samples.

class output.Output(prefix, resume=False, force=False, infix=None)

Basic output driver. It takes care of creating the output files, checking compatibility with old runs when resuming, cleaning up when forcing, preparing SampleCollection files, etc.

create_folder(folder)

Creates the given folder (MPI-aware).

load_updated_info(cache=False, use_cache=False)

Returns the version of the input file updated with defaults, loading it if necessary.

WARNING: This method has been deprecated in favor of get_updated_info and reaload_update_info, depending on the use case (see their docstrings).

Return type:

Optional[InputDict]

reload_updated_info(cache=False, **kwargs)

Reloads and returns the version of the input file updated with defaults.

If none is found, returns None without raising an error.

If cache=True, the loaded input will be cached for future calls.

Return type:

Optional[InputDict]

check_and_dump_info(input_info, updated_info, check_compatible=True, cache_old=False, use_cache_old=False, ignore_blocks=())
Saves the info in the chain folder twice:
  • the input info.

  • idem, populated with the components’ defaults.

If resuming a sample, checks first that old and new infos and versions are consistent.

load_collections(model, skip=0, thin=1, combined=False, name=None, extension=None, concatenate=None)

Loads all collection files found which are compatible with this Output instance, including their path in their name.

Use name for particular types of collections (default: any number). Pass False to mean there is nothing between the output prefix and the extension.

Notes

Unless you know what you are doing, use the cobaya.output.load_samples() function instead to load samples.

delete_with_regexp(regexp, root=None)

Deletes all files compatible with the given regexp.

If regexp is None and root is defined, deletes the root folder.

delete_file_or_folder(filename)

Deletes a file or a folder. Fails silently.

class output.OutputDummy(*args, **kwargs)

Dummy output class. Does nothing. Evaluates to ‘False’ as a class.

collection module documentation

Synopsis:

Classes to store the Montecarlo samples and single points.

Author:

Jesus Torrado and Antony Lewis

class collection.SampleCollection(model, output=None, cache_size=200, name=None, extension=None, file_name=None, resuming=False, load=False, temperature=None, onload_skip=0, onload_thin=1, sample_type=None, is_batch=False)

Holds a collection of samples, stored internally into a pandas.DataFrame.

The DataFrame itself is accessible as the SampleCollection.data property, but slicing can be done on the SampleCollection itself (returns a copy, not a view).

When temperature is different from 1, weights and log-posterior (but not priors or likelihoods’ chi squared’s) are those of the tempered sample, obtained assuming a posterior raised to the power of 1/temperature. Functions returning statistics, e.g. cov(), will return the statistics of the original (untempered) posterior, unless indicated otherwise with a keyword argument.

Note for developers: when expanding this class or inheriting from it, always access the underlying DataFrame as self.data and not self._data, to ensure the cache has been dumped. If you really need to access the actual attribute self._data in a method, make sure to decorate it with @ensure_cache_dumped.

reset()

Create/reset the DataFrame.

add(values, logpost=None, logpriors=None, loglikes=None, derived=None, weight=1)

Adds a point to the collection.

If logpost can be LogPosterior, float or None (in which case, logpriors, loglikes are both required).

If the weight is not specified, it is assumed to be 1.

property is_tempered: bool

Whether the sample was obtained by drawing from a different-temperature distribution.

property has_int_weights: bool

Whether weights are integer.

reset_temperature(with_batch=None)

Drops the information about sampling temperature: weight and minuslogpost columns will now correspond to those of a unit-temperature posterior sample.

If this sample is part of a batch, call this method passing the rest of the batch as a list using the argument with (otherwise inconsistent weights between samples will be introduced). If additional chains are passed with with, their temperature will be reset in-place.

This cannot be undone: (e.g. recovering original integer tempered weights). You may want to call this method on a copy (see SampleCollection.copy()).

property n_last_out

Index of the last point saved to the output.

property data

Pandas’ DataFrame containing the sample collection.

to_numpy(dtype=None, copy=False)

Returns the sample collection as a numpy array.

Return type:

ndarray

copy(empty=False)

Returns a copy of the collection.

If empty=True (default False), returns an empty copy.

Return type:

SampleCollection

mean(first=None, last=None, weights=None, derived=False, tempered=False)

Returns the (weighted) mean of the parameters in the chain, between first (default 0) and last (default last obtained), optionally including derived parameters if derived=True (default False).

Custom weights can be passed with the argument weights.

If derived is True (default False), the means of the derived parameters are included in the returned vector.

If tempered=True (default False) returns the mean of the tempered posterior p**(1/temperature).

NB: For tempered samples, if passed tempered=False (default), detempered weights are computed on-the-fly. If this or any other function returning untempered statistical quantities of a tempered sample is expected to be called repeatedly, it would be more efficient to detemper the collection first with SampleCollection.reset_temperature(), and call these methods on the returned Collection.

Return type:

ndarray

cov(first=None, last=None, weights=None, derived=False, tempered=False)

Returns the (weighted) covariance matrix of the parameters in the chain, between first (default 0) and last (default last obtained), optionally including derived parameters if derived=True (default False).

Custom weights can be passed with the argument weights.

If derived is True (default False), the covariances of/with the derived parameters are included in the returned matrix.

If tempered=True (default False) returns the covariances of the tempered posterior p**(1/temperature).

NB: For tempered samples, if passed tempered=False (default), detempered weights are computed on-the-fly. If this or any other function returning untempered statistical quantities of a tempered sample is expected to be called repeatedly, it would be more efficient to detemper the collection first with SampleCollection.reset_temperature(), and call these methods on the returned Collection.

Return type:

ndarray

reweight(importance_weights, with_batch=None, check=True)

Reweights the sample in-place with the given importance_weights.

Temperature information is dropped.

If this sample is part of a batch, call this method passing the rest of the batch as a list using the argument with_match (otherwise inconsistent weights between samples will be introduced). If additional chains are passed with with_batch, they will also be reweighted in-place. In that case, importance_weights needs to be a list of weight vectors, the first of which to be applied to the current instance, and the rest to the collections passed with with_batch.

This cannot be fully undone (e.g. recovering original integer weights). You may want to call this method on a copy (see SampleCollection.copy()).

For the sake of speed, length and positivity checks on the importance weights can be skipped with check=False (default True).

filtered_copy(where)

Returns a copy of the collection with some condition where imposed.

Return type:

SampleCollection

skip_samples(skip, inplace=False)

Skips some initial samples, or an initial fraction of them.

For collections coming from a Nested Sampler, prints a warning and does nothing.

Parameters:
  • skip (float) – Specifies the amount of initial samples to be skipped, either directly if skip>1 (rounded up to next integer), or as a fraction if 0<skip<1.

  • inplace (bool, default: False) – If True, returns a copy of the collection.

Returns:

The original collection with skipped initial samples (inplace=True) or a copy of it (inplace=False).

Return type:

SampleCollection

Raises:

LoggedError – If badly defined skip value.

thin_samples(thin, inplace=False)

Thins the sample collection by some factor thin>1.

Parameters:
  • thin (int) – Thin factor, must be >1.

  • inplace (bool, default: False) – If True, returns a copy of the collection.

Returns:

Thinned version of the original collection (inplace=True) or a copy of it (inplace=False).

Return type:

SampleCollection

Raises:

LoggedError – If badly defined thin value.

bestfit()

Best fit (maximum likelihood) sample. Returns a copy.

MAP()

Maximum-a-posteriori (MAP) sample. Returns a copy.

to_getdist(label=None, model=None, combine_with=None)
Parameters:
  • label (str, optional) – Legend label in GetDist plots (name_tag in GetDist parlance).

  • model (cobaya.model.Model, optional) – Model with which the sample was created. Needed only if parameter labels or aliases have changed since the collection was generated.

  • combine_with (list of cobaya.collection.SampleCollection, optional) – Additional collections to be added when creating a getdist object. Compatibility between the collections is assumed and not checked.

Returns:

This collection’s equivalent getdist.MCSamples object.

Return type:

getdist.MCSamples

Raises:

LoggedError – Errors when processing the arguments.

out_update()

Update the output file to the current state of the Collection.

class collection.OnePoint(model, output=None, cache_size=200, name=None, extension=None, file_name=None, resuming=False, load=False, temperature=None, onload_skip=0, onload_thin=1, sample_type=None, is_batch=False)

Wrapper of SampleCollection to hold a single point, e.g. the best-fit point of a minimization run (not used by default MCMC).

class collection.OneSamplePoint(model, temperature=1, output_thin=1)

Wrapper to hold a single point, e.g. the current point of an MCMC. Alternative to OnePoint, faster but with less functionality.

For tempered samples, stores the weight and -logp of the tempered posterior (but untempered priors and likelihoods).