Utility for calculating distance matrices at, in between, or excluding variants.
Usage
DistMat(
fwd,
bck,
type = "raw",
M = NULL,
beta.theta.opts = NULL,
nthreads = min(parallel::detectCores(logical = FALSE), fwd$to_recipient -
fwd$from_recipient + 1)
)
Arguments
- fwd
a forward table as returned by
MakeForwardTable()
and propagated to a target variant byForward()
. Must be at the same variant asbck
(unlessbck
is in "beta-theta space" in which case if must be downstream ... seeBackward()
for details).- bck
a backward table as returned by
MakeBackwardTable()
and propagated to a target variant byBackward()
. Must be at the same variant asfwd
(unlessbck
is in "beta-theta space" in which case if must be downstream ... seeBackward()
for details).- type
a string; must be one of
"raw"
,"std"
or"minus.min"
. See Details.- M
a pre-existing matrix into which to write the distances. This can yield substantial speed up but requires special attention, see Details.
- beta.theta.opts
a list; see Details.
- nthreads
the number of CPU cores to use. By default no parallelism is used. By default uses the
parallel
package to detect the number of physical cores.
Value
A matrix of distances. The \((j,i)\)-th element of of the returned matrix is the inferred distance \(d_{ji}\) to haplotype \(j\) from haplotype \(i\) at the current variant. Each column encodes the output of an independent HMM: in column \(i\), haplotype \(i\) is taken as the observed recipient haplotype and painted as a mosaic of the other \(N-1\) haplotypes. Hence, the distances are asymmetric.
If you wish to plot this matrix or perform clustering, you may want to symmetrize the matrix first.
Details
This computes a local probability or distance matrix based on the forward and backward tables at a certain variant. The default usage is provide forward and backward tables at the same variant \(l\) so that the \((j,i)\)-th element of of the returned matrix is the inferred distance \(d_{ji}\) between haplotypes \(j\) and \(i\) at the current variant, \(l\), of the two tables given the haplotypes observed (over the whole sequence).
In particular,
$$d_{ji} = -log(p_{ji})$$
where \(p_{ji}\) is the posterior marginal probability that \(j\) is coped by \(i\) at the current variant of the two tables, \(l\), given the haplotypes observed (over the whole sequence).
By convention, \(d_{ii} = 0\) for all \(i\).
The above is returned when the type
argument is "raw"
(the default).
However, for convenience, a user may change the type
argument to "std"
in order for the distances to be mean and variance normalized before returning.
Changing type
to "minus.min"
will subtract the min distance in each column from that column before returning (this is the default in RELATE, see bottom of the RELATE paper Supplement Page 7 (Speidel et al, 2019)).
This function also allows users to calculate distance matrices in between variants and also to calculate matrices that exclude a set of consecutive variants by passing a backward table in "beta-theta space."
If in "beta-theta space", bck$l
may be greater than but not equal to fwd$l
.
beta.theta.opts
provides is required in this case to set how much of a recombination distance to propagate each matrix before combining them into distances. See Details below.
Notes on beta.theta.opts
In order to obtain distance matrices between variants fwd$l
and bck$l
, then bck
must be in "beta-theta space", see Backward()
for details.
This allows the forward and backward tables to be transitioning both tables to some genomic position between fwd$l
and bck$l
.
The precise recombination distance by which each table is propagated can be controlled by passing optional arguments in a list via beta.theta.opts
.
The recombination distances used can be specified in one of two ways.
Manually. In this case,
beta.theta.opts
is a list containing two named elements:"rho.fwd"
\(\in (0,1)\) specifies the transition probabilityrho
for propagating the forward table."rho.bck"
\(\in (0,1)\) specifies the transition probabilityrho
for propagating the backward table.
Implicitly. In this case,
beta.theta.opts
is a list containing two named elements:
"pars"
: akalisParameters
object that implicitly defines the recombination distance \(\rho^\star\) betweenfwd$l
andbck$l
"bias"
\(\in (0,1)\). The forward table is propagated a distance of(bias)
\(\rho^\star\) and the backward table is propagated a distance of(1-bias)
\(\rho^\star\).
Performance notes
If calculating many posterior probability matrices in succession, providing a pre-existing matrix M
that can be updated in-place can dramatically increase speed by eliminating the time needed for memory allocation.
Be warned, since the matrix is updated in-place, if any other variables point to the same memory address, they will also be simultaneously overwritten.
For example, writing
will update M
and P
simultaneously.
When provided, M
must have dimensions matching that of fwd$alpha
.
Typically, that is simply \(N \times N\) for \(N\) haplotypes.
However, if kalis is being run in a distributed manner, M
will be a \(N \times R\) matrix where \(R\) is the number of recipient haplotypes on the current machine.
References
Speidel, L., Forest, M., Shi, S., & Myers, S. (2019). A method for genome-wide genealogy estimation for thousands of samples. Nature Genetics, 51(1321–1329).
See also
PostProbs()
to calculate the posterior marginal probabilities \(p_{ji}\);
Forward()
to propagate a Forward table to a new variant;
Backward()
to propagate a Backward table to a new variant.
Examples
# To get the posterior probabilities at, say, variants 100 of the toy data
# built into kalis
data("SmallHaps")
data("SmallMap")
CacheHaplotypes(SmallHaps)
#> Warning: haplotypes already cached ... overwriting existing cache.
rho <- CalcRho(diff(SmallMap))
pars <- Parameters(rho)
fwd <- MakeForwardTable(pars)
bck <- MakeBackwardTable(pars)
Forward(fwd, pars, 100)
Backward(bck, pars, 100)
p <- PostProbs(fwd, bck)
d <- DistMat(fwd, bck)
# \donttest{
plot(d)
# }