Reading Haplotype Data
Louis Aslett & Ryan Christ
2024-11-13
Source:vignettes/Reading_Haplotype_Data.Rmd
Reading_Haplotype_Data.Rmd
kalis
calculates pairwise genetic distances at loci of
interest for a set of phased haplotypes stored in a L x N
matrix H
where H[l,i] = 1
if haplotype
i
carries the alternative allele at locus l
and H[l,i] = 0
if it carries the reference allele. For
efficiency kalis
internally must load and store
H
as a matrix of bits.
For all of the data input types below, it assumed that the
haplotypes have already been phased, as required to run
kalis
.
Reading from BCF/VCF
To read phased haplotypes stored in a compressed or uncompressed BCF
or VCF, the file must first be converted to HAP/SAMPLE/LEGEND format.
For example, for a given compressed VCF, we simply call
bcftools
as follows.
Then, from R
, we read in the haplotypes by calling
require(kalis)
CacheHaplotypes("my.hap.gz")
See http://samtools.github.io/bcftools/bcftools.html#convert for more details.
For increased reading efficiency CacheHaplotypes
look
will look for the my.legend
file that was produced by
bcftools
in the same directory as my.hap.gz
so
its worthwhile keeping the .legend
files.
bcftools
can read from many different formats into
BCF/VCF, making it an easy tool for conversion into HAP/SAMPLE/LEGEND
format.
kalis
http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/
ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.hap.gz