Reading Haplotype Data
Louis Aslett & Ryan Christ
2025-07-28
Source:vignettes/Reading_Haplotype_Data.Rmd
Reading_Haplotype_Data.Rmdkalis calculates pairwise genetic distances at loci of
interest for a set of phased haplotypes stored in a L x N
matrix H where H[l,i] = 1 if haplotype
i carries the alternative allele at locus l
and H[l,i] = 0 if it carries the reference allele. For
efficiency kalis internally must load and store
H as a matrix of bits.
For all of the data input types below, it assumed that the
haplotypes have already been phased, as required to run
kalis.
Reading from BCF/VCF
To read phased haplotypes stored in a compressed or uncompressed BCF
or VCF, the file must first be converted to HAP/SAMPLE/LEGEND format.
For example, for a given compressed VCF, we simply call
bcftools as follows.
Then, from R, we read in the haplotypes by calling
require(kalis)
CacheHaplotypes("my.hap.gz")See http://samtools.github.io/bcftools/bcftools.html#convert for more details.
For increased reading efficiency CacheHaplotypes look
will look for the my.legend file that was produced by
bcftools in the same directory as my.hap.gz so
its worthwhile keeping the .legend files.
bcftools can read from many different formats into
BCF/VCF, making it an easy tool for conversion into HAP/SAMPLE/LEGEND
format.
kalis http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/
ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.hap.gz