| Title: | Get proxy SNPs for a SNP in the 1000 Genomes Project |
|---|---|
| Description: | This package implements functions to query remote VCF files. You can use it to find proxy SNPs in linkage disequilibrium with SNPs of interest or to calculate allele frequencies in different populations. |
| Authors: | Kamil Slowikowski [aut, cre] |
| Maintainer: | Kamil Slowikowski <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.1 |
| Built: | 2026-05-28 06:10:33 UTC |
| Source: | https://github.com/slowkow/proxysnps |
Compute R.squared and D.prime for two binary numeric vectors.
compute_ld(x, y)compute_ld(x, y)
x |
a numeric vector of ones and zeros |
y |
a numeric vector of ones and zeros |
Find more details here: https://en.wikipedia.org/wiki/Linkage_disequilibrium
A list with two items:
Squared Pearson correlation coefficient.
Coefficient of linkage disequilibrium D divided by the theoretical maximum.
compute_ld(c(0,0,0,1,1,1), c(1,1,1,1,0,0))compute_ld(c(0,0,0,1,1,1), c(1,1,1,1,0,0))
Returns a dataframe with proxy SNPs.
get_proxies(chrom = NA, pos = NA, query = NA, window_size = 1e+05, pop = NA)get_proxies(chrom = NA, pos = NA, query = NA, window_size = 1e+05, pop = NA)
chrom |
a chromosome name (1-22,X) without "chr" |
pos |
a positive integer indicating the position of a SNP |
window_size |
a positive integer indicating the size of the window |
pop |
the name of a 1000 Genomes population (AMR,AFR,ASN,EUR,...). Set this to NA to use all populations. |
Currently, this is hard-coded to access 1000 Genomes phase3 data hosted by Brian Browning (author of BEAGLE):
http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/
This implementation discards multi-allelic markers that have a "," in the ALT column.
The pop can be any of: ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN,
FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU,
TSI, YRI. It can also be any super-population: AFR, AMR, EAS, EUR, SAS.
Find more details here: http://www.1000genomes.org/faq/which-populations-are-part-your-study
A dataframe with the following columns:
Chromosome name, e.g. "1"
Position, e.g. 583090
Identifier, e.g. "rs11063140"
Reference allele, e.g. "A"
Alternative allele, e.g. "G"
Minor allele frequency, e.g. 0.1
Squared Pearson correlation coefficient, e.g. 1.0
D prime value, e.g. 1.0
Binary indicator set to TRUE for the SNP of interest
d <- get_proxies(chrom = "12", pos = 583090, window_size = 1e5, pop = "AFR") head(d)d <- get_proxies(chrom = "12", pos = 583090, window_size = 1e5, pop = "AFR") head(d)
Returns a list with three dataframes for individuals, SNPs, and genotypes.
get_vcf(chrom, start, end, pop = NA)get_vcf(chrom, start, end, pop = NA)
chrom |
a chromosome name (1-22,X) without "chr" |
start |
a positive integer indicating the start of a genomic region |
end |
a positive integer indicating the end of a genomic region |
pop |
the name of a 1000 Genomes population (AMR,AFR,ASN,EUR,...) |
Currently, this is hard-coded to access 1000 Genomes phase3 data hosted by Brian Browning (author of BEAGLE):
http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/
This implementation discards multi-allelic markers that have a "," in the ALT column.
The pop can be any of: ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN,
FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU,
TSI, YRI. It can also be any super-population: AFR, AMR, EAS, EUR, SAS.
Find more details here: http://www.1000genomes.org/faq/which-populations-are-part-your-study
A list with three dataframes:
A dataframe with information about individuals: Family.ID, Individual.ID, Paternal.ID, Maternal.ID, Gender, Population, Relationship, Siblings, Second.Order, Third.Order, Other.Comments, SuperPopulation
First 8 columns of the VCF file: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO
Columns 10 onward of the VCF file. All genotypes are converted to 0s and 1s representing REF and ALT alleles. This dataframe has two columns for each individual.
vcf <- get_vcf(chrom = "12", start = 533090, end = 623090, pop = "AFR") names(vcf)vcf <- get_vcf(chrom = "12", start = 533090, end = 623090, pop = "AFR") names(vcf)