Package: plinkQC 1.1.0

Hannah Meyer

plinkQC: Genotype Quality Control with 'PLINK'

Genotyping arrays enable the direct measurement of an individuals genotype at thousands of markers. 'plinkQC' facilitates genotype quality control for genetic association studies as described by Anderson and colleagues (2010) <doi:10.1038/nprot.2010.116>. It makes 'PLINK' basic statistics (e.g. missing genotyping rates per individual, allele frequencies per genetic marker) and relationship functions accessible from 'R' and generates a per-individual and per-marker quality control report. Individuals and markers that fail the quality control can subsequently be removed to generate a new, clean dataset. Removal of individuals based on relationship status is optimised to retain as many individuals as possible in the study. Additionally, there is a trained classifier to predict genomic ancestry of human samples.

Authors:Hannah Meyer [aut, cre], Caroline Walter [ctb], Maha Syed [ctb]

plinkQC_1.1.0.tar.gz
plinkQC_1.1.0.zip(r-4.7)plinkQC_1.1.0.zip(r-4.6)plinkQC_1.1.0.zip(r-4.5)
plinkQC_1.1.0.tgz(r-4.6-any)plinkQC_1.1.0.tgz(r-4.5-any)
plinkQC_1.1.0.tar.gz(r-4.7-any)plinkQC_1.1.0.tar.gz(r-4.6-any)
plinkQC_1.1.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
plinkQC/json (API)

# Install 'plinkQC' in R:
install.packages('plinkQC', repos = c('https://meyer-lab-cshl.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/meyer-lab-cshl/plinkqc/issues

Pkgdown/docs site:https://meyer-lab-cshl.github.io

On CRAN:

Conda:

8.11 score 68 stars 76 scripts 744 downloads 1 mentions 33 exports 45 dependencies

Last updated from:800cce0534. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK181
source / vignettesOK207
linux-release-x86_64OK174
macos-release-arm64OK189
macos-oldrel-arm64OK142
windows-develOK108
windows-releaseOK116
windows-oldrelOK139
wasm-releaseOK159

Exports:ancestry_predictioncheck_het_and_misscheck_hwecheck_mafcheck_relatednesscheck_sexcheck_snp_missingnesscheckFilteringcheckLoadingMatcheckPlinkcheckPlink2checkRF_pathcleanDataconvert_from_vcfconvert_to_plink2evaluate_ancestry_predictionevaluate_check_het_and_missevaluate_check_relatednessevaluate_check_sexoverviewPerIndividualQCoverviewPerMarkerQCperIndividualQCperMarkerQCpruning_ldrelatednessFilterrename_variant_identifiersrun_ancestry_formatrun_ancestry_predictionrun_check_heterozygosityrun_check_missingnessrun_check_relatednessrun_check_sextestNumerics

Dependencies:clicowplotcpp11data.tabledplyrfarvergenericsggplot2ggrepelgluegridExtragtableigraphisobandlabelinglatticelifecyclemagrittrMatrixoptparsepillarpkgconfigplyrpurrrR.methodsS3R.ooR.utilsR6randomForestRColorBrewerRcpprlangS7scalesstringistringrsystibbletidyrtidyselectUpSetRutf8vctrsviridisLitewithr

Training a Random Forest Classifier for Population Structure Identification
Download reference data | Set-up | Match study genotypes and reference data | Filter reference and study data for non A-T or G-C SNPs | Renaming variant identifiers | Filtering out shared SNPs between study and reference dataset | Conducting markerQC, pruning LD, and individual QC | PCA | Training a random forest classifier in R | Predicting ancestries of new study data | Evalulating and Tuning of Classification Model | Parameter Tuning via Grid Search | Evaluating/Interpretting the RF | References

Last update: 2026-03-27
Started: 2018-10-23

Genotype quality control with plinkQC
Introduction | Per-individual quality control | Per-marker quality control | Clean data | Workflow | Create QC-ed dataset | Step-by-step | Individuals with discordant sex information | Individuals with outlying missing genotype and/or heterozygosity rates | Related individualis | Ancestry Predictions of Data | Markers with excessive missingness rate | Markers with deviation from HWE | Markers with low minor allele frequency | References

Last update: 2026-03-27
Started: 2018-10-20

Processing 1000 Genomes reference data for ancestry estimation
Introduction | Workflow | Set-up | PLINK software | Download and decompress 1000 Genomes phase 3 data | Convert 1000 Genomes phase 3 data to plink 1 binary format | References

Last update: 2026-02-10
Started: 2018-10-31

Processing HapMap III reference data for ancestry estimation
Introduction | Workflow | Set-up | Download and convert Hapmap phase III data | Update annotation | Update the reference data | References

Last update: 2026-02-10
Started: 2018-10-24

my-vignette

Last update: 2025-11-17
Started: 2025-11-17

Readme and manuals

Help Manual

Help pageTopics
Predicting sample superpopulation ancestryancestry_prediction
Identification of individuals with outlying missing genotype or heterozygosity ratescheck_het_and_miss
Identification of SNPs showing a significant deviation from Hardy-Weinberg- equilibrium (HWE)check_hwe
Identification of SNPs with low minor allele frequencycheck_maf
Identification of related individualscheck_relatedness
Identification of individuals with discordant sex informationcheck_sex
Identification of SNPs with high missingness ratecheck_snp_missingness
Check and construct PLINK sample and marker filterscheckFiltering
Checking the path of the loading matrixcheckLoadingMat
Check PLINK software accesscheckPlink
Check PLINK2 software accesscheckPlink2
Check and construct individual IDs to be removedcheckRemoveIDs
Checking the path of userinputted random forestcheckRF_path
Create plink dataset with individuals and markers passing quality controlcleanData
Converting VCF data files into PLINK v1.9 and PLINK v2.0 data filesconvert_from_vcf
Converting PLINK v1.9 data files into PLINK v2.0 data filesconvert_to_plink2
Predicting sample superpopulation ancestryevaluate_ancestry_prediction
Evaluate results from PLINK missing genotype and heterozygosity rate check.evaluate_check_het_and_miss
Evaluate results from PLINK IBD estimation.evaluate_check_relatedness
Evaluate results from PLINK sex check.evaluate_check_sex
Overview of per sample QCoverviewPerIndividualQC
Overview of per marker QCoverviewPerMarkerQC
Quality control for all individuals in plink-datasetperIndividualQC
Quality control for all markers in plink-datasetperMarkerQC
Pruning of SNPs in Linkage Disequilibriumpruning_ld
Remove related individuals while keeping maximum number of individualsrelatednessFilter
Renaming variantsrename_variant_identifiers
Running functions to format data for ancestry predictionrun_ancestry_format
Projecting the study data set onto the PC space of the reference datasetrun_ancestry_prediction
Run PLINK heterozygosity rate calculationrun_check_heterozygosity
Run PLINK missingness rate calculationrun_check_missingness
Run PLINK IBD estimationrun_check_relatedness
Run PLINK sexcheckrun_check_sex
Test lists for different properties of numericstestNumerics