Training a Random Forest Classifier for Population Structure Identification3 months ago
Download reference data | Set-up | Match study genotypes and reference data | Filter reference and study data for non A-T or G-C SNPs | Renaming variant identifiers | Filtering out shared SNPs between study and reference dataset | Conducting markerQC, pruning LD, and individual QC | PCA | Training a random forest classifier in R | Predicting ancestries of new study data | Evalulating and Tuning of Classification Model | Parameter Tuning via Grid Search | Evaluating/Interpretting the RF | References
