SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method

Research output: Working paperPreprintResearch

Standard

SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method. / Andersen, Mikkel Meyer; Christiansen, Steffan; Andersen, Jeppe Dyrberg; Eriksen, Poul Svante; Morling, Niels.

bioRxiv, 2022.

Research output: Working paperPreprintResearch

Harvard

Andersen, MM, Christiansen, S, Andersen, JD, Eriksen, PS & Morling, N 2022 'SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method' bioRxiv. https://doi.org/10.1101/2022.01.17.476594

APA

Andersen, M. M., Christiansen, S., Andersen, J. D., Eriksen, P. S., & Morling, N. (2022). SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method. bioRxiv. https://doi.org/10.1101/2022.01.17.476594

Vancouver

Andersen MM, Christiansen S, Andersen JD, Eriksen PS, Morling N. SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method. bioRxiv. 2022 Jan 20. https://doi.org/10.1101/2022.01.17.476594

Author

Andersen, Mikkel Meyer ; Christiansen, Steffan ; Andersen, Jeppe Dyrberg ; Eriksen, Poul Svante ; Morling, Niels. / SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method. bioRxiv, 2022.

Bibtex

@techreport{031a8e0637f74881b1a3ea7e2d3db533,
title = "SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method",
abstract = "We introduce the “butterfly method” for SNP calling with the Illumina Infinium Omni5-4 BeadChip kit without the use of Illumina GenomeStudio software. The method is a within-sample method and does not use other samples nor population frequencies to call SNPs. The butterfly method is based on a three-component mixture of normal distributions, in which parameters are easily found using the open-source statistical software R. This makes the method transparent, straight-forward to change parameters according to the user{\textquoteright}s needs, and easy to analyse the data within R after the SNPs have been called. We contribute with two open-source R packages that make SNP calling easy by helping with bookkeeping and by giving easy access to meta-information about the SNPs on the Illumina Infinium Omni5-4 BeadChip Kit (including chromosome, probe type, and SNP bases). We test our method on > 4 mio. SNPs and compare the results with those obtained with the GenTrain method used by Illumina GenomeStudio as well as SNPs obtained by PCR-free whole genome sequencing (WGS). We demonstrate two variants of our method: one where we account for potential probe type bias by estimating a separate model for each probe type (type I and type II) and another that uses a general model such that the model{\textquoteright}s parameter estimates do not depend on the sample that is being analysed. We focused on varying the no-call rate and show how it changed the concordance with that of WGS. This is done by using a threshold on the a posteriori probability of belonging to a SNP cluster and by using the number of beads to adjust the stringency of the no-call mechanism. With the butterfly method, we achieve a SNP call rate of around 99% and a SNP concordance of around 99% with the WGS data. By lowering the a posteriori probability threshold for no-calls, we can get a higher call rate fraction than the GenomeStudio and by using a higher a posteriori probability threshold, we can achieve a higher concordance with the WGS data than the GenomeStudio.",
author = "Andersen, {Mikkel Meyer} and Steffan Christiansen and Andersen, {Jeppe Dyrberg} and Eriksen, {Poul Svante} and Niels Morling",
year = "2022",
month = jan,
day = "20",
doi = "10.1101/2022.01.17.476594",
language = "English",
publisher = "bioRxiv",
type = "WorkingPaper",
institution = "bioRxiv",

}

RIS

TY - UNPB

T1 - SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method

AU - Andersen, Mikkel Meyer

AU - Christiansen, Steffan

AU - Andersen, Jeppe Dyrberg

AU - Eriksen, Poul Svante

AU - Morling, Niels

PY - 2022/1/20

Y1 - 2022/1/20

N2 - We introduce the “butterfly method” for SNP calling with the Illumina Infinium Omni5-4 BeadChip kit without the use of Illumina GenomeStudio software. The method is a within-sample method and does not use other samples nor population frequencies to call SNPs. The butterfly method is based on a three-component mixture of normal distributions, in which parameters are easily found using the open-source statistical software R. This makes the method transparent, straight-forward to change parameters according to the user’s needs, and easy to analyse the data within R after the SNPs have been called. We contribute with two open-source R packages that make SNP calling easy by helping with bookkeeping and by giving easy access to meta-information about the SNPs on the Illumina Infinium Omni5-4 BeadChip Kit (including chromosome, probe type, and SNP bases). We test our method on > 4 mio. SNPs and compare the results with those obtained with the GenTrain method used by Illumina GenomeStudio as well as SNPs obtained by PCR-free whole genome sequencing (WGS). We demonstrate two variants of our method: one where we account for potential probe type bias by estimating a separate model for each probe type (type I and type II) and another that uses a general model such that the model’s parameter estimates do not depend on the sample that is being analysed. We focused on varying the no-call rate and show how it changed the concordance with that of WGS. This is done by using a threshold on the a posteriori probability of belonging to a SNP cluster and by using the number of beads to adjust the stringency of the no-call mechanism. With the butterfly method, we achieve a SNP call rate of around 99% and a SNP concordance of around 99% with the WGS data. By lowering the a posteriori probability threshold for no-calls, we can get a higher call rate fraction than the GenomeStudio and by using a higher a posteriori probability threshold, we can achieve a higher concordance with the WGS data than the GenomeStudio.

AB - We introduce the “butterfly method” for SNP calling with the Illumina Infinium Omni5-4 BeadChip kit without the use of Illumina GenomeStudio software. The method is a within-sample method and does not use other samples nor population frequencies to call SNPs. The butterfly method is based on a three-component mixture of normal distributions, in which parameters are easily found using the open-source statistical software R. This makes the method transparent, straight-forward to change parameters according to the user’s needs, and easy to analyse the data within R after the SNPs have been called. We contribute with two open-source R packages that make SNP calling easy by helping with bookkeeping and by giving easy access to meta-information about the SNPs on the Illumina Infinium Omni5-4 BeadChip Kit (including chromosome, probe type, and SNP bases). We test our method on > 4 mio. SNPs and compare the results with those obtained with the GenTrain method used by Illumina GenomeStudio as well as SNPs obtained by PCR-free whole genome sequencing (WGS). We demonstrate two variants of our method: one where we account for potential probe type bias by estimating a separate model for each probe type (type I and type II) and another that uses a general model such that the model’s parameter estimates do not depend on the sample that is being analysed. We focused on varying the no-call rate and show how it changed the concordance with that of WGS. This is done by using a threshold on the a posteriori probability of belonging to a SNP cluster and by using the number of beads to adjust the stringency of the no-call mechanism. With the butterfly method, we achieve a SNP call rate of around 99% and a SNP concordance of around 99% with the WGS data. By lowering the a posteriori probability threshold for no-calls, we can get a higher call rate fraction than the GenomeStudio and by using a higher a posteriori probability threshold, we can achieve a higher concordance with the WGS data than the GenomeStudio.

U2 - 10.1101/2022.01.17.476594

DO - 10.1101/2022.01.17.476594

M3 - Preprint

BT - SNP calling for the Illumina Infinium Omni5-4 SNP BeadChip kit using the butterfly method

PB - bioRxiv

ER -

ID: 302456466