From c516eb05db17d75db9e202750989085cfdd1bd02 Mon Sep 17 00:00:00 2001 From: BonfaceKilz Date: Wed, 24 Mar 2021 16:17:19 +0300 Subject: Add extra genotype test file * tests/unit/test_data/genotype.txt: New file. * tests/unit/test_file_utils.py: Update failing test. --- tests/unit/test_data/genotype.txt | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 tests/unit/test_data/genotype.txt (limited to 'tests/unit/test_data') diff --git a/tests/unit/test_data/genotype.txt b/tests/unit/test_data/genotype.txt new file mode 100644 index 0000000..368f003 --- /dev/null +++ b/tests/unit/test_data/genotype.txt @@ -0,0 +1,29 @@ +# File name: BXD_Geno-19Jan2017b_forGN.xls +# Metadata: Please retain this header information with file. +# Data Source and Contact: GeneNetwork at http://www.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=600 . Please contact Robert W. Williams (rwilliams@uthsc.edu or labwilliams@gmail.com) regarding this file. +# Citation: For information on the BXD family of strains please see http://www.genenetwork.org/reference.html and Wang X, Pandey AK, Mulligan MK, Williams EG, Mozhui K, Li Z, Jovaisaite V, Quarles LD, Xiao Z, Huang J, Capra JA, Chen Z, Taylor WL, Bastarache L, Niu X, Pollard KS, Ciobanu DC, Reznik AO, Tishkov AV, Zhulin IB, Peng J, Nelson SF, Denny JC, Auwerx J, Lu L, Williams RW (2016) Joint mouse-human phenome-wide association to test gene function and disease risk. Nature Communications 7:10464 +# Date Modified: Dec 19, 2018 by Rob, David Ashbrook, and Danny Arends to remove excessive cross-over events in strains BXD42 (Chr9), BXD81 (Chrs1, 5, 10), BXD99 (Chr1), and BXD100 (Chrs2 and 6); and to add Taar1 maker on Chr 10 for T. Phillips-Richards. Jan 19, 2017: Danny Arends computed BXD cM values and recombinations between markers. Rob W. Williams fixed errors on most chromosomes and added Affy eQTL markers. +# File entered in GN database on Dec 19, 2018 by Arthur Centeno. +# Coordinates of Markers and Assembly: Megabase and basepair positions of markers before 2017 used mm9 NCBI Build 37 coordinates. In Jan 2017 we converted to mm10, GRCm38. Note that some coordinates are approximate, particularly eQTL markers (e.g. Affy_PC2_15; aka Affymetrix Principal component, Chr 2 at about 15 Mb). +# Genotypes: This file provides consensus genotypes for 198 BXD strains and for the two progenitor strains and the reciprocal F1s. Of the 198 BXD strains, 191 are independent, whereas 7 are substrains (e.g., BXD48 and BXD48a). This file provides approximate locations of 10300 recombinations, an average of 52 per strain. Genotypes were generated using Affymetrix, MUGA, MegaMUGA, and GigaMUGA Illumina platforms. Microsatellites and eQTL genotypes were generated by the Williams/Lu laboratory. Unknown genotypes were imputed as B or D, or were called as H (heterozygous) if the genotype was uncertain. Genotypes were manually curated by RW Williams. Genotypes were smoothed to remove unlikely recombination events. Almost all recombinations are supported by multiple markers, although only one or two representative markers may be provided in this file. The original parent file (BXD_El_Grande_Master_Used_to_Proof_Final_Genotypes_2016.xlxs) contains data for approximately 37000 markers. A subset of the most informative 7300 markers are included here. Genotypes for Chr Y and Chr M are provisional and will be verified in 2017. As of 2016, many strains with higher numbers (BXD100 and above) are not fully inbred. +# Material and Cases: 155 BXD strains and substrains are available as live stock from JAX or UTHSC (Jan 2017). 148 are independent strains (excludes 7 substrains). Most samples were were obtained from the UTHSC colony. BXD24a is also known as BXD24/TyJ-Cep290/J and its genotypes are identical to those of BXD24. Forty-three of the BXD strains listed here are extinct as of Dec 2016, including: BXD23 BXD30 BXD35 BXD37 BXD41 BXD52 BXD54 BXD59 BXD72 BXD76 BXD93 BXD94 BXD104 BXD106 BXD107 BXD108 BXD110 BXD115 BXD116 BXD117 BXD119 BXD120 BXD121 BXD126 BXD130 BXD133 BXD134 BXD135 BXD136 BXD138 BXD139 BXD142 BXD145 BXD153 BXD165 BXD188 BXD193 BXD197 BXD203 BXD206 BXD207 BXD208 BXD220 +# Breeding: BXD1 to BXD30 generated by BA Taylor at the Jackson Laboratory starting in 1971 from F2 stock. BXD31 and BXD32 generated by BA Taylor at the Jackson Laboratory in the late 1970s from other stock. BXD32 involved a backcross to DBA/2J and has a D mitochondrial genome. BXD33 to BXD42 were generated by BA Taylor at the Jackson Laboratory in the early 1990s from F2 stock. BXD43 to BXD102 were generated by RW Williams and L Lu at UTHSC in the early 2000s from G8 to G10 advanced intercross stock. BXD104 to BXD157 were generated by RW Williams and L Lu at UTHSC starting in 2008 from F2 stock. BXD160 to BXD186 were generated by RW Williams and L Lu at UTHSC starting in 2011 from G8 to G9 advanced intercross stock (gifted by Abraham Palmer). BXD187 to BXD220 and above were generated by RW Williams and L Lu at UTHSC starting in 2014 from F2 stock. +# Errors and Corrections: This file contains genotyping errors and imprecision in the locations of recombination events. Locations of most of the 10000 recombinations are typically accurate to better than 0.5 Mb. Strains were genotyped between 1998 and 2016. All DNA samples from cases above BXD102 were obtained August 2015. Some regions reported to be heterozygous may actually homozygous (and vice versa). Please report possible problems to RW Williams (rwilliams@uthsc.edu). To resolve errors, the BXD strains are being sequenced in early 2017 at 30X using 10X Chromium libraries at Hudson Alpha (supported by the UTHSC CITG, Williams, Lu, and colleagues at UTHSC and by Jonathan Pritchard at Stanford). +# cM position: cM positions were estimated by Danny Arends, Dec 2016. One recombination on both chromosomes (e.g., a switch from B to D or D to B on both chromosomes) is equivalent to about 0.1272 cM on the corrected scale (cM_BXD). H genotypes were ignored, and this will reduce map lengths slightly. Raw recombination fractions, R. are computed by dividing the number of recombination by the number of strains/cases. The original RI recombination fraction R is adjusted to the equivalent single generation cross cM value using the correction equation r = R/(4 - 6 * R). For comparison the original version of this file also contains sex-average cM values estimated by Cox et al. (2009, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2728870/ and http://cgd.jax.org/mousemapconverter ) +# Acknowledgments We thank Lu Lu for generating many of these strains (BXD43 and higher). We thank Casey Bohl, Melinda McCarty, and Jesse Ingels for supporting the UTHSC colony and preparing samples for genotyping. Thanks to Abraham Palmer, Jeremy Peirce and Lee Silver for providing advanced intercross stock for BXD strains. We thank Cat Lutz and colleagues at the Jackson Laboratory for holding and distributing many of these strains, for cryopreserving strains. We thank Danny Arends, Andrew P. Morgan, Fernando Pardo-Manuel de Villena, Arthur Centeno, and Lei Yan for processing genotype files used in GeneNetwork. Our thanks to Allison Knoll and Jonsong Huang for error checking early versions of this file. +# Funding: This work and GeneNetwork have been funded by The UTHSC Center for Integrative and Translational Genomics, and grants from NIGMS (R01GM123489), NIAAA (U01AA16662), NIDA(P20DA21131), NCI MMHCC (U01CA105417), and NCRR (U24 RR021760) +# Column Heads: Chr = chromosome, Locus=marker name, Mb_mm9= megabase position mouse genome assembly mm9, cM_Cox and cM_BXD are centimorgan positions of markers, nRecP = number of recombinations between site and more proximal marker. If shown: nRecS = sum of recombinations over chromosome, cMraw= uncorrected cM values. +@name:BXD +@type:riset +@mat:B +@pat:D +@het:H +@unk:U +Chr Locus cM Mb BXD1 BXD2 +1 rs31443144 1.50 3.010274 B B +2 rs27644551 93.26 173.542999 D D +3 rs31187985 17.12 41.921845 D D +4 rs30254612 2.15 3.718812 B D +5 UNCHS047057 3.10 4.199559 B B +X ChrXp_no_data 1.40 3.231738 D B +X Affy_17539964 1.40 7.947581 D B -- cgit v1.2.3