aboutsummaryrefslogtreecommitdiff
path: root/general/datasets/JAX_BXD_Germ_Cells_edgeR_0820/processing.rtf
blob: d359653b95e6a943a833d5cd5703561458608708 (plain)
1
2
3
4
5
<p id="__p17">All sequenced B6, D2, F1, and BXD H3K4me3 ChIP libraries, as well as all control input DNA samples were aligned utilizing bwa version 0.7.9a (<a aria-expanded="false" aria-haspopup="true" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404261/#bib37" id="__tag_751284694" rid="bib37" role="button">Li and Durbin 2009</a>). B6 parental samples were aligned to the Genome Reference Consortium Mouse Build 38 (mm10) and D2 parental samples were aligned to the&nbsp;<em>de novo</em>&nbsp;REL-1509 assembly, including all unplaced scaffolds, from the Mouse Genomes Project (<a aria-expanded="false" aria-haspopup="true" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404261/#bib70" id="__tag_751284713" rid="bib70" role="button">Yalcin&nbsp;<em>et al.</em>2012</a>).</p>

<p id="__p18">To ensure that H3K4me3 peaks were properly quantified across divergent genomes, we began by building a comprehensive &ldquo;peakome&rdquo; representing all potential H3K4me3 peaks found in the two parents. H3K4me3 peaks for B6 and D2 were called independently, utilizing alignment data from three replicate samples and one DNA input sample. Reads were filtered for a mapq alignment metric of 60 and an alignment sequence having no indels present across the entire length of the sequencing read, typically 100 bp H3K4me3 peaks were called utilizing MACS version 1.4.2 (<a aria-expanded="false" aria-haspopup="true" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404261/#bib73" id="__tag_751284765" rid="bib73" role="button">Zhang&nbsp;<em>et al.</em>&nbsp;2008</a>) and peaks having a false discovery rate (FDR) of &lt;1% found in two out of three replicates were accepted. Final genomic intervals for each H3K4me3 peak for each strain were derived by merging the peaks from the corresponding replicate samples using bedtools (<a aria-expanded="false" aria-haspopup="true" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404261/#bib49" id="__tag_751284750" rid="bib49" role="button">Quinlan and Hall 2010</a>). To link syntenic regions between B6 and D2 assemblies, which each have their own coordinate system, sequences from these genomic intervals were aligned to their alternative genome using reciprocal BLAST. In some cases, a sequence interval comprising an H3K4me3 peak in one strain aligned to multiple adjacent intervals in the alternative genome. If the sequences of these peaks in the alternate strain all fell within the boundaries of the single peak, they were merged. The boundaries of these merged peaks included the incorporated sequences from both strains. Because there are also H3K4me3 peaks that are strain specific, these peaks were accepted if, and only if, the mapped interval had a unique sequence that was found in the proper syntentic order within the alternative genome lacking that H3K4me3 peak. The final combined peakome between B6 and D2 mice was created by selecting only peaks appropriately linked across each strain, assuring that each H3K4me3 peak reciprocally aligned to only one peak in the alternative genome after merging, and that all peaks were in the same order along the chromosomes in both genomes (Supplemental Material, Table S2). Using the H3K4me3 peaks locations derived from the parental strains, final read counts for B6, D2, F1&nbsp;hybrids, and BXDs were obtained by counting reads within the coordinate boundaries of the peakome intervals.</p>

<p id="__p19">To improve mapping accuracy and utilize known sequence variation between strains, all BXDs and F1hybrid samples were aligned separately to both the mm10 reference and the&nbsp;<em>de novo</em>&nbsp;D2 assembly. To reduce error in quantification of H3K4me3 levels due to genomic regions containing repetitive sequences, we removed reads with multiple alignments and retained reads with alignment metric of 60 that lacked small indels, which can often indicate misalignment. Subsequently, for each genomic interval in the peakome, final reads counts were summed for those that mapped uniquely to one of the assemblies along with those that mapped equally well to both B6 and D2 genome assemblies.</p>