general/datasets/Niaaa_bxd_hip_ctl_rnaseq1020/processing.rtf


1
2
3
4
5
6
7
8
9

<p>Generation of RNA-seq data</p>

<p>1&nbsp;&micro;g of RNA was used for cDNA library construction at Novogene using an NEBNext<sup>&reg;</sup>&nbsp;Ultra RNA Library Prep Kit for Illumina<sup>&reg;</sup>&nbsp;(cat# E7420S, New England Biolabs, Ipswich, MA, USA) according to the manufacturer&rsquo;s protocol. Briefly, mRNA was enriched using oligo(dT) beads followed by two rounds of purification and fragmented randomly by adding fragmentation buffer. The first strand cDNA was synthesized using random hexamers primer, after which a custom second-strand synthesis buffer (Illumina, San Diego, CA, USA), dNTPs, RNase H and DNA polymerase I were added to generate the second strand (ds cDNA). After a series of terminal repair, poly-adenylation, and sequencing adaptor ligation, the double-stranded cDNA library was completed following size selection and PCR enrichment. The resulting 250-350 bp insert libraries were quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and quantitative PCR. Size distribution was analyzed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Qualified libraries were sequenced on an Illumina Novaseq Platform (Illumina, San Diego, CA, USA) using a paired-end 150 run (2&times;150 bases). An average of 40 million raw reads were generated from each library.</p>

<p>&nbsp;</p>

<p>Read mapping and normalization</p>

<p><em>Mus musculus</em>&nbsp;(mouse) reference genome (GRCm38) and gene model annotation files were downloaded from the Ensembl genome browser (<a href="https://useast.ensembl.org/">https://useast.ensembl.org/</a>). Indices of the reference genome were&nbsp;&nbsp;built using STAR v2.5.0a and paired-end reads were aligned to the reference genome. STAR used the method of Maximal Mappable Prefix which can generate a precise mapping result for junction reads.&nbsp;FeatureCount v0.6.1 was used to count the number of read mapped to each gene. Transcripts Per Million (TPM) was calculated for each gene based on the length of the gene and reads mapped to that gene. In this normalization, the sum of all TPMs (genes-level) are equal to 1,000,000. The TPM was further rescaled to log2(TPM+1).</p>