general/datasets/UVA_HIV_1Tg_Str_rlog_0720/processing.rtf


1
2
3
4
5

<p>The extraction of 50-bp length paired-end reads was achieved using CASAVA (Illumina Pipeline v1.38). For each sample, reads with a quality score of &ge;Q30 that passed filtering were used to generate a complete FASTQ file, which was then mapped to UCSC Rat reference [build Rn4] (<a href="ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/rn4.ebwt.zip">ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/rn4.ebwt.zip</a>) using TopHat with the default parameter setting of 40 alignments per read and up to 2 mismatches per alignment. The sequence alignment files (BAM) were analyzed using RSeQC package&nbsp;<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059582#pone.0059582-Wang1">[36]</a>&nbsp;for quality control analysis, which includes the mRNA fragment insert size, base quality distribution, reads mapping distribution, and splicing distribution analysis.</p>

<p><a id="article1.body1.sec2.sec5.p2" name="article1.body1.sec2.sec5.p2"></a></p>

<p>The resulting aligned reads were then analyzed with Cufflinks suite (<a href="http://cufflinks.cbcb.umd.edu/">http://cufflinks.cbcb.umd.edu</a>)&nbsp;<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059582#pone.0059582-Trapnell1">[37]</a>, which assembles the aligned reads into transcripts and measures their relative abundance. The expression of each transcript was quantified as the number of reads mapping to a gene divided by the gene length in kilobases and the total number of mapped reads in millions, which is called fragments per kilobase of exon per million fragments mapped (FPKM). All the junctions identified by Cufflink were compared on the basis of the junction and splicing site provided by reference transcript annotation GTF files to identify known and novel junctions. Then, Cuffcompare merged all the transcripts from different samples to a final transcript annotation GTF file, reported changes in the relative abundance of transcripts sharing a common transcription start site, and indicated the relative abundance of the primary transcripts of each gene crossing all the samples.</p>