1 files changed, 5 insertions, 0 deletions
diff --git a/general/datasets/INIA_LCMB_1215/processing.rtf b/general/datasets/INIA_LCMB_1215/processing.rtf
new file mode 100644
index 0000000..f04ab87
--- /dev/null
+++ b/general/datasets/INIA_LCMB_1215/processing.rtf
@@ -0,0 +1,5 @@
+<p>RNA from tissue trapped in the CapSure LCM caps was extracted using the PicoPure RNA isolation kit (Life Technologies) according to the manufacturer&#39;s instructions (RNA was eluted from provided capture columns in 13.5-mL nuclease-free water). RNA quality was analyzed using a Bioanalyzer (Model 100, Agilent, Foster City, CA). Samples with an associated RNA integrity number (RIN) greater than 6 were subsequently used for RNA sequencing.</p>
+
+<p>&nbsp;</p>
+
+<p>Poly-A enriched mRNA was sequenced on two platforms, ABI SOLID 550XL Wildfire (65 samples) and Ion Proton (39 samples). Read length was 50 nt for the SOLID system and the average read length for the Proton system was ~180 nt. Reads generated on the SOLID system were aligned to the mm10 reference genome using the LifeScope aligner, and BAM&nbsp;files were subsequently generated using custom scripts for third-party downstream analysis. For the Proton system, reads were also aligned to the mm10 (Ensemble GRCm38) reference genome using TopHat2. Settings for TopHat2 are as follows:&nbsp;&ldquo;-p 15 -N 4&nbsp;&ndash;read-gap-length 6&nbsp;&ndash;read-edit-dist 8&nbsp;&ndash;max-insertion-length 6&nbsp;&ndash;max-deletion-length 6&nbsp;&ndash;max-intron-length 300000&nbsp;&ndash;b2-very-sensitive&rdquo;. Alignments on both platforms were splice-aware. RSeQC-2.6.1 (RPKM_count.py) was used to generate count data based on mm10 GENECODE Basic transcript annotation (43,320 transcripts detected). We selected this annotation for greater reproducibility with existing microarray data sets and to simplify downstream analysis by limiting the number of transcript models for each gene. On average, 2.5 million and 7.8 million reads uniquely aligned to transcript models on the SOLID and Proton platforms, respectively. Data were further&nbsp;filtered to remove tran- scripts that had less than 1 count in 90% or more samples. After&nbsp;filtering, 24,597 transcripts representing 12,011 unique genes remained (Supplemental Table 1). The variance stabilizing trans- form (R package DESeq2, FitType&nbsp;1&frasl;4&nbsp;local) was applied to the count data, and transformed data were corrected by dividing by transcript length to generate log2&nbsp;reads per kilobase gene model (RPK) values. The use of two different sequencing platforms was corrected using batch correction (ComBat,&nbsp;Supplemental Fig. 3). All data&nbsp;filtering, transformations, and batch correction were performed using custom R scripts. Batch-corrected and transformed log2&nbsp;RPK and log2&nbsp;count values are available in&nbsp;Supplemental Tables 2 and 3, and log2&nbsp;RPK data are also available at GeneNetwork [http://www. genenetwork.org/webqtl/main.py?FormID1&frasl;4sharinginfo&amp;GN_ AccessionId1&frasl;4772; Group&nbsp;1&frasl;4&nbsp;Chronic Intermittent Ethanol,&nbsp;Type&nbsp;1&frasl;4&nbsp;LCM Brain Regions mRNA, Data set&nbsp;1&frasl;4&nbsp;INIA LCM (11 Regions) CIE/AIR RNA-seq Transcript Level (Dec15)].</p>