From b2feda451ccfbeaed02dce9088d6dd228cf15861 Mon Sep 17 00:00:00 2001 From: Bonface Date: Tue, 13 Feb 2024 23:52:26 -0600 Subject: Update dataset RTF Files. --- general/datasets/Gtex_nucle_0414/cases.rtf | 5 + general/datasets/Gtex_nucle_0414/contributors.rtf | 1 + .../datasets/Gtex_nucle_0414/experiment-design.rtf | 1 + general/datasets/Gtex_nucle_0414/platform.rtf | 3 + general/datasets/Gtex_nucle_0414/processing.rtf | 102 +++++++++++++++++++++ general/datasets/Gtex_nucle_0414/summary.rtf | 3 + general/datasets/Gtex_nucle_0414/tissue.rtf | 3 + 7 files changed, 118 insertions(+) create mode 100644 general/datasets/Gtex_nucle_0414/cases.rtf create mode 100644 general/datasets/Gtex_nucle_0414/contributors.rtf create mode 100644 general/datasets/Gtex_nucle_0414/experiment-design.rtf create mode 100644 general/datasets/Gtex_nucle_0414/platform.rtf create mode 100644 general/datasets/Gtex_nucle_0414/processing.rtf create mode 100644 general/datasets/Gtex_nucle_0414/summary.rtf create mode 100644 general/datasets/Gtex_nucle_0414/tissue.rtf (limited to 'general/datasets/Gtex_nucle_0414') diff --git a/general/datasets/Gtex_nucle_0414/cases.rtf b/general/datasets/Gtex_nucle_0414/cases.rtf new file mode 100644 index 0000000..3453126 --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/cases.rtf @@ -0,0 +1,5 @@ +
The Genotype-Tissue Expression project (GTEx) aims to create a comprehensive public atlas of gene expression and regulation across multiple human tissues. The resource will provide valuable insights in to the mechanisms of gene regulation, aid in the interpretation of genome wide association studies, and enable studies of expression quantitative trait loci (eQTLs), alternative splicing, and the tissue speciï¬city of gene regulatory mechanisms.
+ +The GTEx project recently completed an initial pilot phase during which >185 donor DNAs were genotyped using high density SNP and exome arrays. RNA expression was proï¬led on multiple tissues from these donors (from 9 to 30) by both array-based methods and RNA sequencing, to an average depth of 50 million reads. These pilot phase data have been made available to the public through davailable through the database of Genotype and Phenotype (dbGaP).
+ +For more information about the GTEx project, please visit the About GTEx page, view the Consortium Members, or read the Publication Policy. Additional GTEx resources such as funding opportunities and information for donors are also available on the NIH Common Fund and NHGRI websites respectively.
diff --git a/general/datasets/Gtex_nucle_0414/contributors.rtf b/general/datasets/Gtex_nucle_0414/contributors.rtf new file mode 100644 index 0000000..aae39a0 --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/contributors.rtf @@ -0,0 +1 @@ +Please review and cite: John Lonsdale, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, Gary Walters, Fernando Garcia, Nancy Young, Barbara Foster, Mike Moser, Ellen Karasik, Bryan Gillard, Kimberley Ramsey, Susan Sullivan, Jason Bridge, Harold Magazine, John Syron, Johnelle Fleming, Laura Siminoff, Heather Traino, Maghboeba Mosavel, Laura Barker, Scott Jewell, Dan Rohrer, Dan Maxim, Dana Filkins, Philip Harbach, Eddie Cortadillo, Bree Berghuis, Lisa Turner, Eric Hudson, Kristin Feenstra, Leslie Sobin, James Robb, Phillip Branton, Greg Korzeniewski, Charles Shive, David Tabor, Liqun Qi, Kevin Groch, Sreenath Nampally, Steve Buia, Angela Zimmerman, Anna Smith, Robin Burges, Karna Robinson, Kim Valentino, Deborah Bradbury, Mark Cosentino, Norma Diaz-Mayoral, Mary Kennedy, Theresa Engel, Penelope Williams, Kenyon Erickson, Kristin Ardlie, Wendy Winckler, Gad Getz, David DeLuca, Daniel MacArthur, Manolis Kellis, Alexander Thomson, Taylor Young, Ellen Gelfand, Molly Donovan, Yan Meng, George Grant, Deborah Mash, Yvonne Marcus, Margaret Basile, Jun Liu, Jun Zhu, Zhidong Tu, Nancy J Cox, Dan L Nicolae, Eric R Gamazon, Hae Kyung Im, Anuar Konkashbaev, Jonathan Pritchard, Matthew Stevens, Timothèe Flutre, Xiaoquan Wen, Emmanouil T Dermitzakis, Tuuli Lappalainen, Roderic Guigo, Jean Monlong, Michael Sammeth, Daphne Koller, Alexis Battle, Sara Mostafavi, Mark McCarthy, Manual Rivas, Julian Maller, Ivan Rusyn, Andrew Nobel, Fred Wright, Andrey Shabalin, Mike Feolo, Nataliya Sharopova, Anne Sturcke, Justin Paschal, James M Anderson, Elizabeth L Wilder, Leslie K Derr, Eric D Green, Jeffery P Struewing, Gary Temple, Simona Volpi, Joy T Boyer, Elizabeth J Thomson, Mark S Guyer, Cathy Ng, Assya Abdallah, Deborah Colantuoni, Thomas R Insel, Susan E Koester, A Roger Little, Patrick K Bender, Thomas Lehner, Yin Yao, Carolyn C Compton, Jimmie B Vaught, Sherilyn Sawyer, Nicole C Lockhart, Joanne Demchok & Helen F Moore. Nature Genetics 45, 580–585 (2013).
diff --git a/general/datasets/Gtex_nucle_0414/experiment-design.rtf b/general/datasets/Gtex_nucle_0414/experiment-design.rtf new file mode 100644 index 0000000..5e876e0 --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/experiment-design.rtf @@ -0,0 +1 @@ +GTEx samples are collected from deceased donors at low post-mortem intervals and preserved in PAXgene fixative prior to DNA and RNA extraction.
diff --git a/general/datasets/Gtex_nucle_0414/platform.rtf b/general/datasets/Gtex_nucle_0414/platform.rtf new file mode 100644 index 0000000..f276bf8 --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/platform.rtf @@ -0,0 +1,3 @@ +Expression
+ +RPKM data are used as produced by RNA-SeQC. Filter on >=10 individuals having >0.1RPKM. Log and quantile normalize the expression values across all samples. Outlier correction: for each gene, rank values across samples then map to a standard normal.
diff --git a/general/datasets/Gtex_nucle_0414/processing.rtf b/general/datasets/Gtex_nucle_0414/processing.rtf new file mode 100644 index 0000000..6635893 --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/processing.rtf @@ -0,0 +1,102 @@ +Analysis Methods
+ +Preprocessing
+ +RNA-seq
+ +RNA-seq was performed using the Illumina TruSeq library construction protocol. This is a non-strand specific polyA+ selected library. The sequencing produced 76-bp paired end reads.
+ +See also:How to Evaluate and Use Human and Mouse mRNA Data Sets (e.g. GTEx)
+ +Alignment to the HG19 human genome was performed using Tophat v1.1.4 assisted by the GENCODE v12 transcriptome definition. In a post processing step, unaligned reads are reintroduced into the bam. The final bam contains aligned and unaligned reads, marked duplicates, quality score recalibration. It should be noted that Tophat produces multiple mappings for some reads, but in post processing one read is flagged as the primary alignment.
+ +Genotyping
+ +DNA samples that are sent to the Broad Institute Genetic Analysis Platform for genotyping, are placed on 96-well plates using the Illumina HumanOmni5-4v1_B SNP array. Omni genotypes are called using GenomeStudio v2010.3 with the calling algorithm/genotyping module version 1.8.4 using the default cluster file HumanOmni5-4v1-Multi_B.egt. Called genotypes are run through a standard QC pipeline and only samples passing a call rate threshold of 97%, and passing genetic fingerprint and gender concordance are passed. For the final eQTL analysis, the following filters were applied: call rate (< 90%), low HWE (pValue < 1E-6) or are monormorphic.
+ +Expression Quantification
+ +Gene/Transcript Model
+ +Gencode Version 12
+Contig names modified to match the reference genome used for alignment
+Procedure for collapsing transcript model into gene model
Primary source: gencode.v12
+List exons as a set of intervals, discarding any labeled as 'retained_intron' and retaining only coding and linc rna.
+Create a separate bin for other types of transcripts and process them independently.
+Merge overlapping intervals.
+Discard intervals associated with multiple genes.
+Map intervals back to gene identifiers and output in GTF format.
+Quantification
For gene/exon level read count and gene level RPKM values, we filter reads based on the requirements:
+ +Reads must have be uniquely mapped (for tophat this is mapping quality > 3; == 255).
+Reads must have proper pairs.
+Alignment distance must be <=6.
+Reads must be contained 100% within exon boundaries. Reads overlapping introns are not counted.
+Exon
For exon read counts, if a read overlaps multiple exons, then then a fractional value equal to the portion of the read contained within that exon is allotted.
+ +Transcript
+ +Transcript-level quantification is provided by Flux Capacitor.
+ +eQTL Analysis
+ +QC and Sample Exclusion Process
+ +D statistic outliers are removed.
+Gender-specific expression outliers are removed.
+Samples with less than 10 million mapped reads are removed.
+In the case of replicates, the samples with the greater number of reads are chosen.
+Covariates
3 Genotyping PCs.
+15 Peer factors:
+The input to PEER are the post-normalization expression values described below.
+Gender.
+Expression
RPKM data are used as produced by RNA-SeQC.
+Filter on >=10 individuals having >0.1RPKM.
+Log and quantile normalize the expression values across all samples.
+Outlier correction: for each gene, rank values across samples then map to a standard normal.
+Genotypes
Imputation-based genotypes:
+Call Rate Threshold 95%.
+Info score Threshold 0.4.
+Minor Allele Frequency >= 5%.
+Sex chromosomes have been excluded excluded.
+Matrix eQTL Parameters
Produced for radius +-1mb from TSS.
+P value threshold set to 1 to emit all p-values.
+Storey FDR
The Storey q-value method was applied using the public R package with default values.
+eQTLs were filtered for an FDR <=5%.
+Tissues
There are 9 Tissues that have sufficient sample numbers (n > 80).
+ +Adipose_Subcutaneous
+Artery_Tibial
+Heart_Left_Ventricle
+Lung
+Muscle_Skeletal
+Nerve_Tibial
+Skin_Sun_Exposed_Lower_leg
+Thyroid
+Whole_Blood
Note: RPKM original values that we enter in GeneNetwork have been log2 transformed after added 1, then values lower than 2.0 were transformed to 0 (zero).
+ +Read Bits of DNA blog: GTEx is throwing away 90% of their data and response to: "GTEx is throwing away 90% of their data. by Manolis Dermitzakis, Gad Getz, Krisitn Ardlie, Roderic Guigo for the GTEx consortium.
+ +See also: How to Evaluate and Use Human and Mouse mRNA Data Sets (e.g. GTEx)
diff --git a/general/datasets/Gtex_nucle_0414/summary.rtf b/general/datasets/Gtex_nucle_0414/summary.rtf new file mode 100644 index 0000000..dff008b --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/summary.rtf @@ -0,0 +1,3 @@ +The Genotype-Tissue Expression (GTEx) project is a collaborative effort that aims to identify correlations between genotype and tissue-specific gene expression levels that will help identify regions of the genome that influence whether and how much a gene is expressed. GTEx is funded through the Common Fund, and managed by the NIH Office of the Director in partnership with the National Human Genome Research Institute, National Institute of Mental Health, the National Cancer Institute, the National Center for Biotechnology Information at the National Library of Medicine, the National Heart, Lung and Blood Institute, the National Institute on Drug Abuse, and the National Institute of Neurological Diseases and Stroke, all part of NIH. This series of 837 samples represents multiple tissues collected from 102 GTEX donors and 1 control cell line. In total, 30 tissue sites are represented including Adipose, Artery, Heart, Lung, Whole Blood, Muscle, Skin, and 11 brain subregions. RNA-seq expression data, robust clinical data, pathological annotations, and genotypes are also available for these samples from dbGaP (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v2.p1) and the GTEx portal (www.broadinstitute.org/gtex). While GTEx is no longer generating Affymetrix expression data, donor enrollment continues and is expected to reach 1,000 by the end of 2015. Updates to the GTEx data in dbGaP and the GTEx Portal will be made periodically. contributor: GTEx Laboratory, Data Analysis, and Coordinating Center (LDACC) contributor: The Broad Institute of MIT and Harvard (LDACC PIs: Kristin Ardlie and Gaddy Getz).
+ + diff --git a/general/datasets/Gtex_nucle_0414/tissue.rtf b/general/datasets/Gtex_nucle_0414/tissue.rtf new file mode 100644 index 0000000..7e60b80 --- /dev/null +++ b/general/datasets/Gtex_nucle_0414/tissue.rtf @@ -0,0 +1,3 @@ + + + -- cgit v1.2.3