Browse Source

Matrix style correlation

master
Pjotr Prins 6 months ago
parent
commit
b74f7377b2
2 changed files with 30 additions and 4 deletions
  1. +5
    -1
      README.md
  2. +25
    -3
      doc/03-using-tidyverse.Rmd

+ 5
- 1
README.md View File

@@ -9,7 +9,11 @@ Install RStudio with R.

With GNU Guix install

guix package -i r r-markdown r-rmarkdown r-tidyverse r-hash
guix package -i r r-markdown r-rmarkdown r-tidyverse r-hash r-hmisc

## Under GNU Guix

options(download.file.method="wget")

# Data



+ 25
- 3
doc/03-using-tidyverse.Rmd View File

@@ -6,22 +6,29 @@ output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# The following sets up the working directory for
# the data files. Make sure to amend it to your
# setup
data <- "/home/wrk/iwrk/closed/kemri/Francis_Final_TregData_Jan2020/Data/"
setwd(data)
knitr::opts_knit$set(echo = TRUE, root.dir=data)
```

## R Tidyverse
## Using R Tidyverse

The new way of analysing R data is the tidyverse https://www.tidyverse.org/ which includes the online book 'R for Data Science'.
The new (and hot) way of analysing data with R is the tidyverse https://www.tidyverse.org/ which includes the online book 'R for Data Science'. Please check it out!

Instead of dataframes we use Tibbles now. First import the data using the File menu in Rstudio - and make sure FACTORS is off and tables and columns show. This is the old way (remember) of plotting the individuals_attributes data frame

```{r}
ind_attr=read.csv("Individual_attributes.csv")
plot(ind_attr$ELISA ~ ind_attr$Time_to_diagnosis)
```

we want to turn ind_attr into a tibble with

```{r}
library(tidyverse)
tb = as_tibble(ind_attr)
tb
```
@@ -34,3 +41,18 @@ ggplot(data = tb) + geom_point(mapping = aes(y=ELISA, x = Time_to_diagnosis))

which shows that all high [ELISA](https://en.wikipedia.org/wiki/ELISA) values are for all late diagnosis only. ELISA uses a solid-phase enzyme immunoassay (EIA) to detect the presence of a ligand (commonly a protein) in a liquid sample using antibodies directed against the protein to be measured.

## Correlation

Let's try a simple correlation. This site has some
interesting [ideas](https://paulvanderlaken.com/2018/09/10/simpler-correlation-analysis-in-r-using-tidyverse-priciples/) which we may visit later. Let's correlate
using the pipes from dplyr:

```{r}
library(dplyr)
library(Hmisc)

cs = cbind(tb$Time_to_diagnosis,tb$ELISA,tb$Age)
rcorr(cs)
```

From this it is clear that correlations between time to diagnosis, age and ELISA are low.

Loading…
Cancel
Save