You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

88 lines
2.2 KiB

  1. ---
  2. title: "04-merge-cytokines"
  3. author: "Pjotr"
  4. date: "11/03/2020"
  5. output: html_document
  6. ---
  7. ```{r setup, include=FALSE}
  8. # The following sets up the working directory for
  9. # the data files. Make sure to amend it to your
  10. # setup
  11. data_dir <- "/home/wrk/iwrk/closed/kemri/Francis_Final_TregData_Jan2020/Data/"
  12. setwd(data_dir)
  13. knitr::opts_knit$set(echo = TRUE, root.dir=data_dir)
  14. ```
  15. ## Merging data
  16. As the data is distributed across multiple CSV files we need a merging strategy which follows the tidyverse relational model https://r4ds.had.co.nz/relational-data.html.
  17. In the first steps we load the two files
  18. ```{r}
  19. library(tidyverse)
  20. data <- read_csv("data-202003/final_chmi_covariates.csv")
  21. cytokines <- read_csv("data-202003/cytokines.csv")
  22. cytokines
  23. ```
  24. Cytokines contains repeat data:
  25. ```{r}
  26. sort(cytokines$SampleID)[1:10]
  27. ```
  28. ```{r}
  29. cytokines %>%
  30. filter(SampleID=="16K0007")
  31. ```
  32. You can see patient 16K0007 was treated and had measurements taken at time C-1 and C+7. Time series are always tricky. So, let's start simple by merging the measurements using an inner join (where only matching keys are kept)
  33. ```{r}
  34. c1 = data %>%
  35. inner_join(cytokines, by="SampleID")
  36. ```
  37. Oh, an error because the column names are different! Let's fix that by renaming the SampleID to subjectid
  38. ```{r}
  39. cytokines = cytokines %>%
  40. rename(subjectid=SampleID)
  41. c1 = data %>%
  42. inner_join(cytokines, by="subjectid")
  43. ```
  44. and now we have a table with all observations that have both data points. Now we should be able to plot
  45. ```{r}
  46. ggplot(data=c1, aes(Timepoint,IFNa,colour=location)) + geom_point()
  47. ```
  48. Now the time points are not ordered correctly, so let's fix that
  49. ```{r}
  50. c1 %>%
  51. mutate(t = revalue(Timepoint, c("C-1"="0",
  52. "C+5"="05",
  53. "C+7"="07",
  54. "C+9"="09",
  55. "C+14"="14",
  56. "C+21"="21",
  57. "DoD"="30"
  58. ))) %>%
  59. ggplot(aes(t,IFNg,colour=phenotype)) +
  60. geom_point()
  61. ```
  62. Now let's try a linear regression
  63. ```{r}
  64. ggplot(c1,aes(phenotype,IFNg,colour=Timepoint)) +
  65. geom_point()
  66. ```