Heatmap combining bacterial abundance determined in experiments with patient clinical data annotation

Biological data visualization

Author

Ivan Osinnii

Published

April 22, 2024

This is a Quarto document which nicely combines both R code, its results and text explanation to enable an easy and interactive access for both learners, readers and supervisors to such analyses. To learn more about Quarto see https://quarto.org. By default all code lines are blended, but you can show them by clicking on the code button.

1 Introduction

A snippet of the fully formated figure from real publication

This is a sample plot mapping the clinical information collected from literature meta-analysis. In this demonstration the data were taken from this study and reduced for convenience. Authors manually extracted bacterial abundance measurment data (16S rRNA seq analysis) for given taxonomic units and transformed an increase or decrease in certain categories from integers -3 up to 3 where 0 means no change in diseases. This is a basis for making a simple heatmap based on bacterial data only.

In rows we see different taxa and in columns we see different studies (papers) numbered from 1 to 20. The power of this figure is hovewer in combining visual bacterial changes information with visual clinical parameter trends extracted from publications manually. The latter would allow us to see whether any results are caused or associated with particular patient characteristics of study techniques.

2 Loading data and dependencies

For this data analysis we will need next R packages: tidyverse, pheatmap, RColorBrewer, readxl, gridExtra

Code
library(pheatmap)
library(RColorBrewer)
library(tidyverse)
library(readxl)
library(gridExtra)

Let us load a data directly from Excel file using read_xlsx function from readxl package This data is taken from this publication and us used as a small subset. Table Taxa contains the name of taxonomic units in rows and different publication numbers in columns. As experimental outcome values the bacterial abundance change on the scale from -3 to 3 is presented. -3, -2, -1 can be interpreted as bacterial decrease in very severe/severe/light disease stage and so positive values. Alternatively, one can assign to different decrease degrees different diseases or other clinical parameters.

Code
Taxa <-read_xlsx("./../input/Literature_HeatMap_Data_Ivan2.xlsx", sheet = 1)
Taxa <- tibble::column_to_rownames(Taxa, var = names(Taxa)[1])

annotation_table <- read_xlsx("./../input/Literature_HeatMap_Data_Ivan2.xlsx", sheet = 2)
annotation_table <- tibble::column_to_rownames(annotation_table, var = names(annotation_table)[1])

annotation_table contains different clinical parameters as rows and again publication numbers as columns. As values different types of clinical parameters are used (as designated on the figure legend) - they will be handled as categorical variables in our analysis. These values were extracted from publications manualy.

3 Making plot with the help of pheatmap package

3.1 Plotting bacterial abundance without annotation table

Code
rwbcols <- c("#D33F6A","#E07B91","#E6AFB9","#f4fcfc","#B5BBE3", "#8595E1", "#4A6FE3")
pheatmap(Taxa, 
         fontsize_row = 8, fontsize_col = 5, cluster_cols = TRUE, cluster_rows = F,
         color = colorRampPalette(rev(rwbcols))(100),
         cellwidth = 7, cellheight = 12)

3.1.1 We set up annotation categories and their colors

Code
ann_colors = list(
  Gender = c(Less="#3399FF", More="#cc6666", Unknown="#f4fcfc"), 
  Sample_Type = c(Biopsy="#efe9d0", Stool="#ccbb74", Stool_Biopsy="#bc5607"),
  Sample_Size = c(LessThanFifty="#cff7d5",Between50_100="#a5f7b0", Between101_200="#78e887", MoreThan200="#47a354"),
  Median_Age = c(BetweenTenTwenty="#e1e8f4",BetweenThirtyFourty="#c1d3f2", BetweenThirtyFifty="#94b6ef", BetweenFiftySeventy="#6d98e0", Unknown="#f4fcfc"),
  RaceEthnicity = c(MostlyCaucasian="#aaf2ff", Indian="brown", Japanese="#e5f98b", Han= "#efb94c", Korean = "#cc6666", Unknown="#f4fcfc"),
  Method = c(RT_PCR="#e9e3ef", R454="#c3a9e5",  MiSeq="#a276db", HiSeq="#7b45c1", Other="#f4fcfc"))

3.2 plot heatmap with annotation table

Code
pheatmap(Taxa, 
         fontsize_row = 8, fontsize_col = 5, cluster_cols = FALSE, cluster_rows = FALSE,
         color = colorRampPalette(rev(rwbcols))(100),
         cellwidth = 7, cellheight = 12,
         annotation_col=annotation_table, annotation_colors = ann_colors, border_color = "grey60"
         )

4 Replicating this figure using ggplot2

We slightly adjust the data changing some annotation value names

Code
Taxa<-read_xlsx("./../input/Literature_HeatMap_Data_Ivan3.xlsx", sheet = 1) 
Taxa <- tibble::column_to_rownames(Taxa, var = names(Taxa)[1])

annotation_table <- read_xlsx("./../input/Literature_HeatMap_Data_Ivan3.xlsx", sheet = 2)
annotation_table <- tibble::column_to_rownames(annotation_table, var = names(annotation_table)[1])

Define your color palettes and the rwbcols. Prepare Taxa data (transform into long format)

Code
rwbcols <- brewer.pal(3, "RdBu")

  Taxa <- rownames_to_column(Taxa, "RowNames")
  Taxa_long <- pivot_longer(Taxa, cols = -RowNames, names_to = "ColumnName", values_to = "Value")
Taxa_long <- Taxa_long %>%
  arrange(desc(RowNames))

Prepare annotation data color scheme

Code
ann_colors <- list(Less="#3399FF", More="#cc6666", Unknown="#f4fcfc", Biopsy="#efe9d0", 
Stool="#ccbb74", Stool_Biopsy="#bc5607", LessThanFifty="#cff7d5", Between50_100="#a5f7b0", 
Between101_200="#78e887", MoreThan200="#47a354", 
BetweenTenTwenty="#e1e8f4", 
BetweenThirtyFourty="#c1d3f2", 
BetweenThirtyFifty="#94b6ef", BetweenFiftySeventy="#6d98e0", NotKnown="#f4fcfc", 
MostlyCaucasian="#aaf2ff",
Indian="brown", Japanese="#e5f98b", Han= "#efb94c", Korean = "#cc6666", Not_known="#f4fcfc", 
RT_PCR="#e9e3ef", R454="#c3a9e5",  MiSeq="#a276db", HiSeq="#7b45c1", Other="#f4fcfc")

Prepare Annotations data. Flatten the list of vectors into a single named vector for use in ggplot. Ensure that the ‘Value’ column is a factor and its levels are in the order of ‘all_colors’

Code
annotation_table_long <- annotation_table %>%
  rownames_to_column("Publication") %>%
  pivot_longer(cols = -Publication, names_to = "Parameter", values_to = "Value")

all_colors <- unlist(ann_colors)

annotation_table_long$Value <- factor(annotation_table_long$Value, levels = names(all_colors))

4.0.1 Plotting main heatmap

Code
p1 <- ggplot(Taxa_long, aes(x = ColumnName, y = RowNames, fill = Value)) +
  geom_tile(width = 1, height = 1,color = "grey50") +
  scale_fill_gradientn(colors = colorRampPalette(rev(rwbcols))(100)) +
  theme_minimal() +
  theme(
    plot.margin = unit(c(0, 0, 0, 0), "cm"),  # Adjust margins around the plot
    axis.text.x = element_text(size = 5, angle = 45, hjust = 1),
        axis.text.y = element_text(size = 8),
        axis.title = element_blank(),
        legend.title = element_text(size = 7),  # Smaller legend title
        legend.text = element_text(size = 7),  # Smaller legend text
        legend.key.size = unit(0.4, "cm")) +  # Smaller legend keys
    coord_fixed(ratio = 12 / 7) +             # Adjust ratio based on cellheight and cellwidth
    scale_y_discrete(position = "right")  # Position the y-axis on the right
p1

4.0.2 Adding the annotation data

Code
p2 <- ggplot(annotation_table_long, aes(x = Publication, y = Parameter, fill = Value)) +
  geom_tile(width = 1.2, height = 0.8,color = "black") +  # Width is larger than height
  scale_fill_manual(values = all_colors) +  # Use the flattened color vector
  theme_void() +  # Start with a blank theme
  theme(
    plot.margin = unit(c(0, 6.4, 0, 0), "cm"),  # Adjust margins around the plot
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1, size = 6, color = "black"),  # Customize x axis labels
    axis.text.y = element_text(angle = 0, hjust = 1, size = 8, color = "black"),  # Customize y axis labels
    axis.title.x = element_text(size = 12, face = "bold"),  # X-axis title
    axis.title.y = element_text(size = 14, face = "bold"),   # Y-axis title
    legend.title = element_text(size = 7),  # Smaller legend title
    legend.text = element_text(size = 6),  # Smaller legend text
    legend.key.size = unit(0.25, "cm")  # Smaller legend keys
  ) +
  labs(
    x = "",  # Rename X axis
    y = "",  # Rename Y axis
    fill = ""  # Legend title
  ) +
  coord_fixed(ratio = 2) +  # Adjust ratio based on cellheight and cellwidth
  scale_x_discrete(position = "top") + # Position the x-axis at the top
  scale_y_discrete(position = "right")  # Position the y-axis on the right
ggplot2.plot <- grid.arrange(p2, p1, ncol = 1, heights = c(1, 4.1))

4.1 Acknowledgements

Many thanks to Prof. Yilmaz (UniBern) for data sample and help with learning this type of plot

Code
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3      readxl_1.4.3       lubridate_1.9.3    forcats_1.0.0     
 [5] stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
 [9] tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.0      tidyverse_2.0.0   
[13] RColorBrewer_1.1-3 pheatmap_1.0.12   

loaded via a namespace (and not attached):
 [1] gtable_0.3.4      jsonlite_1.8.8    compiler_4.3.3    tidyselect_1.2.1 
 [5] scales_1.3.0      yaml_2.3.8        fastmap_1.1.1     R6_2.5.1         
 [9] labeling_0.4.3    generics_0.1.3    knitr_1.45        htmlwidgets_1.6.4
[13] munsell_0.5.1     tzdb_0.4.0        pillar_1.9.0      rlang_1.1.3      
[17] utf8_1.2.4        stringi_1.8.3     xfun_0.43         timechange_0.3.0 
[21] cli_3.6.2         withr_3.0.0       magrittr_2.0.3    digest_0.6.35    
[25] grid_4.3.3        rstudioapi_0.16.0 hms_1.1.3         lifecycle_1.0.4  
[29] vctrs_0.6.5       evaluate_0.23     glue_1.7.0        farver_2.1.1     
[33] cellranger_1.1.0  fansi_1.0.6       colorspace_2.1-0  rmarkdown_2.26   
[37] tools_4.3.3       pkgconfig_2.0.3   htmltools_0.5.8.1