This function performs hierarchical clustering on gene sets (pathways) based on the Jaccard similarity of their gene members. It subsets pathways from the supplied pathway_table using the specified contrast, computes the similarity matrix, performs hierarchical clustering, and generates a dendrogram saved as a PDF. The function then annotates the pathways with cluster information (using annotate_clusters()) and writes the result to a specified worksheet in an openxlsx workbook via add_table_to_workbook().

cluster_jaccard(contrast, contrast_color, wb, pathway_table)

Arguments

contrast

A character string specifying the contrast (e.g., "WT vs. GFP") used to subset the pathways.

contrast_color

A character string representing the tab color (as a hex code, e.g., "#FAE1DD") for the worksheet corresponding to this contrast.

wb

A workbook object created by openxlsx::createWorkbook() where the clustered results will be added.

pathway_table

A data.table containing the pathways to be clustered. This table must include the following columns: Contrast, MEMBERS_SYMBOLIZED, NAME, Comparison, Regulation, SIZE, ES, NES, NOM.p.val, FDR.q.val, FWER.p.val, CONTRIBUTOR, SUB_CATEGORY_CODE, EXACT_SOURCE, DESCRIPTION_BRIEF, and MEMBERS_EZID.

Value

A data.table containing the pathways annotated with cluster information.

Details

The dendrogram is saved as a PDF in the "Results/GSEA_preranked/pathways" directory relative to the project root.