R/pathway-clustering-functions.R
cluster_jaccard.Rd
This function performs hierarchical clustering on gene sets (pathways) based on the Jaccard similarity of their gene members.
It subsets pathways from the supplied pathway_table
using the specified contrast, computes the similarity matrix,
performs hierarchical clustering, and generates a dendrogram saved as a PDF. The function then annotates the pathways with cluster information
(using annotate_clusters()
) and writes the result to a specified worksheet in an openxlsx
workbook via add_table_to_workbook()
.
cluster_jaccard(contrast, contrast_color, wb, pathway_table)
A character string specifying the contrast (e.g., "WT vs. GFP"
) used to subset the pathways.
A character string representing the tab color (as a hex code, e.g., "#FAE1DD"
) for the worksheet corresponding to this contrast.
A workbook object created by openxlsx::createWorkbook()
where the clustered results will be added.
A data.table
containing the pathways to be clustered. This table must include the following columns:
Contrast
, MEMBERS_SYMBOLIZED
, NAME
, Comparison
, Regulation
, SIZE
, ES
, NES
,
NOM.p.val
, FDR.q.val
, FWER.p.val
, CONTRIBUTOR
, SUB_CATEGORY_CODE
, EXACT_SOURCE
,
DESCRIPTION_BRIEF
, and MEMBERS_EZID
.
A data.table
containing the pathways annotated with cluster information.
The dendrogram is saved as a PDF in the "Results/GSEA_preranked/pathways"
directory relative to the project root.