Large-scale heterogeneous biomedical knowledge graphs (KGs) use graph structures to represent and study multi-typed relational information in biological systems. Network relationships in a KG can be quantified by similarity search methods; however, such methods must consider the diversity of node types contained within that KG. To distinguish between node types, we leverage meta paths, a general graph-theoretic approach for flexible similarity search in large networks. Meta paths are defined as sequences of node types which define a walk from the origin node to the destination node, and are widely used in biomedical network analysis.

To support meta paths in R, we present metapaths, the first R software package to perform meta path-based similarity search in heterogeneous KGs. The metapaths package offers various in-built similarity metrics for node pair comparison by querying KGs represented as either edge or adjacency lists, as well as auxiliary aggregation methods to measure set-level relationships. This framework facilitates the scalable and flexible modeling of network similarities in KGs with applications across biomedical KG learning.

## Installation

metapaths is designed for the R programming language and statistical computing environment. To install the latest version of this package, please run the following line in your R console:

devtools::install_github("ayushnoori/metapaths")

## Custom Similarity Metrics

In addition to the in-built similarity metrics, users may also define their own custom metrics. To define a custom similarity metric, please complete the following steps:

1. Add a new function to similarity-metrics.R with the get_<similarity-metric>() nomenclature.

2. Edit the get_similarity_function() function to add your metric to the list of allowed similarity metrics.

3. Submit a pull request for approval.

## Custom Aggregation Methods

Akin to custom similarity metrics, users may also define custom aggregation methods for set-level comparison. To define a custom aggregation method, please complete the following steps:

1. Add a new function to aggregation-methods.R with the get_<aggregation-method>() nomenclature.

2. Edit the get_aggregation_function() function to add your metric to the list of allowed aggregation methods.

3. Submit a pull request for approval.

## Evaluation on a Biomedical KG

Evaluation of the metapaths package on ogbl-biokg, an open-source biomedical KG available from the Open Graph Benchmark, recovered meaningful drug and disease-associated relationships as quantified by high similarity scores. For example, the meta path traversal function identified three paths following the specified meta path that connect donepezil – a drug used to treat Alzheimer’s disease (AD) – with the regulation of amyloid fibril formation pathway, which is implicated in AD.

Additional usage examples are available in the ogbl-biokg vignette.

## Documentation

• The metapaths R package is freely available under MPL 2.0 via GitHub.
• Package documentation and usage examples are available here.
• For more information, please visit the metapaths project website.

## Citation

If you find metapaths useful, please cite our forthcoming paper:

@article{noori2022metapaths,
title={metapaths: similarity search in heterogeneous knowledge graphs via meta paths},
author={Noori, Ayush and Tan, Amelia L.M. and Li, Michelle M. and Zitnik, Marinka},
journal={arXiv: 2209.0000},
volume={},
number={},
pages={},
year={2022},
publisher={}
}

## Contact

Should any questions arise, please open a GitHub issue or contact anoori@college.harvard.edu.