Skip to content

bionetslab/scxmatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scXMatch

Python package Conda install with bioconda

scXMatch (single-cell cross match) is a Python package that implements Rosenbaum's cross-match test using distance-based matching to assess statistical dependence between two groups of high-dimensional data. This is particularly useful in analyzing multivariate distributions in structured data, such as single-cell RNA-seq.

This package provides a Python implementation inspired by the methodology described in Rosenbaum (2005).


Installation

Due to its dependence on graph-tool, this package can only be installed from conda, not from PyPI. The channels need to be specified.

conda install scxmatch -c conda-forge -c bioconda

Requirements

  • Python ≥ 3.9
  • anndata
  • scanpy
  • scipy
  • graph-tool $\geq$ 2.92

API Documentation

scxmatch.test

scxmatch.test(
    adata,
    group_by,
    test_group,
    reference=None,
    metric="sqeuclidean",
    rank=False,
    k=100
)

Description

Performs Rosenbaum’s matching-based test to determine if there is a statistically significant difference between two groups of samples using a distance-based graph matching approach.

Parameters

  • adata (anndata.AnnData): The input data matrix. Features should be in adata.X, and group labels in adata.obs[group_by].
  • group_by (str): Column in adata.obs indicating group labels.
  • test_group (str or list of str): The group(s) to be tested.
  • reference (str or list of str, optional): The reference group(s). If None, all non-test samples are used as reference.
  • metric (str, default "sqeuclidean"): Distance metric for matching. Follows scipy.spatial.distance.cdist standards.
  • rank (bool, default False): If True, features are rank-transformed before distance computation.
  • k (int or None, default 100): Number of nearest neighbors to use for graph construction. If None, a full distance matrix is used.

Returns

  • result (dictionary): Dictionary containing the P-value, the number of cross matches a1, the expected value of A1, the variance of A1, the z-score, the matching coverage, and the effect strength ratio.

Raises

  • TypeError: If the input adata is not an AnnData object.
  • ValueError: If test_group or reference contains values not present in adata.obs[group_by].
  • ValueError: If k is not an integer or None.

Modifies:

  • Modifies adata.obs in-place by adding the following columns:
    • XMatch_partner_<test_group>_vs_<reference>: The index of each sample’s matched partner in the MWMCM.

scxmatch.estimate_peak_RAM_GB

scxmatch.estimate_peak_RAM_GB(
    N, 
    k
)

Description

Estimates the worst-case peak RAM usage of a test run with k on an anndata with N samples.

Parameters

  • N (int): Number of samples in anndata file.
  • k (int): Number of nearest neighbors to use for graph construction.

Returns

  • peak_ram_gb (float): The expected worst-case peak RAM in gigabytes.

Raises

  • ValueError: For invalid (negative) N or k.

Example Usage

import anndata as ad
import scxmatch

# Load your AnnData object or load scanpy dataset
# adata = ad.read_h5ad("your_data.h5ad")
adata = sc.datasets.krumsiek11()

# Run test
result = xm.test(
    adata=adata,
    group_by="condition",
    test_group="treated",
    reference="control",
    metric="sqeuclidean",
    rank=False,
    k=100
)

print(f"P-value: {result["p_value"]:.4f}, Z-score: {result["z_score"]:.2f}, Coverage: {result["coverage"]:.2%}")

Citation

If you use scXMatch in your research, please cite the original paper and our publication:

Rosenbaum, P. R. (2005). An exact distribution-free test comparing two multivariate distributions based on adjacency. Journal of the Royal Statistical Society: Series B, 67(4), 515–530.

Anna Moeller, Miriam Schnitzerlein, Eric Greto, Vasily Zaburdaev, Stefan Uderhardt, David B. Blumenthal. Quantifying distribution shifts in single-cell data with scXMatch. bioRxiv 2025.06.25.661473; doi: https://doi.org/10.1101/2025.06.25.661473


License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages