troutpy.tl.cluster_distribution_from_source

troutpy.tl.cluster_distribution_from_source#

troutpy.tl.cluster_distribution_from_source(sdata, gene_key='gene', distance_key='distance', n_clusters=3, n_bins=20, copy=False)#

Cluster genes by the distribution of their transcripts’ distances to source cells.

For each gene in sdata["source_score"].obs, computes a normalized histogram (with n_bins bins) of distance_key values, standardizes these histogram vectors, and clusters them with KMeans. Results are stored in sdata["xrna_metadata"].var["kmeans_distribution"].

Parameters:
  • sdata (SpatialData) – SpatialData object containing a "source_score" table with an obs DataFrame.

  • gene_key (str (default: 'gene')) – Column in sdata["source_score"].obs containing gene identifiers.

  • distance_key (str (default: 'distance')) – Column in sdata["source_score"].obs containing the distance from the source cell.

  • n_clusters (int (default: 3)) – Number of KMeans clusters to form.

  • n_bins (int (default: 20)) – Number of histogram bins used to represent each gene’s distance distribution.

  • copy (bool (default: False)) – If True, return a modified copy of sdata. Otherwise modify in place.

Returns:

If copy=True, a modified copy of sdata. Otherwise None, modifying sdata in place.