troutpy.tl.cluster_distribution_from_source

troutpy.tl.cluster_distribution_from_source#

troutpy.tl.cluster_distribution_from_source(sdata, gene_key='gene', distance_key='distance', n_clusters=3, n_bins=20, copy=False)#

Clusters genes based on the distribution of distances of extracellular transcripts from their source cell.

For each gene in sdata[‘source_score’].obs, the function computes a normalized histogram (using n_bins) over the distance range. These histogram vectors are then standardized and clustered using KMeans.

Parameters:
  • sdata (spatialdata.SpatialData) – Spatial data object containing a ‘source_score’ layer with an obs DataFrame.

  • gene_key (str) – Column name that contains the gene names.

  • distance_key (str) – Column name that contains the distance from the source cell.

  • n_clusters (int) – Number of clusters to form.

  • n_bins (int) – Number of bins for the histogram representation.

Returns:

-gene_cluster_df (DataFrame)

A DataFrame with columns ‘gene’ and ‘cluster’ indicating the cluster assignment.

-hist_df (DataFrame)

A DataFrame where each row is a gene and the columns are the normalized histogram counts.

-bin_edges (ndarray)

The bin edges used for the histograms.