troutpy.pp.find_optimal_segmentation_free_bin_size

troutpy.pp.find_optimal_segmentation_free_bin_size#

troutpy.pp.find_optimal_segmentation_free_bin_size(sdata, bin_sizes=(1, 2, 4, 8, 16, 32), cell_type_key='leiden', roi_size=120, min_div=0.85)#

Find the optimal bin size for transcript aggregation by maximizing separability between intracellular and extracellular gene signatures.

The function computes cosine similarity between binned transcript counts and known cell-type signatures, then uses Jensen-Shannon (JS) Divergence to identify the bin size that best distinguishes “cellular” signals from background noise. A spatial visualization is produced automatically.

Parameters:
  • sdata (spatialdata.SpatialData) – SpatialData object containing a "table" AnnData with cell-type annotations and a "transcripts" points layer with "gene", "x", "y", and "overlaps_cell" columns.

  • bin_sizes (tuple of int, optional) – Pixel/unit sizes to test for spatial binning. Defaults to (1, 2, 4, 8, 16, 32).

  • cell_type_key (str, optional) – Key in sdata["table"].obs containing cell-type labels used to build reference signatures. Defaults to "leiden".

  • roi_size (int, optional) – Side length (in the same units as transcript coordinates) of the square Region of Interest centred on the median transcript position. Defaults to 120.

  • min_div (float, optional) – Fraction of the maximum observed JS Divergence used as the acceptance threshold when selecting the optimal bin size. Defaults to 0.85.

Returns:

-results_df (DataFrame)

Long-format DataFrame with columns "bin_x", "bin_y", "cosine_sim", "is_cellular", and "bin_size" for every bin across all tested bin sizes.

-metrics_df (DataFrame)

DataFrame with columns "bin_size" and "js_divergence" for each tested size.

-optimal_bin (int)

The smallest bin size that achieves at least min_div of the maximum observed JS Divergence.