troutpy.pp.find_optimal_segmentation_free_bin_size#
- troutpy.pp.find_optimal_segmentation_free_bin_size(sdata, bin_sizes=(1, 2, 4, 8, 16, 32), cell_type_key='leiden', roi_size=120, min_div=0.85)#
Find the optimal bin size for transcript aggregation by maximizing separability between intracellular and extracellular gene signatures.
The function computes cosine similarity between binned transcript counts and known cell-type signatures, then uses Jensen-Shannon (JS) Divergence to identify the bin size that best distinguishes “cellular” signals from background noise. A spatial visualization is produced automatically.
- Parameters:
sdata (spatialdata.SpatialData) – SpatialData object containing a
"table"AnnData with cell-type annotations and a"transcripts"points layer with"gene","x","y", and"overlaps_cell"columns.bin_sizes (tuple of int, optional) – Pixel/unit sizes to test for spatial binning. Defaults to
(1, 2, 4, 8, 16, 32).cell_type_key (str, optional) – Key in
sdata["table"].obscontaining cell-type labels used to build reference signatures. Defaults to"leiden".roi_size (int, optional) – Side length (in the same units as transcript coordinates) of the square Region of Interest centred on the median transcript position. Defaults to
120.min_div (float, optional) – Fraction of the maximum observed JS Divergence used as the acceptance threshold when selecting the optimal bin size. Defaults to
0.85.
- Returns:
- -results_df (
DataFrame) Long-format DataFrame with columns
"bin_x","bin_y","cosine_sim","is_cellular", and"bin_size"for every bin across all tested bin sizes.- -metrics_df (
DataFrame) DataFrame with columns
"bin_size"and"js_divergence"for each tested size.- -optimal_bin (
int) The smallest bin size that achieves at least
min_divof the maximum observed JS Divergence.
- -results_df (