troutpy.tl.segment_protrusions

troutpy.tl.segment_protrusions#

troutpy.tl.segment_protrusions(sdata, layer='transcripts', table_name='structure_table', cluster_eps=1.0, min_samples=10, connectivity_threshold=2.0, max_distance=30, global_distance=500, n_neighbors=50, lmbda=0.01, copy=False)#

Identify extracellular RNA structures, quantify their morphology, and assign them to parent cells.

Transcripts in sdata.points[layer] that are either classified as "High Density" and not overlapping a cell, or intracellular-like but not overlapping a cell, are clustered with DBSCAN into candidate structures (e.g. protrusions). For each resulting structure this computes:

  • Morphology (area, perimeter, circularity) from the convex hull of its transcripts, and whether it is physically connected to a segmented cell.

  • An individual parent_score for the most likely parent cell, based on gene-expression overlap weighted by an exponential distance decay, plus an assignment_confidence margin and an is_ambiguous flag.

  • A neighborhood_score describing how much of the structure’s gene content is explained by the collective expression of its nearby cells.

Results are stored as a new table sdata.tables[table_name], and sdata.points[layer] is annotated with a structure_id column (-1 for transcripts not assigned to any structure).

Parameters:
  • sdata (spatialdata.SpatialData) – SpatialData object with a layer points table containing x, y, gene, enrichment_class, extracellular, and overlaps_cell columns, and a "table" AnnData with .obsm["spatial"].

  • layer (str, optional) – Points layer to read transcripts from and annotate with structure_id. Defaults to "transcripts".

  • table_name (str, optional) – Key under which the resulting structure table is stored in sdata.tables. Defaults to "structure_table".

  • cluster_eps (float, optional) – DBSCAN eps (maximum distance between two points to be considered neighbors) used to cluster candidate structure transcripts. Defaults to 1.0.

  • min_samples (int, optional) – DBSCAN min_samples required to form a structure. Defaults to 10.

  • connectivity_threshold (float, optional) – Maximum distance to a cell-assigned transcript for a structure to be considered physically connected to a cell. Defaults to 2.0.

  • max_distance (float, optional) – Currently unused; reserved for future filtering by structure-to-cell distance. Defaults to 30.

  • global_distance (float, optional) – Maximum distance for a cell to be considered a neighbor of a structure when computing parent and neighborhood scores. Defaults to 500.

  • n_neighbors (int, optional) – Number of nearby cells to consider per structure for parent assignment. Defaults to 50.

  • lmbda (float, optional) – Exponential decay rate applied to structure-to-cell distance when scoring candidate parent cells. Defaults to 0.01.

  • copy (bool, optional) – If True, return the modified SpatialData object; otherwise modify sdata in place and return None. Defaults to False.

Return type:

SpatialData | None

Returns:

spatialdata.SpatialData or None sdata with sdata.tables[table_name] containing per-structure morphology, predicted_parent, parent_score, neighborhood_score, assignment_confidence, and is_ambiguous columns, and sdata.points[layer]["structure_id"] set, if copy=True; otherwise None.