pyccapt.calibration.clustering package

Submodules

pyccapt.calibration.clustering.clustering module

Clustering helpers for calibrated APT datasets.

class pyccapt.calibration.clustering.clustering.MinMaxClusterResult(labels: ndarray, selected_mask: ndarray, selected_indices: ndarray, centers: ndarray, ion_names: tuple[str, ...], cluster_column: str, algorithm: str = 'min-max', parameters: dict[str, float | int | bool] | None = None)[source]

Bases: object

Result of a clustering pass on a selected ion population.

algorithm: str = 'min-max'

centers: ndarray

cluster_column: str

property counts: tuple[int, ...]

ion_names: tuple[str, ...]

labels: ndarray

property n_clusters: int

parameters: dict[str, float | int | bool] | None = None

selected_indices: ndarray

selected_mask: ndarray

pyccapt.calibration.clustering.clustering.build_cluster_context_trace(variables, *, mask: ndarray, name: str, color: str = 'rgba(160,160,160,0.55)', opacity: float = 0.12, marker_size: float = 1.0, showlegend: bool = True) → Scatter3d | None[source]: Build a faint context trace to show specimen geometry around clusters.

pyccapt.calibration.clustering.clustering.build_cluster_scatter_traces(variables, cluster_result: MinMaxClusterResult, *, opacity: float = 0.9, marker_size: float = 2.5, valid_mask: ndarray | None = None) → list[Scatter3d][source]: Build Plotly traces for clustered precipitate segments.

pyccapt.calibration.clustering.clustering.estimate_maximum_separation_distance(points: ndarray, *, kth_neighbor: int = 3, percentile: float = 50.0) → float[source]: Estimate d_max from the kth-nearest-neighbor distance distribution.

pyccapt.calibration.clustering.clustering.maximum_separation_clustering(points: ndarray, *, d_max: float, n_min: int) → tuple[ndarray, ndarray][source]: Cluster points by connected components with maximum edge length d_max.

pyccapt.calibration.clustering.clustering.min_max_clustering(points: ndarray, n_clusters: int = 2, max_iter: int = 50, n_min: int | None = None) → tuple[ndarray, ndarray][source]

Segment points with a deterministic Min-Max initialization plus centroid refinement.

Parameters:: n_min (optional int) – When provided, clusters with fewer than n_min members are relabelled as noise (-1) and the surviving labels compacted – consistent with the HDBSCAN / DBSCAN / maximum-separation algorithms in this module, which all drop tiny clusters. The default (None) preserves the legacy behaviour of returning exactly n_clusters partitions.

Notes

NaN-coordinate rows (partial-recovered ions) are dropped before clustering and re-inserted afterwards with label -1; previously the NaN values flowed through np.linalg.norm / argmin and silently produced garbage labels.

pyccapt.calibration.clustering.clustering.normalize_clustering_method(method: str) → str[source]: Return the canonical clustering method identifier.

pyccapt.calibration.clustering.clustering.parse_label_selection(selection: str | Iterable[str]) → tuple[str, ...][source]: Normalize a comma-separated label selection.

pyccapt.calibration.clustering.clustering.segment_ions(variables, ion_names: Sequence[str] | str, *, method: str = 'min-max', n_clusters: int = 2, d_max: float | None = None, n_min: int = 25, auto_d_max: bool = True, kth_neighbor: int = 3, percentile: float = 50.0, voxel_size: float = 1.0, seed_labels: Sequence[str] | str | None = None, cluster_column: str | None = None) → MinMaxClusterResult[source]: Cluster a selected ion population with the requested algorithm.

pyccapt.calibration.clustering.clustering.segment_ions_by_maximum_separation(variables, ion_names: Sequence[str] | str, *, d_max: float | None = None, n_min: int = 25, auto_d_max: bool = True, kth_neighbor: int = 3, percentile: float = 50.0, cluster_column: str = 'cluster_maxsep') → MinMaxClusterResult[source]: Cluster a selected ion population with a fast maximum-separation rule.

pyccapt.calibration.clustering.clustering.segment_ions_by_min_max(variables, ion_names: Sequence[str] | str, *, n_clusters: int = 2, cluster_column: str = 'cluster_minmax') → MinMaxClusterResult[source]: Cluster a selected ion population into n_clusters precipitate segments.

pyccapt.calibration.clustering.isosurface module

pyccapt.calibration.clustering.isosurface.bin_vectors_from_distance(dist, bin_values, mode='distance')[source]

Create a set of grid vectors to be used in nD binning. The bounds are calculated such that they don’t go beyond the size of the dataset.

Parameters:

dist (numpy.ndarray) – The distance variable to be binned. One column per dimension. It is the generalized distance.
bin_values (list or numpy.ndarray) – The bin ‘distance’ per bin in either a distance metric or a count. Non-isometric bins are possible.
mode (str) – Mode can be ‘distance’ (constant distance) or ‘count’ (constant count). Default is ‘distance’.

Returns:

bin_centers (list of numpy.ndarray): The bin centers of each bin.
bin_edges (list of numpy.ndarray): The edges of each bin.

Return type:

tuple

pyccapt.calibration.clustering.isosurface.isosurface(gridVec, data, isovalue)[source]

Extract isosurface using pyvista for a custom 3D grid.

Parameters:

gridVec (list of np.ndarray) – List of 3 arrays representing the grid points in x, y, and z.
data (np.ndarray) – 3D scalar field (same shape as the meshgrid defined by gridVec).
isovalue (float) – Scalar value to extract the isosurface.

Returns:

Isosurface with faces and vertices.

Return type:

pyvista.PolyData

pyccapt.calibration.clustering.isosurface.pos_to_voxel(data, grid_vec, species=None)[source]

Creates a voxelization of the data in ‘pos’ based on the bin centers in ‘grid_vec’ for the atoms/ions in the specified species.

Parameters:
data (pyccapt DataFrame): The data to be voxelized. when input species is given, ranges must be allocated.

% A decomposed DataFrame file is also possible. Use range_to_pyccapt to decompose the data.

grid_vec (list of numpy.ndarray): Grid vectors for the voxel grid. These are the bin centers. species (list, str, or numpy.ndarray, optional): The species to filter by. Can be:

List of species names (e.g., [‘Fe’, ‘Mn’]).

Boolean array matching the length of pos.

None, to include all atoms/ions.

Returns:: numpy.ndarray: A 3D array representing the voxelized data.

Module contents