pyccapt.calibration.data_tools package

Submodules

pyccapt.calibration.data_tools.ato_tools module

pyccapt.calibration.data_tools.ato_tools.ato_to_ccapt(file_path: str, mode: str) → DataFrame[source]

Read data from an .ato file version 6 and convert it into a pandas DataFrame.

Parameters:

file_path – Path to the .ato file
mode – Type of mode (oxcart/ato)

Returns:

Pandas DataFrame containing the converted data

pyccapt.calibration.data_tools.ato_tools.ccapt_to_ato(data: DataFrame, path: str | None = None, name: str | None = None) → bytes[source]: Convert a PyCCAPT dataframe to the ATO v6 binary layout used by this project.

pyccapt.calibration.data_tools.data_loadcrop module

pyccapt.calibration.data_tools.data_loadcrop.build_event_group_mapping(dld_start_counter: ndarray, tdc_start_counter: ndarray) → tuple[ndarray, ndarray, ndarray][source]

Assign shared event-group ids linking dld rows to their tdc rows.

The start_counter wraps; values are not unique by themselves. But within each pulse trigger, all related tdc rows are consecutive in time order, and so are any dld rows for that pulse. A two-pointer sweep over the consecutive runs is enough to pair them: each tdc run either matches the next unmatched dld run (same counter value), or it is an “orphan” pulse that did not produce a reconstructible dld event.

Returns:

dld_gid (int64 array, parallel to dld_start_counter) – Group id assigned to each dld row.
tdc_gid (int64 array, parallel to tdc_start_counter) – Group id for matched tdc rows; -1 for orphan tdc rows.
tdc_has_match (bool array, parallel to tdc_start_counter) – True iff the tdc row’s pulse trigger has at least one dld row.

pyccapt.calibration.data_tools.data_loadcrop.calculate_ppi_and_ipp(data, max_start_counter)[source]

Calculate pulses since the last event pulse and ions per pulse.

Parameters:

data (dict) – A dictionary containing the ‘start_counter’ data.
max_start_counter (int) – The maximum start counter value.

Returns:

A tuple containing two numpy arrays: delta_p and multi.

Return type:

tuple

Raises:

IndexError – If the length of counter is less than 1.

pyccapt.calibration.data_tools.data_loadcrop.concatenate_dataframes_of_dld_grp(dataframe_list: list) → DataFrame[source]

Concatenates dataframes into a single dataframe.

Parameters:: dataframe_list – List of different information from dld group.
Returns:: Single concatenated dataframe containing all relevant information.
Return type:: DataFrame

pyccapt.calibration.data_tools.data_loadcrop.create_pandas_dataframe(data_crop, mode='dld', flag_old_pyccpat_data=False)[source]

Create a pandas dataframe from the cropped data.

Parameters:

data_crop – Cropped dataset
mode – Mode of extraction dld: Extracts data from dld group tdc_sc: Extracts data from tdc for Surface Consept tdc_ro: Extracts data from tdc for RoentDek detector
flag_old_pyccpat_data – Flag to determine if data is already convert from bin to ns and mm (old pyccapt datas)

Returns:

Dataframe to be inserted in the HDF file

Return type:

hdf_dataframe

pyccapt.calibration.data_tools.data_loadcrop.crop_data_after_selection(data_crop, variables)[source]

Crop the dataset after the region of interest has been selected.

Parameters:

data_crop – Original dataset to be cropped
variables – Variables object

Returns:

Cropped dataset

Return type:

data_crop

pyccapt.calibration.data_tools.data_loadcrop.crop_dataset(dld_master_dataframe, variables)[source]

Crop the dataset based on the selected region of interest.

Parameters:

dld_master_dataframe – Concatenated dataset
variables – Variables object

Returns:

Cropped dataset

Return type:

data_crop

pyccapt.calibration.data_tools.data_loadcrop.elliptical_shape_selector(axisObject, figureObject, variables, mode='circle')[source]

Enable the creation of an elliptical box to select the region of interest.

Parameters:

axisObject – Object to create the axis of the plot
figureObject – Object to create the figure
variables – Variables object
mode – Mode of selection (circle or ellipse)

Returns:

None

pyccapt.calibration.data_tools.data_loadcrop.fetch_dataset_from_dld_grp(filename: str, extract_mode='dld', *, lazy: bool = False)[source]

Fetches dataset from HDF5 file.

Parameters:

filename – Path to the HDF5 file.
extract_mode – Mode of extraction. dld: Extracts data from dld group. tdc_sc: Extracts data from tdc for Surface Consept. tdc_ro: Extracts data from tdc for Roentdek detector.
lazy – When True, return a pyccapt.calibration.data_tools.lazy_io.LazyTable view of the requested group (/dld or /tdc) instead of a materialized DataFrame. Use this on small-RAM machines: the file is opened with h5py and downstream analyses can iterate chunks via LazyTable.iter_chunks(). The caller is responsible for closing the table (preferably via a with block).

Returns:

Contains relevant information from the requested group, materialized or lazy depending on lazy.

Return type:

DataFrame | LazyTable

pyccapt.calibration.data_tools.data_loadcrop.fetch_dataset_with_tdc(filename: str, tdc_extract_mode: str = 'tdc_sc') → tuple[DataFrame, DataFrame][source]

Load both the dld and tdc dataframes and add shared event_group_id.

The dld dataframe gets a single new integer column event_group_id that survives drop/iloc/reset_index. The tdc dataframe gets the same column plus has_dld_match (bool) so orphan tdc rows can be preserved at save time even after dld filtering.

pyccapt.calibration.data_tools.data_loadcrop.filter_tdc_by_dld(dld_df: DataFrame, tdc_df: DataFrame) → DataFrame[source]

Return tdc rows whose dld counterpart still exists, plus all orphan rows.

Rule: drop a tdc row only when its linked dld row was deleted. Orphan tdc rows (pulses that never produced a dld event) are always preserved.

pyccapt.calibration.data_tools.data_loadcrop.plot_FDM(ax, fig, data, bins=(256, 256), save_name=None)[source]: Backward-compatible wrapper for legacy plot_FDM API.

pyccapt.calibration.data_tools.data_loadcrop.plot_crop_FDM(ax, fig, data, bins=(256, 256), save_name=None)[source]

Backward-compatible wrapper for legacy camelCase API.

Notes

The ax and fig arguments are kept for compatibility with older call sites and are not used directly because the modern API creates its own figure.

pyccapt.calibration.data_tools.data_loadcrop.plot_crop_experiment_history(data: DataFrame, variables, max_tof, frac=1.0, bins=(1200, 800), figure_size=(8, 3), draw_rect=False, data_crop=True, pulse_plot=False, dc_plot=True, pulse_mode='voltage', save=True, figname='')[source]

Plots the experiment history.

Parameters:

dldGroupStorage – DataFrame containing info about the dld group.
max_tof – The maximum tof to be plotted.
frac – Fraction of the data to be plotted.
figure_size – The size of the figure.
data_crop – Flag to control if only the plot should be shown or cropping functionality should be enabled.
draw_rect – Flag to draw a rectangle over the selected area.
pulse – Flag to choose whether to plot pulse.
pulse_mode – Flag to choose whether to plot pulse voltage or pulse.
dc_plot – Flag to choose whether to plot dc voltage.
save – Flag to choose whether to save the plot or not.
figname – Name of the figure to be saved.

Returns:

None.

pyccapt.calibration.data_tools.data_loadcrop.plot_crop_fdm(x, y, bins=(256, 256), frac=1.0, axis_mode='normal', figure_size=(5, 4), variables=None, range_sequence=[], range_mc=[], range_detx=[], range_dety=[], range_x=[], range_y=[], range_z=[], range_vol=[], data_crop=False, draw_circle=False, mode_selector='circle', save=False, figname='FDM')[source]

Plot and crop the FDM with the option to select a region of interest.

Parameters:

x – x-axis data
y – y-axis data
bins – Number of bins for the histogram as a tuple or a single float as the bin size
frac – Fraction of the data to be plotted
axis_mode – Flag to choose whether to plot axis or scalebar: ‘normal’ or ‘scalebar’
variables – Variables object
range_sequence – Range of sequence
range_mc – Range of mc
range_detx – Range of detx
range_dety – Range of dety
range_x – Range of x-axis
range_y – Range of y-axis
range_z – Range of z-axis
range_vol – Range of voltage
figure_size – Size of the plot
draw_circle – Flag to enable circular region of interest selection
mode_selector – Mode of selection (circle or ellipse)
save – Flag to choose whether to save the plot or not
data_crop – Flag to control whether only the plot is shown or cropping functionality is enabled
figname – Name of the figure to be saved

Returns:

None

pyccapt.calibration.data_tools.data_loadcrop.rectangle_box_selector(axisObject, variables)[source]

Enable the creation of a rectangular box to select the region of interest.

Parameters:

axisObject – Object to create the rectangular box
variables – Variables object

Returns:

None

pyccapt.calibration.data_tools.data_tools module

Data loading and persistence helpers for calibration workflows.

pyccapt.calibration.data_tools.data_tools.convert_mat_to_df(hdf5_file_response: dict) → DataFrame[source]: Convert loaded .mat content to a dataframe and persist it as HDF5.

pyccapt.calibration.data_tools.data_tools.extract_data(data, variables, flightPathLength_d, max_mc)[source]: Extract common calibrated arrays and metadata into shared variables.

pyccapt.calibration.data_tools.data_tools.load_data(dataset_path, data_type, mode='processed', *, load_tdc=False, tdc_extract_mode='tdc_sc')[source]

Load supported dataset formats into the pyccapt dataframe convention.

When load_tdc=True (only supported for data_type='pyccapt' in mode='raw'), also reads the /tdc group and returns (dld_df, tdc_df). Both dataframes carry a shared event_group_id column so the dld→tdc link survives downstream cropping.

pyccapt.calibration.data_tools.data_tools.pyccapt_raw_to_processed(data)[source]

Convert a raw pyccapt dataframe to the processed schema.

Existing calibrated columns are preserved when present:

If the input already has mc (Da) (e.g. it came from a partly-processed bundle), it is copied through untouched.
If the input already has mc_uc (Da), that is also copied through.
Otherwise, when the inputs needed for the uncalibrated mc formula are all present (t (ns), high_voltage (V), x_det (cm), y_det (cm)), mc_uc (Da) is computed on the fly using tof2mc(t0=0, V_pulse=0, flightPathLength=110, mode='voltage') — the same uncalibrated formula the legacy raw-data notebook used to produce Figure 6A in the PyCCAPT paper. This makes raw acquisition files (which have never been through calibration) usable in downstream M/C plots.

Columns that have no obvious raw equivalent (x/y/z (nm), t_c (ns), delta_p, multi) are zero-initialized as before.

pyccapt.calibration.data_tools.data_tools.read_hdf5(filename: str | Path, *, lazy: bool = False)[source]

Read non-pandas HDF5 content into a dictionary of dataframes.

Parameters:

filename – HDF5 file path.
lazy – When True, return a pyccapt.calibration.data_tools.lazy_io.LazyTable view that reads each /group/dataset on demand. Use this on small-RAM machines: a 2-3 GB pyccapt-raw file opens with essentially zero resident memory and downstream analyses can call LazyTable.iter_chunks() to stream the rows. The caller is responsible for closing the table (preferably with a with block).

Returns:

dict[str, pd.DataFrame] | LazyTable | None

pyccapt.calibration.data_tools.data_tools.read_mat_files(filename: str | Path)[source]: Read a .mat file and return its dictionary contents.

pyccapt.calibration.data_tools.data_tools.read_range(filename: str | Path) → DataFrame[source]: Read saved range definitions from .h5, .rrng, or legacy .rng files.

pyccapt.calibration.data_tools.data_tools.remove_invalid_data(dld_group_storage: DataFrame, max_tof: float, detector_zero_epsilon: float = 0.001) → DataFrame[source]: Remove invalid TOF and detector rows and return the cleaned dataframe.

pyccapt.calibration.data_tools.data_tools.save_data(data, variables, name=None, hdf=True, epos=False, pos=False, ato_6v=False, csv=False, temp=False, start_index=0, end_index=-1, save_tdc=False, save_range=False)[source]

Persist data in one or more supported export formats.

Parameters:

save_tdc (bool, default False) – If True and hdf=True and variables.data_tdc was loaded, also write a filtered /tdc group into the same HDF5 file. The tdc rows are filtered so that only those whose linked dld row still exists in data are kept; “orphan” tdc rows (pulses that never produced a dld event in the first place) are always preserved.
save_range (bool, default False) – If True and hdf=True and variables.range_data is non-empty, also write the current range table under /range so the calibrated, raw, and ranging information all live in a single h5 file.

pyccapt.calibration.data_tools.data_tools.save_range(variables) → None[source]: Save range data as both HDF5 and CSV.

pyccapt.calibration.data_tools.data_tools.store_df_to_csv(data: DataFrame, path: str | Path) → None[source]: Store dataframe to CSV with project-default encoding and separator.

pyccapt.calibration.data_tools.data_tools.store_df_to_hdf(dataframe, key, filename, *, format: str = 'fixed')[source]

Store dataframe to HDF5.

Parameters:

dataframe – Pandas DataFrame to serialize.
key – HDF5 key (e.g. "df").
filename – Destination .h5 path.
format – Pytables format. "fixed" (default, fast full-file reads, single-write) or "table" (slightly slower but supports pd.read_hdf(..., iterator=True, chunksize=...) and is more permissive about mixed-dtype columns). Use "table" for big raw outputs you’ll later want to stream back without loading the whole file into RAM.

Supports both modern argument order (dataframe, key, filename) and the legacy order (filename, dataframe, key) for backwards compatibility.

pyccapt.calibration.data_tools.dataset_path_qt module

Qt file/directory pickers for the dataset-path Browse buttons.

PyQt6 is imported lazily inside the dialog functions so that headless CI environments (which don’t ship PyQt6 by default) can still import this module to read the filter constants – the dialog itself is only ever invoked from the Jupyter notebooks, where PyQt6 is present.

pyccapt.calibration.data_tools.dataset_path_qt.gui_dirname(initial_directory)[source]: Select an existing directory via a dialog and return its path.

pyccapt.calibration.data_tools.dataset_path_qt.gui_fname(initial_directory, file_kind='any')[source]

Select a file via a dialog and return the file name.

Parameters:: initial_directory (str) – path to the initial directory.
Returns:: path to the chosen file.
Return type:: chosen_file (str)

pyccapt.calibration.data_tools.merge_range module

pyccapt.calibration.data_tools.merge_range.merge_by_range(data_df, range_df, full=False)[source]

Optimized merging function based on the ‘mc’ column value falling within the ‘mc_low’ and ‘mc_up’ range. Uses vectorized operations for performance.

Parameters:

data_df (pd.DataFrame) – The dataframe containing the data to be merged.
range_df (pd.DataFrame) – The dataframe containing the range values ‘mc_low’ and ‘mc_up’.
full (bool) – If True, the merged dataframe will contain all columns from the range_df. Default is False.

Returns:

The merged dataframe with the range data attached.

Return type:

pd.DataFrame

pyccapt.calibration.data_tools.plot_vline_draw module

class pyccapt.calibration.data_tools.plot_vline_draw.HorizontalZoom(ax, fig)[source]

Bases: object

on_key_press(event)[source]

on_key_release(event)[source]

on_scroll(event)[source]

class pyccapt.calibration.data_tools.plot_vline_draw.VerticalLineManager(variables, ax, fig, x, y)[source]

Bases: object

on_key_press(event)[source]

on_key_release(event)[source]

on_motion(event)[source]

on_press(event)[source]

on_release(event)[source]

on_scroll(event)[source]

remove_all_lines()[source]

update_variables(line_pos)[source]

pyccapt.calibration.data_tools.raw_data_surface_concept module

pyccapt.calibration.data_tools.raw_data_surface_concept.find_consecutive_sequences(start_counter, channel, time_data, high_voltage, pulse, print_stats=False)[source]

“: Find the consecutive sequences of the start counter and the corresponding channels

Parameters:

start_counter – list of start counter values
channel – list of channel values
time_data – list of time data values
high_voltage – list of high voltage values
pulse – list of pulse values
print_stats – bool, print the statistics of the sequences

Returns:

list of dictionaries containing the sequences

Return type:

result

pyccapt.calibration.data_tools.raw_data_surface_concept.find_consecutive_sequences_seperatly(start_counter, channel, time_data, high_voltage, pulse)[source]

” find the consecutive sequences of the start counter and the corresponding channels :param start_counter: list of start counter values :param channel: list of channel values :param time_data: list of time data values :param high_voltage: list of high voltage values :param pulse: list of pulse values

Returns:: list of dictionaries containing the valid sequences of 4 channels result_4_invalid: list of dictionaries containing the invalid sequences of 4 channels result_3_invalid: list of dictionaries containing the invalid sequences of 3 channels result_2_invalid: list of dictionaries containing the invalid sequences of 2 channels result_1_invalid: list of dictionaries containing the invalid sequences of 1 channels result_other_odd: list of dictionaries containing the sequences of odd length result_other_even: list of dictionaries containing the sequences of even length
Return type:: result_4

pyccapt.calibration.data_tools.raw_data_surface_concept.find_nth_max_repeated_indices(nums, n)[source]

Find the start/end indices of the n-th longest repeated run (1-based).

Parameters:

nums
n – 1 returns the longest run, 2 the second-longest, and so on.

Returns:

pyccapt.calibration.data_tools.raw_data_surface_concept.iter_consecutive_sequences(start_counter, channel, time_data, high_voltage, pulse, show_progress=False)[source]

Generator version of find_consecutive_sequences().

Yields one pulse-record dict at a time instead of building the full list, so peak memory is O(1) records rather than O(N). The yielded dicts have the same schema as find_consecutive_sequences returns.

pyccapt.calibration.data_tools.run_dataset_path_qt module

pyccapt.calibration.data_tools.selectors_data module

class pyccapt.calibration.data_tools.selectors_data.CircleSelector(ax, onselect=None, *, minspanx=0, minspany=0, useblit=False, props=None, spancoords='data', button=None, grab_range=10, handle_props=None, interactive=False, state_modifier_keys=None, drag_from_anywhere=False, ignore_event_outside=False, use_data_coordinates=False)[source]

Bases: RectangleSelector

Select a circular region of an Axes.

For the cursor to remain responsive you must keep a reference to it.

Press and release events triggered at the same coordinates outside the selection will clear the selector, except when ignore_event_outside=True.

Examples

/gallery/widgets/rectangle_selector

pyccapt.calibration.data_tools.selectors_data.line_select_callback(eclick, erelease, variables)[source]

Callback function for line selection event.

Parameters:

eclick (MouseEvent) – Event object representing the press event.
erelease (MouseEvent) – Event object representing the release event.
variables (object) – Object containing the variables.

pyccapt.calibration.data_tools.selectors_data.onselect(eclick, erelease, variables)[source]

Callback function for the selection event.

Parameters:

eclick (MouseEvent) – Event object representing the click event.
erelease (MouseEvent) – Event object representing the release event.
variables (object) – Object containing the variables.

pyccapt.calibration.data_tools.selectors_data.toggle_selector(event)[source]

Toggles the rectangle selector based on the key press event.

Parameters:: event (KeyEvent) – Event object representing the key press event.

Module contents

Data loading, cropping, and format conversion utilities.