pyccapt.calibration.data_tools package
Submodules
pyccapt.calibration.data_tools.ato_tools module
- pyccapt.calibration.data_tools.ato_tools.ato_to_ccapt(file_path: str, mode: str) DataFrame[source]
Read data from an .ato file version 6 and convert it into a pandas DataFrame.
- Parameters:
file_path – Path to the .ato file
mode – Type of mode (oxcart/ato)
- Returns:
Pandas DataFrame containing the converted data
pyccapt.calibration.data_tools.data_loadcrop module
- pyccapt.calibration.data_tools.data_loadcrop.build_event_group_mapping(dld_start_counter: ndarray, tdc_start_counter: ndarray) tuple[ndarray, ndarray, ndarray][source]
Assign shared event-group ids linking dld rows to their tdc rows.
The start_counter wraps; values are not unique by themselves. But within each pulse trigger, all related tdc rows are consecutive in time order, and so are any dld rows for that pulse. A two-pointer sweep over the consecutive runs is enough to pair them: each tdc run either matches the next unmatched dld run (same counter value), or it is an “orphan” pulse that did not produce a reconstructible dld event.
- Returns:
dld_gid (int64 array, parallel to
dld_start_counter) – Group id assigned to each dld row.tdc_gid (int64 array, parallel to
tdc_start_counter) – Group id for matched tdc rows;-1for orphan tdc rows.tdc_has_match (bool array, parallel to
tdc_start_counter) – True iff the tdc row’s pulse trigger has at least one dld row.
- pyccapt.calibration.data_tools.data_loadcrop.calculate_ppi_and_ipp(data, max_start_counter)[source]
Calculate pulses since the last event pulse and ions per pulse.
- Parameters:
- Returns:
A tuple containing two numpy arrays: delta_p and multi.
- Return type:
- Raises:
IndexError – If the length of counter is less than 1.
- pyccapt.calibration.data_tools.data_loadcrop.concatenate_dataframes_of_dld_grp(dataframe_list: list) DataFrame[source]
Concatenates dataframes into a single dataframe.
- Parameters:
dataframe_list – List of different information from dld group.
- Returns:
Single concatenated dataframe containing all relevant information.
- Return type:
DataFrame
- pyccapt.calibration.data_tools.data_loadcrop.create_pandas_dataframe(data_crop, mode='dld', flag_old_pyccpat_data=False)[source]
Create a pandas dataframe from the cropped data.
- Parameters:
data_crop – Cropped dataset
mode – Mode of extraction dld: Extracts data from dld group tdc_sc: Extracts data from tdc for Surface Consept tdc_ro: Extracts data from tdc for RoentDek detector
flag_old_pyccpat_data – Flag to determine if data is already convert from bin to ns and mm (old pyccapt datas)
- Returns:
Dataframe to be inserted in the HDF file
- Return type:
hdf_dataframe
- pyccapt.calibration.data_tools.data_loadcrop.crop_data_after_selection(data_crop, variables)[source]
Crop the dataset after the region of interest has been selected.
- Parameters:
data_crop – Original dataset to be cropped
variables – Variables object
- Returns:
Cropped dataset
- Return type:
data_crop
- pyccapt.calibration.data_tools.data_loadcrop.crop_dataset(dld_master_dataframe, variables)[source]
Crop the dataset based on the selected region of interest.
- Parameters:
dld_master_dataframe – Concatenated dataset
variables – Variables object
- Returns:
Cropped dataset
- Return type:
data_crop
- pyccapt.calibration.data_tools.data_loadcrop.elliptical_shape_selector(axisObject, figureObject, variables, mode='circle')[source]
Enable the creation of an elliptical box to select the region of interest.
- Parameters:
axisObject – Object to create the axis of the plot
figureObject – Object to create the figure
variables – Variables object
mode – Mode of selection (circle or ellipse)
- Returns:
None
- pyccapt.calibration.data_tools.data_loadcrop.fetch_dataset_from_dld_grp(filename: str, extract_mode='dld', *, lazy: bool = False)[source]
Fetches dataset from HDF5 file.
- Parameters:
filename – Path to the HDF5 file.
extract_mode – Mode of extraction. dld: Extracts data from dld group. tdc_sc: Extracts data from tdc for Surface Consept. tdc_ro: Extracts data from tdc for Roentdek detector.
lazy – When
True, return apyccapt.calibration.data_tools.lazy_io.LazyTableview of the requested group (/dldor/tdc) instead of a materialized DataFrame. Use this on small-RAM machines: the file is opened with h5py and downstream analyses can iterate chunks viaLazyTable.iter_chunks(). The caller is responsible for closing the table (preferably via awithblock).
- Returns:
Contains relevant information from the requested group, materialized or lazy depending on
lazy.- Return type:
DataFrame | LazyTable
- pyccapt.calibration.data_tools.data_loadcrop.fetch_dataset_with_tdc(filename: str, tdc_extract_mode: str = 'tdc_sc') tuple[DataFrame, DataFrame][source]
Load both the dld and tdc dataframes and add shared
event_group_id.The dld dataframe gets a single new integer column
event_group_idthat survivesdrop/iloc/reset_index. The tdc dataframe gets the same column plushas_dld_match(bool) so orphan tdc rows can be preserved at save time even after dld filtering.
- pyccapt.calibration.data_tools.data_loadcrop.filter_tdc_by_dld(dld_df: DataFrame, tdc_df: DataFrame) DataFrame[source]
Return tdc rows whose dld counterpart still exists, plus all orphan rows.
Rule: drop a tdc row only when its linked dld row was deleted. Orphan tdc rows (pulses that never produced a dld event) are always preserved.
- pyccapt.calibration.data_tools.data_loadcrop.plot_FDM(ax, fig, data, bins=(256, 256), save_name=None)[source]
Backward-compatible wrapper for legacy plot_FDM API.
- pyccapt.calibration.data_tools.data_loadcrop.plot_crop_FDM(ax, fig, data, bins=(256, 256), save_name=None)[source]
Backward-compatible wrapper for legacy camelCase API.
Notes
The ax and fig arguments are kept for compatibility with older call sites and are not used directly because the modern API creates its own figure.
- pyccapt.calibration.data_tools.data_loadcrop.plot_crop_experiment_history(data: DataFrame, variables, max_tof, frac=1.0, bins=(1200, 800), figure_size=(8, 3), draw_rect=False, data_crop=True, pulse_plot=False, dc_plot=True, pulse_mode='voltage', save=True, figname='')[source]
Plots the experiment history.
- Parameters:
dldGroupStorage – DataFrame containing info about the dld group.
max_tof – The maximum tof to be plotted.
frac – Fraction of the data to be plotted.
figure_size – The size of the figure.
data_crop – Flag to control if only the plot should be shown or cropping functionality should be enabled.
draw_rect – Flag to draw a rectangle over the selected area.
pulse – Flag to choose whether to plot pulse.
pulse_mode – Flag to choose whether to plot pulse voltage or pulse.
dc_plot – Flag to choose whether to plot dc voltage.
save – Flag to choose whether to save the plot or not.
figname – Name of the figure to be saved.
- Returns:
None.
- pyccapt.calibration.data_tools.data_loadcrop.plot_crop_fdm(x, y, bins=(256, 256), frac=1.0, axis_mode='normal', figure_size=(5, 4), variables=None, range_sequence=[], range_mc=[], range_detx=[], range_dety=[], range_x=[], range_y=[], range_z=[], range_vol=[], data_crop=False, draw_circle=False, mode_selector='circle', save=False, figname='FDM')[source]
Plot and crop the FDM with the option to select a region of interest.
- Parameters:
x – x-axis data
y – y-axis data
bins – Number of bins for the histogram as a tuple or a single float as the bin size
frac – Fraction of the data to be plotted
axis_mode – Flag to choose whether to plot axis or scalebar: ‘normal’ or ‘scalebar’
variables – Variables object
range_sequence – Range of sequence
range_mc – Range of mc
range_detx – Range of detx
range_dety – Range of dety
range_x – Range of x-axis
range_y – Range of y-axis
range_z – Range of z-axis
range_vol – Range of voltage
figure_size – Size of the plot
draw_circle – Flag to enable circular region of interest selection
mode_selector – Mode of selection (circle or ellipse)
save – Flag to choose whether to save the plot or not
data_crop – Flag to control whether only the plot is shown or cropping functionality is enabled
figname – Name of the figure to be saved
- Returns:
None
pyccapt.calibration.data_tools.data_tools module
Data loading and persistence helpers for calibration workflows.
- pyccapt.calibration.data_tools.data_tools.convert_mat_to_df(hdf5_file_response: dict) DataFrame[source]
Convert loaded .mat content to a dataframe and persist it as HDF5.
- pyccapt.calibration.data_tools.data_tools.extract_data(data, variables, flightPathLength_d, max_mc)[source]
Extract common calibrated arrays and metadata into shared variables.
- pyccapt.calibration.data_tools.data_tools.load_data(dataset_path, data_type, mode='processed', *, load_tdc=False, tdc_extract_mode='tdc_sc')[source]
Load supported dataset formats into the pyccapt dataframe convention.
When
load_tdc=True(only supported fordata_type='pyccapt'inmode='raw'), also reads the/tdcgroup and returns(dld_df, tdc_df). Both dataframes carry a sharedevent_group_idcolumn so the dld→tdc link survives downstream cropping.
- pyccapt.calibration.data_tools.data_tools.pyccapt_raw_to_processed(data)[source]
Convert a raw pyccapt dataframe to the processed schema.
Existing calibrated columns are preserved when present:
If the input already has
mc (Da)(e.g. it came from a partly-processed bundle), it is copied through untouched.If the input already has
mc_uc (Da), that is also copied through.Otherwise, when the inputs needed for the uncalibrated mc formula are all present (
t (ns),high_voltage (V),x_det (cm),y_det (cm)),mc_uc (Da)is computed on the fly usingtof2mc(t0=0, V_pulse=0, flightPathLength=110, mode='voltage')— the same uncalibrated formula the legacy raw-data notebook used to produce Figure 6A in the PyCCAPT paper. This makes raw acquisition files (which have never been through calibration) usable in downstream M/C plots.
Columns that have no obvious raw equivalent (
x/y/z (nm),t_c (ns),delta_p,multi) are zero-initialized as before.
- pyccapt.calibration.data_tools.data_tools.read_hdf5(filename: str | Path, *, lazy: bool = False)[source]
Read non-pandas HDF5 content into a dictionary of dataframes.
- Parameters:
filename – HDF5 file path.
lazy – When
True, return apyccapt.calibration.data_tools.lazy_io.LazyTableview that reads each/group/dataseton demand. Use this on small-RAM machines: a 2-3 GB pyccapt-raw file opens with essentially zero resident memory and downstream analyses can callLazyTable.iter_chunks()to stream the rows. The caller is responsible for closing the table (preferably with awithblock).
- Returns:
dict[str, pd.DataFrame] | LazyTable | None
- pyccapt.calibration.data_tools.data_tools.read_mat_files(filename: str | Path)[source]
Read a .mat file and return its dictionary contents.
- pyccapt.calibration.data_tools.data_tools.read_range(filename: str | Path) DataFrame[source]
Read saved range definitions from .h5, .rrng, or legacy .rng files.
- pyccapt.calibration.data_tools.data_tools.remove_invalid_data(dld_group_storage: DataFrame, max_tof: float) DataFrame[source]
Remove invalid TOF and detector rows and return the cleaned dataframe.
- pyccapt.calibration.data_tools.data_tools.save_data(data, variables, name=None, hdf=True, epos=False, pos=False, ato_6v=False, csv=False, temp=False, start_index=0, end_index=-1, save_tdc=False, save_range=False)[source]
Persist data in one or more supported export formats.
- Parameters:
save_tdc (bool, default False) – If True and
hdf=Trueandvariables.data_tdcwas loaded, also write a filtered/tdcgroup into the same HDF5 file. The tdc rows are filtered so that only those whose linked dld row still exists indataare kept; “orphan” tdc rows (pulses that never produced a dld event in the first place) are always preserved.save_range (bool, default False) – If True and
hdf=Trueandvariables.range_datais non-empty, also write the current range table under/rangeso the calibrated, raw, and ranging information all live in a single h5 file.
- pyccapt.calibration.data_tools.data_tools.save_range(variables) None[source]
Save range data as both HDF5 and CSV.
- pyccapt.calibration.data_tools.data_tools.store_df_to_csv(data: DataFrame, path: str | Path) None[source]
Store dataframe to CSV with project-default encoding and separator.
- pyccapt.calibration.data_tools.data_tools.store_df_to_hdf(dataframe, key, filename, *, format: str = 'fixed')[source]
Store dataframe to HDF5.
- Parameters:
dataframe – Pandas DataFrame to serialize.
key – HDF5 key (e.g.
"df").filename – Destination
.h5path.format – Pytables format.
"fixed"(default, fast full-file reads, single-write) or"table"(slightly slower but supportspd.read_hdf(..., iterator=True, chunksize=...)and is more permissive about mixed-dtype columns). Use"table"for big raw outputs you’ll later want to stream back without loading the whole file into RAM.
Supports both modern argument order
(dataframe, key, filename)and the legacy order(filename, dataframe, key)for backwards compatibility.
pyccapt.calibration.data_tools.dataset_path_qt module
Qt file/directory pickers for the dataset-path Browse buttons.
PyQt6 is imported lazily inside the dialog functions so that headless CI environments (which don’t ship PyQt6 by default) can still import this module to read the filter constants – the dialog itself is only ever invoked from the Jupyter notebooks, where PyQt6 is present.
pyccapt.calibration.data_tools.merge_range module
- pyccapt.calibration.data_tools.merge_range.merge_by_range(data_df, range_df, full=False)[source]
Optimized merging function based on the ‘mc’ column value falling within the ‘mc_low’ and ‘mc_up’ range. Uses vectorized operations for performance.
- Parameters:
data_df (pd.DataFrame) – The dataframe containing the data to be merged.
range_df (pd.DataFrame) – The dataframe containing the range values ‘mc_low’ and ‘mc_up’.
full (bool) – If True, the merged dataframe will contain all columns from the range_df. Default is False.
- Returns:
The merged dataframe with the range data attached.
- Return type:
pd.DataFrame
pyccapt.calibration.data_tools.plot_vline_draw module
pyccapt.calibration.data_tools.raw_data_surface_concept module
- pyccapt.calibration.data_tools.raw_data_surface_concept.find_consecutive_sequences(start_counter, channel, time_data, high_voltage, pulse, print_stats=False)[source]
- “
Find the consecutive sequences of the start counter and the corresponding channels
- Parameters:
start_counter – list of start counter values
channel – list of channel values
time_data – list of time data values
high_voltage – list of high voltage values
pulse – list of pulse values
print_stats – bool, print the statistics of the sequences
- Returns:
list of dictionaries containing the sequences
- Return type:
result
- pyccapt.calibration.data_tools.raw_data_surface_concept.find_consecutive_sequences_seperatly(start_counter, channel, time_data, high_voltage, pulse)[source]
” find the consecutive sequences of the start counter and the corresponding channels :param start_counter: list of start counter values :param channel: list of channel values :param time_data: list of time data values :param high_voltage: list of high voltage values :param pulse: list of pulse values
- Returns:
list of dictionaries containing the valid sequences of 4 channels result_4_invalid: list of dictionaries containing the invalid sequences of 4 channels result_3_invalid: list of dictionaries containing the invalid sequences of 3 channels result_2_invalid: list of dictionaries containing the invalid sequences of 2 channels result_1_invalid: list of dictionaries containing the invalid sequences of 1 channels result_other_odd: list of dictionaries containing the sequences of odd length result_other_even: list of dictionaries containing the sequences of even length
- Return type:
result_4
- pyccapt.calibration.data_tools.raw_data_surface_concept.find_nth_max_repeated_indices(nums, n)[source]
Find the start/end indices of the
n-th longest repeated run (1-based).- Parameters:
nums
n – 1 returns the longest run, 2 the second-longest, and so on.
Returns:
- pyccapt.calibration.data_tools.raw_data_surface_concept.iter_consecutive_sequences(start_counter, channel, time_data, high_voltage, pulse, show_progress=False)[source]
Generator version of
find_consecutive_sequences().Yields one pulse-record dict at a time instead of building the full list, so peak memory is O(1) records rather than O(N). The yielded dicts have the same schema as
find_consecutive_sequencesreturns.
pyccapt.calibration.data_tools.run_dataset_path_qt module
pyccapt.calibration.data_tools.selectors_data module
- class pyccapt.calibration.data_tools.selectors_data.CircleSelector(ax, onselect=None, *, minspanx=0, minspany=0, useblit=False, props=None, spancoords='data', button=None, grab_range=10, handle_props=None, interactive=False, state_modifier_keys=None, drag_from_anywhere=False, ignore_event_outside=False, use_data_coordinates=False)[source]
Bases:
RectangleSelectorSelect a circular region of an Axes.
For the cursor to remain responsive you must keep a reference to it.
Press and release events triggered at the same coordinates outside the selection will clear the selector, except when ignore_event_outside=True.
Examples
/gallery/widgets/rectangle_selector
- pyccapt.calibration.data_tools.selectors_data.line_select_callback(eclick, erelease, variables)[source]
Callback function for line selection event.
- Parameters:
eclick (MouseEvent) – Event object representing the press event.
erelease (MouseEvent) – Event object representing the release event.
variables (object) – Object containing the variables.
- pyccapt.calibration.data_tools.selectors_data.onselect(eclick, erelease, variables)[source]
Callback function for the selection event.
- Parameters:
eclick (MouseEvent) – Event object representing the click event.
erelease (MouseEvent) – Event object representing the release event.
variables (object) – Object containing the variables.
Module contents
Data loading, cropping, and format conversion utilities.