-
Notifications
You must be signed in to change notification settings - Fork 6
Overview of IFCB raw data format
IFCB collects data for a sample and stores it in three files:
- The header file, containing metadata
- The ADC file, containing non-image data
- The ROI file, containing image data
File names include the date and time that the sample was run, and the instrument number identifying the IFCB that ran the sample.
Here's an example set of file names:
D20150314T203456_IFCB102.hdr
D20150314T203456_IFCB102.adc
D20150314T203456_IFCB102.roi
These three files contain all data for a sample run by IFCB 102 on March 14, 2015 at 20:34:56 UTC. The .hdr
file contains metadata, the .adc
file contains non-image data, and the .roi
file contains image data.
To be easily accessible, all three files for each sample should be placed in the same directory, which is what IFCB does on its internal hard drive.
These data files are called "raw" data because they contain the data collected by the instrument, rather than the results of processing that data in any way.
Because each set of three data files is a "bin" of target and image data, pyifcb
uses the term "bin" to refer to a file set.
Non-image data is tabular data where each row represents a target. Data for each target includes fluorescence and scattering measurements along with technical information such trigger number and image size. Not all targets are associated with an image, but each image is associated with a target. Targets are numbered consecutively in each file set starting with target number 1.
Images are not all the same size. Each image is a crop of a single video frame, and because it represents a region of interest in the frame, IFCB images are referred to as "regions of interest" or "ROIs".
Every set of raw data files conceptually has a persistent identifier, or PID. This identifier is derived from the filename, or in the case of web-accessible files, the URL. A PID typically includes the part of the raw data filenames that does not include the extension. For the example above, the PID is:
D20150314T203456_IFCB102
A PID in this form is also called a local identifier, or LID. For raw data access, the PID and LID are usually the same. For URL-based access, the LID includes the part of the URL after the final slash and before the extension.
Older pre-commercial IFCBs have a different file naming convention. Here's an example filename for one of these older IFCBs:
IFCB5_2016_032_012345.adc
This file would contain the non-image data for a sample taken by IFCB 5 on February 1, 2016 at 01:23:45 UTC.
Also, in some cases the data for one sample will be split into two sets of three files, where the timestamp on the second file indicates when the split occurred rather than when the initial sample run started.
PIDs can contain information above and beyond what is in the filename. They can include target numbers, extensions, and product names. These are typically not relevant for raw data access, but pyifcb
provides access to them.