Skip to content

Overview of IFCB raw data format

Joe Futrelle edited this page Nov 14, 2016 · 14 revisions

Files, filenames, and bins

IFCB collects data for a sample and stores it in three files:

  • The header file, containing metadata
  • The ADC file, containing non-image data
  • The ROI file, containing image data

File names include the date and time that the sample was run, and the instrument number identifying the IFCB that ran the sample.

Here's an example set of file names:

D20150314T203456_IFCB102.hdr
D20150314T203456_IFCB102.adc
D20150314T203456_IFCB102.roi

These three files contain all data for a sample run by IFCB 102 on March 14, 2015 at 20:34:56 UTC. The .hdr file contains metadata, the .adc file contains non-image data, and the .roi file contains image data.

To be easily accessible, all three files for each sample should be placed in the same directory, which is what IFCB does on its internal hard drive.

Terminology

These data files are called "raw" data because they contain the data collected by the instrument, rather than the results of processing that data in any way.

Because each set of three data files is a "bin" of target and image data, pyifcb uses the term "bin" to refer to a file set.

Targets and images

Non-image data is tabular data where each row represents a target. Data for each target includes fluorescence and scattering measurements along with technical information such trigger number and image size. Not all targets are associated with an image, but each image is associated with a target. Targets are numbered consecutively in each file set starting with target number 1.

Images are not all the same size. Each image is a crop of a single video frame, and because it represents a region of interest in the frame, IFCB images are referred to as "regions of interest" or "ROIs".

Persistent identifiers (PIDs) and local identifiers (LIDs)

Every set of raw data files conceptually has a persistent identifier, or PID. This identifier is derived from the filename, or in the case of web-accessible files, the URL. A PID typically includes the part of the raw data filenames that does not include the extension. For the example above, the PID is:

D20150314T203456_IFCB102

A PID in this form is also called a local identifier, or LID. For raw data access, the PID and LID are usually the same. For URL-based access, the LID includes the part of the URL after the final slash and before the extension.

Caveats

Older file formats

Older pre-commercial IFCBs have a different file naming convention. Here's an example filename for one of these older IFCBs:

IFCB5_2016_032_012345.adc

This file would contain the non-image data for a sample taken by IFCB 5 on February 1, 2016 at 01:23:45 UTC.

Also, in some cases the data for one sample will be split into two sets of three files, where the timestamp on the second file indicates when the split occurred rather than when the initial sample run started.

PIDs and LIDs for targets and products

PIDs can contain information above and beyond what is in the filename. They can include target numbers, extensions, and product names. These are typically not relevant for raw data access, but pyifcb provides access to them.