Skip to content

Files and directories

Joe Futrelle edited this page Jun 22, 2017 · 13 revisions

Overview

pyifcb provides several ways to access and locate IFCB data files.

Accessing files

If you know the pathnames of a set of three raw data files (or the pathname of any one of them), you can access the data from the files using the open_raw function. For example:

import ifcb

PATHAME = '/mnt/ifcb/data/D20150101T123456_IFCB102.adc'

sample_bin = ifcb.open_raw(PATHNAME)

open_raw also works as a context manager, which is the recommended way to use it if you are going to access image data.

with ifcb.open_raw(PATHNAME) as sample_bin:
    ...

For more on how to use the context manager support, see Opening and closing bins.

Directory structure

An IFCB data collection that includes many samples comprises a large number of files. It can become inconvenient to locate all of these files in the same directory, so pyifcb provides ways of accessing files from a set of sample bins even if the files are organized into a directory hierarchy.

pyifcb assumes that directory hierarchies are organized based on date and time. The best practice is to use directory names that are prefixes of the filenames in them. For example, a directory called D2016 should contain only data from 2016 (i.e., files whose names start D2016). Inside that directory there could be a directory called D20161020 which would contain only data from October 20, 2016 (i.e., files whose names start D20161020). pyifcb does not require organization into this year/day organization--you could also organize files just by year, or by year/month/day, or even by year/month/day/hour.

For small data collections, organizing files into directories is usually overkill.

If you know the LID of a file set that is located in a directory structure, you can access it using DataDirectory:

import ifcb

data_dir = ifcb.DataDirectory('/mnt/ifcb/data')

sample_bin = data_dir['D20150101T123456_IFCB102']

If you want to access all the file sets in a data directory, you can iterate over a DataDirectory:

for sample_bin in data_dir:
    number_of_images = len(sample_bin.images)
    lid = sample_bin.lid
    print('{} has {} image(s)'.format(lid, number_of_images))

Note that if you are iterating over the bins in a data directory and want to access images from each bin, it is still best to use the context manager interface for each bin. For example, this computes the average image intensity for each image in all samples in a data directory:

import numpy as np

for sample_bin in data_dir:
    lid = sample_bin.lid
    with sample_bin:
        for roi_number in sample_bin.images:
            avg_intensity = np.mean(sample_bin.images[roi_number])
            print('{} ROI #{} has average intensity {}'.format(lid, roi_number, avg_intensity))