Skip to content

Lab 3. Using Neural Nets on the UrbanSound Dataset

David Goedicke edited this page Jul 6, 2022 · 1 revision

Update to get new lab files

In your terminal, change to the directory where you keep your workshop repository. Use git pull to get the new files.

XXXX:01_Spectrum Generation xxxx$ git pull

The program will warn you if it will overwrite any files you have. One basic way to make sure that nothing gets overwritten is to rename your files.

XXXX:01_Spectrum Generation xxxx$ mv Standard.SpecVar WendyStandard.SpecVar

Git should update the files in the repository but leave your own files alone unless they have the same name as files that are in the repository.

Install IPY Widgets

To enable the pulldown menus for the new .ipynb files, install ipy widgets:

conda install -c conda-forge ipywidgets

The UrbanSound dataset

For this lab, we're repeating the process we used for Cats vs. Dogs. but using the UrbanSound Dataset. If you would like, you will have the option of retraining the whole network instead of just the last layer; this will take a longer time.

The Data

The UrbanSound Dataset contains 1302 labeled sound recordings of sound events from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The audio codec, sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).

Take a little time to look at the number of files, and look at some of the files.

Process

As with Cats Vs. Dogs, the process for performing classification is:

Organizing Data -> Generating Spectrums -> Training the Neural Network -> Running the Neural Net.

Organizing Data

The UrbanSound dataset comes with most of the data in a folder called data. To see the same notebook code that we used in Cats Vs Dogs, the folders in UrbanSound/data need to be moved up a level so that the directory tree structure looks like this:

.
├── Cats-Vs-Dogs
│   ├── Cats
│   └── Dogs
└── UrbanSound
    ├── air_conditioner
    ├── car_horn
    ├── children_playing
    ├── data
    ├── dog_bark
    ├── drilling
    ├── engine_idling
    ├── gun_shot
    ├── jackhammer
    ├── siren
    └── street_music

Generating Spectrums

The next step is to compute images from the audio data.

The notebook GeneratingSpectrums2.ipynb in the 01_Spectrum Generation folder will allow you to select which dataset in the 'AudioData' folder you wish to use.

Another update to GeneratingSpectrums2.ipynb, is that you can select your Spectrum Variables file. You can add to the Standard.SpecVar by using SpectrumsSettingTool2.ipynb, playing with Spectrum settings, and saving them.

Training the Neural Network

As before, we will be using a the ResNet CNN.

Please open the notebook TrainingResNets2 in the folder 02_Training. This notebook has been renovated to allow the selection of other GeneratedData sets of Spectrograms, and has more labels between cells to help you understand what is happening at different places in the code.

The training will take much longer than Cats vs. Dogs, especially if you enable training on all variables instead of just the last layer.

Running the Neural Net (Inference)

Try out your neural net using ResNetInferenceInteractive in the folder 03_Running! Does it work as well as Cats vs. Dogs? Why or why not?

As promised, this version of the Inference functions provides more details on how the code responds to the inferences made by the neural net than ResNetInference .

Train a NN with the Stanford Sounds Dataset

Try this again with the Stanford Sounds Dataset! Also, feel free to pad the sounds from that dataset with other sounds from Freesound.