Improve ApplyImpulseResponse documentation

iver56 · Sep 7, 2023 · a3544a1 · a3544a1
1 parent 498a7d4
commit a3544a1
Show file tree

Hide file tree

Showing 7 changed files with 76 additions and 5 deletions.
diff --git a/demo/generate_examples_for_doc.py b/demo/generate_examples_for_doc.py
@@ -279,6 +279,27 @@ def generate_example(self):
         return sound, transformed_sound, sample_rate
 
 
+@register
+class ApplyImpulseResponseExample(TransformUsageExample):
+    transform_class = ApplyImpulseResponse
+
+    def generate_example(self):
+        random.seed(42)
+        np.random.seed(42)
+        transform = ApplyImpulseResponse(
+            ir_path=os.path.join(DEMO_DIR, "ir", "rir48000.wav"), p=1.0
+        )
+
+        sound, sample_rate = load_sound_file(
+            os.path.join(DEMO_DIR, "p286_011.wav"), sample_rate=None
+        )
+        sound = sound[..., int(0.5 * sample_rate) : int(2.9 * sample_rate)]
+
+        transformed_sound = transform(sound, sample_rate)
+
+        return sound, transformed_sound, sample_rate
+
+
 @register
 class LimiterExample(TransformUsageExample):
     transform_class = Limiter

diff --git a/demo/ir/rir48000.wav b/demo/ir/rir48000.wav
diff --git a/docs/waveform_transforms/ApplyImpulseResponse.webp b/docs/waveform_transforms/ApplyImpulseResponse.webp
diff --git a/docs/waveform_transforms/ApplyImpulseResponse_input.flac b/docs/waveform_transforms/ApplyImpulseResponse_input.flac
diff --git a/docs/waveform_transforms/ApplyImpulseResponse_transformed.flac b/docs/waveform_transforms/ApplyImpulseResponse_transformed.flac
diff --git a/docs/waveform_transforms/apply_impulse_response.md b/docs/waveform_transforms/apply_impulse_response.md
@@ -2,23 +2,73 @@
 
 _Added in v0.7.0_
 
-Convolve the audio with a randomly selected impulse response.
+This transform convolves the audio with a randomly selected (room) impulse response file.
+
+`ApplyImpulseResponse` is commonly used as a data augmentation technique to **add
+realistic-sounding reverb to recordings**, for example to make denoisers and speech
+recognition systems more robust to different acoustic environments as well as a range of
+different distances between the main sound source and the microphone. It could also be
+used to generate roomy audio examples for the training of dereverberation models.
+
+Convolution with an impulse response is a powerful technique in signal processing that
+can be employed to emulate the acoustic characteristics of specific environments or
+devices. This process can transform a dry recording, giving it the sonic signature of
+being played in a specific location or through a particular device.
+
+**What is an impulse response?** An impulse response (IR) captures the unique acoustical
+signature of a space or object. It's essentially a recording of how a specific
+environment or system responds to an impulse (a short, sharp sound). By convolving
+an audio signal with an impulse response, we can simulate how that signal would sound in
+the captured environment.
+
+Note that some impulse responses, especially those captured in larger spaces or from
+specific equipment, can introduce a noticeable delay when convolved with an audio
+signal. In some applications, this delay is a desirable property. However, in some other
+applications, the convolved audio should not have a delay compared to the original
+audio. If this is the case for you, you can align the audio afterwards with
+[fast-align-audio](https://github.com/nomonosound/fast-align-audio), for example.
+
 Impulse responses can be created using e.g. [http://tulrich.com/recording/ir_capture/](http://tulrich.com/recording/ir_capture/)
 
 Some datasets of impulse responses are publicly available:
-- [EchoThief](http://www.echothief.com/) containing 115 impulse responses acquired in a
+
+* [EchoThief](http://www.echothief.com/) containing 115 impulse responses acquired in a
  wide range of locations.
-- [The MIT McDermott](https://mcdermottlab.mit.edu/Reverb/IR_Survey.html) dataset
+* [The MIT McDermott](https://mcdermottlab.mit.edu/Reverb/IR_Survey.html) dataset
  containing 271 impulse responses acquired in everyday places.
 
 Impulse responses are represented as audio (ideally wav) files in the given `ir_path`.
 
+Another thing worth checking is that your IR files have the same sample rate as your
+audio inputs. Why? Because if they have different sample rates, the internal resampling
+will slow down execution, and because some high frequencies may get lost.
+
+## Input-output example
+
+Here we make a dry speech recording quite reverbant by convolving it with a room impulse response
+
+![Input-output waveforms and spectrograms](ApplyImpulseResponse.webp)
+
+| Input sound                                                                                 | Transformed sound                                                                                 |
+|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
+| <audio controls><source src="../ApplyImpulseResponse_input.flac" type="audio/flac"></audio> | <audio controls><source src="../ApplyImpulseResponse_transformed.flac" type="audio/flac"></audio> | 
+
+## Usage example
+
+```python
+from audiomentations import ApplyImpulseResponse
+
+transform = ApplyImpulseResponse(ir_path="/path/to/sound_folder", p=1.0)
+
+augmented_sound = transform(my_waveform_ndarray, sample_rate=48000)
+```
+
 ## ApplyImpulseResponse API
 
 [`ir_path`](#ir_path){ #ir_path }: `Union[List[Path], List[str], str, Path]`
 :   :octicons-milestone-24: A path or list of paths to audio file(s) and/or folder(s) with
     audio files. Can be `str` or `Path` instance(s). The audio files given here are
-    supposed to be impulse responses.
+    supposed to be (room) impulse responses.
 
 [`p`](#p){ #p }: `float` • range: [0.0, 1.0]
 :   :octicons-milestone-24: Default: `0.5`. The probability of applying this transform.

diff --git a/docs/waveform_transforms/high_pass_filter.md b/docs/waveform_transforms/high_pass_filter.md
@@ -3,7 +3,7 @@
 _Added in v0.18.0, updated in v0.21.0_
 
 Apply high-pass filtering to the input audio of parametrized filter steepness (6/12/18... dB / octave).
-Can also be set for zero-phase filtering (will result in a 6db drop at cutoff).
+Can also be set for zero-phase filtering (will result in a 6 dB drop at cutoff).
 
 # HighPassFilter API