Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Default Modulators #205

Open
spessasus opened this issue Jul 29, 2024 · 11 comments
Open

[ENHANCEMENT] Default Modulators #205

spessasus opened this issue Jul 29, 2024 · 11 comments

Comments

@spessasus
Copy link
Contributor

spessasus commented Jul 29, 2024

EDIT

New version of the proposal is here:
https://github.com/spessasus/soundfont-proposals/blob/main/default_modulators.md

@spessasus
Copy link
Contributor Author

I've created an MD file with the proper proposal:
https://github.com/spessasus/soundfont-proposals/blob/main/default_modulators.md

@derselbst
Copy link

Just to clarify: The proposal suggests that DMOD contains modulators that would be applied on instrument level only? Not on preset level?

@spessasus
Copy link
Contributor Author

Just to clarify: The proposal suggests that DMOD contains modulators that would be applied on instrument level only? Not on preset level?

I worded that poorly. It just works exactly like the stock SF2 default modulators list. I've changed the MD now.

@derselbst
Copy link

Some details are still not clear to me. (Sorry for abusing this issue as discussion).

Current state as per SF-spec

The SF Spec essentially defines 3 levels of modulators:

  1. Default modulators dictated by the SF Spec. These are applied to every instrument. By any SF2 synth. Always.
  2. Instrument level modulators.
  3. Preset level modulators.

Number 3. can override 2., and 2. can override 1. How that works in detail is clearly defined by the spec.


When reading your proposal, the role of the DMOD chunk is unfortunately not yet clear to me.

One possible interpretation: You're trying to introduce an additional level of modulators between 1. and 2.

Another possible interpretation: DMOD chunk shall fully replace or supersede modulators mentioned in no. 1.

Yet another possible interpretation: The DMOD chunk shall only serve for informational purposes to Soundfont editors, allowing them to more cleanly separate "general-apply-to-all-instrument"-modulators, and "instrument-specific" modulators.

The section "Default modulator behavior" is not really helpful, because it doesn't add anything that the spec already says (whether or not you remove a zero-amount modulator or keep the amount zero is IMO just an implementation detail). The final sentence

The default modulator list is altered at load time and then it acts exactly like the default SF2 modulator list.

suggests that my third interpretation is unlikely, but it doesn't help me to decide between my first or second interpretation.

So I think you need to update the description of the DMOD chunk, by pointing out its role, its purpose, and how it differentiates itself from the default modulators dictated by the SF spec (no. 1). The "stakeholder" of this chunk must become clear, i.e. is it just SoundFont Editor apps, or do synth's also have to account for it.

And a final remark: Pls. keep in mind that SF2 is an old but yet well established standard. Things will not change. Implementations will not change. If you want this feature to be generally accepted and usable, this customization must be backward-compatible. By that I mean, applying, i.e. structurally saving, those DMOD modulators to each and every instrument zone in IMOD for backward compatibility reasons should be seriously considered, IMO.

@spessasus
Copy link
Contributor Author

spessasus commented Sep 19, 2024

@derselbst, what I meant is replacing the level 1 modulators.

Essentially, with or without DMOD chunk, the default SF2 modulators are there. When the DMOD chunk is read, the modulators are added to that list, and the identical ones override the sf2 default ones.

For example, assume a DMOD chunk of 2 modulators:

  • MIDI CC 1 to vibratoToPitch, linear unipolar positive, no controller, amount 100.
  • Poly Pressure to vibratoToPitch, linear unipolar positive, no controller, amount 50.

The default modulators for this soundfont will be:

  • Velocity to attenuation (unchanged)
  • Velocity to filter (unchanged, though some synths disable that, like mine and yours)
  • Channel pressure to vibrato (unchanged)
  • Volume to attenuation (unchanged)
  • Expression to attenuation unchanged)
  • Pan to pan (unchanged)
  • Pitch wheel by pitch wheel range to initialPitch or fineTune (unchanged)
  • reverb to reverb (unchanged)
  • chorus to chorus (unchanged)
  • Mod wheel to vibrato will change the amount from the default 50 cents to 100 cents, since the DMOD modulator is identical to it, overriding its amount.
  • A new modulator, poly pressure to vibrato, 50 cents depth.

I hope this clears things up.

@derselbst
Copy link

Ok, thanks for the example. So it turns out my first interpretation is the correct one. The purpose of DMOD is to introduce an extra level of modulators. This leads to the following modulator hierarchy:

  1. Default modulators dictated by the SF Spec. These are applied to every instrument. By any SF2 synth. Always.
  2. Default modulators defined in a specific SoundFont file, and only applied in the scope of that particular file.
  3. Instrument level modulators.
  4. Preset level modulators.

Yet, for best compatibility, I again would like to recommend to save all the modulators of the DMOD chunk to each and every instrument zone in the IMOD chunk. Old synthesizers would play the soundfont correctly, while Soundfont editors can more easily recognize file-specific default modulators.

IIRC, Polyphone does already go through all modulators in IMOD to figure out which are meant as default ones. So, if my last backward compatibility idea is considered, one might raise the question what the added value of a DMOD chunk would be. If it is not considered (since that is essentially "The Problem" you're trying to solve with DMOD) the question remains if this solution would be adopted by a range of SF2 implementations such that it ultimately finds acceptance by the users. I'm having twisted minds here...

@spessasus
Copy link
Contributor Author

the question remains if this solution would be adopted by a range of SF2 implementations such that it ultimately finds acceptance by the users. I'm having twisted minds here...

Well, that's what I'm hoping for. This is essentially an extension like the vorbis sf3 extension. Many players (like meltysynth, SF2Lib or tinysoundfont or example) don't support it at all. Actually, I know only 3 players with sf3 support: fluid, bass, and spessasynth...

Since most people probably only use BASS or fluid anyways, these three (poly, fluid, bass) implementing this should be enough to get widely adopted more. After all, with sf3, musescore invented this format and since musescore is popular, the sf3 format became widely supported.

And that's what I would like to happen. The three major sf2 tools supporting this chunk would make other players add support for that. It's also what I hope will happen with the SF2 RMIDI format, but that's unrelated.

Maybe we could only use DMOD with dwMajor set to 3? Since the sf3 format can contain uncompressed samples, it will act like a regular sf2, but since the dwMajor is 3, it automatically rules out incompatible synths.

@davy7125
Copy link
Owner

Sorry for not being more reactive on this interesting subject. I had quite a lot of work with the previous version of Polyphone and I am currently busy with various life projects...

So globally there is a need for a soundfont format update and I completely agree with this. Some years ago I started this (based on the sfz capabilities and the different user feedbacks):
https://github.com/davy7125/soundfont-standard-v3

And now I am discovering:
https://github.com/SFe-Team-was-taken/SFe

And I still need to correctly understand this ticket:
#179

Aside from this, MuseScore create the sf3 format which is the sf2 using compressed data samples as you know well. From my side I recently added the "release mode" for samples inside an instrument, so that the playback starts when the key is released (this is a personal wish since I use soundfont to play organ). Vienna (SynthFont) also added a property but I don't remember well (something like the velocity modifying the attack).

This context shows that the different actors should agree with a common target. My position is that I will show no resistance in upgrading the format but I am unfortunately lacking time for managing the whole process and also a bit afraid of implementing updates and force others to follow the movement, creating thus tensions.

Now, back to the default modulator subject, Polyphone could display them when clicking on the soundfont header as you proposed by email. This is a very good idea so that we know exactly how a soundfont is played. It could be possible to change the default modulators without changing the 2.04 format though, with extra processing for displaying the content of a soundfont within Polyphone. If all instruments have all default modulators defined as the first modulators, Polyphone can gather all common instrument modulators and then display them at the soundfont level instead of the instrument level. Other soundfont editors would however display all modulators for each instrument and it would maybe be harder to distinguish the default modulators from the others but... this may not be that important if the use of Polyphone is kept. This system has the advantage of staying supported by all soundfont readers and I have thus the same recommendation than @derselbst , changing the sf2 format for this particular purpose is not needed.

The other and proper way is to update the soundfont format as you propose but I need a common well specified target (including other upgrade needs) with at least the agreement of the fluidsynth team. Maybe should we still use the .sf extension and simply increase the internal version number while progressively supporting more chunks and more sound properties?

@spessasus
Copy link
Contributor Author

Thanks for this response, Davy.

About the Default modulators:

Derselbst's approach

It relies on Polyphone grouping modulators from all instrument zones into default modulators

pros

  • all synthesizers that support modulators will work fine

cons

  • using a soundfont editor other than polyphone will mess up default modulators

Spessasus's approach

It relies on a custom DMOD chunk

pros

  • Other soundfont editors will not mess up the default modulators

cons

  • Requires a synthesizer that supports the DMOD chunk

To be fair, if Polyphone remains dominant sf editor (which it probably will), option 1 might be the best approach. If that gets implemented, I'll remove the proposal from the repo.

Stgiga's wBank proposal

Here's how I understand it:

SF2 always had a problem of lacking support for the bank LSB message, making it incompatible with XG and GM2 bank selection systems. Some synthesizers include hacks to circumvent this, like reacting to bank MSB as a drum toggle and LSB as the soundfont's bank select. But this solution isn't perfect.

What #179 discovers is that the wBank field in the preset selection is a WORD. This means that there are two bytes used, despite only one being needed for storing bank select.

So what stgiga suggested, was to use the top byte of wBank field as the LSB bank select and the bottom one as MSB:

// storing bank LSB 60 and MSB 5
int wBank = 7685;
char bankMSB = wBank & 127; // 5
char bankLSB = wBank >> 7;     // 60

And that's it. The drum toggle means that bank 128 (either one) still means a drum channel.

I hope this helps, @davy7125

@spessasus
Copy link
Contributor Author

These two proposals achieve the most needed features (default modulators and bank LSB) without needing a new format or file structure.

So adding them as soon as possible would extend the life of the sf2 format while giving sf v3 (or SFe) time to develop.

@sylvia-leaf
Copy link

sylvia-leaf commented Nov 13, 2024

Hello! My name is Sylvia. I am the lead developer of the SFe standard that you linked here. I've got a silver badge on the Polyphone forums but haven't been posting there for a while due to mental health issues.

However even so, I'm very happy to help you understand what's going on with the soundfont enhancement projects that we (and other people) have been working on!


Default modulators

The first thing that we're talking about is the default modulator issue, right? Well, Spessasus's solution involves adding a DMOD subchunk. This allows the bank developer to define a few modulators that apply to all instruments and/or presets in the bank. This can make life much easier for the bank developer, because they won't need to define the same modulator multiple times. This also solves the problems with the default modulator system found in legacy SF2.0x, for example "Velocity -> Filter Cutoff".

I'm not too familiar with the situation that Derselbst has suggested, but from Spessasus's summary of the solutions, it seems that they suggest that Polyphone would "intelligently" detect modulators that are in all instrument zones, and then list them as additional "default modulators" that can be added to or removed. The main disadvantage listed is that it would break if edited with a non-Polyphone editor. However, because non-Polyphone editors (such as Viena or Swami) remain popular, I don't think that it's an acceptable tradeoff.

The last thing that we want are proprietary extensions that only work properly with one soundfont editor or player; this has happened already with other features. By formally defining custom default modulators, we can prevent this issue. Therefore, I'm giving my support to spessasus's DMOD subchunk proposal. We can of course use both strategies; the "intelligent" default modulator detection maximises compatibility with legacy players, while the DMOD subchunk reduces modulator complexity and simplifies modulator parsing for programs that implement the feature. If we can get fluidsynth to adopt it then it would likely be good enough!

As an aside, we've seen other proposals for similar "intelligent" features that would auto-detect when data is formatted in a particular way, but none of these features have received much success. One of these was a way to randomise samples. We were evaluating "intelligent" features, but we concluded that formerly defined structures will always be a better solution than attempting to "unpick" implicitly-defined data.

Ultimately, it depends on whether or not you want to implement this chunk. If you don't think that there is enough use for a DMOD chunk, then we can just not implement it.


Two bank selects on one file

It looks like another thing is the bank select LSB implementation. When stgiga and I were looking through SFSPEC24.PDF, we noticed that the value used for banks (wBank) was a 16-bit value (WORD) instead of an 8-bit value (BYTE/CHAR), as spessasus said. Therefore, we concluded that it would be possible to achieve the 16384 bank system consisting of both bank select MSB and LSB without significant modifications to the SF format.

Spessasus explains very well that the used (in SF2.04) byte of the wBank would be used as the MSB value and the unused (in SF2.04) byte the LSB value. Because the unsigned WORD value can be up to 65535, we can easily represent any of the 16384 combinations of bank select MSB and LSB using just the wBank. No extra fields need to be declared.

The elegant property of this solution that spessasus didn't mention is that to an SF2.04 player, any bank number above 128 is ignored. In other words, the unused byte is ignored. Therefore, banks that use LSB bank selects are completely transparent to legacy players that don't support the feature. Additional programs could also be used to remove such unused presets to reduce preset generator usage.

User interface modifications in Polyphone for this feature would be very simple. All you would have to do is to split the bank field into two bank fields, "Bank (MSB)" and "Bank (LSB)". The preset listings would be slightly modified from "Bank:Preset" to "MSB:LSB:Preset".


File extensions

This is something that I've not got one single answer on. While keeping the .SF2 file extension for updated versions of the format is simple and doesn't need any effort, it may mislead an end-user into attempting to play an enhanced SF bank on an non-enhanced player. This simply causes complaints that the bank developer must address.

With the current loose definition of legacy SF2.0x, there is currently a problem of bank developers saying that their bank only runs on one player (for example fluidsynth or bassmidi). All keeping the .SF2 file extension does is exacerbate this issue. Therefore, we suggest using a different file extension unless the features included are completely backwards compatible with legacy players, i.e. the lack of these features does not affect the rest of the bank.

The file extension would be selectable. If the features used by the bank are completely backwards compatible, then the extension .SF2 can be used, otherwise the new extension would be used.


Internal version number

As I've mentioned before, I propose that our enhancements start with the wMajor version of 4. The wMajor version of 3 used to indicate WernerSF3 may not be sufficient, as there may be a program that implements the SF3 format compression, but can't use any enhanced features that we come up with.

You mentioned that we can keep one extension but increase the internal version number when extra fields, chunks and subchunks are added. This is a good idea; we might want to add some more chunks to the "hydra structure", but this is disallowed by legacy SF2.04. However, SFSPEC24.PDF never states that we can't make such changes at all. My understanding of what SFSPEC24.PDF says is that as long as the wMajor value is changed, we can do whatever the **** we want with the structure as long as the ifil version is still formatted in the same way. If we do so, then this will have to be combined with a different file extension.

My proposal for this is that we just keep the 32-bit version of any enhanced SF format mostly compatible with SF2.04, with a subset being fully compatible. Structural changes that aren't compatible with SF2.04 are limited to 64-bit banks, which is something that stgiga has been researching for a long time.


A "common well-specified target"

Guess what? We've got this common well-specified target that you request. It's called SFe! The name is short for "SF enhanced". I'll start by saying that it seems that there was a misunderstanding about the purpose of SFe. It is not a competitor to anyone's proposals, but rather a way to combine as many of the known SF extensions together to create a complete specification that if followed, will allow programs to be compatible with any bank that may use existing extensions.

The current draft specification should be sent to FluidSynth for approval, and if we can make the necessary changes to make the specification implementable in a practical amount of time, then we can promote the draft specification to a final specification.

However, before the final specification can be released, FluidSynth would need to complete the Werner SF3 specification, as it is an integral part of SFe.

The initial version of the SFe specification, 4.00, includes many things, including some topics that are beyond what's discussed here:

  • ifil versioning rules
    • wMajor=2 and wMajor=3 are still planned to get some use
  • isng rules
    • beyond EMU8000, so we can assume different sound engine parameters
  • UTF-8 support
    • many of the text string fields in SF2.04 were designed to work with ascii
    • however, it is simple to switch to the UTF-8 format
    • this allows users to use kana/CJK characters (kanji) in these fields
  • an extra ISFe chunk in the info-chunk
    • including everything that's in SFe to make it easier to tell if something is an enhanced feature
    • also includes feature flags so the player can communicate to the end-user what features are supported
    • programs can warn the user if it's not able to play the bank with 100% accuracy
    • more features will come up
    • right now, the DMOD subchunk is not in this sub-chunk, but we are thinking of moving it into the sub-chunk
  • WernerSF3 compression
    • initially OGG only but with more compression formats soon
    • according to WernerSF3 draft specification, any compression format can be used
    • examples of other formats in the future that could be supported include FLAC, OPUS or BWTC32Key
    • proprietary compression formats are forbidden, but read-only support and conversion to WernerSF3 is permitted
  • sm32 sub-chunk
    • 32-bit samples likely aren't going to be used in 32-bit banks, but will be good for 64-bit banks
  • 8-bit samples
    • if only the sm24 subchunk is used, then samples can be stored in 8-bit
  • Bank select LSB support
    • something that we mentioned here
    • there are some things related to the unused bit of wPreset that will be added soon (planned for version 4.04)
  • Program specification
    • well-defined guidelines for program developers to meet
  • Compatibility specification
    • information list to describe the differences between SFe and legacy SF2.04
  • Optional AWE ROM emulator plus reference samples

The default modulators feature is planned for the next version after 4.00, 4.01. We did a feature freeze a few months ago precisely to ensure that FluidSynth or similar playback programs would not have to implement a massive number of features. Overwhelming the program developers with a ton of features that must be added would not be a good idea. Therefore, it may be a good idea to get a few more opinions on what solution we should use for default modulators once the first version of SFe is implemented in an SF player.

Other features that will be coming in future versions of SFe include MIDI lyrics, SynthFont Custom Features (if Kenneth Rundt cooperates), preset library management systems, true 64-bit support with expanded or removed field size limits and round robin sampling.

If you think that we should include more or less features in the draft specification, please tell us and then we'll move forward or back some of the features that we're planning to include.

Right now, we were planning to develop a reference implementation of SFe based on SpessaSynth, however if another program implements SFe, then this would negate the need for such a reference implementation!


What do we need to do?

  • Decide on a file extension to use for banks that aren't fully compatible with SF2.04
  • Decide on how we're going to change the ifil value
  • Communicate with FluidSynth to get a well-specified target specification for WernerSF3
  • Agree with FluidSynth about a version of SFe to implement
  • Listen to feedback and release the next draft milestone
  • Repeat until the final specification is ready

Questions to be answered

  • What file extensions should be used and when?
  • How do we handle the ifil value changes for new SF versions?
  • Should the SFe specification be full or partial?
  • What SFe features should be moved forward or backward?
  • If we can't get a formal specification for SFCF, can we reverse engineer it?
  • When should the final specification be ready?

Sorry for this wall of text! If you have any concerns, please reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants