You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a tool developer, I'd like to be able to override the category classification given to my tool. Specifically I'd like https://github.com/mnahkies/openapi-code-generator to be labelled as a "Code Generator" rather than a "Parser"
Context
Currently the category is assigned using https://www.npmjs.com/package/bayes which essentially uses the frequency of tokens in a provided text against the frequency of tokens in already classified text to assign a class.
However, because the current category/class distributions are pretty uneven (>30% are assigned to "Parsers") it seems to have ended up overly biasing assignment to "Parsers". For example, Redoc is assigned "User Interfaces" and "Parsers", but not "Documentation"
And these are all assigned to "Parsers" as well:
OpenAPI Server Code Generator (oapi-codegen)
OpenAPI Mocker
docs
php-openapi-faker
...
Rather than "Code Generator" / "Mock" / "Documentation" / "Testing Tools"
I'm not sure if this is inherent to the classification approach / problem space (eg: is the written language used for different types of tool lacking enough distinguishing tokens to give a good signal), or a negative feedback loop from the existing classifications, but either way I think it would be good to have a way to override this behavior.
I'm hopeful that introducing this would over time improve the accuracy of the classification using bayes as a result of the accurate manually labelled data.
Detailed Requirement
Propose adding a way to manually label a primary category for a tool. I see two main options:
Field on the tools.yaml entries like manualCategoryOverride
Looking for new topics on the source entries like the existing openapi3 / openapi31 ones that indicate the primary category
I see the primary benefit of the first option being that it gives control of curation to the maintainers of this repository, whilst the second option allows tool writers to self serve. It's possible that both might be desirable, especially to account for entries that aren't scrapped from Github (though I guess their categories are essentially manually configured already).
I think some amount of rationalization (eg: Testing vs Testing Tools) of the existing categories may be useful as well, and potentially adding a description of each category explaining what is in/out of scope for it.
The text was updated successfully, but these errors were encountered:
@SensibleWood do you have any thoughts on this? I'm open to attempting an implementation, but would appreciate some feedback on whether it would be likely to be accepted before investing the effort.
@mnahkies thanks for raising this issue and sorry for the delay in replying. Work on this website has taken a hiatus as there has been other priorities.
I am very open to agreeing an approach and an implementation. There is a need to uplift the repository for Arazzo (which already lives under #157) so now is a good time to rethink categorisation. The original categories and approach was spawned from other initiatives and sources and, whilst it got this site going, needs refinement.
I would suggest we agree a time to talk with voices and take it from there. Thanks again for raising this.
User Story
As a tool developer, I'd like to be able to override the category classification given to my tool. Specifically I'd like https://github.com/mnahkies/openapi-code-generator to be labelled as a "Code Generator" rather than a "Parser"
Context
Currently the category is assigned using https://www.npmjs.com/package/bayes which essentially uses the frequency of tokens in a provided text against the frequency of tokens in already classified text to assign a class.
However, because the current category/class distributions are pretty uneven (>30% are assigned to "Parsers") it seems to have ended up overly biasing assignment to "Parsers". For example, Redoc is assigned "User Interfaces" and "Parsers", but not "Documentation"
And these are all assigned to "Parsers" as well:
Rather than "Code Generator" / "Mock" / "Documentation" / "Testing Tools"
I'm not sure if this is inherent to the classification approach / problem space (eg: is the written language used for different types of tool lacking enough distinguishing tokens to give a good signal), or a negative feedback loop from the existing classifications, but either way I think it would be good to have a way to override this behavior.
I'm hopeful that introducing this would over time improve the accuracy of the classification using
bayes
as a result of the accurate manually labelled data.Detailed Requirement
Propose adding a way to manually label a primary category for a tool. I see two main options:
tools.yaml
entries likemanualCategoryOverride
openapi3
/openapi31
ones that indicate the primary categoryI see the primary benefit of the first option being that it gives control of curation to the maintainers of this repository, whilst the second option allows tool writers to self serve. It's possible that both might be desirable, especially to account for entries that aren't scrapped from Github (though I guess their categories are essentially manually configured already).
I think some amount of rationalization (eg: Testing vs Testing Tools) of the existing categories may be useful as well, and potentially adding a description of each category explaining what is in/out of scope for it.
The text was updated successfully, but these errors were encountered: