Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow topics to override primary category #103

Open
mnahkies opened this issue Nov 12, 2023 · 2 comments
Open

Allow topics to override primary category #103

mnahkies opened this issue Nov 12, 2023 · 2 comments

Comments

@mnahkies
Copy link

User Story

As a tool developer, I'd like to be able to override the category classification given to my tool. Specifically I'd like https://github.com/mnahkies/openapi-code-generator to be labelled as a "Code Generator" rather than a "Parser"

Context

Currently the category is assigned using https://www.npmjs.com/package/bayes which essentially uses the frequency of tokens in a provided text against the frequency of tokens in already classified text to assign a class.

However, because the current category/class distributions are pretty uneven (>30% are assigned to "Parsers") it seems to have ended up overly biasing assignment to "Parsers". For example, Redoc is assigned "User Interfaces" and "Parsers", but not "Documentation"

And these are all assigned to "Parsers" as well:

  • OpenAPI Server Code Generator (oapi-codegen)
  • OpenAPI Mocker
  • docs
  • php-openapi-faker
  • ...

Rather than "Code Generator" / "Mock" / "Documentation" / "Testing Tools"

I'm not sure if this is inherent to the classification approach / problem space (eg: is the written language used for different types of tool lacking enough distinguishing tokens to give a good signal), or a negative feedback loop from the existing classifications, but either way I think it would be good to have a way to override this behavior.

I'm hopeful that introducing this would over time improve the accuracy of the classification using bayes as a result of the accurate manually labelled data.

Detailed Requirement

Propose adding a way to manually label a primary category for a tool. I see two main options:

  • Field on the tools.yaml entries like manualCategoryOverride
  • Looking for new topics on the source entries like the existing openapi3 / openapi31 ones that indicate the primary category

I see the primary benefit of the first option being that it gives control of curation to the maintainers of this repository, whilst the second option allows tool writers to self serve. It's possible that both might be desirable, especially to account for entries that aren't scrapped from Github (though I guess their categories are essentially manually configured already).

I think some amount of rationalization (eg: Testing vs Testing Tools) of the existing categories may be useful as well, and potentially adding a description of each category explaining what is in/out of scope for it.

@mnahkies
Copy link
Author

@SensibleWood do you have any thoughts on this? I'm open to attempting an implementation, but would appreciate some feedback on whether it would be likely to be accepted before investing the effort.

@SensibleWood
Copy link
Collaborator

@mnahkies thanks for raising this issue and sorry for the delay in replying. Work on this website has taken a hiatus as there has been other priorities.

I am very open to agreeing an approach and an implementation. There is a need to uplift the repository for Arazzo (which already lives under #157) so now is a good time to rethink categorisation. The original categories and approach was spawned from other initiatives and sources and, whilst it got this site going, needs refinement.

I would suggest we agree a time to talk with voices and take it from there. Thanks again for raising this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants