-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The OBO Impact score #65
Comments
While I did include GitHub stars in the OBO Community Health report (by external suggestion, mind you), I'd be hard-pressed to say it's an actual indicator of impact. E.g., there are some trash ontologies with near 10 stars but then again, the Protein Ontology also has close to 10 stars. It's basically impossible to tease apart how popularity affects stars vs. people who want to engage with the issue tracker and leave a star while they're there vs. anything else. I'd suggest abandoning the impact score because it will likely be difficult create a meaningful objective metrics that aren't confounded by the willingness of the ontology owners to follow some best community practices, e.g., using GitHub as their primary place of doing curation. I'd much rather focus on "usefulness" and "goodness" metrics. Final thought: impact is a lot like obscenity/pornography - I know it when I see it. |
Unless I'm under some misapprehension, the purpose of the dashboard score is to provide an evaluation of the "fitness" of an ontology as that notion pertains to adherence to the principles. In my opinion, an impact score doesn't help evaluate this (but see caveat below). So what does an impact score tell you? When the score is high, one can probably assume the ontology is 'good' and usable content-wise. But a low impact score cannot imply an ontology is 'bad'. Thus, for the aspect it is intended to evaluate, it fails. One could argue that the impact score does indeed reflect the principle of Users. On this I would agree. However, a sliding scale for this principle is unnecessary. Put another way, does having 20 users indicate your ontology is a better adherent to this principle than one with 4? I don't think so. Number of users is more a reflection of whether or not the ontology covers a domain that has widespread need. Indeed--given the purpose of the Users principle "to ensure that the ontology tackles a relevant scientific area and does so in a usable and sustainable fashion"--adherence to it is a binary function; either the ontology has enough users to indicate that it is useful, or it doesn't. The above, coupled with the obvious issues in evaluation, says to abandon this. |
[Sorry, I edited this a bunch immediately after posting.] I think that the context for this is "Remove foundry distinction on front page" OBOFoundry/OBOFoundry.github.io#1140. Currently the table of OBO ontologies has groups: reviewed foundry ontologies are at the top of the list, followed by library ontologies, then inactive, orphaned, obsolete. Within each group we sort alphabetically. When the foundry distinction is dropped, we will combine foundry with active. Maybe we then split active into "domain", "project", "member". Inactive/orphaned/obsolete stay at the bottom of the list, and I guess project/member are lower on the list, but let's talk about the 100+ domain ontologies that claim some distinct scope in OBO. So a new user comes to the obofoundry.org and looks at the table to try to decide which ontology to use. If we actually had one domain ontology for each distinct scope the decision would be easier, but we often have many. How does the user decide? They can sort by OBO Dashboard score (#64). A good Dashboard score helps the new user choose but it doesn't capture the benefit the user gets from an ontology that is widely (re)used. I will foolishly pick an example that I care about: OBI vs CMO (Clinical Methods Ontology). OBI has a slightly better Dashboard score (at least on the last version of the formula), but it would be easy enough for CMO to beat OBI by fixing a few things. CMO has similar scope to OBI comes first alphabetically. OBI no longer has its foundry review status to bump it to the top of the list. As a new user, I would be likely to pick CMO. However OBI terms are reused in 75 other OBO ontologies, while CMO is used in about 8. That's a benefit to using OBI that I would like to see reflected somehow on the obofoundry.org list. But when we just use OBO "internal" reuse, we miss important "external" (re)use. And some ontologies get a lot of reuse via XRefs rather than PURLs -- shouldn't that count for something? We've discussed and tested various options over the past few years, and haven't found anything that makes everyone happy. Maybe that's reason enough to abandon the attempt, but either way it seems like some projects will "win" and some will "lose". |
I want to throw another metric out there for discussion: Citations
according to google scholar to the primary publication(s). That comes with
all the caveats associated with citations, but I think it broadly reflects
real world use and impact of ontologies better. Citations will typically be
given by someone who wants to acknowledge that they really used an
ontology, and not in the sense of e.g. re-using 'organism' term from OBI.
It will put ontologies like HPO and GO very high. We could limit citations
to a time period like 'in the last 5 years' to avoid this being age
dominated.
Most of all, when I pull citations to test rank a few ontologies, it
reflects my intuition, at least in terms of order of magnitudes. Note that
this is *crudely* picking the first paper I can find and listing total
citations. And note that the vast majority of ontologies are in the
GO - 2158
RO - 1310
DO - 842
BFO - 788
HPO - 425
OBI - 229
PR - 148
PATO - 144
XAO - 60 (xenopus ontology)
ZFA - 45 (zebrafish)
CMO - 22
MRO - 13 (MHC restriction ontology, one of mine)
Note that for someone working on MHC restriction or zebrafish, having a low
impact rating to the ontology will not be considered a failure of that
ontology - that just reflects the fields broad important.
And we can convert to log10 (1+citatoins) to display meaningful
differences.
[image: image.png]
Very curious what others think.
…On Thu, Feb 10, 2022 at 7:39 AM James A. Overton ***@***.***> wrote:
I think that the context for this is "Remove foundry distinction on front
page" OBOFoundry/OBOFoundry.github.io#1140
<OBOFoundry/OBOFoundry.github.io#1140>.
Currently the table of OBO ontologies has four groups: reviewed foundry
ontologies are at the top of the list, followed by library ontologies, then
inactive and obsolete. Within each group we sort alphabetically. When the
foundry distinction is dropped, we will have three groups: active,
inactive, obsolete. Inactive and obsolete stay at the bottom of the list,
but let's talk about the 100+ active ontologies.
So a new user comes to the obofoundry.org and looks at the table to try
to decide which ontology to use. If we actually had one domain ontology for
each distinct scope the decision would be easier, but we often have many.
How does the user decide?
They can sort by OBO Dashboard score (#64
<#64>). A good
Dashboard score helps the new user choose but it doesn't capture the
benefit the user gets from an ontology that is widely (re)used.
I will foolishly pick an example that I care about: OBI
<http://dashboard.obofoundry.org/dashboard/obi/dashboard.html> vs CMO
<http://dashboard.obofoundry.org/dashboard/cmo/dashboard.html> (Clinical
Methods Ontology). OBI has a slightly better Dashboard score (at least on
the last version of the formula), but it would be easy enough for CMO to
beat OBI by fixing a few things. CMO has similar scope to OBI comes first
alphabetically. OBI no longer has its foundry review status to bump it to
the top of the list. As a new user, I would be likely to pick CMO.
However OBI terms are reused in 75 other OBO ontologies, while CMO is used
in about 8. That's a benefit to using OBI that I would like to see
reflected somehow on the obofoundry.org list.
But when we just use OBO "internal" reuse, we miss important "external"
(re)use. And some ontologies get a lot of reuse via XRefs rather than PURLs
-- shouldn't that count for something?
We've discussed and tested various options over the past few years, and
haven't found anything that makes everyone happy. Maybe that's reason
enough to abandon the attempt, but either way it seems like some projects
will "win" and some will "lose".
—
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJX2IX5MQVWGN4NQ6LQ5BDU2PL4TANCNFSM5OA3FNKQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Bjoern Peters
Professor
La Jolla Institute for Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters
|
I edited the first post to ensure that its clear that this issue is really for giving everyone the opportunity to say their piece and concerns. After a while, I will compile all the arguments in the issue, and call a vote! @bpeters42 I will add your citation idea, its probably better than stars, but not ideal because it favours older over newer ontologies.. But keep these ideas coming! And also the concerns. |
- any impact score will favor older ontologies. And that is not wrong -
impact builds over time, something created this second cannot have impact
(by definition).
- I was proposing citations in the last 5 years, but didn't do that in the
numbers I put together. Neither did I take into account that e.g HPO seems
to publish a new paper every year.
…On Thu, Feb 10, 2022 at 10:11 AM Nico Matentzoglu ***@***.***> wrote:
I edited the first post to ensure that its clear that this issue is really
for giving everyone the opportunity to say their piece and concerns. After
a while, I will compile all the arguments in the issue, and call a vote!
@bpeters42 <https://github.com/bpeters42> I will add your citation idea,
its probably better than stars, but not ideal because it favours older over
newer ontologies.. But keep these ideas coming! And also the concerns.
—
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJX2IWBBPAM2GEL7PXKR2TU2P5UXANCNFSM5OA3FNKQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Bjoern Peters
Professor
La Jolla Institute for Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters
|
Ah ok, yeah, citations in the last 5 years is probably much better, sorry I missed that! I am personally all game for whatever the community decides on. I think all these measures have something in favour of them! One thing we could do is capture all these metrics separately anyways, and not compile an OBO score from them, just use them to sort the table. Another idea is to include at least the metadata verified usage count into the #64 OBO Dashboard score, to alleviate some of the bias created by small, formally correct ontologies that are not used anywhere. I could register nico.owl in OBO foundry, make sure the metadata is all perfect, and then get a score of 100% - risky business. |
Alongside our quality metrics #64, we want to take some notion of impact into account when for providing our final OBO score #26.
There are basically two elements of raw data available to us to determine impact:
usage
data recorded as part of the metadataNeither of them are perfect.
usage
data cannot be verified with 100% certainty, and also, we currently do not allow to record usage in private systems, which makes this somewhat incomplete. However, usage information does go through (pull request) review, so it is not that bad. Basing the score on this could help with people recording their usages more diligently.I am very torn with all of this. Some people (like @cthoyt) may come and suggest GitHub stars. @cmungall will suggest his ontology API, which can crawl some key resources in the biomedical domain for the number of times a term is used in biocuration, another great metric. @bpeters42 later in this thread suggests number of citations.
None of the above is truly 100% satisfactory.
My personal tendency right this moment is to
usage
data field, and apply stricter rules for their review (i.e only count the ones with websites including TERM IDs that also resolve).Looking forward to your ideas!
EDIT: This thread is for discussion only, not for deciding anything. Everyone will want to promote the impact metric that will make their ontologies look the best, which is fine, but as OBO Foundry we want to decide this in a neutral way. @mellybelly question re governance is, therefore: here, no decision will be reached. Wants all arguments are heard, I will compile a list of options, and then we will call a vote!
The text was updated successfully, but these errors were encountered: