Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLEP024 Guideline for external posts on scikit-learn blog #92

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Member

This SLEP intends to have a discussion to define a clear guideline allowing to accept external contribution in the scikit-learn blog post.

@glemaitre glemaitre marked this pull request as draft August 9, 2024 13:14
Comment on lines +26 to +27
When it comes to technical content, up to now, the content is only limited to the
scikit-learn library. However, the scikit-learn community is going beyond the
Copy link
Member

@jjerphan jjerphan Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening the discussion.

If I can interject, some content is also exposed on external website but only linked on the blog post. This is for instance the case of the series on performance improvements.

I think it would be better to have this part of scikit-learn's blog post directly.

This was not done originally because of lack of time and for convenience (it was easy for me to publish it on my website and to iterate on it), but I do no mean to be the sole owner of the knowledge shared there.

Should such external content be discussed as part of this SLEP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are faster than me writing the SLEP :).

For this particular case, this is not even a question since this already linked to some internal of scikit-learn. So it would go to facto in the scikit-learn blog if you ask me.

But let's imagine that this is a topic that is related to scikit-learn but somehow outside of the library itself. Then, I would consider this case as part of the SLEP. The guidelines should answer to some questions with some extend, notably if this is eligible for inclusion.

Note, that my first thought here with this SLEP was more on: someone has a shiny compatible package and search some visibility; is it possible to advertise it and if so, what are the couple of requirements from our side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I was unsure my remark was relevant with regards to the subject of this SLEP (and it is not based on your remark).

Should I open an issue or PRs directly to integrate its content in scikit-learn's blog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I would find it relevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have opened scikit-learn/blog#191.

Comment on lines +50 to +59
To accept an external contribution, the blog post should be related to the scikit-learn
ecosystem. When it comes to presenting a compatible tool, the criteria are the
following:

- The tool should be compatible with scikit-learn.
- The tool should be under an open-source license.
- The tool should be actively maintained.
- The tool should have a clear documentation.
- The tool should be well tested.
- The tool should not be a commercial product or serve advertisement for a company.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to what we had when I was in NumFocus's affiliated project committee. These criteria are quite hard to assess, and also quite hard to maintain. They seem like a low bar, but they're indeed quite high of a passing bar.

I would probably modify this to something like:

The scikit-learn blog is not an opinionated place when it comes to tools. Posts are included if somebody takes the effort of writing them. However, we don't want the blog to be a place where it's flooded by companies trying to advertise their products. Therefore we have the following requirements:

  • The post needs to cover the basic theoretical background required to understand the post. Our blog posts are educational resources.
  • The tools used in the blog need to be under an OSI approved or similar license.
  • The content of the post should be limited to the tool at hand, rather than using the tool to advertise a commercial ecosystem / tool.
  • The proposing community needs to adhere to our code of conduct standards. We reserve the rights to reject posts from communities where there are issues that we'd like to stay away from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current content focusses a lot on the tool/library that the blog post is covering. Adrin's suggestion is more about the content of the blog post. I think that is the right direction to go. I've been thinking about how to express my thoughts and so far the best I've come up with is that we should try and highlight what we do want to see in the blog posts (as opposed to focussing on what we don't want).

What do we want? My thoughts: the blog, like the rest of the documentation, should aim to be recognised as authoritative and high quality. Better to not write a blog post if it would be just average or a repeat of what can be read elsewhere. The content should be obviously correct. By this I mean a statement should be easy to read, easy to understand and easy to come to the conclusion of the statement. As opposed to statements which are hard to parse, might make you conclude A when in reality you should be concluding B and the author really rather you conclude A. I guess a phrase for this kind of language is "technically correct, but misleading".

Like scikit-learn the blog should cover things which are well established and "old". They might be less well known or been forgotten, but it shouldn't be "trail blazing".

Like a wikipedia article there should be sources you can link to to back up your claims and give the reader more details, etc. For me it would be fine to link to code that you can run as "sources" for say benchmarks. You wouldn't have to publish your results elsewhere first.

You should stick to the spirit of these guidelines, sticking to the "letter of the law" is not enough. This means there will be an element of human judgement.

Clearly mark "paid content"? I am a bit less sure about this one. I know in a lot of newspapers, magazines, youtube, instagram, etc content people are obviously and not so obviously rewarded for talking about something. This can be a ski resort inviting a reporter and paying for the trip, with the hope that they write about their time in the resort, a free pizza oven sent to a cooking YouTuber, straight up paid for advertising in a newspaper. I don't think there is a fundamental problem with this kind of "paid content" (using a broad definition here). And you could argue that it is covered under "take care that the reader understands" from above. Depending on how big the influence is something very prominent like "I work for company X and this post is about things we make, so take what I say with a grain of salt" or a note at the end "Project Y invited and paid for me to travel to their sprint" is appropriate. As long as the facts are with you, there seems to be very little downside to declaring your possible conflicts. It might even make you more credible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants