Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLEP024 Guideline for external posts on scikit-learn blog #92

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
slep012/proposal
slep017/proposal
slep019/proposal
slep024/proposal

.. toctree::
:maxdepth: 1
Expand Down
93 changes: 93 additions & 0 deletions slep024/proposal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
.. _slep_024:

===========================================================================
SLEP024: Guideline for external contributions to the scikit-learn blog post
===========================================================================

:Author: Guillaume Lemaitre, François Goupil
:Status: Draft
:Type: Standards Track
:Created: 2024-08-09

Abstract
--------

This SLEP proposes some guidelines for writing and reviewing external contributions
to the scikit-learn blog post.

Detailed description
--------------------

Scikit-learn has a blog post available at the following URL:
https://blog.scikit-learn.org/. Since its origin, the blog post is used to relay
information related to diverse subject such as sprints, interviews of contributors,
collaborations, and technical content.

When it comes to technical content, up to now, the content is only limited to the
scikit-learn library. However, the scikit-learn community is going beyond the
Comment on lines +26 to +27
Copy link
Member

@jjerphan jjerphan Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening the discussion.

If I can interject, some content is also exposed on external website but only linked on the blog post. This is for instance the case of the series on performance improvements.

I think it would be better to have this part of scikit-learn's blog post directly.

This was not done originally because of lack of time and for convenience (it was easy for me to publish it on my website and to iterate on it), but I do no mean to be the sole owner of the knowledge shared there.

Should such external content be discussed as part of this SLEP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are faster than me writing the SLEP :).

For this particular case, this is not even a question since this already linked to some internal of scikit-learn. So it would go to facto in the scikit-learn blog if you ask me.

But let's imagine that this is a topic that is related to scikit-learn but somehow outside of the library itself. Then, I would consider this case as part of the SLEP. The guidelines should answer to some questions with some extend, notably if this is eligible for inclusion.

Note, that my first thought here with this SLEP was more on: someone has a shiny compatible package and search some visibility; is it possible to advertise it and if so, what are the couple of requirements from our side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I was unsure my remark was relevant with regards to the subject of this SLEP (and it is not based on your remark).

Should I open an issue or PRs directly to integrate its content in scikit-learn's blog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I would find it relevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have opened scikit-learn/blog#191.

library itself and had developed compatible tools for years. As an example, the
scikit-learn-contrib repository [2]_ is hosting a collection of tools which are not
part of the main library but are still compatible with scikit-learn.

This SLEP proposes to extend the scope of the technical content of the blog post to
accept contributions in link with the scikit-learn ecosystem but not limited to the
scikit-learn library itself. However, it is necessary to define some guidelines to
manage expectations of contributors and readers.

Here, we define the guidelines for external contributions that should be used to
write and review external contributions to the scikit-learn blog post.

Guidelines
----------

In this section, we provide a set of guidelines to ease the discussions when reviewing
external contributions to the scikit-learn blog post. It should help both the authors
and the reviewers.

Inclusion criteria
^^^^^^^^^^^^^^^^^^

To accept an external contribution, the blog post should be related to the scikit-learn
ecosystem. When it comes to presenting a compatible tool, the criteria are the
following:

- The tool should be compatible with scikit-learn.
- The tool should be under an open-source license.
- The tool should be actively maintained.
- The tool should have a clear documentation.
- The tool should be well tested.
- The tool should not be a commercial product or serve advertisement for a company.
Comment on lines +50 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to what we had when I was in NumFocus's affiliated project committee. These criteria are quite hard to assess, and also quite hard to maintain. They seem like a low bar, but they're indeed quite high of a passing bar.

I would probably modify this to something like:

The scikit-learn blog is not an opinionated place when it comes to tools. Posts are included if somebody takes the effort of writing them. However, we don't want the blog to be a place where it's flooded by companies trying to advertise their products. Therefore we have the following requirements:

  • The post needs to cover the basic theoretical background required to understand the post. Our blog posts are educational resources.
  • The tools used in the blog need to be under an OSI approved or similar license.
  • The content of the post should be limited to the tool at hand, rather than using the tool to advertise a commercial ecosystem / tool.
  • The proposing community needs to adhere to our code of conduct standards. We reserve the rights to reject posts from communities where there are issues that we'd like to stay away from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current content focusses a lot on the tool/library that the blog post is covering. Adrin's suggestion is more about the content of the blog post. I think that is the right direction to go. I've been thinking about how to express my thoughts and so far the best I've come up with is that we should try and highlight what we do want to see in the blog posts (as opposed to focussing on what we don't want).

What do we want? My thoughts: the blog, like the rest of the documentation, should aim to be recognised as authoritative and high quality. Better to not write a blog post if it would be just average or a repeat of what can be read elsewhere. The content should be obviously correct. By this I mean a statement should be easy to read, easy to understand and easy to come to the conclusion of the statement. As opposed to statements which are hard to parse, might make you conclude A when in reality you should be concluding B and the author really rather you conclude A. I guess a phrase for this kind of language is "technically correct, but misleading".

Like scikit-learn the blog should cover things which are well established and "old". They might be less well known or been forgotten, but it shouldn't be "trail blazing".

Like a wikipedia article there should be sources you can link to to back up your claims and give the reader more details, etc. For me it would be fine to link to code that you can run as "sources" for say benchmarks. You wouldn't have to publish your results elsewhere first.

You should stick to the spirit of these guidelines, sticking to the "letter of the law" is not enough. This means there will be an element of human judgement.

Clearly mark "paid content"? I am a bit less sure about this one. I know in a lot of newspapers, magazines, youtube, instagram, etc content people are obviously and not so obviously rewarded for talking about something. This can be a ski resort inviting a reporter and paying for the trip, with the hope that they write about their time in the resort, a free pizza oven sent to a cooking YouTuber, straight up paid for advertising in a newspaper. I don't think there is a fundamental problem with this kind of "paid content" (using a broad definition here). And you could argue that it is covered under "take care that the reader understands" from above. Depending on how big the influence is something very prominent like "I work for company X and this post is about things we make, so take what I say with a grain of salt" or a note at the end "Project Y invited and paid for me to travel to their sprint" is appropriate. As long as the facts are with you, there seems to be very little downside to declaring your possible conflicts. It might even make you more credible.


Reproducibility requirements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In the scikit-learn documentation, we ensure that our examples are reproducible and can
be executed by using our continuous integration. When it comes to the scikit-learn blog
post, it is not possible (or rather difficult) to have the same level of integration.

However, we should ensure that the given examples or code snippets are reproducible by
the readers. We therefore recommend the following:

- Provide a link to a repository where the code or notebook is available that is used
as a baseline for the blog post.
- The repository should contain a system to reproduce the environment (e.g.
`requirements.txt`, `environment.yml`, or `pixi.toml`).
- If possible, a continuous integration should make sure that the code or notebook can
be executed. We understand that this step is sometimes impossible due to limit of
resources.

References and Footnotes
------------------------

.. [1] Each SLEP must either be explicitly labeled as placed in the public
domain (see this SLEP as an example) or licensed under the `Open
Publication License`_.

.. [2] `scikit-learn-contrib repository <https://github.com/scikit-learn-contrib>`__

.. _Open Publication License: https://www.opencontent.org/openpub/

Copyright
---------

This document has been placed in the public domain. [1]_