Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken sitemap.xml #57

Open
sneumann opened this issue May 31, 2020 · 6 comments
Open

broken sitemap.xml #57

sneumann opened this issue May 31, 2020 · 6 comments

Comments

@sneumann
Copy link
Contributor

https://www.bioconductor.org/sitemap.xml gives

XML Parsing Error: not well-formed
Location: https://www.bioconductor.org/sitemap.xml
Line Number 1, Column 2:
<%= xml_sitemap %>
-^

from https://github.com/Bioconductor/bioconductor.org/blob/master/content/sitemap.xml
There was a suggenstion in a discussion with @egonw about Add a sitemap.xml summarising site content to crawlers including google et al and TeSS
Yours,
Steffen

@mtmorgan
Copy link
Contributor

since this has been there, unchanged, since March 15 2010 without comment maybe the most expeditious solution is to simple remove it?

@egonw
Copy link

egonw commented May 31, 2020

I suppose something is supposed to replace the placeholder with content. Yes, would be awesome if it contained a list of all vignettes (HTML) webpages and/or all packages. Indeed, that sitemap.xml can then be used by ELIXIR services to pick up content, e.g. ELIXIR TeSS but also BioSchemas (cc @AlasdairGray).

@mtmorgan
Copy link
Contributor

The site is more than the repository of packages, so sitemap.xml doesn't sound appropriate for this purpose.

For what it's worth package metadata is already available in machine-readable format as https://bioconductor.org/packages/3.12/bioc/VIEWS and presumably also on individual pages if this #25 were completed. I can't see the need for a third source of this information.

@egonw
Copy link

egonw commented May 31, 2020

The sitemap.xml is not critical, I agree. (Any sitemap.xml has redundant information.)

@sneumann
Copy link
Contributor Author

It is a way of search engine optimisation. OTOH all content on BioC can be considered well-linked,
we don't have dynamically generated content, and no dark corners of non-linked stuff we'd want to be found. In that case, removal of a broken sitemap.* is not a loss.

https://support.google.com/webmasters/answer/156184?hl=en&topic=8476&ctx=topic
has more information when a sitemap is needed or not.

Yours, Steffen

@AlasdairGray
Copy link

While a sitemap is not necessarily essential for the likes of Google who have "unlimited" resources to follow links and hopefully traverse a whole site, it is more difficult for others to do the same. For example, we have started scraping Bioschemas content but do not have the resource to do a full web crawl for it so are reliant on sitemaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants