Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After disable "indexing in frontend" (indexed_search) nothing will be indexed by crawler anymore #1029

Open
mediaessenz opened this issue Feb 17, 2024 · 5 comments

Comments

@mediaessenz
Copy link

Bug Report

Current Behavior
I'm not sure if this is a bug in crawler or indexed_search. Maybe it's also a missing configuration on my side, due to the very outdated and/or uncomplete documentation on both sides.

My problem is:
If I disable "indexing in frontend" (disableFrontendIndexing) of indexed_search, clear all index_* db tables + crawler_queue + crawler_process and restart indexing tasks, nothing will be added to the index anymore. The crawler_queue is filled up and there entries get eg. result_data entries with the time. But the index tables stay empty.

Expected behavior/output
The content should be indexed with disableFrontendIndexing

Steps to reproduce
disable "indexing in frontend" (disableFrontendIndexing) of indexed_search
clear all index_* db tables + crawler_queue + crawler_process
start indexing / crawling tasks
See empty index tables

Environment

  • Crawler version: dev-main
  • TYPO3 version: 12.4.11
  • Is your TYPO3 installation set up with Composer (Composer Mode): yes
Copy link

Hi there, thank you for taking your time to create your first issue. Please give us a bit of time to review it.

@mediaessenz
Copy link
Author

mediaessenz commented Feb 17, 2024

After some more research I found out, that the problem only exists, if I add multiple crawler configurations (Argument "conf") to a crawler:buildQueue task.

In my case I added three configurations by a comma separated list of its names (pages,news,press). One for the normal pages, one for news (tx_news) and one for press messages (also tx_news).

If I only enter the configuration for the normal pages, the indexing works as expected, also with disabled frontend indexing.

Having realised this I created extra tasks for news and press indexing, but it seems, that they are not getting indexed with disabled frontend indexing.

The crawler config for news looks like this:
name:
news

pidsonly:
880

configuration
&tx_news_pi1[controller]=News&tx_news_pi1[action]=detail&tx_news_pi1[news]=[_TABLE:tx_news_domain_model_news; _PID:879; _WHERE: hidden = 0]

pid 880 is the news detail page. pid 879 is the sysfolder with the news records.

In the corresponding crawler:buildQueue task, I entered the detail pid 880 into the Argument "page" field

I will dig a bit deeper now and hopefully find the cause of the problem ...

@tomasnorre
Copy link
Owner

Thanks for your report. We will look into this. The response time, I currently longer than normally, I will get back to you though.

@nhovratov
Copy link

Hey, I had a little crawler configuration session today and had the same constellation with disableFrontendIndexing enabled. This flag is really useful, as we have cached pagination pages, which should not be indexed when the user browses through them. The problem however is, that as soon as any page is cached, crawling them will not trigger re-indexing. I don't know if there is already a solution for this, but theoretically the page cache must be flushed before each crawl to the page. Correct me if I'm wrong.

So maybe the problem described here is also about having already cached pages?

@tomasnorre
Copy link
Owner

I don't recall this from the top of my head, but I know there is a "hook" the other way around that when a page is flushed, it's automatically added to the crawler queue again.

But it sounds like it could be the issue as pages are already cached. I would have to think of a solution for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants