Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Fix solr collections #842

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

thepsalmist
Copy link

Fix java.lang.ClassCastException bug by querying mediacloud32 and mediacloud64 solr collections separately, then combine responses.

philbudne and others added 14 commits March 6, 2022 17:37
	(can be overridden via MC_NO_DEDUP_SENTENCES env var)

apps/common/src/python/mediawords/util/config/__init__.py: add env_bool function
apps/extract-and-vector/bin/extract_and_vector_worker.py: honor MC_NO_DEDUP_SENTENCES
…processing backlog

up FETCH_BLOCK_SIZE from 100 to 200:
ammortizes citus connection startup time by processing fetching larger blocks of text.

Honor skip_update_snapshot option (defaults to true):
skip setting snapshots.searchable=true
(Jinja2 2.11.3 can't cope with the new MarkupSafe 2.1.1)
PG server running on postgresql EC2 server w/o docker
Removed extra AND to try to fix mediacloud#833 as suggested by Rahul.
abs_uri = furl(f"{solr_url}/{collection}/{path}")
abs_uri = abs_uri.set(params)
abs_url = str(abs_uri)
request = Request(method='POST', url=abs_url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop the abs_url = str(abs_uri) unless the abs_url is used somewhere else.

 request = Request(method='POST', url=str(abs_url))

abs_uri = abs_uri.set(params)
abs_url = str(abs_uri)
request = Request(method='POST', url=abs_url)
request.set_header(name='Content-Type', value=content_type)
Copy link

@esirK esirK Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this, I think we can combine all the headers while creating the Request something like

headers = {
    'Content-Type': content_type
    'Content-Length': str(len(content_encoded)),
}

request = Request(method='POST', url=abs_url, headers=headers)

@thepsalmist thepsalmist marked this pull request as ready for review November 10, 2022 12:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants