Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data/import dennis 12 neural pdfs #43

Merged
merged 4 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions mind_palace/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
from llama_index.query_engine import CitationQueryEngine

openai.api_key = st.secrets.openai_key
xml_dir = "./resources/xmls/12-pdfs-from-steve-aug-22/"
xml_dir = "./resources/xmls/dennis-oct-10/"
gpt_model = "gpt-3.5-turbo"

st.set_page_config(page_title="Q&A with Steve's PDFs")
st.title("Q&A with Steve's PDFs 💬")
st.set_page_config(page_title="Q&A with Dennis's PDFs")
st.title("Q&A with Dennis's PDFs 💬")

with st.sidebar:
st.markdown("Conversation History")
Expand Down
4 changes: 3 additions & 1 deletion mind_palace/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
def _gen_document_dict(file_path) -> dict[str, TextNode]:
xml = docs.load_tei_xml(file_path)
doi = xml.header.doi
assert doi is not None
if doi is None:
print(f"DOI is None for {file_path}. Replacing with title instead.")
doi = xml.header.title

try:
title_node = docs.title(xml, doi)
Expand Down
5 changes: 5 additions & 0 deletions resources/xmls/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
These XMLs are generated from parsing the original PDFs with Grobid. Follow the official installation instructions [to build](https://grobid.readthedocs.io/en/latest/Install-Grobid/) and [run Grobid in command line batch mode](https://grobid.readthedocs.io/en/latest/Grobid-batch/) using this command:

```sh
$ java -Xmx4G -Djava.library.path=grobid-home/lib/lin-64:grobid-home/lib/lin-64/jep -jar grobid-core/build/libs/grobid-core-0.7.3-onejar.jar -gH grobid-home -dIn ~/Downloads/neural_decoding_papers -dOut ../mind-palace/resources/xmls/ -exe processFullText -ignoreAssets
```
1,079 changes: 1,079 additions & 0 deletions resources/xmls/dennis-oct-10/1705.00857.tei.xml

Large diffs are not rendered by default.

2,487 changes: 2,487 additions & 0 deletions resources/xmls/dennis-oct-10/1802.06441.tei.xml

Large diffs are not rendered by default.

1,976 changes: 1,976 additions & 0 deletions resources/xmls/dennis-oct-10/2208.01178.tei.xml

Large diffs are not rendered by default.

791 changes: 791 additions & 0 deletions resources/xmls/dennis-oct-10/2304.07362.tei.xml

Large diffs are not rendered by default.

3,212 changes: 3,212 additions & 0 deletions resources/xmls/dennis-oct-10/2305.15767.tei.xml

Large diffs are not rendered by default.

702 changes: 702 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevA.102.042411.tei.xml

Large diffs are not rendered by default.

996 changes: 996 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevLett.119.030501-accepted.tei.xml

Large diffs are not rendered by default.

1,056 changes: 1,056 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevLett.128.080505.tei.xml

Large diffs are not rendered by default.

2,354 changes: 2,354 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevResearch.2.023230.tei.xml

Large diffs are not rendered by default.

1,180 changes: 1,180 additions & 0 deletions resources/xmls/dennis-oct-10/q-2018-01-29-48.tei.xml

Large diffs are not rendered by default.

1,355 changes: 1,355 additions & 0 deletions resources/xmls/dennis-oct-10/q-2019-09-02-183.tei.xml

Large diffs are not rendered by default.

610 changes: 610 additions & 0 deletions resources/xmls/dennis-oct-10/s41598-017-11266-1.tei.xml

Large diffs are not rendered by default.