Skip to content

Commit

Permalink
Merge pull request #43 from TheDataGuild/data/import-dennis-12-neural…
Browse files Browse the repository at this point in the history
…-pdfs

Data/import dennis 12 neural pdfs
  • Loading branch information
Quantisan authored Oct 9, 2023
2 parents de3488f + d034dde commit 986f832
Show file tree
Hide file tree
Showing 15 changed files with 17,809 additions and 4 deletions.
6 changes: 3 additions & 3 deletions mind_palace/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
from llama_index.query_engine import CitationQueryEngine

openai.api_key = st.secrets.openai_key
xml_dir = "./resources/xmls/12-pdfs-from-steve-aug-22/"
xml_dir = "./resources/xmls/dennis-oct-10/"
gpt_model = "gpt-3.5-turbo"

st.set_page_config(page_title="Q&A with Steve's PDFs")
st.title("Q&A with Steve's PDFs 💬")
st.set_page_config(page_title="Q&A with Dennis's PDFs")
st.title("Q&A with Dennis's PDFs 💬")

with st.sidebar:
st.markdown("Conversation History")
Expand Down
4 changes: 3 additions & 1 deletion mind_palace/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
def _gen_document_dict(file_path) -> dict[str, TextNode]:
xml = docs.load_tei_xml(file_path)
doi = xml.header.doi
assert doi is not None
if doi is None:
print(f"DOI is None for {file_path}. Replacing with title instead.")
doi = xml.header.title

try:
title_node = docs.title(xml, doi)
Expand Down
5 changes: 5 additions & 0 deletions resources/xmls/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
These XMLs are generated from parsing the original PDFs with Grobid. Follow the official installation instructions [to build](https://grobid.readthedocs.io/en/latest/Install-Grobid/) and [run Grobid in command line batch mode](https://grobid.readthedocs.io/en/latest/Grobid-batch/) using this command:

```sh
$ java -Xmx4G -Djava.library.path=grobid-home/lib/lin-64:grobid-home/lib/lin-64/jep -jar grobid-core/build/libs/grobid-core-0.7.3-onejar.jar -gH grobid-home -dIn ~/Downloads/neural_decoding_papers -dOut ../mind-palace/resources/xmls/ -exe processFullText -ignoreAssets
```
1,079 changes: 1,079 additions & 0 deletions resources/xmls/dennis-oct-10/1705.00857.tei.xml

Large diffs are not rendered by default.

2,487 changes: 2,487 additions & 0 deletions resources/xmls/dennis-oct-10/1802.06441.tei.xml

Large diffs are not rendered by default.

1,976 changes: 1,976 additions & 0 deletions resources/xmls/dennis-oct-10/2208.01178.tei.xml

Large diffs are not rendered by default.

791 changes: 791 additions & 0 deletions resources/xmls/dennis-oct-10/2304.07362.tei.xml

Large diffs are not rendered by default.

3,212 changes: 3,212 additions & 0 deletions resources/xmls/dennis-oct-10/2305.15767.tei.xml

Large diffs are not rendered by default.

702 changes: 702 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevA.102.042411.tei.xml

Large diffs are not rendered by default.

996 changes: 996 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevLett.119.030501-accepted.tei.xml

Large diffs are not rendered by default.

1,056 changes: 1,056 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevLett.128.080505.tei.xml

Large diffs are not rendered by default.

2,354 changes: 2,354 additions & 0 deletions resources/xmls/dennis-oct-10/PhysRevResearch.2.023230.tei.xml

Large diffs are not rendered by default.

1,180 changes: 1,180 additions & 0 deletions resources/xmls/dennis-oct-10/q-2018-01-29-48.tei.xml

Large diffs are not rendered by default.

1,355 changes: 1,355 additions & 0 deletions resources/xmls/dennis-oct-10/q-2019-09-02-183.tei.xml

Large diffs are not rendered by default.

610 changes: 610 additions & 0 deletions resources/xmls/dennis-oct-10/s41598-017-11266-1.tei.xml

Large diffs are not rendered by default.

0 comments on commit 986f832

Please sign in to comment.