Many useful tools for computer scientists!
Please join our discord for any help.
-
Full list of available dataset: HAMU Dataset
-
How to iterate all data
from hamu_tool.dataset import DataLoader
loader = DataLoader.load('beir/arguana')
for doc in loader.get_docs():
print(doc.id, doc.text, doc.title)
break
for query in loader.get_qeuries():
print(query.id, query.title)
break
for qrel in loader.get_qrels('[mode]'):
print(qrel.qid, qrel.did, qrel.score)
break
- How to fetch a single item
from hamu_tool.dataset import DataLoader
loader = DataLoader.load('beir/arguana')
doc = loader.get_doc(['did'])
print(doc)
query = loader.get_query('[qid]')
print(query)
qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)
- How to build and load index
from hamu_tool.utils import CorpusReader
CorpusReader.build_index('data_file.jsonl', 'index_file.idx')
reader = CorpusReader('index_file.idx')
print(reader[0]) # get document by index
print(reader['[did]']) # get document by id
- Format of
data_file.jsonl
: each line ofdata_file.jsonl
should be a dictionary
{"id": "doc_1", "text": "doc_text_1"}