Skip to content

waylight3/hamu-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 

Repository files navigation

hamu-tools

Many useful tools for computer scientists!

Please join our discord for any help.

Dataset

DataLoader

  • Full list of available dataset: HAMU Dataset

  • How to iterate all data

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/arguana')

for doc in loader.get_docs():
    print(doc.id, doc.text, doc.title)
    break

for query in loader.get_qeuries():
    print(query.id, query.title)
    break

for qrel in loader.get_qrels('[mode]'):
    print(qrel.qid, qrel.did, qrel.score)
    break
  • How to fetch a single item
from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/arguana')

doc = loader.get_doc(['did'])
print(doc)

query = loader.get_query('[qid]')
print(query)

qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)

Utils

CorpusReader

  • How to build and load index
from hamu_tool.utils import CorpusReader

CorpusReader.build_index('data_file.jsonl', 'index_file.idx')
reader = CorpusReader('index_file.idx')
print(reader[0]) # get document by index
print(reader['[did]']) # get document by id
  • Format of data_file.jsonl: each line of data_file.jsonl should be a dictionary
{"id": "doc_1", "text": "doc_text_1"}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages