Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Scribe-Data Hindi queries #212

Open
5 of 8 tasks
andrewtavis opened this issue Oct 3, 2024 · 19 comments
Open
5 of 8 tasks

Expand Scribe-Data Hindi queries #212

andrewtavis opened this issue Oct 3, 2024 · 19 comments
Assignees
Labels
feature New feature or request good first issue Good for newcomers hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Oct 3, 2024

Terms

Description

This issue would expand the queries for Hindi that are found in src/scribe_data/language_data_extraction/Hindi. As of now the nouns query is likely fairly good, but we need to add verb conjugations to the verbs query as is done in other languages :)

Data types to include:

  • Nouns
  • Verbs
  • Adjectives
  • Adverbs
  • Prepositions
  • Emoji keywords

Contribution

Happy to support with this and answer any questions that come up! Also happy to review when it's time 😊

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed good first issue Good for newcomers hacktoberfest Included as a part of Hacktoberfest labels Oct 3, 2024
@KesharwaniArpita
Copy link
Contributor

Hi @andrewtavis while I am working on this, do you think you can assign this issue to me so there's no confusion later?

@andrewtavis
Copy link
Member Author

Yes definitely, @KesharwaniArpita :) Thanks for your willingness to help!

@KesharwaniArpita
Copy link
Contributor

Hi @andrewtavis ,I’ve raised a PR for expanding the Hindi verb extraction query. Initially, I was working with verb tenses, but after digging into the data on Wikidata through lexemeIDs, I realized that for Hindi, the available data focuses more on "कारक" (Kārak) forms (which express the relationships between words in a sentence) rather than tenses. I believe the same is going to be the case with other languages(Like Urdu and Bengali).

So, I shifted gears and built a modified query based on these Kārak forms—things like direct case, gerund, intransitive phase, and more. I’ve tested the updated query, and saved the results too.

Would love to get your thoughts on this approach and any suggestions you might have!

Also can I checkout the other languages?

@andrewtavis
Copy link
Member Author

Thanks so much for your hard work, @KesharwaniArpita! Do you want to but a note for this into the Urdu issue and check for Bengali as well. I'm pretty sure that Bengali verbs are modeled the way that they should be as the Wikidata Bengali community is very good 😊

By all means check out other languages as you already have!

We'll get to the review in the coming days :)

@KesharwaniArpita
Copy link
Contributor

KesharwaniArpita commented Oct 4, 2024

@andrewtavis Sure. Thank you!!!! Should I raise the issue for Bengali?

@andrewtavis
Copy link
Member Author

The Bengali verbs have already been checked a while ago, so maybe you can check that query and see if you'd change/expand it in any way :)

CC @mhmohona, who originally wrote the Bengali query :)

@KesharwaniArpita
Copy link
Contributor

Sure!!! I'll look into that.

@SethiShreya
Copy link
Contributor

I want to participate in this issue too, can I do that? I am new to Sparql but could collaborate and contribute. Thanks! 😊

@KesharwaniArpita
Copy link
Contributor

KesharwaniArpita commented Oct 5, 2024

Hi @SethiShreya ! 😊 I'd love to collaborate. Even I'm new to SPARQL. Let's work together! Looking forward to your thoughts. Thanks! 🙌

@SethiShreya
Copy link
Contributor

Yeah that would be great, lets connect sometime to collaborate further

@SethiShreya
Copy link
Contributor

It would be helpful if you could tell me how much progress have done on this issue, and what are the features that are needed to be added

@KesharwaniArpita
Copy link
Contributor

Till now, I have worked on verbs(enhanced the query here) and adjectives(created the query here) in hindi. You can check them out. I was just checking out the query_nouns.sparql , maybe we extend the query to include the gender too. As of now, it only includes the number.

@SethiShreya
Copy link
Contributor

Thanks for sharing, I will look into it @KesharwaniArpita

@SethiShreya
Copy link
Contributor

As discussed with @KesharwaniArpita, there are things that we can expand for the Hindi language: gender for nouns, Adjectives, Prepositions, Adverbs, etc. We have discussed collaborating, so I will be working on Gender for nouns and she on Adjectives. Is is correct? @andrewtavis

@andrewtavis
Copy link
Member Author

Sounds great, @SethiShreya! Thank you both for the collaboration and coordination!

@SethiShreya
Copy link
Contributor

@KesharwaniArpita I reviewed the files on Hindi nouns and gender is already done, right?

@SethiShreya
Copy link
Contributor

SethiShreya commented Oct 6, 2024

@andrewtavis I want to work on Punjabi language(an Indian language) query, can you please make an issue for that?

@KesharwaniArpita
Copy link
Contributor

@SethiShreya ,You can try working on conjuctions or prepositions and there cases if you like to?

@andrewtavis
Copy link
Member Author

Just added a list of data types that we want to include to this issue :) Have marked those that are already done or have PRs open, and we can work on the others 😊 If the data type can't work, then we can move to the others and open up specific issues later :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for newcomers hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed
Projects
Status: Todo
Development

No branches or pull requests

3 participants