Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Scribe-Data Hindi verb queries #240

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

KesharwaniArpita
Copy link
Contributor

@KesharwaniArpita KesharwaniArpita commented Oct 4, 2024

Contributor checklist


Description

This PR enhances the Hindi verb extraction query by incorporating various grammatical forms based on "कारक" (Kārak) categories. The updated query now retrieves verb forms corresponding to specific cases and phases, including:

Direct Case: Extracts verbs in their direct forms.
Gerund: Captures gerund forms as verbal nouns.
Intransitive Phase: Includes verbs that do not take a direct object.
Basic Phase: Retrieves the infinitive form of verbs.
Conjunctive Participle: Gathers forms used to connect clauses.
Adverbial: Captures adverbial forms of the verbs.
Absolute Construction: Retrieves forms indicative of absolute constructions in Hindi.
Oblique Case: Includes both accusative and ergative forms.
Additive Phase: Gathers forms indicative of the additive phase.
The enhancements aim to improve the comprehensiveness and accuracy of verb form retrieval, allowing for better representation of Hindi verb conjugations based on syntactic roles.

The query has been tested on the Wikidata query service and the outputs generated has also been stored in the path https://github.com/KesharwaniArpita/Scribe-Data/tree/Hindi/scribe_data_json_export/Hindi/ along with the outputs generated from the Nouns query.

Would love to hear what everyone thinks! Also, if any contributors have time, feel free to check it out and share feedback. Thanks! 😊

Related issue

Copy link

github-actions bot commented Oct 4, 2024

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The commit messages for the remote branch should be checked to make sure the contributor's email is set up correctly so that they receive credit for their contribution

    • The contributor's name and icon in remote commits should be the same as what appears in the PR
    • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo
  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@KesharwaniArpita KesharwaniArpita changed the title Hindi Expand Scribe-Data Hindi verb queries Oct 4, 2024
KesharwaniArpita and others added 7 commits October 4, 2024 11:29
This SPARQL query extracts Hindi adjectives (lexeme) along with their associated grammatical features, specifically the direct case, singulative numeral, collective numeral, and oblique case, from Wikidata. It groups the results to ensure that multiple values for these features are concatenated into a single result for each lexeme. 

This query is loosely based on the [query](scribe-org@3acbe1a#diff-845bdab161353ea336224aeb6ed0234768f56202c3f5fe24c89cd2ff05c040f2) written by @Ekikereabasi-Nk  who was working on urdu
@andrewtavis
Copy link
Member

Minor edits in my commit, @KesharwaniArpita! We don't need the JSONs included (no stress that they were). The first verbs query isn't finishing, so do we want to split it up more?

Also, really appreciate your interest in citing your program participants! Is a great instinct :) Scribe-Data is written under a permissive license, so we don't need to cite people directly. Let's for now not include these lines so the code is as lean as can be :)

@KesharwaniArpita
Copy link
Contributor Author

Thank you @andrewtavis for your review and commit🥺. I'll keep this in mind from now on.
I have shifted a case from query_verbs_1.sparql to query_verbs_2.sparql. It should work fine now.

@andrewtavis andrewtavis self-requested a review October 5, 2024 18:43
@andrewtavis andrewtavis added the hacktoberfest-accepted Accepted as a part of Hacktoberfest label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Accepted as a part of Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants