Skip to content

pooja-gera/hackYourWay-BITS

Repository files navigation

Banner

Team G-PKTSS’s idea of generating family trees from electoral rolls available, is broken down in the following.

We have generated scripts specific to different states. We can easily generate more but in the interest of time, we were only able to generate the same for 14 - Assam, Chhattisgarh, Delhi, Goa, Haryana, JK, Jharkhand, Manipur, Meghalaya, Mizoram, Nagaland, Punjab, Sikkim, Telangana. These scripts provide us with the necessary PDFs of all parts of all assembly constituencies of these 14 aforementioned states.

We have then used Tesseract OCR to extract the data from these PDFs since the PDFs are unsearchable, which means, the PDFs do not contain text but contain images. After extracting the text data, we have segregated out the information of interest - Name, House Number, Father's Name, Mother's Name and Husband's Name. A separate CSV file has been generated for every state which will be utilised as our master database. To create the family tree, the user has to enter Name, House Number, City and Assembly Constituency Name. We have utilised these parameters to ensure there is no ambiguity. Based on these details, we extract out the have created nodes for all the people mentioned in the a row pertaining to an entry.

The reason why that we think that our approach is very efficient is that we are not doing the ocr on the entire image rather we are breaking the entire image into smaller sections. This helps in extracting the details of one particular person at one time barring any mixing of information of 2 different people.

We have relied on the assumption that the people living in the same house are a part of the family tree and have also assumed that at this instant, people belonging to the same family living in different houses and the maternal side of the family are not of interest as with the dataset that we have there is no way to figure that out without more information. Based on these details, we extract out the have created nodes for all the people mentioned in the a row pertaining to an entry. For example, an entry for "Ashok Kumar" exists whose mother name and husband name are NULL but father's name is "Sukha Ram". We have to make sure that "Sukha Ram" also gets added as a node into our hashmap which is storing names and the addresses of these nodes. We loop over the people living in the same household and create all the necessary households. We have then identified six major relations - self, spouse, children, sibling and parents, grandparents. The Node contains the name of the person, vector of children, vector of siblings, vector of parents, vector of grandparents and the address of spouse node.

All of this is connected in the form of a graph and when the output is generated, it is generated in a JSON format outlining the relations and the person with whom it has the relation with. You can find the logic for the same in familyTree.cpp and src/index.ts. We have also utilised express, nodejs, typescript, prisma ORM, planetscale as well. We have created an express server that talks to our planetscale database that fetches data from the generated CSVs. We have also implemented the relationships logic that are important for displaying for our family tree within the nodeJs server. We have hosted it on an Oracle Virtual Machine that uses nginx as reverse proxy where the code is running in a docker container.

We would like to extend our gratitude to the problem setters for presenting us with such a challenging and engaging problem statement.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages