Research writing can be technical and difficult to understand. Manual assignments of research areas and related faculties can be timeconsuming and error-prone. Therefore, the first goal of this project is to build an unsupervised model to classify an unseen project proposal. We explored unsupervised learning methods such as K-means and topic models as well as a combination of Latent Dirichlet Annotation (LDA) and Bidirectional Encoder Representations from Transformers (BERT) and K-means to cluster project proposals into different categories. The second goal is to recommend papers for particular project proposals based on other similar publications. We can assume that the authors of the closest papers can be suitable supervisors for the research project. After investigating different features that can be used as numerical vector representation of documents and apply cosine similarity method to f ind matching pairs of paper and proposal, the features outputted by TF-IDF show the most accurate results.
Fig 1: Proposed approach for clustering project Fig 2: Categories visualizationFig 3: Examples of recommended papers based on the content of project and paper abstracts