The SEART Data Hub platform allows to easily create large-scale datasets that can be used to either run empirical MSR studies or to train Deep Learning models to automate software engineering tasks.
This project contains several modules:
dl4se-model
: A module containing domain model classes used for mapping the relational database structure to the programming environment;dl4se-analyzer
: A module containing implementations of code analysis operations running ontree-sitter
;dl4se-transformer
: A module containing implementations of code transformation operations running ontree-sitter
;dl4se-crawler
: A standalone crawler application that we use to mine source code from GitHub repositories indexed by GitHub Search;dl4se-server
: A Spring Boot server application that acts as our platform back-end;dl4se-spring
: Common Spring Boot configuration and utilities used in both the server and the crawler;dl4se-website
: A front-end web-application written in Vue.
This section will detail the necessary actions for setting up and running the project locally on your machine.
Heuristics used to identify test code in Java and Python can be found here and here. Heuristics used to identify boilerplate code can be found here and here respectively.
If you have ideas for a feature you would like to see implemented or if you have any questions, we encourage you to create a new discussion. By initiating a discussion, you can engage with the community and our team, and we'll respond promptly to address your queries or consider your feature requests.
To report any issues or bugs you encounter, please create a new issue. Providing detailed information about the problem you're facing will help us understand and address it more effectively. Rest assured, we are committed to promptly reviewing and responding to the issues you raise, working collaboratively to resolve any bugs and improve the overall user experience.