Replies: 2 comments 2 replies
-
The EAGE has generously made available ALL the 2023 conference abstracts which is a huge dataset for this hackathon. There are a lot of ideas that you can do with this set of data. But first, you will need to do some prep - OCR, maybe some labelling - and then once you are in all text, time to apply a language model over top. There comes the challenge - the language in this dataset is specific to quite technical depth in the geosciences and engineering - which language model do you choose? Then how to create a way to interact with it? Another idea is to look for patterns in this text dataset - what are the trends in the EAGE 2023 abstracts? What about additional metadata with the abstracts (authors, companies...etc) - does this tell us anything about the state of our science? |
Beta Was this translation helpful? Give feedback.
-
I am keen to try out the idea for using the dataset to understand trends - I also have a notebook that crawls google scholar for past EAGE abstracts - this needs to have a mini pause so that it conforms with google's bot policy so might need to run overnight and even then google might ban requests from it but whatever I get from it could be used in conjunction with what's on offer on EAGE. |
Beta Was this translation helpful? Give feedback.
-
Please use this thread to post any ideas you have around natural language processing for geoscience or engineering. These are the ideas that teams should form around.
Beta Was this translation helpful? Give feedback.
All reactions