Skip to content

[I am forking this repo from the organization's repo to my own repo]This repository contains tutorials, materials for testing purposes, and other documents relating to natural language processing and machine learning. It's first created by Titi KH while primarily working with Ian Scott.

Notifications You must be signed in to change notification settings

koutiany/nlp-ml-repo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nlp-ml-repo

Introduction

This repository contains tutorials, materials for testing purposes, and other documents relating to natural language processing and machine learning. It's first created by Titi KH while primarily working with Ian Scott (with a lot of support from Mike Thicke! And of course other super duper cool co-workers, Cassie Lem, Dimitrios Tzouris, Brian Boggan, and Bonnie Russell).

Stage 1: 2023 Fall/Winter

After gaining basic understanding of KC and other knowledge (e.g. docker environments, how to use API to access stats,etc.), I became more clear of my short-term and long-term goals:

  • testing .py libraries for extracting text from different types of deposited files
  • clean files and build (a) structured dataframe(s)
  • perform topic modeling or other analysis on the data at hand

All related materials can be found in the subfolder "text4test". The script for text extraction comparision can be found under "stage1/tutorial1-textout.md".

Resources I still need to check out

Stage 2: 2024 Spring/Summer

In this stage, I have been focusing on accessing and downloading files using api from Invenio, and then extract text data from all downloaded files. These steps can be found in the script "apiinvenio-9th.py" (in folder "stage2"). The next step is to clean all the extracted text, currently saved in csv, and preparing them for machine learning.

About

[I am forking this repo from the organization's repo to my own repo]This repository contains tutorials, materials for testing purposes, and other documents relating to natural language processing and machine learning. It's first created by Titi KH while primarily working with Ian Scott.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%