Skip to content

Detecting and Anonymizing the PII (personally identifiable information) from log files using NLP

Notifications You must be signed in to change notification settings

praj-tarun/PII_Log_Data-NLP

Repository files navigation

PII Anonymization in Log Files

Overview

This repository contains Jupyter notebooks showcasing various approaches to anonymizing Personally Identifiable Information (PII) within log files. The approaches include Regular Expressions (Regex), Presidio Lib, an NLP Model, and Deep Learning (Custom NLP Model). Additionally, sample log files are provided for testing and experimentation.

Approach

The project follows these approaches:

Regex Approach: Utilizes regular expressions to identify and anonymize PII patterns within log files.

Presidio Lib Approach: Implements the Presidio library for PII detection and anonymization in log files.

NLP Model Approach: Develops an NLP model to detect and anonymize PII using techniques such as tokenization and sequence labeling.

Custom NLP Model Approach: Builds a custom NLP model using deep learning techniques tailored specifically for PII detection and anonymization in log files.

About

Detecting and Anonymizing the PII (personally identifiable information) from log files using NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published