PII Anonymization in Log Files

Overview

This repository contains Jupyter notebooks showcasing various approaches to anonymizing Personally Identifiable Information (PII) within log files. The approaches include Regular Expressions (Regex), Presidio Lib, an NLP Model, and Deep Learning (Custom NLP Model). Additionally, sample log files are provided for testing and experimentation.

Approach

The project follows these approaches:

Regex Approach: Utilizes regular expressions to identify and anonymize PII patterns within log files.

Presidio Lib Approach: Implements the Presidio library for PII detection and anonymization in log files.

NLP Model Approach: Develops an NLP model to detect and anonymize PII using techniques such as tokenization and sequence labeling.

Custom NLP Model Approach: Builds a custom NLP model using deep learning techniques tailored specifically for PII detection and anonymization in log files.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Linux_2k.log		Linux_2k.log
NLP_anonymized.txt		NLP_anonymized.txt
PII_Log_Data.ipynb		PII_Log_Data.ipynb
Presidio_anonymized.txt		Presidio_anonymized.txt
README.md		README.md
example.json		example.json
labeled_data.spacy		labeled_data.spacy
pii_log_data.py		pii_log_data.py
re_anonymized.txt		re_anonymized.txt
tokenized_data.json		tokenized_data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PII Anonymization in Log Files

Overview

Approach

The project follows these approaches:

About

Releases

Packages

Languages

praj-tarun/PII_Log_Data-NLP

Folders and files

Latest commit

History

Repository files navigation

PII Anonymization in Log Files

Overview

Approach

The project follows these approaches:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages