Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to detect pii prperly #168

Open
Hir98 opened this issue Aug 7, 2024 · 1 comment
Open

Not able to detect pii prperly #168

Hir98 opened this issue Aug 7, 2024 · 1 comment

Comments

@Hir98
Copy link

Hir98 commented Aug 7, 2024

My text is :

Detect SSN,DOB,CreditCard,CVV,Expiration, and Gender in the following and anonymize them by replacing with fictitious data. Note Do not not mask the data: 

Michael Brown,345-67-8901,07/30/1978,3400 0000 0000 009,789,12/23,Male,Photography, Cooking 
Jessica Davis,456-78-9012,11/05/1982,6011 0000 0000 0004,012,03/26,Female,Painting, Cycling 
Emily Johnson,234-56-7890,03/22/1990,5500 0000 0000 0004,456,08/24,Female,Traveling, Yoga 
John Smith,123-45-6789,01/15/1985,4111 1111 1111 1111,123,11/25,Male,Reading, Hiking 
David Wilson,567-89-0123,05/17/1975,3000 0000 0000 04,345,06/25,Male,Golf, Music

and my label is:

 ["person","username",
     "email","email address",
     "address",
     "phone number","mobile phone number","landline phone number","mobile_phone_number","phone_number",
     "credit card CVV", "credit card CVC","CVV","credit card cvv"
     "social security number","security code","credit card security number","social_security_number","credit card security code","bank_account_number","bank account number",
     "driver's license number","US_SSN",
     "birth date","birthdate","date","date_of_birth","expiration date","departure date","arrival date",
     "credit card expiration date","passport issue date","card expiration date","passport expiration date","datetime"]

but in the response i am getting


[
  {
    'start': 8,
    'end': 11,
    'text': 'SSN',
    'label': 'social_security_number',
    'score': 0.5743955373764038
  },
  {
    'start': 27,
    'end': 30,
    'text': 'CVV',
    'label': 'CVV',
    'score': 0.5180321335792542
  },
  {
    'start': 158,
    'end': 171,
    'text': 'Michael Brown',
    'label': 'person',
    'score': 0.9997666478157043
  },
  {
    'start': 184,
    'end': 194,
    'text': '07/30/1978',
    'label': 'date_of_birth',
    'score': 0.5781332850456238
  },
  {
    'start': 214,
    'end': 217,
    'text': '789',
    'label': 'credit card CVV',
    'score': 0.3783819377422333
  },
  {
    'start': 218,
    'end': 223,
    'text': '12/23',
    'label': 'card expiration date',
    'score': 0.32416781783103943
  },
  {
    'start': 251,
    'end': 264,
    'text': 'Jessica Davis',
    'label': 'person',
    'score': 0.9983078241348267
  },
  {
    'start': 265,
    'end': 276,
    'text': '456-78-9012',
    'label': 'credit card CVV',
    'score': 0.3885277211666107
  },
  {
    'start': 277,
    'end': 287,
    'text': '11/05/1982',
    'label': 'date_of_birth',
    'score': 0.5117868781089783
  },
  {
    'start': 308,
    'end': 311,
    'text': '012',
    'label': 'credit card CVV',
    'score': 0.3981240391731262
  },
  {
    'start': 344,
    'end': 357,
    'text': 'Emily Johnson',
    'label': 'person',
    'score': 0.9977560639381409
  },
  {
    'start': 370,
    'end': 380,
    'text': '03/22/1990',
    'label': 'date_of_birth',
    'score': 0.4856725335121155
  },
  {
    'start': 435,
    'end': 445,
    'text': 'John Smith',
    'label': 'person',
    'score': 0.9992269277572632
  },
  {
    'start': 446,
    'end': 457,
    'text': '123-45-6789',
    'label': 'credit card CVV',
    'score': 0.5499760508537292
  },
  {
    'start': 458,
    'end': 468,
    'text': '01/15/1985',
    'label': 'date_of_birth',
    'score': 0.40841439366340637
  },
  {
    'start': 521,
    'end': 533,
    'text': 'David Wilson',
    'label': 'person',
    'score': 0.9988730549812317
  },
  {
    'start': 546,
    'end': 556,
    'text': '05/17/1975',
    'label': 'date_of_birth',
    'score': 0.3618524372577667
  }
]

actual "123-45-6789" is SSN but it detect it as credict card CVV

@urchade can you please tell me what is wrong in this?

@hari-ag00
Copy link

maybe you could try generating synthetic data and finetune the model and add post processing validators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants