- Authors: Jan Lindquist, Dativa; Paul Knowles, Dativa; Mark Lizar, OpenConsent; Harshvardhan J. Pandit, ADAPT Centre, Trinity College Dublin
- Status: PROPOSED
- Since: 2019-08-07 (updated 2019-10-03)
- Status Note: This RFC is under development. The reference implementation was recently added.
- Supersedes: Indy HIPE PR #55: Consent Receipt
- Start Date: 2018-11-08
- Tags: concept
- Summary
- Motivation
- Overview
- Concepts
- Use Cases
- Implementation Guidelines
- Implementation Reference
- Reference
- Annex A: PDP Schema mapping to Kantara Consent Receipt
- Prior art
- Unresolved questions
- Plan
- ToDo
- Comments
This RFC describes a reference implementation for data privacy agreements, in which people initiate personal information sharing with the use of DLT and privacy rights as defined in the GDPR--thus leveraging the GDPR to innovate beyond compliance, to personal data control and explicit consent.
It takes advantage of advances in standards and privacy that operationally begins with privacy by design and default. From this technical context in the design of DLT, this RFC splits apart personal data protection processing and modularizes information overlays process, in order to separate the components to be used with the Hyperledger Aries framework.
A key challenge with privacy and personal data sharing and self-initiated consent is to establish trust. There is no trust in the personal data based economy. GDPR Article 25, Data Protection by Design and by Default, lists recommendations on how private data is processed. Here we list the technology changes required to implement that GDPR article. Note the RFC focuses on formalizing the processing agreement associated with the consent, rather than on informal consent dialogue.
Hyperledger Aries provides the perfect framework for managing personal data, especially personal identifiable information (PII), when necessary data is restricted to protect the identity of the individual or data subject. Currently, the privacy policy that is agreed to when signing up for a new service dictates how personal data is processed and for which purpose. There is no clear technology to hold a company accountable for the privacy policy. By using blockchain and the data consent receipt, accountability of a privacy policy can be reached. The data consent is not limited to a single data controller (or institution) and data subject (or individual), but to a series of institutions that process the data from the original data subject. The beauty of the proposal in this RFC is that accountability is extended to ALL parties using the data subject's personal data. When the data subject withdraws consent, the data consent receipt agreement is withdrawn, too.
GDPR lacks specifics regarding how technology should be or can be used to enforce obligations. This RFC provides a viable alternative with the mechanisms to bring accountability and at the same time protecting personal data.
Three key components need to be in place:
-
Schema bases/overlays
-
Consent Lifecycle
-
Wallet
Schema bases/overlays describes a standard approach to data capture that separates raw schema building blocks from additional semantic layers such as data entry business logic and constraints, knowledge about data sensitivity, and so forth (refer to [RFC 0013: Overlays for details). The data consent lifecycle covers the data consent receipt certificate, proof request and revocation. The wallet is where all data is stored which requires a high level of security and control by individual or institution. This RFC will cover the consent lifecycle.
The Concepts section below explains the RFC in GDPR terms. There is an attempt to align with the vocabulary in the W3C Data Privacy Vocabulary specification.
The consent lifecycle will be based on self sovereign identity (SSI) to ensure that the individual (data subject) has full control of their personal information. To help illustrate how SSI is applied several use cases along a reference implementation will help show the relation between the data subject, data controller and data processor.
These are some concepts that are important to understand when reviewing this RFC.
Secondary Data Controller: The terms "data subject" and "data controller" (see GDPR Article 4, items 1 and 7) should be well understood. The data controller is responsible for the data that is shared beyond their control. A data controller which does not itself collect data but receives it from another controller is termed a 'secondary' data controller. Even though the secondary data controller is independent in its processing of personal data, GDPR requires the primary or original data controller to be responsible for sharing data under the given consent. The 3rd party becomes a secondary controller under the responsibility of the original data controller. Important to note that if a 3rd party does not share the collected data back to the original data controller, then the 3rd party is considered an independent data controller (add reference to CIEU).
Opt-in / Opt-out: These terms describe a request to use personal data beyond the limits of legitimate reasons for conducting a service. If for example the data is shared with a 3rd party a consent or opt-in is required. At any point the data subject may withdraw the consent through an opt-out.
Expiration: The consent may have time limitations that may require being renewed and does not automatically renew. The data subject may have a yearly subscription or for purposes of a trial there needs to be a mechanism to ensure the consent is limited to the duration of the service.
Storage limitation: PII data should not be stored indefinitely and need to have a clear storage limitation. Storage limitation as defined by GDPR is limiting how long PII data is kept to fulfill the legitimate reasons of a service.
Processing TTL: Indy currently supports proof only limited to a specific point in time. For companies that collect data over time to check for proof every minute is not a viable solution. The processing TTL gives allowances for data ingestion to be done for an extended period without requiring performing new proof request. Examples will be given that explain the usage of the term.
These are the use cases to help understand the implementation guide. A reference implementation will help in the development.
-
Alice (data subject) gives data consent by accepting a privacy agreement.
-
Acme (3rd party data controller) requests proof that data consent was given
-
Alice terminates privacy agreement, thus withdrawing her data consent.
Note: additional use cases may be developed based on contributions to this RFC.
These are the steps covered with collect personal data:
- list personal Identifiable data points
- list quasi-identifiable data points
The [Blinding Identity Taxonomy] provides a compressive list of data points that are considered sensitive and shall be handled with higher level of security.
Section will expand terms of the explanation of personal identifiable and quasi-identifiable terms.
The personal data processing (PDP) schema captures attributes used to defines the conditions of collecting data and conditions how data may be shared or used.
These are the PDP schema attributes:
Category | Attribute | Brief description | Comment |
---|---|---|---|
Data subset | DID of associated schema or overlay | Data object identifier | All data objects |
Industry Scope [1] | A predefined description of the industry scope of the issuer. | All data objects | |
Storage (raw) | Expiration Date | The definitive date on which data revocation throughout the chain of engaged private data lockers of all Data Controllers and sub-Data Controllers will automatically occur. In other words when the PDP expires. | Access-Window |
Limitation (Restricted-Time) | How long data is kept in the system before being removed. Different from expiration date attribute limitation indicates how long personal data may be used beyond the PDP expires. Request to be forgotten supersedes the limitation. | Access-Window | |
PII pseudonymization | Data stored with pseudonymization. Conditions of access to are given under purpose attribute of "Access" category. | Encryption | |
Method of psuedonymization | Specify algorithm used for performing anonymisation that is acceptable. | Encryption | |
Geographic restriction | The data storage has geo location restrictions (country). | Demarcation | |
No share | The data shall not be shared outside of the Data Controller responsibility. When set no 3rd party or Secondary Data Controller are allowed. | Demarcation | |
Access (1-n) | Purpose | The purpose for processing data shall be specified (refer to GDPR Article 4, clause 2, for details on processing details). Applies to both a Data Controller and Secondary Data Controller. | Access-Window |
policyUrl | Reference to privacy policy URL that describes policy in human readable form. | Access-Window | |
Requires 3PP PDP [2] | A PDP is required between Data Controller and Secondary Data Controller in the form of code of conduct agreement. | Access-Window | |
Single Use | The data is shared only for the purpose of completing the interaction at hand. "Expiration-Date" is set to the date of interaction completion. | Access-Window | |
PII anonymisation | Data stored with no PII association. | Encryption [3] | |
Method of anonymisation | Specify algorithm used for performing anonymisation that is acceptable. | Encryption | |
Multi-attribute anonymisation | Quasi-identifiable data may be combined create a finger print of the data subject. When set a method of multi-attribute anonymisation is applied on the data | Encryption | |
Method of multi-attribute anonymisation | Specifify algorithm used for performing anonymisation that is acceptable (K-anonymity). | Encryption | |
Ongoing Use | The data is shared for repeated use by the recipient, with no end date or end conditions. However, the data subject may alter the terms of use in the future, and if the alteration in terms is unacceptable to the data controller, the data controller acknowledges that it will thereby incur a duty to delete. In other words, the controller uses the data at the ongoing sufferance of its owner. | Access-Window | |
Collection Frequency (Refresh) | How frequently the data can be accessed. The collection may be limited to once a day or 1 hour. Purpose of attribute is protect data subject to create a profile of behavior. | Access-Window | |
Validity TTL | If collection is continuous the validity TTL specifies when to perform new verification. Verification is to check customer withdrew consent. Note this is method for revocation. | Access-Window | |
No correlation | No correlation is allowed for subset. This means no external data shall be combined for example public data record of the data subject. | Correlation | |
Inform correlation | Correlation is shared with data subject and what data was combined related to them. | Correlation | |
Open correlation | Correlation is open and does not need to be informed to data subject. | Correlation |
As the PDP schema may be the only compulsory linked schema specified in every schema metadata block, we have an opportunity to store the "Framework Description" - a description of the business framework of the issuer.
Predefined values could be imported from the GICS "Description" entries, or, where missing, NECS "Description" entries, courtesy of filtration through the Global Industry Classification Standard (GICS) or New Economy Classification Standard (NECS) ontologies.
The predefined values could be determined by the next highest level code to the stored GICS "Sub-industry" code (or NECS "SubSector" code) held in the associated metadata attribute of the primary schema base to add flexibility of choice for the Issuer.
If a PDP is required between the Data Controller (Issuer) and sub-Data Controller, we should have a field(s) to store the Public DID (or Private Data Locker ID) of the sub-Data Controller(s). This will be vital to ensure auto-revocation from all associated private data lockers on the date of expiry.
As the "PII Attribute" schema object is already in place for Issuer's to flag sensitive data according to the Blinding Identity Taxonomy (BIT), we already have a mechanism in place for PII. Once flagged, we can obviously encrypt sensitive data. Some considerations post PII flagging: (i.) In the Issuer's Private Data Locker : The default position should be to encrypt all sensitive elements. However, the issuer should be able to specify if any of the flagged sensitive elements should remain unencrypted in their private locker. (ii.) In a Public Data Store : all sensitive elements should always be encrypted
When defining a schema there will be a consent schema associated with it.
SCHEMA = {
did: "did:sov:3214abcd",
name: 'Demographics',
description: "Created by Faber",
version: '1.0',
# MANDATORY KEYS
attr_names: {
brthd: Date,
ageic: Integer
},
consent: did:schema:27312381238123 # reference to consent schema
# Attributes flagged according to the Blinding Identity Taxonomy
# by the issuer of the schema
# OPTIONAL KEYS
frmsrc: "DEM"
}
The original schema will have a consent schema reference.
CONSENT_SCHEMA = {
did: "did:schema:27312381238123",
name: 'Consent schema for consumer behaviour data',
description: "Created by Faber",
version: '1.0',
# MANDATORY KEYS
attr_names: {
expiration: Date,
limitation: Date,
dictatedBy: String,
validityTTL: Integer
}
}
The consent schema will have specific attributes for managing data.
Attribute | Purpose | Type |
---|---|---|
expiration | How long consent valid for | Date |
limitation | How long is data kept | Date |
dictatedBy | Who sets expiration and limitation | String |
validityTTL | Duration proof is valid for purposes of data processing | Integer |
The issuer may optionally define an overlay that sets the consent schema values without input from the data subject.
CONSENT_RECEIPT_OVERLAY = {
did: "did:sov:5678abcd",
type: "spec/overlay/1.0/consent_entry",
name: "Consent receipt entry overlay for clinical trial",
default_values: [
:expiration => 3 years,
:limitation => 2 years,
:dictatedBy = <reference to issuer> # ??? Should the DID of the issuer's DID be used?
:validityTTL => 1 month
]
}
If some attributes are identified as sensitive based on the Blinding Identity Taxonomy when a sensitivity overlay is created.
SENSITIVE_OVERLAY = {
did: "did:sov:12idksjabcd",
type: "spec/overlay/1.0/bit",
name: "Sensitive data for private entity",
attributes: [
:ageic
]
}
To finalise a consent a proof schema has to be created which lists which schemas and overlays applied and values. The proof is kept off ledger in the wallet.
PROOF_SCHEMA = {
did: "did:schema:12341dasd",
name: 'Credential Proof schema',
description: "Created by Rosche",
version: '1.0',
# MANDATORY KEYS
attr_names: {
createdAt: DateTime, # How long consent valid for.
proof_key: "<crypto asset>", # How long data is kept.
# Include all the schema did that were agreed upon
proof_of: [ "did:sov:3214abcd", "did:sov:1234abcd"]
}
}
These are the considerations when setting up the ledger:
- Ledger setup
- Wallet setup
- Types of entities
- Agents (working on behalf of entity)
- Communication between entities
- Onboarding
- Onboarding of data subject
These are the steps covered with data consent receipt certificate:
- initial agreement of privacy agreement
- update agreement with changed terms of use
- update agreement with new opt-in
- terminate agreement
- apply option to forget
The following flow diagram for setting up privacy agreement.
These are the steps covered with proof request:
- performing proof request
- performing proof request without personal data
- auditing of proof request by Data Authority
The proof request serves multiple purposes. The main one being the conditions of access are auditable. If a data controller encounters a situation they need to show the consent and conditions of accessing data are meet the proof request provides the evidence. The data subject also has more control of the proof request and in situations the revocation of certificate is not performed this becomes an extra safe guard. An important aspect with proof request is that it can be done without requiring to share any personal data.
The following flow diagram for setting up privacy agreement.
These are the steps covered with certification revocation:
-
data subject initiated revocation
-
data controller initiated revocation
-
data authority initiated revocation (if owns schema base)
A python jupyter notebook is available as reference implementation to help with implementation. The base for this example is getting-started jupyter notebook. In order to run the example take the following steps.
-
Clone indy-sdk \
git clone https://github.com/hyperledger/indy-sdk.git
-
Copy over following files to doc/getting-started \
Note * - Reason for changing the docker-compose.yml is to be able to view consent-flow.ipynb.
-
Ready to start docker-compose \
docker-compose up
-
Open html link and run consent-flow.ipynb
Provide guidance for implementers, procedures to inform testing, interface definitions, formal function prototypes, error codes, diagrams, and other technical details that might be looked up. Strive to guarantee that:
-
Interactions with other features are clear.
-
Implementation trajectory is well defined.
-
Corner cases are dissected by example.
-
General Data Protection Regulation
REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016 -
Data Privacy Vocabulary v0.1
Draft Community Group Report 26 July 2019
[https://www.w3.org/ns/dpv] -
Hyperledger Aries is infrastructure for blockchain-rooted, peer-to-peer interactions
[https://github.com/hyperledger/aries-rfcs] -
Overlays
[../0013-overlays/README.md] -
Blinding Identity Taxonomy
Created by Andrew Hughes, last modified on Jun 19, 2019
[https://kantarainitiative.org/confluence/display/infosharing/Blinding+Identity+Taxonomy] -
W3C Verifiable Credential https://www.w3.org/TR/vc-data-model/
Kantara has defined a Consent Receipt with a list of mandatory and optional attributes. This annex maps the attributes to the PDP. Many of the attributes are supported through the ledger and is not directly included in the PDP.
Note: The draft used for this annex was file "Consent receipt annex for 29184.docx".
Kantara attribute | Hyperledger Indy mapping |
---|---|
Version | Schema registration |
Jurisdiction | Agent registration |
Consent Timestamp | PDP signed certificate |
Collection Method | - |
Consent Receipt ID | PDP signed certificate |
Public Key | Ledger |
Language | Overlays |
PII Principal ID | Schema/Agent registration |
PII Controller | Agent registration |
On Behalf | Agent registration (1) |
PII Controller Contract | Agent registration (2) |
PII Controller Address | Agent registration |
PII Controller Email | Agent registration |
PII Controller Phone | Agent registration |
PII Controller URL [OPTIONAL] | - |
Privacy Policy | PDP |
services | PDP |
purposes | PDP |
Purpose Category | - |
Consent Type | PDP |
PII Categories | - |
Primary Purpose | PDP |
Termination | Ledger |
Third Party Name | PDP |
Sensitive PII | Schema base |
Notes
(1) Agent may be of type Cloud Agent which works on behalf of an Issuer (Data Controller). When the institution when they register in blockchain should make it clear who are they registering on behalf.
(2) Controller Contact may change over time and is not a good reference to be used when accepting a consent. If required suggest include as part of Agent registration (or requirement)
Current data processing of PII date is not based on blockchain. Data is processed through ETL routines (ex. AWS API Gateway and Lambda) with a data warehouse (ex. AWS Redshift). The enforcement of GDPR is based on adding configuration routines to enforce storage limitations. Most data warehouses do not implement pseudonymization and may instead opt to have a very short storage limitation of a couple of months. The current practice is to collect as much data as possible which goes against data minimisation.
The Customer Commons iniative (customercommons.org) has developed a [terms and conditions] for personal data usage. The implementation of these terms and conditions will be tied to the schema and overlay definitions. The overlay will specify the conditions of sharing. For more broader conditions the schema will have new attributes for actual consent for data sharing. The work by Hypeledger Aries and Customer Commons complement each other.
The following lists the implementations (if any) of this RFC. Please do a pull request to add your implementation. If the implementation is open source, include a link to the repo or to the implementation within the repo. Please be consistent in the "Name" field so that a mechanical processing of the RFCs can generate a list of all RFCs supported by an Aries implementation.
Name / Link | Implementation Notes |
---|---|
- Upload consent demo to Indy-sdk
- Update schema examples
Question | From | Date | Answer |
---|---|---|---|
Where is consent recorded? | Harsh | 2019-07-31 | There are several types of consent listed below. Where the actual consent is recorded needs Specialised Consent (legal)Generic Consent (legal)General Data Processing Consent |