Core Principles


This practice area addresses the following Markle Connecting for Health Core Principles for a Networked Environment:

  • Purpose Specification

  • Collection Limitation and Data Minimization

  • Use Limitation

  • Security Safeguards and Controls

See Architecture for Privacy in a Networked Health Information Environment for more information.

There are significant risks if business partners of Consumer Access Services are permitted to combine data with other databases to identify individuals or create a more complete profile of the consumer's health. Such practices have the potential to create unauthorized third party relationships of which the consumer may be completely unaware. Chain-of trust agreements should prohibit this type of activity. (See CP4: Chain-of-Trust Agreements.) In addition, Consumer Access Services can further protect consumers – as well as themselves – by ensuring that the identifying information they expose to partners is the minimal amount necessary. For example, in some cases, a Consumer Access Service could share a consumer's age, but not date of birth, with a third party because age is less potentially revealing of identity than a specific date of birth.

In the Internet Age, information is increasingly difficult to classify as "identified" or "de-identified," particularly as it is copied, exchanged, or recombined with other information. With rapidly evolving technologies and databases, it is more appropriate to describe a spectrum of "identifiability," rather than a binary classification of information as identifiable or not. The question could then become not whether de-identified information might be made re-identifiable, but rather which entities would be able to re-identify the information, how much effort they would have to expend, and what limits are placed on their doing so.

HIPAA Regulations (45 C.F.R. § 164.514) provide standards for de-identification, including a list of 18 "identifier" data elements that must be stripped out in order for a limited data set to qualify as "de-identified."1 

The Privacy Rule also allows a second way to de-identify information by having a qualified statistician determine, using generally accepted statistical principles and methods, that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by the anticipated recipient to identify the subject of the information. The qualified statistician must document the methods and results of the analysis that justify such a determination.

This HIPAA regulation remains a reasonable industry standard for defining information as "de-identified" in many circumstances today. However, it may not be fully identity-protective in some contexts, such as when applied to very small subsets of populations, or with the ever-increasing amounts of "partially identifying information" gathered in electronic environments. (See Appendix A for more on partially identifying information.) This reality will necessitate frequent monitoring of risk by policymakers in both the public and private sectors.

Recommended Practice

Consumer Access Services should limit disclosures of identifying data to only those data that are necessary to perform the specified function(s) that the recipient is authorized to perform.

Care should be taken to limit the release or exposure of information that can be directly or indirectly tied to an individual, including electronic identifiers such as IP address, cookies, and web beacons.

Any release of such indirectly or directly identifying information should be consistent with all nine Connecting for Health Privacy Principles and all of the Practice Areas of this Common Framework, particularly specification of purpose, limitation of use to only specified purpose, and no unauthorized combining of data to create a more complete profile of individuals.

Appendix A: "Partially Identifying Data"

In today's web environment, much of what consumers do is recorded and tracked by the sites they visit. Even when consumers are not logged in, various pieces of information are collected about them. These little bits of data are often not personally identifying at the time and point of collection. But in some cases, these bits of information can be combined with other bits of information to build a more complete profile of each user. When enough information is collected and combined, it can be used to identify individuals. Hence, we call this information "partially identifying." Examples include cookies, web beacons, and even search keywords.

For illustration, "persistent cookies" are little pieces of text deposited in the web browsers of consumers by the web sites they visit. In a similar way that a ticket from the dry cleaner lets the proprietor link the customer out front with the right clothes held in the back, cookies contain lookup information that lets a web site link a user to other information held about him in a database, such as preferences, search history, or "checkout" items for purchase on the site.

When the consumer returns to a web site at a later time, persistent cookies such as these can tell the web browser to display the user name, show whatever the user has specified to appear on the site's homepage, allow for access to previously entered search queries, or display information about items the user had previously added to a shopping cart.

When search engine companies collect user search query history "anonymously" (i.e., not tied to a specific user identity), the partially identifiable information the user provides can be identifying in and of itself if a consumer searches for information about her name, address, telephone number, and/or personal identifiers. When this information is combined with additional search queries that detail the user's interests, hobbies, health conditions, etc., a very personal picture can be elicited quite easily. For example, America Online in the summer of 2006 released 20 million "de-identified" search queries of more than 650,000 of its users with the intention to help researchers design better search engines. AOL initially claimed the search data had been made anonymous by replacing each search query's associated AOL username with a different unique user ID. But for those search queries that included identifying information along with personal interests, not only were some users' identities revealed, but also intimate details about their personal lives.

Another example of unintentional identification occurred as a result of an airline's practice of printing customers' frequent-flyer numbers on boarding passes in addition to names and seat numbers. An investigative reporter doing a story on identity theft retrieved a passenger's discarded ticket stub and used the information to purchase another ticket from the same airline (in this case from British Airways). In doing so, the reporter was granted access to additional pieces of the passenger's identity, including "passport number, date of birth, and nationality."

The above cases, in which partially identifying information is used by external parties to identify an individual, occurred outside of contractual agreements. However, they do illustrate how the identifiability of information can change over time.

__________

  1. Accessed online on January 2, 2008, at the following URL: http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf.

 

©2008-2011, Markle Foundation

This work was originally published as part of a compendium called The Markle Connecting for Health Common Framework for Networked Personal Health Information. It is made available free of charge, but subject to the terms of a License. You may make copies of this work; however, by copying or exercising any other rights to the work, you accept and agree to be bound by the terms of the License. All copies of this work must reproduce this copyright information and notice.