Is my health data safe on the Internet?
Is my health data safe on the Internet? That is a question we hear frequently when describing web-based Electronic Health Record(EHR) systems, as well as the newly evolving field of mobile health (mHealth) applications.
Privacy and security of health information is a central plank in the policy framework surrounding Meaningful Use and the encouragement to move health care out of a paper-based legacy and onto an e-platform. Banking did this a decade or more ago, but now is the time for health data to move into the modern era.
As the Internet has grown, more and more individual data has been accumulated by various web services and applications – click tracking to refine web searches and customize commerce, subscription lists to e-publications (with member lists often sold on secondary markets) – all these kinds of things generate fear that health data, if it is on the web also, will be subject to the kind of privacy intrusions that happen elsewhere.
Health data vs. Internet data
It is important to recognize that there is a difference between Internet data and health data. By health data, we mean personal information created by healthcare professionals about patients they see (EHR data) – the electronic equivalent of information in a doctor’s paper charts. Unlike health data, which is protected against unauthorized discovery by stiff laws like HIPAA, regular Internet data is not thusly protected. Secondary uses of data do not have HIPAA-level identity-scrubbing. When individuals (patients) receive unrequested ads or other targeted messaging about health conditions they may have, it is generally because of some other place where the condition was self-disclosed, such as signing up for a subscription, or entering data on a self-help web site focused on some particular condition. It is not from an EHR.
HIPAA actually works very well in protecting individual identity from unauthorized disclosure. This is especially important when it comes to de-identifying data for secondary use. There is an entire academic discipline around data de-identification – attempts to re-identify data that has been stripped of personal identifiers is a focus of national policy as well as academic study (we have participated in these kinds of discussions at the national level). The result? When HIPAA-level stripping of a set of 18 different identifiers is done, the probability of re-identification by trying to use outside publicly-available databases to cross-match and identify individuals is extraordinarily low. Ordinary data that is stripped of fewer identifiers, such as is often found on the Internet outside the purview of HIPAA, can be re-identified more readily (though it is difficult) – but EHR health data is protected at a much higher level.
For HIPAA-compliant de-identification to be met, there are 18 elements that must be removed from the data set, if that data is to be exposed publicly. These are: (1) name; (2) location beyond a 3-digit zip (or more blurred if the 3-digit zip has fewer than 20,000 people in it); (3) all dates other than year (date of birth, date of service, etc.); (4) phone number; (5) fax number; (6) email address; (7) social security number; (8) medical record numbers; (9) health plan ID numbers; (10) account numbers; (11) certificate or license numbers; (12) vehicle ID numbers (e.g. license plate numbers); (13) device ID numbers; (14) web URLs; (15) Internet IP addresses; (16) biometric identifiers (e.g. fingerprints); (17) full-face photos or images; (18) any other unique identifier.
The importance of using de-identified data in research
Why be concerned about the re-use of de-identified health data? Because that is where dramatic medical discoveries are made. Post-market drug surveillance, for example, can make use of web-housed EHR data to find unexpected problems long before the traditional “adverse event reporting” mechanism finds something. Syndromic surveillance, part of the anti-terrorism activities that the CDC undertook after the anthrax scare post 9-11, can happen much more real-time when using de-identified web-based EHR data. The examples are bounded only by the imagination.
EHR data, properly de-identified, is a tremendous resource never before available. Medical discovery, both in finding new cures for human ills as well as in finding unanticipated problems from current therapies, can grow at an unprecedented rate. Even the emerging science of Comparative Effectiveness – looking at outcomes and total-system-cost of different approaches to treating a given medical problem – can be dramatically enhanced when using real-time data from physician EHRs.
And, unlike general Internet data and the issues of privacy in realms outside HIPAA, health data is protected at a level that is, in fact, unparalleled. The security safeguards that are part of EHR product Certification (necessary for access to Meaningful Use incentive money) result in data safety that far, far outstrips any other way of housing data (of any sort).
Better than paper?
Is it better than paper? Health data on paper has 2 vulnerabilities: safety and security. In terms of safety, paper-based charts are susceptible to damage, loss and theft. Natural disasters not only destroy offices with paper records, they even destroy locally-housed electronic data (locally installed EHRs on servers within a medical practice) – however, web-based data (from a Certified product) remains available as soon as Internet connectivity is re-established. In terms of security, HIPAA protects electronic data; it does not protect paper charts. There is no “encryption” for information in paper charts. There is no audit log to see who has been looking at which patient’s records. In short, moving from paper to an e-platform (especially a web platform) improves health data safety, as well as security.
Like with banking, we are moving into an era where on-line health data (just like on-line banking) will become regarded as an ordinary matter-of-course. The kinds of rigorous privacy and security protections that are such a central focus of the EHR industry and of the health policy and regulatory communities ensure that health data, unlike ordinary Internet data, is in a class by itself.