By Ross Anderson

[21 June 1996] An agreement has been reached between the British Medical Association and the UK's main providers of healthcare analysis information services - CHKS Ltd (a subsidiary of HCIA Inc), the SEMA group, IMG and Reuters - to set minimum standards for the de-identification of medical records. These records are used in analysing hospital readmission rates, referral patterns and casemix, and in epidemiological research generally.

Such studies require records of hospital care episodes to be linked, but they should still not be identifiable to individuals outside the hospital or other care provider (or else patient consent must be sought). A problem had arisen in that some (though not all) healthcare information companies had been identifying patients, and linking episodes, by their postcode and date of birth. This combination is enough to identify over 99% of UK residents.

It has therefore been agreed that in future, de-identified medical records will not contain either the last two symbols of the postcode, or the day and month of birth. Thus for example

        CB5 9HF         15/09/1956

will become

        CB59            56

This is sufficient information for age related casemix studies, and to identify deprived areas. However, it is very rarely enough to identify individuals; there are on average six individuals with each combination of year of birth and postcode sector.

In order that episodes can still be linked, there will also be stored a pseudonym for the patient such as a hospital number, practice number or cryptographic hash function of the patient's name and date of birth (in which case there will be a key unique to each provider).

This arrangement is not entirely sufficient for the secure handling of health information - further access control and statistical security measures are neededd to foil inferencing and other attacks. However it brings the following immediate benefits:

  1. The threat to patient privacy is reduced by orders of magnitude. In prticular, the databases are no longer of value to banks, credit reference agencies, insurers and law enforcement;
  2. The accuracy of the statistics is significantly improved. At present, figures such as hospital readmission rates are skewed by the correlation between ill health and frequent address changes. Using a pseudonym rather than the postcode as the primary key will enable systems to discount address changes within the same district hospital's catchment area;
  3. The standards are in line with the existing guidelines of the Royal College of General practitioners and the General Medical Services Committee of the BMA, which state that no data may be sent outside a general practice without patient consent unless patients cannot be identified by persons external to the practice. This will enable data collected in primary care to be used together with the secondary care records, subject to any inference controls that may be needed.
It should be noted that the use of pseudonyms in medical research and clinical audit systems is well established in a number of countries, including Denmark, Germany and New Zealand. Papers on the systems in use in the latter two countries were given at the workshop on personal information held at Cambridge on the 21st and 22nd June 1996.