Clustering Multiple Long-Term Conditions

Background and rationale

What is MLTC?

Multiple Long-Term Conditions (MLTC) is usually defined as the co-existence (or co-occurrence) of two or more long-term conditions (LTCs) in one person. It has traditionally been called multimorbidity, a term which I avoid as unnecessary medical jargon which is less clear to patients, and has negative connotations.

Worldwide, more people are living with MLTC, partly because of people living to older ages and partly because of increased rates of many LTCs. Health systems tend to be designed around the care of single diseases, which means that people with MLTC often have to see multiple healthcare specialists, and their care can become "fragmented", creating difficulties in accessing care, inefficiencies and leading to poor health outcomes.

Why cluster?

One of the biggest challenges in addressing MLTC is that it is a crude marker - there exists huge variety among people who have it. For example, the health and care needs of a person with arthritis and thyroid disease might be very different to a person with a previous heart attack and stroke. This makes it difficult to identify and design strategies to help those with MLTC. If instead, we could find patterns of diseases that often occur together, then we might be able to provide tailored care specific to each pattern.

To better understand this complexity, as a starting point, we might try to identify and explain every possible pattern of diseases that occurs in a population. However, as Stokes and colleagues found in a 2021 study, there were over 63,000 unique combination of 28 conditions from a population of 8 million people - clearly far beyond human comprehension! Clustering can be viewed as a middle-ground, giving us meaningful insights from a smaller and intepretable number of groups.

Clustering is a data-driven method of grouping together similar data. Applied to diseases, this means grouping together diseases that are similar, and in the case of people, grouping together people who are similar. There are of course many attributes we could pick to decide whether two diseases or people are 'similar', but from the perspective of MLTC, our interest is in diseases that co-exist. So two diseases will be similar if they commonly co-exist together in one person and dissimilar if they are rarely seen together. By extension, two people will be similar if they share similar diseases. As applied to this research, the clusters I aim to create are unsupervised, i.e., determined based on information on the diseases alone, without training algorithms to fit to already known patterns. The purpose of doing this in an unsupervised way is to help identify patterns that we don't alreaady know.

The challenge is in how to evaluate whether the clusters are 'good', in the absence of knowledge of what they should be (we give more detail of our approach here).

Generating clusters requires a two-step process:

  1. Create a measure of similarity for each disease or person.
  2. Run a clustering algorithm using the similarity measures.

Classification versus clustering

Throughout history, doctors have attempted to make sense of diseases by classifying them into usable groups. One of the most widely used classifications today is the International Classification of Diseases. Existing classifications tend to group diseases based on the body system they affect. For example, asthma and pulmonary fibrosis are both in the chapter of 'Diseases of the Respiratory System' and eczema is found under 'Diseases of the skin'. In contrast, when clustering conditions that co-occur, we might expect asthma and eczema to both be in the same cluster, as they frequently co-exist. So clustering diseases based on their co-occurrence is different to the traditional anatomical classifications we have used.

Co-occurrence versus sequence

Consider two people with the same two diseases, but developed in a different order over time:

Co-occurrence versus sequence

Intutively, we would expect that the order might be relevant both to the causes of their conditions, how they interact with health services, and to what happens next (such as developing another disease). So, one of the premises of this work is that finding ways to incorporate and handle the order of the diseases may help understand the complexity of MLTC.

Methods developed for natural language are well-suited to this purpose. In language, the order of words is important, and natural language processing (NLP) algorithms are designed to learn patterns in the order of words or documents. In this work, I interpret the order of words in a document as analogues to the order of diseases in a person, and adopt similar methods, translated to healthcare data.