SANTTUcurriculum vitae
12 Feb 2017

Automatic Clustering of Irregularly Sampled Time Series with Unequal Lengths : An application to eGFR data

Estimated glomerular filtration rate (eGFR) is a derived measurement that characterises the effective functioning of a kidney. It plays a central role in both the management of people with chronic diseases and epidemiology research involving longitudinal data with ten or more years of observations. Often eGFR time series exhibit irregularities such as missing values and unequal lengths. Missing values are an inevitable consequence of the difficulty of ensuring that patients return for regular follow up measurements, while unequal lengths are a result of patients with differing ages and conditions receiving measurements with different frequency. A patient’s eGFR is therefore observed at irregular time intervals, and will have greater or fewer observations depending on their age and the conditions they suffer from.

eGFR dataset

For a clinician, having an easily understandable summary of a patient’s eGFR trend can be useful for determining the progression of diseases ranging from diabetes to chronic kidney disease. In many cases, simply distinguishing between stable (non-decreasing) and unstable (decreasing) trends can prove sufficient. Armed with this information a clinician can identify those patients who are most at risk of suffering a deterioration in their renal function. Presently, this trend differentiation is performed by a nephrologist manually analysing and labelling an eGFR time series. Despite manual labelling being time consuming and expensive, automating the process using standard supervised classifiers directly is not possible, as they require an equal number of input features and eGFR time series are of unequal length. Overcoming this requires either developing a framework for classifying the irregular and unequal eGFR time series, or using interpolation to make the time series amenable to standard classification methods.

In spite of the complexity of eGFR trends, it was possible to utilise a machine learning approach to automatically determine if an eGFR trend is stable or unstable. By equalising the lengths of the eGFR time series for each patient and by uniformly resampling new datapoints, this approach could classify the eGFR trend.

health care • kidney disease • machine learning Leave a comment

Leave a Reply

%d bloggers like this: