SANTTUcurriculum vitae
12 Feb 2017

Can we analyse the clinical diabetic survey questionnaires using machine learning?

Type-1 diabetes is a major health problem in the present generation with 10% of all the adults are diagnosed with diabetes. There are many factors that must be considered to effectively manage it; daily insulin injections, a healthy diet, regular physical activity as well as others described in Table.

Label Self-care factors
CBG-Monitor Check blood glucose with monitor
RBG-Results Record blood glucose results
CKGL-High Check ketones when the glucose level is high
TCD-Insulin Take the correct dose of insulin
TI-Time Take insulin at the right time
EC-Food Portions Eat the correct food portions
Eat-Timely Eat meals/snacks on time
KF-Records Keep food records
RF-Labels Read food labels
Rec-Carbs Treat low blood glucose with just the recommended amount of carbohydrate
Carry-Sugar Carry quick acting sugar to treat low blood glucose
Clin-Appoint Come in for clinic appointments
WM-Alert Wear a medic alert ID
Exercise Exercise
AIDGFE Adjust insulin dosage based on glucose values, food, and exercise.

Type-1 diabetes can develop at any age but usually appears before the age of 40. It is the most common type of diabetes found in children. It is often influenced by the lifestyle of the patient and treatment requires well-managed self-care. Awareness of medication adherence also plays an important role in the treatment. We, therefore, in this study, aim to locate, define, analyse and interpret, via statistical machine learning approaches, patterns existing in the habits and behaviour of patients with regard to their medication in order to motivate treatment suggestions and determine the most suitable treatment plan.

The process of analysing the data in the form of pictures is called information visualisation. This helps and supports decision making in numerous fields, including health-care surveys. Visualising information from large amounts of heterogeneous survey data in order to find out interesting patterns is a difficult task, but by using data-mining techniques (clustering) coupled with artificial neural networks in the form of SOMs renders it tractable.

In particular, clinicians often conduct surveys to better understand their patients. As mentioned earlier, using traditional descriptive statistical methods such as mean, variance, skewness and frequency, may lead to overly simplified conclusions. Hence, clinicians require statistical machine-learning tools that could be deployed as a ‘black-box’ for carrying out data analysis. For these reasons, we make use of the SOM algorithm for mining correlations and clustering similar responses within the surveys. The clustered responses in the higher dimensions are then visualised in a 2-dimensional grid thereby reducing the complexity within the data. Reducing the complexity in the data reveals more meaningful relationships, enabling understanding of the dependencies among the responses given in the survey. Previously, SOM has been used to visually explore data areas such as health, lifestyle, nutrition, financial, gene expression, marine safety and linguistics. Recently, SOM has also been used to explore questionnaire based loneliness survey data.

The main objectives of this article are two-fold. First, from the computational perspective, we would like to examine the feasibility of using Self-Organizing Map as a means of extracting useful information from survey questionnaires. Second, from the scientific perspective, we would like to understand if the responses collected from the Type-1 Diabetes survey are reasonable and correspond to what domain experts and clinicians would expect. For instance, it is desirable to answer the following questions of an exploratory nature:

  1. Can we identify co-morbidities from the survey?
  2. What are the self-care factors or behaviours that are dependent on each other?
  3. How are the individual patients grouped together?
self organising maps
By comparing component planes one can see whether two components correlate or not. If the outlook is similar, the components strongly correlate. For example, high BP and high cholesterol correlate with each other. Hence, the bottom left side of the U-matrix in Figure b reveals that high BP and high cholesterol have been clustered nearby. Similarly, the correlated co-morbidities (see Figure a are clustered nearby in the U-matrix (Figure 3). For example, we observe the following natural clustering of variables: (1) high BP and high cholesterol; (2) anxiety and depression; (3) heart disease and vascular disease; and, (4) Kidney damage and protein albumin.

The questions being posed here cannot be readily answered using classical methods such as generalized linear models and their variants; hence, the motivation for using SOM for exploring the data. The aim is to: to provide patient level analytics by identifying patient profiles with co-morbidities associated with Type-1 diabetes and to provide patient level analytics by identifying patients profiles who are in need of adjusting their self-care behaviours. The complete article can be found here

Data Analytics Leave a comment

Leave a Reply

%d bloggers like this: