Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A medical researcher wants to use data mining techniques to be able to predict w

ID: 3576568 • Letter: A

Question

A medical researcher wants to use data mining techniques to be able to predict whether or not a patient has a specific lung failure. As a domain export he knows that this probability is combination of personal traits like smoking, exercise level, age, gender and also the area that the patient lives in since population and air quality is big part of it. He was given the historical data of 10,000 patients with or without disease who live in various parts of Houston. Which Data mining technique(s) or combination of techniques should he use so that he has a high chance of high accuracy prediction? (Hint: think how you can involve location info)

Explanation / Answer

The researcher must use a clustering algorithm like K-means to cluster patients in and around a particular location. This could lead to fascinating discoveries like 40-50% people living in a 5 mile radius of a location have thedisease. Th location could be a factory polluting air and we can find out the culprit. Although the researcher wanted to know if a person has lung failure or not. For that after vlustering on the location he can use a classifier like naive bayes which allows to predict the outcome after certain inputs are provided. So the data would be used to train the algorithm to build a model and prediction would follow.

The flow would be

Clustering on location ----> classifier using age, smoking, exercise level, sex -----> model

Query the model by providing input like age, sex, smoking, exercise level -----> prediction in the form of probability, the liklihood of the disease happening to the particular person.