Introduction PROS Ria DLNK, Alexandr K, Heri K Random forest fulltext

Proceedings of the IConSSE FSM SWCU 2015, pp. MA.26–41 ISBN: 978-602-1047-21-7 SWUP MA.26 Random forest of modified risk factor on ischemic and hemorrhagic Case study: Medicum Clinic, Tallinn, Estonia Ria Dhea Layla Nur Karisma a , Alexandr Kormitsõn b , Heri Kuswanto c a,c Sepuluh Nopember Institute of Technology, Jl. Arief Rahman Hakim, Surabaya 60117, Indonesia b Tallinn University of Technology, Ehitaja Tee 5, Tallinn 19086 , Estonia Abstract Estonia is one of European Union countries with capital city named Tallinn. It is one of Baltic area with population 1312300 and they have problem in health such as Stroke Cerebrovascular which is the second biggest cardiovascular disease cause of death. The aim of study is to classify modified factor Ischemic patient and Hemorrhagic Patient using ensemble method. It used Random Forest analysis which is a classifier formed from a set of tree structure, where each tree is a random independent vector which has identical distribution and each tree comes from best unit. Generally, the method has better accuracy than individual classification. The unit of observation is 420 patients consist of missing data and the independent variable is modified factor of Ischemic patient and Hemorrhagic patient in Medicum clinic, Tallinn, Estonia. The independent variable is alcohol habit, diet habit, smoking habit, physical activity, and body mass index. Proportion of training and testing data is 85:15, whereas it formed proportion of original data set. In this research, used bootstrap with replacement 2015 times one used and replication 300 along 3 combination of predictor variable, which is 1,7 in miss accuracy. The important modified risk factor is diet habit and alcohol habit. Variable that has influenced in Ischemic is smoking habit, diet habit, and physical activity meanwhile in Hemorrhagic is diet habit. Response variable has imbalance data then we are considered for appropriate accuracy that showed by sensitivity and specificity. Accuracy of prediction model 98.32 and validation of the model is 95.23, then sensitivity and specificity are 98.6 and 97.2 respectively. Keywords stroke, ischemic, hemorrhagic, modified risk factor, ensemble method, random forest

1. Introduction

Estonia is one of European Union countries located in North Europe with capital city named Tallinn. It is one of Baltic area with Latvia and Lithuania. Based on Statistics data January 2015 population Estonia were 1312300, this is less than 3600 from last year Estonian Statistics, 2015. Health problem become one of factors that cause population decrease in Estonia. Cardiovascular becomes one of major death cause in Estonia, there was died 786.5 every 100000 people on 2010 European Commission, 2010. Stroke is the second biggest cardiovascular disease caused death in Estonia Estonian Statistics, 2015. Based on the cause, stroke is divided into 2 types i.e. Ischemic and Hemorrhagic. This study will discuss about Ischemic classification and Hemorrhagic classification based on controlled risk factors. Corresponding author. Tel.: +62 812 160 02783; E-mail address: R.D.L.N. Karisma, A. Kormitson, H. Kuswanto SWUP MA.27 Control risk factors as variable in this case are alcohol habit, smoking habit, physical activity, body mass index, dietary habit, weight, and height. Some algorithms such as classification tree and regression tree are used by researcher for the sake of classification. CART Classification and Regression Tree methods, one of classification methods in data mining, was used by Takahashi et al. 2006 for identified four groups for mortality in intracerebral haemorrhage patients ICH. Random forest was used Rajagopal et al. 2013 about Enhancer Identification from Chromatin State giving conclusion that random forest is informative feature for classification, because it can predict enhancers accurately in a genome-wide based on chromatin modi-fications. Problem and aim to be solved in this study case i.e. characteristic patient Ischemic and Hemorrhagic then classify Ischemic patient and Hemorrhagic patient reviewed by modified risk factor. Sample data is sample of patient data Ischemic and Hemorrhagic 2009 to 2014, which is expected to represent patients Ischemic and Hemorrhagic. Hospital data in Estonia for cardiovascular disease on 2001 have 3245 then increase in 2009 has 3327 per 100000 population Nichols et al., 2014.

2. Materials and methods