K-Means Execution Time Sentiment Analysis Performance

TELKOMNIKA ISSN: 1693-6930  Cluster Analysis for SME Risk Analysis Documents Based on Pillar K-Means Irfan Wahyudin 681 The second additional condition has been added because high silhouette score does not always represent a solution with good quality for each cluster within. For example, for k=7, α=0.65, and β=0.8, from the silhouette score, they may be considered one of best cluster solutions, but when it is observed deeper, most of its objects have negative silhouette score. Our exploration found that the best cluster solution is k=6, α=0.65, and β=0.8, because it has the highest silhouette score, s=0.30205, and fulfills both additional criteria.

3.5.2. Sum Squared of Error

This evaluation also helps to understand the nature of the cluster solution. It has been noticed that the greater the number of clusters in a cluster solution is, the lower the SSE is resulted. For instance, for k=6, the top 5 cluster solutions are listed in Table 3 above. Table 3 empowers our reason to add the additional conditions since the cluster solutions that have not fulfilled both additional conditions tend to have higher SSE. Thus, those are not recommended as the best solutions, despite having high silhouette score. Table 3. List of the SSE of first top 5 cluster solutions for k=6, and the best cluster solution α β Number of Cluster with Negative Avg Silhouette Number of Empty Cluster SSE 0.75 0.75 1 3 7786.063 0.7 0.75 3 7802.287 0.6 0.8 2 7358.057 0.55 0.8 2 7358.057 0.5 0.8 2 7358.057

0.65 0.8 3974.671

3.5.3. K-Means Execution Time

We also performed some comparison between two clustering task where the first task was performed without SVD decompisition, and the second task was performed with SVD decomposition. From the result we can see that by reduce the dimension using SVD we can save execution time. As seen in Figure 6, the performance of clustering task with SVD up to 50 ms is surpass the other task that not using SVD 400 ms to 460 ms. The comparison was taken for these following parameters: 6 ≤ K ≤ 7 , 0.5 ≤ α ≤ 1.0, and 0.5 ≤ β ≤ 1.0. Figure 6. Clustering performance comparison by execution time in miliseconds on dataset that decomposed with SVD and without SVD

3.5.4. Sentiment Analysis Performance

Djatna T. and Morimoto T. [21] used sortation and rank method to select featured numerical attributes that contain correlation in databases. We used the same idea to populate and rank the most important terms, but the difference is that in this research the sortation and ranking were based on the weight of sentiment score. The sortation was limited up to 200 mostly presented terms which represent the character of the cluster, and Table 4 shows the most weighted terms by selecting the terms with negative polarity then accumulating those TF- IDF and sentiment weight in each cluster. To get proportional measurement, the Risk Score was gained by dividing the total term score with the total number of documents in the cluster.  ISSN: 1693-6930 TELKOMNIKA Vol. 14, No. 2, June 2016 : 674 – 683 682 For instance, in cluster 1, the total term weight is -6262.070, and the total number of documents is 221; thus, the Risk Score is -28.335. To indicate how big the risk is, the sum of TF-IDF score and negative sentiment score was accumulated in each cluster. While each cluster has unique characteristics, on the contrary, each of them has variation sector of business. This indicates that specific business does not always has a specific risk, so, the bank may be more aware and more thoroughly in analyzing every loan proposal that comes in, not treating it in the same way as analyzing previous proposals with the same business sector. The result also shows that the type of risk found in the cluster solution is related to four of 5Cs Character, Capacity, Capital, Condition, and Collateral Credit criteria [20] that are commonly used to make lending decision. The mostly found criterion in the corpus is Capacity, while Character and Capital are not considered too significant, as the top ranked terms do not reflect these criteria. This result can be used by the bank to resharp the risk analysis since only three of the 5Cs criteria are exposed significantly in at least one cluster. The analysts may have difficulities in analyzing Character since it needs more in-depth investigation in the field. However, they must also improve the Character analysis since it is the most important criterion. Capital is the criterion that the analysts may rely on the scoring system, so that they will not be too concerned with delivering the opinions. Table 4. Risk cluster analysis and its corresponding 5Cs criteria Cluster Rank Risk Analysis Number of Documents Risk Score Corresponding 5Cs Criteria 12 Related to collateral and asset fix asset and current asset, e.g. “asset”, “piutang” claim, “tanah”land, “jaminan”collateral 221 -28.335 Collateral 23 Related to income, e.g. “net profit”, “copat” cash operating profit after tax, “pendapatan”profit, “leverage” gain and loss ratio 213 -27.502 Capacity 34 Related to production capacity, e.g. “persediaan”stock, “kapasitas”capacity, “penjualan”sales 47 -27.447 Capacity 45 Related to both income and financial measurements, e.g. “net profit”, “copat” cash operating profit after tax, “pendapatan” profit, “leverage”gain and loss ratio, “equity”, and “roe”return of equity 34 -26.102 Capacity 51 Related to business condition, e.g. “persaingan” business competition, “wilayah”territory, “demonstrasi warga” protest from local residents 2 -35.064 Condition 66 Related to business financial measurement, e.g. “perputaran” business cycle, “quick ratio”, “equity”, “return of equity”, “roe”return of equity and “profit” 2 -24.548 Capacity

K-Means Execution Time Sentiment Analysis Performance

3.5.2. Sum Squared of Error

0.65 0.8 3974.671

3.5.3. K-Means Execution Time

3.5.4. Sentiment Analysis Performance

3.5.5. Comparison with The Conventional Risk Analysist

Parts

Dokumen yang terkait

Modeling Risk Cluster based on Sentiment Analysis in Bahasa Indonesia for SME Business Risk Analysis Documents

PENDAHULUAN Pemetaan Industri Kreatif Kecamatan Jebres Kota Surakarta Dengan Metode K-Means Cluster Analysis.

K-Means Based on Neighbors with Latent Semantic Indexing for Documents Clustering.

Risk Analysis | Karya Tulis Ilmiah Risk Analysis

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

PENERAPAN METODE K-MEANS CLUSTER ANALYSIS UNTUK MENDETEKSI HAMA ATAU PENYAKIT PADA TANAMAN MELON

SISTEM PENDUKUNG KEPUTUSAN MINAT PEMILIHAN JURUSAN SMA DENGAN METODE K-MEANS CLUSTER ANALYSIS

Dukungan

Links

K-Means Execution Time Sentiment Analysis Performance

3.5.2. Sum Squared of Error

0.65 0.8 3974.671

3.5.3. K-Means Execution Time

3.5.4. Sentiment Analysis Performance

3.5.5. Comparison with The Conventional Risk Analysist

Parts

Dokumen yang terkait

Modeling Risk Cluster based on Sentiment Analysis in Bahasa Indonesia for SME Business Risk Analysis Documents

PENDAHULUAN Pemetaan Industri Kreatif Kecamatan Jebres Kota Surakarta Dengan Metode K-Means Cluster Analysis.

K-Means Based on Neighbors with Latent Semantic Indexing for Documents Clustering.

Risk Analysis | Karya Tulis Ilmiah Risk Analysis

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

Analisis Fungsi Algoritma c-means dan Cluster Analysis (variance)

PENERAPAN METODE K-MEANS CLUSTER ANALYSIS UNTUK MENDETEKSI HAMA ATAU PENYAKIT PADA TANAMAN MELON

SISTEM PENDUKUNG KEPUTUSAN MINAT PEMILIHAN JURUSAN SMA DENGAN METODE K-MEANS CLUSTER ANALYSIS

Dokumen yang Anda mencari sudah siap untuk unduhkan