Implementasi Basis Data Implementasi Pengelompokan Dokumen 1. Implementasi G-Means
44
56. cluster1.setCentroidutil.convert1DPCAArrayToRecordcentroid1;
57. cluster2.setRecordListtempRecordListCluster2; 58.
cluster2.setCentroidutil.convert1DPCAArrayToRecordcentroid2; 59. break a;
60. } 61.
62. iftempRecordListCluster2.isEmpty == false { 63. centroid2 = calculateNewCentroidtempRecordListCluster2;
64. } else { 65. System.out.printlnempty cluster;
66. cluster1.setRecordListtempRecordListCluster1; 67.
cluster1.setCentroidutil.convert1DPCAArrayToRecordcentroid1; 68. cluster2.setRecordListtempRecordListCluster2;
69. cluster2.setCentroidutil.convert1DPCAArrayToRecordcentroid2;
70. break a; 71. }
72. 73. cek centroid baru thd centroid lama, apabila centroidBaru =
centroidLama, maka kmeans selesai stop while loop 74. boolean checkCluster1 = checkNewWithOldCentroidcentroid1,
util.convert1DPCARecordToArraycluster1.getCentroid; 75. boolean checkCluster2 = checkNewWithOldCentroidcentroid2,
util.convert1DPCARecordToArraycluster2.getCentroid; 76.
77. pindahkan isi tempRecordListCluster ke recordList di Cluster yang sesuai
78. cluster1.setRecordListtempRecordListCluster1; PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI
45
79. cluster1.setCentroidutil.convert1DPCAArrayToRecordcentroid1;
80. cluster2.setRecordListtempRecordListCluster2; 81.
cluster2.setCentroidutil.convert1DPCAArrayToRecordcentroid2; 82.
83. ifcheckCluster1 checkCluster2 { 84. break a;
85. } 86. ifindex = limit {
87. break a; 88. }
89. 90. index++;
91. } 92.
93. Cluster[] arrayCluster = new Cluster[2]; 94. arrayCluster[0] = cluster1;
95. arrayCluster[1] = cluster2; 96.
97. return arrayCluster; 98. }
Listing 4.3. Pemilihan anak cluster dengan k-means
4. Hitung nilai vektor v dengan v = c1 – c2. Lalu proyeksikan X ke v, menjadi X
’, dengan rumus sebagai berikut ‖ ‖
Setelah itu, dilakukan normalisasi X` sehingga memiliki rerata 0 dan varian 1. 1. public double[] step4double[] centroid1, double[] centroid2, Cluster
46
cluster { 2. Utility util = new Utility;
3. 4. Calculation calc = new Calculation;
5. 6. if centroid1.length = centroid2.length {
7. throw new IllegalArgumentException; 8. }
9. 10. double[] v = new double[centroid1.length];
11. 12. for int i = 0; i v.length; i++ {
13. v[i] = centroid1[i] - centroid2[i]; 14. }
15. 16. ListRecord recordList = cluster.getRecordList;
17. int recListSize = recordList.size; 18.
19. double[][] xi
= new
double[recListSize][recordList.get0.getDataList.size]; 20.
21. for int i = 0; i recListSize; i++ { 22. xi[i] = util.convert1DPCARecordToArrayrecordList.geti;
23. } 24.
25. double[] xiac = new double[recordList.size]; 26.
27. double dotProduct = 0; 28. double norm = 0;
29. for int i = 0; i xi.length; i++ { 30. dotProduct = calc.dotProductxi[i], v;
47
31. norm = Math.sqrtcalc.dotProductv, v; 32. xiac[i] = dotProduct norm; hati2 disini, kita tdk tahu di paper
yg dimaksud dgn 2norm adl 2norm euclidean ataukah norm2 euc_norm2, di code ini, asumsi yg dimaksud paper adalah norm2
33. } 34.
35. double[] transform = calc.zScoreNormalizationxiac; 36.
37. mengurutkan xiac ORDERED xi 38. double[] sortTransf = util.sortArraytransform;
39. double[] z = new double[sortTransf.length]; 40. for int i = 0; i z.length; i++ {
41. z[i] = calc.calculateCDFsortTransf[i]; 42. }
43. 44. return z;
45. }
Listing 4.4. Penghitungan vektor arah data dan proyeksi dataset X ke vektor arah data, serta normalisasi hasil proyeksi dengan z-score normalization
5. Hitung . Apabila
berada pada daerah non-kritis, maka H diterima. Sebaliknya apabila
berada di dalam daerah kritis, maka H
1
diterima dan pusat cluster yang baru adalah c1 dan c2. 1. public boolean step5double[] x, double alpha {
2. boolean stat = false; 3. double ad = NormalityTest.anderson_darling_statisticx;
4. 5. adjustment for few datapoints -- fitur ini dicoba justru membuat
jml cluster mjd tdk akurat mjd lebih sedikit 6. ad = ad 1 + 4 x.length - 25 Math.powx.length, 2;
7. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI
48
8. double pValue = NormalityTest.anderson_darling_pvaluead, x.length;
9. 10. if pValue = alpha {
11. stat = true; 12. } else {
13. stat = false; 14. }
15. return stat; 16. }
Listing 4.5. Uji statistik Anderson-Darling untuk melihat apakah cluster sudah terdistribusi normal atau belum
Source code sub sistem pengelompokan dokumen terlampir di bagian lampiran dokumen ini.
4.4. Implementasi Pencarian Dokumen 4.4.1. Implementasi Preprocessing Query
Preprocessing dilakukan untuk menghilangkan stopword dalam query, menyederhanakan query ke bentuk dasar, serta membobot query dengan TF-IDF.
1. Preprocess prep = new Preprocess; 2. Utility util = new Utility;
3. 4. DatabaseRWOperation dbrw = new DatabaseRWOperation;
5. 6. DataStructuring ds = new DataStructuring;
7. 8. ========== CLUSTER QUERYING
======================== 9. String query = userQuery;
10. 11. ambil dari db
12. ModifiedLinkedList masterLinkedList = null; 13. try {
14. masterLinkedList = dbrw.readMasterTerm; 15. } catch SQLException ex {
49
16. Logger.getLoggerMainRetrieval.class.getName.logLevel.SEVERE, null,
ex; 17. }
18. 19. prep.preprocessQueryquery;
20. String[] queryArray = prep.getContent; 21. String[] wordArray = queryArray[0].split\\s+;
22. 23. String[] matchQuery =
util.sentenceTermContainInListmasterLinkedList, wordArray; 24. System.out.printlnmatchQuery length: + matchQuery.length;
25. 26. if util.emptyStringArraymatchQuery {
27. System.out.printlntidak ditemukan dokumen dengan kata kunci
tersebut; 28. return null;
29. } else { 30. System.out.printlnexist;
31. } 32.
33. UBAH QUERY KE BENTUK RECORD 34. Record recQuery = ds.tokenizeQuerymatchQuery;
35. 36. MERGE AGAR MEMILIKI DIMENSI COLUMN YANG SAMA
DENGAN RECORD YANG LAIN 37. recQuery = ds.mergeQueryrecQuery, masterLinkedList;
38. 39. ambil dari db
40. HashMapString, Integer mapPointDf = null; 41. try {
42. mapPointDf = dbrw.mapPoint; 43. } catch SQLException ex {
44.
Logger.getLoggerMainRetrieval.class.getName.logLevel.SEVERE, null, ex;
45. } 46.
47. ambil dari db 48. int documentCount = 0;
49. try { 50. documentCount = dbrw.allDocumentCount;
51. } catch SQLException ex { 52.
Logger.getLoggerMainRetrieval.class.getName.logLevel.SEVERE, null, ex;
50
53. } 54.
55. HITUNG BOBOT TF-IDF QUERY 56. ModifiedLinkedList pointList = recQuery.getPointList;
57. IteratorPoint iterator = pointList.iterator; 58. while iterator.hasNext {
59. Point next = iterator.next; 60. String term = next.getTerm;
61. boolean containsKey = mapPointDf.containsKeyterm; 62. if containsKey {
63. next.setDfmapPointDf.getterm; 64. double w = next.getTf Math.log10documentCount
next.getDf; 65. next.setWNormw;
66. } 67. }
Listing 4.6. Preprocessing terhadap query pencarian