4.2. Implementasi Preprocessing Dokumen 4.2.1. Implementasi Pembacaan Isi Dokumen
Pada langkah pembacaan file dokumen ini sistem akan membaca isi dokumen. Dengan parameter method path merupakan lokasi dimana
folder lokasi jawaban disimpan dan fileName merupakan nama dokumen. berikut ini merupakan list code openFile 4.2.1.
public static String openFileString path, String fileName throws FileNotFoundException, IOException {
String text = , teks = ; FileReader fr = new FileReaderpath + + fileName;
BufferedReader br = new BufferedReaderfr; while teks = br.readLine = null {
text = text + teks + \n; }
br.close; fr.close;
return text;
}
List Code 4.2.1 openFile
4.2.2. Implementasi Tokenisasi, Menghapus Tanda Baca, Case Folding
Pada langkah ini sistem akan menghapus tanda baca filterTandaBaca, melakukan tokenisasi tokenisasi, dan case folding caseFolding
public static String filterTandaBacaString doc throws FileNotFoundException, IOException {
String cek = ; String tandaBaca = openFilesrcaplikasi, tanda baca.txt;
for int i = 0; i tandaBaca.length; i++ { cek = tandaBaca.substringi, i + 1;
doc = replacedoc, cek, ; doc = replacedoc, , ;
doc = replacedoc, , ; doc = replacedoc, , ;
} return doc ;
} public static String tokenisasiString doc {
doc = replacedoc, , \n; return doc;
} public static String caseFoldingString doc {
doc = doc.toLowerCase; return doc;
}
List Code 4.2.2 Tokenisasi, Menghapus Tanda Baca, Case Folding
4.2.3. Implementasi Stopword
Pada tahap ini sistem melakukan proses penghilangan kata yang termasuk stopword
kata yang tidak mempengaruhi proses pemerolehan informasi. Kata yang dihilangkan merupakan kata - kata yang tidak layak dijadikan
sebagai kata kunci. Berikut langkah penghilangan kata umum stopword pada list code
proses Stopword public static String stopWordString doc throws
FileNotFoundException, IOException { String stoplist = openFilesrcaplikasi, stoplist.txt;
StringTokenizer stop = new StringTokenizerstoplist; String[] stopA = new String[stop.countTokens];
for int i = 0; i stopA.length; i++ { stopA[i] = stop.nextToken;
} StringTokenizer token = new StringTokenizerdoc;
String[] tokenA = new String[token.countTokens];
for int i = 0; i tokenA.length; i++ { tokenA[i] = token.nextToken;
} String kataPenting = ;
for int i = 0; i tokenA.length; i++ { String t = ;
for int j = 0; j stopA.length; j++ { if tokenA[i].equalsIgnoreCasestopA[j] {
tokenA[i] = ; }
} }
for int i = 0; i tokenA.length; i++ { if tokenA[i].isEmpty {
do nothing } else {
kataPenting = kataPenting + tokenA[i] + \n; }
} return kataPenting;
}
List Code 4.2.3 Stopword 4.2.4. Implementasi Stemming
Pada langkah proses stemming ini sistem akan memproses untuk mencari kata dasar berdasarkan algoritma.
public static String stemDocString doc throws FileNotFoundException, IOException {
String hasil = ; StringTokenizer docToken = new StringTokenizerdoc;
stemming setiap kata while docToken.hasMoreTokens {
hasil = hasil + stemdocToken.nextToken + \n; }
return hasil; }
public static String stemString word { String[] cmd = {C:Perl64binperl,
C:Userswin7DocumentsNetBeansProjectsAplikasisrcaplikasistemW ord.pl, word};
Process process; String line = ;
try { process = Runtime.getRuntime.execcmd;
BufferedReader output = new BufferedReadernew InputStreamReaderprocess.getInputStream;
line = output.readLine; output.close;
} catch Exception e { System.out.printlnException: + e.toString;
} return line;
}
List Code 4.2.4a Stemming
1. make a rule 2. open text file
3. get one word 4. stem
5. compare with the real root word 6. count the true word stem
local suffix_1; local suffix_2;
local suffix_3; local suffix_4;
local suffix_5; local prefix_1;
local prefix_2; local prefix_3;
local prefix_4; local prefix_5;
local prefix_6; local prefix_7;
local prefix_8; local prefix_9;
local prefix_10;
local infix_1; local infix_2;
local dict; my word = ARGV[0];
my fileOp; fileOp=D:\\test.txt;
open FILE, , fileOp or die Cant open; my fileOut=D:\\hasilStem2.txt;
open FILEOUT, ,fileOut or die ; my fileTest=D:\\testhasil2.txt;
open FILETESTH, ,fileTest or die ; initial;
right=0; whileline=FILE{
splLine=split\s+,line; print splLine[0]. .splLine[1].\n;;
word=lc splLine[0]; my stemWord=stemword;
my stemWord=stemlc word; print stemWord;
print stemWord.\n;. .splLine[1].\n; if stemWord eq lc splLine[1]
{ print FILEOUT stemWord. .word.\n;
right++; }
else {
print FILEOUT 1 .stemWord. 2 .splLine[1]. 3 .word.\n;
} }
print right; sub initial{
dictionary
hash pasangan substitusi list prefix, suffix, infix
fileOp=C:\\Users\\win7\\Documents\\NetBeansProjects\\Aplikasi \\src\\aplikasi\\kamus.txt;
open FILEDIC, , fileOp or die Cant open; while FILEDIC
{ chomp;
dict{_}=_; }
daftar tingkat dan substitusinya suffix_1=ekaken=i,okaken=u,ekake=i,okake=u,oni
=u,eni=i,wa=, ya=, ning=,nipun=,okna=u,ekna=i,onana=u,enana=i
,onen=u,enen=i, enan=i,on=u, onan=u, ku=,mu=;
suffix_2=kake=,kaken=,ni=,ing=,nana=, nane=,nan=, nen=,ipun=,kna=;
suffix_3=kaken=n,kake=n,kna=n, ana=, an=, en=;
suffix_4=ake=, aken=,en=i, na=,ne=; suffix_5=ke=,ken=, n=,a=,i=;
suffix_6=e=; suffix_1=ekake=i,okake=u,oni=u,eni=i,wa=,
ya=,ning=,okna=u,onana=u,onane=u, enan=i,ean=i,on=u, onan=u,
onen=u,ku=,mu=,nipun=; suffix_2=kake=,ni=,ing=,ana=, nan=,
nen=,ipun=, nane=, nana=; suffix_3=kake=n,i=, en=i, an=, ane=;
suffix_4=ake=, en=, na=,ne=; suffix_5=e=, n=,a=;
prefix_1=m=,nge=a,ny=s,di=,dak=,tak=,kok=,to k=,ka=,
ke=,ku=,ang=, sa=, se=, pa=, peng=, pang=, ing=,u=;
prefix_2=m=p,ng=,ny=c, ke=i,pe=,an=, pen=t, pan=t;
prefix_3=m=w,ng=k, k=, pe=, pa=; prefix_4=n=, a=, p=;
prefix_5=n=t;
prefix_1=dipun=,peng=,peny=,pem=,pam=,pan y=,pra=,kuma=,kapi=,
bok=,mbok=,dak=,tak=,kok=,tok=,ing=,an g=,any=,
am=,sak=, se=,mang=,meng=,nge=,nya=,pi=,ge=,ke=,u=
, po=u,ke=u;
prefix_2=mer=,mi=,sa=,ku=,an=,ka=,ny= s,ng=k,di=,peng=k,pang=k,
pam=p,ke=i,mang=k,meng=k; prefix_3=a=,k=,pam=w,pan=t,
pen=t,mang=w,meng=w, ny=c,ng=; prefix_4=n=t, pan=s, pen=s,man=s,men=s;
prefix_5=pan=,pen=,man=t,men=t,n=; prefix_6=pa=,pe=,man=,men=;
prefix_7=p=,ma=,me=; prefix_8=m=w;
prefix_9=m=p; prefix_10=m=;
infix_1=gum=b,gem=b,kum=p; infix_2=kum=w;
} sub hilangPref{
my word = _[0]; my w=word;
if w =~ dipun|peng|peny|pem|pam|pany|pra|kuma|kapi|bok|mbok|dak|tak|kok|tok|
ing|ang|any|am|sak|se|mang|meng|nge|nya|pi|ge|ke|u|po|ke {
stem=prefix_1{1}.; print FILETESTH stem. p1 .w.\n;
if exists dict{stem} { return stem;}
} ifw=~
mer|mi|sa|ku|an|ka|ny|ng|di|peng|pang|pam|ke|mang|meng {
stem=prefix_2{1}.; print FILETESTH stem. p2 .w.\n;
if exists dict{stem} { return stem;}
} ifw=~ a|k|pam|pan|pen|mang|meng|ny|ng
{ stem=prefix_3{1}.;
print FILETESTH stem. p3 .w.\n; if exists dict{stem}
{ return stem;} }
ifw=~ n|pan|pen|man|men {
stem=prefix_4{1}.; print FILETESTH stem. p4 .w.\n;
if exists dict{stem} { return stem;}
} ifw=~ pan|pen|man|men|n
{ stem=prefix_5{1}.;
print FILETESTH stem. p5 .w.\n; if exists dict{stem}
{ return stem;} }
ifw=~ pa|pe|man|men {
stem=prefix_6{1}.; print FILETESTH stem. p6 .w.\n;
if exists dict{stem} { return stem;}
} ifw=~ p|ma|me
{ stem=prefix_7{1}.;
print FILETESTH stem. p7 .w.\n; if exists dict{stem}
{ return stem;} }
ifw=~ m {
stem=prefix_8{1}.; print FILETESTH stem. p8 .w.\n;
if exists dict{stem} { return stem;}
stem=prefix_9{1}.; print FILETESTH stem. p9 .w.\n;
if exists dict{stem} { return stem;}
stem=prefix_10{1}.; print FILETESTH stem. p10 .w.\n;
if exists dict{stem} { return stem;}
} return w;
} sub hilangSuf{
my word = _[0]; my w=word;
if w =~ ekaken|okaken|ekake|okake|oni|eni|wa|ya|ning|nipun|okna|ekna|onana|ena
na|onen|enen|enan|on|onan|ku|mu {
stem=`.suffix_1{1}; print FILETESTH stem. 1 .w.\n;
} hilang akhiran 2 elsif w =~ kake|kaken|ni|ing|nana|nane|nan|nen|ipun|kna
{ stem=`.suffix_2{1};
print FILETESTH stem. 2 .w.\n; } hilang akhiran 3
elsif w =~ kaken|kake|kna|ana|an|en {
stem=`.suffix_3{1}; print FILETESTH stem. 3 .w.\n;
} hilang akhiran 4 elsif w =~ ake|aken|en|na|ne
{ stem=`.suffix_4{1};
print FILETESTH stem. 4 .w.\n; } hilang akhiran 5
elsif w =~ ke|ken|n|a|i {
stem=`.suffix_5{1}; print FILETESTH stem. 5 .w.\n;
} hilang akhiran 6
elsif w =~ e {
stem=`.suffix_6{1}; print FILETESTH stem. 5 .w.\n;
} if exists dict{stem}
{ return stem;
} else
{ hilang prefix
my stemPref=hilangPrefstem; if exists dict{stemPref}{ return stemPref;}
}
} sub stem{
my word = _[0]; jika panjang kata 3 keluar
if lengthword3{return word;} print word.\n;
loop hilangkan akhiran tingkat 1 , cek kamus, jika ada break
hilangkan awalan tingkat 1, cek kamus, jika ada break kembalikan akhiran tingkat 1, cek kamus, jika ada break
my w=word; if exists dict{w}{ return w;}
hilang infix if indexw,in == 1 ||indexw,um == 1||indexw,em ==
1||indexw,el == 1||indexw,er == 1 {
_=w; sin|um|em|el|er;
print FILETESTH _. i1 .w.\n; if exists dict{_}{ return _;}
elsifw=~ gum|kum|gem {
stem=infix_1{1}.; print FILETESTH stem. i2 .w.\n;
if exists dict{stem} { return stem;}
} else
{ my stemPref=hilangPref_;
if exists dict{stemPref}{ return stemPref;} hilang suffix
my hs=hilangSuf_; if exists dict{hs}{return hs;}
} if _ =~ an|ne
{ stem=`;
if exists dict{stem}{ return stem;} }
}
kata reduplikasi if w =~ m[-]
{ _=w; split-;
if exists dict{}{ return ;} else
{ hilang suffix
if exists dict{hilangSuf}{return ;} w=;
} }
hilang awalan saja my stemPref=hilangPrefw;
if exists dict{stemPref}{ return stemPref;} hilang suffix
my hs=hilangSufw; if exists dict{hs}{return hs;}
hilang reduplikasi tanpa - if indexw,e == 1 ||indexw,substrw,0,1,2==2
{ dua=substrw,0,2;
_=w; sdua; if exists dict{_}{ return _;}
else {w=_;} }
return w; }
List Code 4.2.4b Stemming 4.2.5. Implementasi Save Dokumen Hasil Preprocessing
public static void saveString doc, String filePath, String fileName throws IOException {
FileWriter fw = new FileWriterfilePath + + fileName; fw.writedoc;
fw.flush; fw.close;
}
List Code 4.2.5 Preprocessing
4.3. Implementasi Klasifikasi Dokumen