Combining a Rule-based Classifier with Ensemble of Feature Sets and Machine Learning Techniques for Sentiment Analysis on Microblog

  

Combining a Rule-based Classifier with Ensemble of

Feature Sets and Machine Learning Techniques for

Sentiment Analysis on Microblog

  Experimental results demonstrate the effectiveness of our method over the baseline and known related work.

  The rest of the paper is structured as follows: Section II describes the state-of-the-art of sentiment analysis task while our proposed framework is described in detail in Section III. Next, in Section IV, the experimental setup and results are discussed to show the effectiveness of our proposed method. Finally, some concluded remarks and future directions of our work described in Section V.

  The main contributions of this paper include: (1) We propose a rule-based classifier based on the occurrences of emoticons and sentiment-bearing words and combine it with a majority voting count based ensemble of supervised classifiers. (2) We extract a set of features, experiment with, and evaluate them for sentiment analysis. (3) Construct sentiment, emoticon, and intensifier lexicons by aggregating several lexical re- sources. (4) We investigate the impact of using the ensemble of sentiment lexicons to train the supervised classifier instead of the large training dataset. (5) We conduct a set of experiments on sentiment140 dataset to compare our proposed method with the baseline and known related work. Experimental results demonstrate that our proposed approach yields significant improvements in sentiment classification task.

  twitter to estimate the extent of product acceptance and to determine the strategies for improving product quality. Sports fans are always fascinated to know the reaction of other people about their favorite players and teams. Moreover, a strong correlation revealed between the twitter posts and the outcomes of political elections. Both voters and political parties are increasingly seeking ways to get an overview of the support and opposition candidates before the elections [1]. Therefore, a comprehensive sentiment analysis on twitter will provide an effective solution to address these scenarios. In this paper, we have proposed a method for sentiment analysis on twitter. To classify the tweets sentiment either positive or negative, we combine a rule-based classifier with the ensemble of supervised classifiers. Our rule-based classifier is based on the occurrences of emoticons and sentiment-bearing words. To train the supervised classifiers, we extract the ensemble of feature sets grouped into five different categories, including twitter specific features, textual features, parts-of-speech (POS) features, lexicon based features, and bag-of-words (BoW) feature. Moreover, a supervised feature selection method is applied to select the best feature combination and estimate their contribution to sentiment classification. Experimental results with Stanford sentiment140 dataset showed that our method improves the sentiment classification performance.

  https://twitter.com

  is now the most popular, where millions of users spread millions of tweets on a daily basis. As these tons of tweets reflect people’s opinions and attitudes, sentiment analysis in twitter has made a hit with a lot of complaisance. Monitoring and analyzing tweets sentiment from twitter provide enormous opportunities for the public and private sectors as well as individual users. For example, people are eager to know the feelings of others about Apple’s new product “iPhone 7” and it will provide great convenience for them if the opinions are collected from massive tweets. As the reputation of a certain product is highly affected by the people’s opinions posted on twitter, companies are now monitoring and analyzing sentiments of public opinions from 1

  1

  I. INTRODUCTION Nowadays, microblog websites are not only the places of maintaining social relationships but also act as a valu- able information source. Every day lots of users turn into microblog sites for expressing their feelings, experiences, views, and opinions about several issues or entities. Among several microblog sites, Twitter

  Keywords: Microblog, sentiment analysis, sentiment lexi- con, feature extraction, rule-based classifier, machine learning.

  2 ) and information gain (IG) is applied to select the best feature combination. We conducted our experiments on Stanford sentiment140 dataset.

  Umme Aymun Siddiqua ∗

  Abstract—Microblog, especially Twitter, have become an in- tegral part of our daily life, where millions of users sharing their thoughts daily because of its short length characteristics and simple manner of expression. Monitoring and analyzing sentiments from such massive twitter posts provide enormous opportunities for companies and other organizations to estimate the user acceptance of their products and services. But the ever-growing unstructured and informal user-generated posts in twitter demands sentiment analysis tools that can automatically infer sentiments from twitter posts. In this paper, we propose an approach for sentiment analysis on twitter, where we combine a rule-based classifier with a majority voting based ensemble of su- pervised classifiers. We introduce a set of rules for the rule-based classifier based on the occurrences of emoticons and sentiment- bearing words. To train the supervised classifiers, we extract a set of features grouped into twitter specific features, textual features, parts-of-speech (POS) features, lexicon based features, and bag-of-words (BoW) feature. A supervised feature selection method based on the chi-square statistics (χ

  Toyohashi University of Technology Toyohashi, Aichi, Japan aymun@kdml.org, tanveer.ahsan@gmail.com, and nowshed@kde.cs.tut.ac.jp

  †

  International Islamic University Chittagong Chittagong-4314, Bangladesh

  ∗

  Department of Computer Science & Engineering

  , and Abu Nowshed Chy †

  

, Tanveer Ahsan

  19 th International Conference on Computer and Information Technology, December 18-20, 2016, North South University, Dhaka, Bangladesh

  II. RELATED WORK Sentiment analysis in a microblogging environment, such as twitter, is one of the state of the art research task in informa- tion retrieval domain, where the major goal is to classify the polarity of a tweet sentiment either positive, negative or neutral based on its contents. Sentiment analysis tasks can be done at several levels of granularity, namely, word level, phrase or sen- tence level, document level, and feature level [2]. Prior works on twitter sentiment analysis mostly based on the noisy labels or distant supervision, for example, considering emoticons to decide the tweets sentiment, to train the supervised classifiers, and so on [3][4][5]. Nowadays, several researchers adapted feature engineering with the supervised learning approach like Support Vector Machine (SVM), Naive-Bayes (NB), Maxi- mum Entropy (Max Ent.), etc. to enhance the classification efficacy [6][7]. Whereas some researchers introduced a number of unsupervised approaches by utilizing lexicons [8][9] as well as the combination of lexicons and emotional signals [10], to overcome the limitation of annotating sentiment labels in large training corpora. Moreover, contents in twitter are informal, fast-evolving, and unstructured too. Along with this direction, current studies has shown that outcomes of twitter sentiment analysis can be applied in a wide range of social applications, including natural disaster management [11], trend identification of political elections [12], etc.

  3

  http://www.lemurproject.org/stopwords/stoplist.dft 3 http://sourceforge.net/p/lemur/wiki/KrovetzStemmer/ 4 http://provalisresearch.com/Download/WSD.zip

  , SentiWordnet [19], and NRC emotion lexicon [20]. We extract the positive and negative sentiment words from these lexicons to build our own positive and negative sentiment lexicons, which are then used in our rule-based classifier. We also utilize these lexicons to extract lexicon based features for training the supervised classifiers. 2

  4

  A sentiment lexicon is a collection of sentiment words such as “good”, “excellent”, “poor”, and “bad”, that are used to indi- cate positive or negative sentiments. For our sentiment analysis task, we construct a strong sentiment lexicon by utilizing seven publicly available sentiment lexicons, including the Bing Liu lexicon [15], SentiStrength lexicon [16], EffectWordNet [17], subjectivity clues from [18], WordStat Sentiment Dictionary

  B. Lexical Resources for Sentiment Analysis

  toriously varied in contents and compositions, it often con- tains non-standard words and domain-specific entities. For instance, people sometimes use “lappy” instead of “laptop”, “Fineeeeeee” instead of “Fine”, “Jelous” instead of “Jealous”, and so on. To normalize such kind of non-standard words into their canonical forms, we utilize two lexical normalization dic- tionaries, collected from [13] and [14]. The first one contains 41181 words and the second one contains 3802 words. For URL normalization, we replace all the links embedded in the tweet with just the word “URL”.

  Lexical Normalization: As the words in tweets are no-

  to reduce the word variants to a common form.

  , because it contains some words that have sentiment information, such as like, useful, thank, etc. We manually inspect the list and discard these words from the list. After that, the remaining words are then stemmed by using Krovetz stemmer

  III. PROPOSED SENTIMENT ANALYSIS SYSTEM In this section, we describe the details of our proposed sentiment analysis system. The goal of our proposed system is to assign positive or negative sentiment label to a tweet document based on its contents. The overview of our proposed framework depicted in Fig. 1.

  2

  Data preprocessing stage is initiated with tokenization. As tweets are informal user generated contents, people use lots of emoticons and twitter specific syntax (RT, @, #hashtag, etc.). But meaningful English words do not contain these characters. After indexing necessary information as metadata, we remove these characters from tweet documents as well as removing the single-letter word. Moreover, in sentiment classification task, stop-words play a negative role because they do not carry any sentiment information and may actually damage the performance of the classifier. To remove the stop-words, we utilize the refined form of Indri’s standard stop-list

  A. Data Preprocessing

  ) statistics and information gain (IG) is applied to select the best feature combination. Next, our proposed rule-based classifier is applied to classify the tweets sentiment as positive, negative or unknown. For the tweets that are labeled as unknown by the rule-based classifier, we consider the majority voting count based predictions from several configurations of the ensemble of classifiers, including probabilistic and multinomial naive- bayes classifier, support vector machine (SVM) classifier, and sequential minimal optimization (SMO) classifier. Results of both rule-based classifier and the ensemble of classifiers are then combined and the set of labeled tweets return to the user.

  2

  At first, our system fetches the set of tweets from the user and indexed them for further processing. In the prepro- cessing stage, we perform the tokenization, lexical normal- ization, URL normalization, stemming, stop-word removal, single-letter word removal, and special character removal. Next, in the feature extraction stage, we extract several fea- tures which are broadly grouped into different categories, including twitter specific features, textual features, parts-of- speech (POS) features, lexicon based features, and bag-of- words (BoW) feature. For extracting lexicon based features, we construct strong sentiment lexicons by combining several publicly available sentiment lexicons. We also construct an emoticon lexicon, where we combine several publicly available emoticon lists. Once our proposed features are extracted, a supervised feature selection method based on chi-square (χ

  Tweets Fig. 1: Proposed sentiment analysis system.

  Prediction Results Training Data Indexing & Preprocessing Lexical Normalization + URL Normalization + Stemming + Stop Word Removal + Special Character Removal Ensemble of Classifiers Naïve Bayes (Probabilistic & Multinomial) + Multiclass SVM + Multiclass SMO Supervised Feature Selection Chi-Square ( χ 2 ) Statistics + Information Gain (IG) Lexical Resources Given Tweets Labeled

  Feature Extraction Twitter Specific Features + Parts-of-Speech Features Lexicon Based Features + Textual Features + Bag-of-Words Rule-Based Classifier Based Prediction Majority Voting Count Based Prediction Combine

  19 th International Conference on Computer and Information Technology, December 18-20, 2016, North South University, Dhaka, Bangladesh Like sentiment lexicons, we also construct an emoticon lexicon, as people increasingly use emoticons on twitter in order to express their feelings. For our emoticon lexicon, we combine four publicly available emoticon lists, including the SentiStrength emoticon lexicon [16], emoticon list from Wikipedia

  5

  and N

  1

  : if N P E > 0 and N N E = 0 then, Class = P ositive R

  2

  : if N N E > 0 and N P E = 0 then, Class = N egative

  R

  3 : if N P W

  − N

  N W

  > P then, Class = P ositive

  R

  4

  : if N N W − N P W > P then, Class = N egative where N

  P E

  N E

  rules. Our unsupervised rule-based classifier casts the senti- ment analysis problem as a multi-class classification problem and labeled each tweet as positive, negative or unknown. To achieve this, we define the following set of rules based on the occurrences of emoticons and sentiment-bearing words:

  denote the number of positive and negative emoticons, N

  P W

  and N

  N W

  denote the number of positive and negative words, and P is a positive integer. of these rules, are labeled with the corresponding class and the remaining rules are discarded. Whereas, tweets that do not satisfy any rules, are labeled with unknown class. We utilize the original tweets in R

  1

  and R

  2

  rule, while preprocessed tweets (as described in section III-A) in R

  3

  and R

  4 rule.

  F. Ensemble of Supervised Learning Approach

  To improve the classification performance, we combine the classification results from different supervised classification models. To achieve this; we make use of probabilistic Naive- Bayes (NB) classifier, multinomial Naive-Bayes classifier, and

  R

  F 23 Number of positive words available in the tweet. F 24 Number of negative words available in the tweet. F 25 Number of positive words with adjective POS. F 26 Number of negative words with adjective POS. F 27 Number of positive words with verb POS. F 28 Number of negative words with verb POS. F 29 Number of positive words with adverb POS. F 30 Number of negative words with adverb POS. F 31 Percentage of positive words with adjective POS. F 32 Percentage of negative words with adjective POS. F 33 Percentage of positive words with verb POS. F 34 Percentage of negative words with verb POS. F 35 Percentage of positive words with adverb POS. F 36 Percentage of negative words with adverb POS. F 37 Number of intensifier words available in the tweet.

  , Sharpened-text based emoticons

  is A j

  6

  , and DataGenet- ics

  7

  popular emoticons list on twitter. Moreover, we construct an intensifier lexicon based on the SentiStrength [16] inten- sifier list and several online dictionaries.

  C. Feature Extraction

  For our sentiment analysis system, we extract a set of features broadly grouped into different categories, including twitter specific features, textual features, parts-of-speech (POS) features, lexicon based features, and bag-of-words (BoW) feature as described in Table I. We exploit some special characteristics of tweets like #hashtag, retweet etc. to extract the twitter specific features. Textual features are extracted using only explicitly contained information in the text, like word count, occurrences of quotes, exclamation, emoticon, and so on. We take into account the presence of certain POS tags and utilize our constructed lexicons (as described in section III-B) to extract the POS features and lexicon based features. To identify the POS of each tokenized word, we use the CMU ARK POS tagger [21]. For BoW feature, we utilize the unigram feature extractor and TF-IDF as feature weight.

  D. Supervised Feature Selection

  By applying supervised feature selection methods, we can identify and remove unnecessary, irrelevant, and redundant features that do not contribute to the accuracy of the predictive model or decrease the accuracy of the model. In general, a relatively simple and quite useful method for doing feature se- lection is to independently rank feature attributes according to their predictive abilities for the category under consideration. In this approach, we can simply select the top-ranking features. The predictive ability of a feature attribute can be measured by a certain quantity that indicates how correlated a feature is with the class label. In our tweet sentiment analysis system, we apply the chi-square (χ

  2

  ) statistics [22] and information gain (IG) [22] techniques to select the best feature combination and estimate the importance of each selected feature.

  E. Rule-based Classifier

  In a rule-based classifier, we usually construct a set of rules that determine a certain combination of patterns, which are most likely to be related to the different classes. Each antecedent part corresponds to a pattern and the consequent part corresponds to a class label. We can define a rule as follows:

  R j : if x

  1

  1

  Le xicon Based Features

  and . ...... x n is A jn then Class = C j

  , j = 1, . ....., N where R j is a rule label, j is a rule index, A j

  1

  is an antecedent set, C j is a consequent class, and N is the total number of 5

  https://en.wikipedia.org/wiki/List of emoticons 6 http://www.sharpened.net/emoticons/ 7 http://www.datagenetics.com/blog/october52012/index.html

  Table I: Features used in our sentiment analysis system.

  Type

  ID Feature Description T witter

  Specific F 1 Whether the tweet contains a #hashtag or not.

  F 2 Whether the tweet is a retweet or not. F 3 Whether the tweet contains a user name or not. F 4 Whether the tweet contains a URL or not.

  T extual Features

  F 5 TweetLength: Number of words in the tweet. F 6 AvgWordLength: Average character length of words. F 7 Number of question marks available in the tweet. F 8 Number of exclamation marks available in the tweet. F 9 Number of quotes available in the tweet. F 10 Number of words start with the uppercase letter in tweet. F 11 Whether the tweet contains a positive emoticon or not. F 12 Whether the tweet contains a negative emoticon or not.

  P arts-of-Speech (POS) Features

  F 13 Number of noun POS available in the tweet. F 14 Number of adjective POS available in the tweet. F 15 Number of verb POS available in the tweet. F 16 Number of adverb POS available in the tweet. F 17 Number of interjection POS available in the tweet. F 18 Percentage of noun POS in the tweet. F 19 Percentage of adjective POS in the tweet. F 20 Percentage of verb POS in the tweet. F 21 Percentage of adverb POS in the tweet. F 22 Percentage of interjection POS in the tweet.

  19 th International Conference on Computer and Information Technology, December 18-20, 2016, North South University, Dhaka, Bangladesh sequential minimal optimization (SMO) for Support Vector Machine (SVM) implemented in WEKA [23]. We also utilize the multiclass SVM implementation from [24]. In our system, we train these classification models with several configurations and use an ensemble approach to incorporate their prediction results.

  G. Combining the Classifiers

  6

  C. Parameter Settings

  In order to determine the optimal value of parameter, P used in the R 3 and R4 rule in our rule-based classifier, we observed the sensitivity (in terms of accuracy) of our Run

  4 method (as described in section IV-D) for different values of P . The result is illustrated in Fig. 3. Based on the observation, the optimal value of parameter P is set to as

  3.

  78

  80

  82

  84

  86

  1

  2

  3

  4

  5

  7

  ) statistics and information gain (IG), we applied the publicly available package WEKA [23]. In both cases, we got the similar kind of result. The result of our supervised feature selection process indicates that feature F 2 (Retweet) and feature F 9 (Quotes Count) are irrelevant. To illustrate the importance of each selected feature, we applied the WEKA’s Ranker algorithm to rank the features according to their chi-square (χ

  1 score, and accuracy.

  https://www.cs.cornell.edu/people/tj/svm light/svm multiclass.html

  and Run4, respectively. At Run5, we used the training dataset of sentiment140 dataset (containing 1.6 million tweets) with BoW feature to train the multiclass SVM classifier. At Run6, we combined our proposed rule-based classifier with the setup 9

  Run2 . The results of these experiments are showed in Run3

  rule-based classifier is combined with the setup of Run1 and

  Run2 . Next, to enhance the classification performance, our

  The summarized results of our experiments are presented in Table III. At first, we showed the classification performance processing with baseline method resulted in Run1. After that, several sentiment lexicons were utilized to train the Naive- Bayes classifier with BoW feature, which has resulted in

  As a baseline system, we utilized the training set of 1.6 million tweets to train the probabilistic Naive-Bayes classifier with bag-of-words (BoW) feature for classifying the sentiment of the test set. To evaluate the performance of our sentiment analysis system, we applied four standard evaluation measures, including recall, precision, F

  8

  D. Results with Sentiment Classification

  9 API.

  SVM-Multiclass

  Fig. 3: Accuracy estimation for different values of, P . For the training of each supervised classification model, we used the default parameters setting given by WEKA [23] and

  10 P Acc ura cy (%)

  9

  2 ) statistics score as depicted in Figure 2.

  2

  After developing our proposed rule-based classifier and training several supervised classifiers, we combine them to classify the tweets sentiment. At first, our rule-based classifier is applied to classify the tweets sentiment as positive, negative or unknown. But, our goal is to classify the tweets sentiment only positive or negative class. That is why; for the tweets that are labeled as unknown by the rule-based classifier, we consider the majority voting count based predictions from several classifiers as the final labels.

  26

  The basic idea behind the majority voting based classifi- cation scheme is that classification of an unlabeled sample is estimated according to the class that gets the highest number of votes [25]. This approach has frequently been used as a combining method to overcome the limitation of a single classifier. Mathematically, it can be written as: class

  (X ) = argmax

  c i ∈ dom (y) X k

  g (y k (x), c i ) ! where y k (x) is the classification of the k’th classifier and g (y, c) is an indicator function defined as: g

  (y, c) = 1, if y = c 0, if y 6 = c

  IV. EXPERIMENTS AND EVALUATION

  A. Dataset Collection

  For evaluating our sentiment analysis system, we made use of Stanford twitter sentiment140

  8

  dataset [3]. The training set consists of 1.6 million tweets with the equal number of positive and negative tweets, while the test set consists of 182 positive and 177 negative tweets. The test set was col- lected with specific queries chosen from different categories, including person, movies, events, etc. and annotated manually. For each category, the total number of corresponding tweets, positive tweets, and negative tweets were presented in Table II. Each training and test tweet are composed of the tweet id, tweet text, query, screen name, and tweet polarity.

  Category #Tweets #Positive #Negative Company 119

  33

  86 Misc.

  67

  41 Person

  For supervised feature selection by using chi-square (χ

  3 Location

  B. Results with Supervised Feature Selection

  8 Total 359 182 177 8 http://help.sentiment140.com/for-students

  8

  14 Events

  4

  18

  16

  65

  19

  16 Movies

  47

  63

  17 Product

  48

  19 th International Conference on Computer and Information Technology, December 18-20, 2016, North South University, Dhaka, Bangladesh th

  19 International Conference on Computer and Information Technology, December 18-20, 2016, North South University, Dhaka, Bangladesh

  80000 e

  60000 Scor ²) (χ

  40000 re

  20000

  • Squa Chi

  Feature ID Fig. 2: Features importance.

  100

  of Run5. Next, we trained the WEKA’s [23] sequential minimal optimization (SMO) and multinomial Naive-Bayes classifier

  80

  with our selected 35 features on sentiment140 dataset and combined our proposed rule-based classifier with these two

  60 Baseline

  setups. The results of these experiments are presented in Run7

  racy (%) Run9

  40

  and Run8, respectively. Finally, at Run9, at first our rule-based classifier is applied to classify the tweet sentiment as positive,

  Accu

  20

  negative, or unknown. For the tweets that are labeled as unknown by the rule-based classifier, we consider the majority voting count based predictions from the ensemble of classifiers setup, including Run3, Run4, Run6, Run7, and Run8.

  Table III: Classification results (in %) on sentiment140 dataset.

  Run ID Recall Precision F Score Accuracy

  Baseline

  63.19

  80.99

  70.99

  73.82 Run1

  80.77

  81.67

  81.22

  81.06 E. Comparison with Related Work Run2

  89.56

  74.43

  81.30

  79.11 The comparative results of our proposed method with the Run3

  85.16

  82.45

  83.78

  83.29

  known related works [26] [27] [3] [16] [6] on sentiment140

  Run4

  91.76

  81.07

  86.08

  84.96

  test set are described in Table IV. It shows that in terms

  Run5

  85.71

  81.68

  83.65

  83.01 Run6

  88.46

  82.56

  85.41 84.68 of accuracy, our proposed method outperforms the related Run7

  80.77

  73.87

  77.17

  75.77 methods and baseline. Run8

  82.97

  75.12

  78.85

  77.44 Table IV: Comparative results with related work. Run9

  92.31

  84.35

  88.42

  87.74 Method Accuracy (%) Our Method

  87.7 Results showed that our rule-based classifier yielded signif- Sentiment-topic-features [26]

  86.3

  icant improvements in all cases, which indicates the comple-

  Label propagation [27]

  84.7

  mentary importance of our proposed rule-based classifier. At

  MaxEnt. (Unigram+Bigram) [3]

  83.0 Run4 , we utilized several sentiment lexicons to train the Naive- SentiStrength [16]

  81.7 Bayes classifier to alleviate the construction of large annotated Meta-level-features [6] 81.6 training dataset and combined it with the rule-based classifier. Semantria (Online System)

  78.1 As Run4 outperform other runs (Run1 to Run8) in terms of Baseline

  73.8

  recall, F score, and accuracy, we can deduce that utilizing

  1

  the sentiment lexicons instead of large training dataset is Saif et al. [26] achieved 86.3% sentiment classification effective while considering single supervised classifier. Finally, accuracy by utilizing the sentiment-topic-features, where they we achieved the best result in terms of all four evaluation extracted latent topics and the associated topic sentiment from measures at Run9, which indicates that combining a rule-based tweets. Whereas Speriosu et al. [27] conducted their exper- classifier with the ensemble of supervised classifiers improves iment on a subset of the sentiment140 test set (75 negative the sentiment classification performance. and 108 positive tweets) and reported 84.7% accuracy by

  We also compared the category-wise performance of our using label propagation on a rather complicated graph that proposed method (Run9) with the baseline. Fig. 4 showed has users, tweets, word unigrams, word bigrams, hashtags, and the category-wise comparative results. For all types of tweet emoticons as its nodes. Go et al. [3] reported 83% accuracy categories, our proposed Run9 method outperforms the base- with Maximum Entropy (MaxEnt.) classifier trained with the line significantly in terms of accuracy. This indicates the combination of unigrams and bigrams. Bravo et al. [6] utilized applicability of our proposed system for all categories of different sentiment dimensions as meta-level-features, where tweets. they combined aspects such as opinion strength, emotion, and polarity indicators. They reported 81.6% classification accuracy. We also evaluate the test set with the two well- known online sentiment classification system, named as Sen- tiStrength

  10

  [16] M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment strength detec- tion for the social web,” Journal of the American Society for Information Science and Technology (JASIST) , vol. 63, no. 1, pp. 163–173, 2012.

  [12] O. Almatrafi, S. Parack, and B. Chavan, “Application of location-based sentiment analysis using twitter for identifying trends towards indian general elections 2014,” in Proceedings of the 9th International Con- ference on Ubiquitous Information Management and Communication (IMCOM) . ACM, 2015, p. 41.

  [13]

  B. Han, P. Cook, and T. Baldwin, “Automatically constructing a normalisation dictionary for microblogs,” in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Process- ing and Computational Natural Language Learning (EMNLP-CoNLL) .

  Association for Computational Linguistics (ACL), 2012, pp. 421–432. [14]

  F. Liu, F. Weng, and X. Jiang, “A broad-coverage normalization system for social media language,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 . Association for Computational Linguistics, 2012, pp. 1035–1044.

  [15]

  B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and com- paring opinions on the web,” in Proceedings of the 14th International Conference on World Wide Web (WWW) . ACM, 2005, pp. 342–351.

  [17] Y. Choi, L. Deng, and J. Wiebe, “Lexical acquisition for opinion inference: A sense-level lexicon of benefactive and malefactive events,” in Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA) , 2014, pp. 107–112. [18] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP) . Association for Computational Linguistics, 2005, pp. 347–354. [19] S. Baccianella, A. Esuli, and F. Sebastiani, “Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining.” in LREC, vol. 10, 2010, pp. 2200–2204. [20] S. M. Mohammad and P. D. Turney, “Crowdsourcing a word–emotion association lexicon,” Computational Intelligence, vol. 29, no. 3, pp.

  [11]

  436–465, 2013. [21] O. Owoputi, B. O’Connor, C. Dyer, K. Gimpel, N. Schneider, and N. A.

  Smith, “Improved part-of-speech tagging for online conversational text with word clusters.” Association for Computational Linguistics, 2013. [22] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in ICML, vol. 97, 1997, pp. 412–420. [23]

  A. Bifet, G. de Francisci Morales, J. Read, G. Holmes, and

  B. Pfahringer, “Efficient online evaluation of big data stream classifiers,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 59–68.

  [24]

  I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, “Support vec- tor machine learning for interdependent and structured output spaces,” in Proceedings of the twenty-first International Conference on Machine Learning (ICML) . ACM, 2004, p. 104. [25] L. Rokach, “Ensemble-based classifiers,” Artificial Intelligence Review, vol. 33, no. 1-2, pp. 1–39, 2010.

  [26]

  H. Saif, Y. He, and H. Alani, “Alleviating data sparsity for twitter sentiment analysis.” CEUR Workshop Proceedings (CEUR-WS. org), 2012. [27] M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge, “Twitter polarity classification with label propagation over lexical links and the follower graph,” in Proceedings of the First Workshop on Unsupervised Learning in NLP . Association for Computational Linguistics, 2011, pp. 53–63.

  D. Buscaldi and I. Hernandez-Farias, “Sentiment analysis on microblogs for natural disasters management: a study on the 2014 genoa floodings,” in Proceedings of the 24th International Conference on World Wide Web (WWW) Companion . International World Wide Web Conferences Steering Committee, 2015, pp. 1185–1188.

  X. Hu, J. Tang, H. Gao, and H. Liu, “Unsupervised sentiment analysis with emotional signals,” in Proceedings of the 22nd International Conference on World Wide Web (WWW) . International World Wide Web Conferences Steering Committee, 2013, pp. 607–618.

  [16] and Semantria

  [3]

  11

  . However, our proposed rule-based classifier, the ensemble of supervised classifiers trained on different kind of features addressed the sentiment classification problem effectively and yielded the significant improvements over the known related work.

  V. CONCLUSION AND FUTURE DIRECTION In this paper, we proposed an efficient and effective method for sentiment analysis on twitter. We introduced a rule-based classifier based on emoticons and sentiment-bearing words and combined it with a majority voting based ensemble of supervised classifiers to classify the tweets sentiment. We constructed several lexicons by utilizing several lexical re- sources and extracted several features for sentiment analysis. After supervised feature selection and ranked the features based on importance, we see that our lexicon based features have significant importance in the sentiment analysis task. We also found that utilizing the several sentiment lexicons for training rather than large annotated dataset is effective, while single supervised classifier combined with the rule- based classifier. Experimental results on Stanford sentiment140 dataset demonstrate the effectiveness of our proposed method over the baseline and known related methods.

  In future, we have a plan to incorporate several rules based on complex semantics and introduce more features to improve the classification efficacy. We also have a plan to handle the neutral sentiment class and extend the applicability of our proposed method to other languages like Bangla.

  R

  EFERENCES [1]

  X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, “Topic senti- ment analysis in twitter: a graph-based hashtag sentiment classification approach,” in Proceedings of the 20th International Conference on Information & Knowledge Management . ACM, 2011, pp. 1031–1040. [2] R. Feldman, “Techniques and applications for sentiment analysis,” Communications of the ACM (CACM) , vol. 56, no. 4, pp. 82–89, 2013.

  A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using distant supervision,” CS224N Project Report, Stanford, vol. 1, p. 12, 2009. [4] L. Barbosa and J. Feng, “Robust sentiment detection on twitter from biased and noisy data,” in Proceedings of the 23rd International Con- ference on Computational Linguistics (COLING): Posters . Association for Computational Linguistics (ACL), 2010, pp. 36–44.

  [10]

  [5]

  D. Davidov, O. Tsur, and A. Rappoport, “Enhanced sentiment learning using twitter hashtags and smileys,” in Proceedings of the 23rd Inter- national Conference on Computational Linguistics (COLING): Posters .

  Association for Computational Linguistics, 2010, pp. 241–249. [6]

  F. Bravo-Marquez, M. Mendoza, and B. Poblete, “Combining strengths, emotions and polarities for boosting twitter sentiment analysis,” in Proceedings of the 2nd International Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM) . ACM, 2013, p. 2.

  [7] P. Chikersal, S. Poria, and E. Cambria, “Sentu: Sentiment analysis of tweets by combining a rule-based classifier with supervised learning,” in Proceedings of the International Workshop on Semantic Evaluation (SemEval 2015) , 2015. [8]

  B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, “From tweets to polls: Linking text sentiment to public opinion time series.” ICWSM, vol. 11, no. 122-129, pp. 1–2, 2010. 10 http://sentistrength.wlv.ac.uk/ 11 https://www.lexalytics.com/semantria

  [9]

  G. Paltoglou and M. Thelwall, “Twitter, myspace, digg: Unsupervised sentiment analysis in social media,” ACM Transactions on Intelligent Systems and Technology (TIST) , vol. 3, no. 4, p. 66, 2012.

  19 th International Conference on Computer and Information Technology, December 18-20, 2016, North South University, Dhaka, Bangladesh