Research Article
BibTex RIS Cite

Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Year 2019, Volume: 7 Issue: 3, 467 - 472, 28.09.2019
https://doi.org/10.21541/apjes.459447

Abstract

Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. By using the multi-stages LDA, we were able to perform better (2-stages:70.5%, 3-stages:76.4%) than the state of the art result (60.4%) which was achieved using the plain LDA for 5 classes.

References

  • [1] D. M. Blei, “Probabilistic topic models”, Communications of the ACM, vol. 55, no 4, pp. 77-84, April 2012.
  • [2] A. Daud, J. Li, L. Zhou, and F. Muhammad, “Knowledge discovery through directed probabilistic topic models: a survey”, Frontiers of Compute rScience in Chine, vol. 4, no 2, pp. 280-301, June 2010.
  • [3] M. Steyvers and T. Griffiths, “Probabilistic topic models”, Handbook of latent semantic analysis, vol. 427, no 7, pp. 424-440, February 2007.
  • [4] B. Liu and L. Zhang, “A survey of opinion mining and sentiment analysis”, Mining text data, pp. 415-463, 2012.
  • [5] O. Coban, B. Ozyer, and G. T. Ozyer, “Sentiment analysis for Turkish Twitter feeds,” 2015 23nd Signal Processing and Communications Applications Conference (SIU), May 2015.
  • [6] H. Türkmen, S. I. Omurca, E. Ekinci,“An Aspect Based Sentiment Analysis on Turkish Hotel Reviews”, Girne American University Journal of Social and Applied Sciences, vol. 6, pp. 9-15, 2016.
  • [7] K. Roberts, M. Roach, J. Johnson, J. Guthrie, and S. Harabagiu, “EmpaTweet: Annotating and Detecting Emotions on Twitter”, In Proceedings of the 8th International Conference on Language Resourcesand Evaluation (LREC), May 2012.
  • [8] A. Çelikyılmaz, G. Tur, and D. Tur, “LDA Based Similarity Modeling for Question Answering”, Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, pp. 1-9, May 2010.
  • [9] G. Tur, A. Celikyilmaz, and D. Hakkani-Tur, “Latent semantic modeling for slot filling in conversational understanding,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013.
  • [10] P. Paroubek and A. Pak, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”, Proceedings of the International Conference on Language Resourcesand Evaluation, pp. 17-23, Malta, May 2010.
  • [11] C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis,” Proceeding of the 18th ACM conference on Information and knowledge management - CIKM 09, pp. 375–384, Nov. 2009.
  • [12] R. Chatterjee and S. Agarwal, “Twitter Truths: Authenticating Analysis of Information Credibility”, 2016 3rd International Conference on Computing for Sustainable Global Development, March 2016.
  • [13] A. Ratku, S. Feuerriegel, and D. Neumann, “Analysis of How Underlying Topics in Financial News Affect Stock Prices Using Latent Dirichlet Allocation,” SSRN Electronic Journal, pp. 1072–1081, Jan. 2014.
  • [14] C. Strapparava and R. Mihalcea, “SemEval-2007 task 14,” Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval 07, pp. 70–74, Jun. 2007.
  • [15] F. Colace, M. D. Santo, and L. Greco, “A Probabilistic Approach to Tweets Sentiment Classification,” 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 37–42, Sep. 2013.
  • [16] A. Onan, “Türkçe Twitter Mesajlarında Gizli Dirichlet Tahsisine Dayalı Duygu Analizi”, Akademik Bilişim Konferansı, Feb. 2017.
  • [17] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, vol. 3, pp. 993-1022, March 2003.
  • [18] L. Bolelli, Ş. Ertekin, and C. L. Giles, “Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation,” Lecture Notes in Computer Science Advances in Information Retrieval, pp. 776–780, Apr. 2009.
  • [19] J. Barber, “Latent Dirichlet Allocation (LDA) with Python,” Human Activity Recognition Using Smartphones Data Set. [Online]. Available: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html. [Accessed: 12-Sep-2017].
  • [20] wikizero.net. [Online]. Available: http://www.wikizero.net/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGF0ZW50X0RpcmljaGxldF9hbGxvY2F0aW9u. [Accessed: 20-Oct-2017].
  • [21] “Zemberek NLP,” Zemberek NLP. [Online]. Available: http://zembereknlp.blogspot.com/. [Accessed: 05-Oct-2017].
  • [22] “Download,” Snowball. [Online]. Available: http://snowball.tartarus.org/download.html. [Accessed: 16-Nov-2017].

Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Year 2019, Volume: 7 Issue: 3, 467 - 472, 28.09.2019
https://doi.org/10.21541/apjes.459447

Abstract

Understanding the reason behind
the emotions placed in the social media plays a key role to learn mood
characterization of any written texts that are not seen before. Knowing how to
classify the mood characterization leads this technology to be useful in a
variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling
algorithm, was used to determine which emotions the tweets on Twitter had in
the study. The dataset consists of 4000 tweets that are categorized into 5
different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek,
Snowball, and first 5 letters root extraction methods are used to create
models. The generated models were tested by using the proposed n-stage LDA
method. With the proposed method, we aimed to increase model’s success rate by
decreasing the number of words in the dictionary. Using the multi-stage LDA
(2-stages:70.5%, 3-stages:76.375%) method, the success rate was increased
compared to normal LDA (60.375%) for 5 class.

References

  • [1] D. M. Blei, “Probabilistic topic models”, Communications of the ACM, vol. 55, no 4, pp. 77-84, April 2012.
  • [2] A. Daud, J. Li, L. Zhou, and F. Muhammad, “Knowledge discovery through directed probabilistic topic models: a survey”, Frontiers of Compute rScience in Chine, vol. 4, no 2, pp. 280-301, June 2010.
  • [3] M. Steyvers and T. Griffiths, “Probabilistic topic models”, Handbook of latent semantic analysis, vol. 427, no 7, pp. 424-440, February 2007.
  • [4] B. Liu and L. Zhang, “A survey of opinion mining and sentiment analysis”, Mining text data, pp. 415-463, 2012.
  • [5] O. Coban, B. Ozyer, and G. T. Ozyer, “Sentiment analysis for Turkish Twitter feeds,” 2015 23nd Signal Processing and Communications Applications Conference (SIU), May 2015.
  • [6] H. Türkmen, S. I. Omurca, E. Ekinci,“An Aspect Based Sentiment Analysis on Turkish Hotel Reviews”, Girne American University Journal of Social and Applied Sciences, vol. 6, pp. 9-15, 2016.
  • [7] K. Roberts, M. Roach, J. Johnson, J. Guthrie, and S. Harabagiu, “EmpaTweet: Annotating and Detecting Emotions on Twitter”, In Proceedings of the 8th International Conference on Language Resourcesand Evaluation (LREC), May 2012.
  • [8] A. Çelikyılmaz, G. Tur, and D. Tur, “LDA Based Similarity Modeling for Question Answering”, Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, pp. 1-9, May 2010.
  • [9] G. Tur, A. Celikyilmaz, and D. Hakkani-Tur, “Latent semantic modeling for slot filling in conversational understanding,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013.
  • [10] P. Paroubek and A. Pak, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”, Proceedings of the International Conference on Language Resourcesand Evaluation, pp. 17-23, Malta, May 2010.
  • [11] C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis,” Proceeding of the 18th ACM conference on Information and knowledge management - CIKM 09, pp. 375–384, Nov. 2009.
  • [12] R. Chatterjee and S. Agarwal, “Twitter Truths: Authenticating Analysis of Information Credibility”, 2016 3rd International Conference on Computing for Sustainable Global Development, March 2016.
  • [13] A. Ratku, S. Feuerriegel, and D. Neumann, “Analysis of How Underlying Topics in Financial News Affect Stock Prices Using Latent Dirichlet Allocation,” SSRN Electronic Journal, pp. 1072–1081, Jan. 2014.
  • [14] C. Strapparava and R. Mihalcea, “SemEval-2007 task 14,” Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval 07, pp. 70–74, Jun. 2007.
  • [15] F. Colace, M. D. Santo, and L. Greco, “A Probabilistic Approach to Tweets Sentiment Classification,” 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 37–42, Sep. 2013.
  • [16] A. Onan, “Türkçe Twitter Mesajlarında Gizli Dirichlet Tahsisine Dayalı Duygu Analizi”, Akademik Bilişim Konferansı, Feb. 2017.
  • [17] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, vol. 3, pp. 993-1022, March 2003.
  • [18] L. Bolelli, Ş. Ertekin, and C. L. Giles, “Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation,” Lecture Notes in Computer Science Advances in Information Retrieval, pp. 776–780, Apr. 2009.
  • [19] J. Barber, “Latent Dirichlet Allocation (LDA) with Python,” Human Activity Recognition Using Smartphones Data Set. [Online]. Available: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html. [Accessed: 12-Sep-2017].
  • [20] wikizero.net. [Online]. Available: http://www.wikizero.net/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGF0ZW50X0RpcmljaGxldF9hbGxvY2F0aW9u. [Accessed: 20-Oct-2017].
  • [21] “Zemberek NLP,” Zemberek NLP. [Online]. Available: http://zembereknlp.blogspot.com/. [Accessed: 05-Oct-2017].
  • [22] “Download,” Snowball. [Online]. Available: http://snowball.tartarus.org/download.html. [Accessed: 16-Nov-2017].
There are 22 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Zekeriya Anıl Güven 0000-0002-7025-2815

Banu Diri 0000-0002-4052-0049

Tolgahan Çakaloğlu This is me 0000-0002-4711-7287

Publication Date September 28, 2019
Submission Date September 12, 2018
Published in Issue Year 2019 Volume: 7 Issue: 3

Cite

IEEE Z. A. Güven, B. Diri, and T. Çakaloğlu, “Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets”, APJES, vol. 7, no. 3, pp. 467–472, 2019, doi: 10.21541/apjes.459447.