Large scale implementations for Twitter sentiment classification (Q1662631): Difference between revisions

Summary: Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark's Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification.

0 references

zbMATH Keywords

big data

0 references

Bloom filters

0 references

Twitter

0 references

describes a project that uses

0 references

0 references

0 references

0 references

0 references

0 references

0 references

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.3390/a10010033

0 references

cites work

Manipulating market sentiment

0 references

Large scale implementations for Twitter sentiment classification

0 references

Q2810821

0 references

Space/time trade-offs in hash coding with allowable errors

0 references

Identifiers

zbMATH Open document ID

1461.62204

0 references

DOI

10.3390/a10010033

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1662631

@@ Property / author @@
-Athanasios K. Tsakalidis
@@ Property / author: Athanasios K. Tsakalidis / rank @@
-Normal rank
@@ Property / author @@
+Athanasios K. Tsakalidis
@@ Property / author: Athanasios K. Tsakalidis / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Stanford Tagger
@@ Property / describes a project that uses: Stanford Tagger / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+MapReduce
@@ Property / describes a project that uses: MapReduce / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Hadoop
@@ Property / describes a project that uses: Hadoop / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+MLlib
@@ Property / describes a project that uses: MLlib / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Spark
@@ Property / describes a project that uses: Spark / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+kdANN+
@@ Property / describes a project that uses: kdANN+ / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Apache Spark
@@ Property / describes a project that uses: Apache Spark / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.3390/a10010033
@@ Property / full work available at URL: https://doi.org/10.3390/a10010033 / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W2592794863
@@ Property / OpenAlex ID: W2592794863 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q57395653
@@ Property / Wikidata QID: Q57395653 / rank @@
+Normal rank
@@ Property / cites work @@
+Manipulating market sentiment
@@ Property / cites work: Manipulating market sentiment / rank @@
+Normal rank
@@ Property / cites work @@
+Large scale implementations for Twitter sentiment classification
+Normal rank
@@ Property / cites work @@
+Q2810821
@@ Property / cites work: Q2810821 / rank @@
+Normal rank
@@ Property / cites work @@
+Space/time trade-offs in hash coding with allowable errors
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:1662631