Sentiment Analysis Resources

Introduction

As increasing numbers of people come online across the globe, they are also increasingly voicing their opinion in social media, review sites, etc.

This makes it important for organizations to pay attention to what is being said about them in different markets, while also leading to substantial opportunities for data analytics to enable a better consumer experience.

In light of this, it is crucial to develop sentiment and emotion analysis methods that work not just on English, but also on the thousands of other languages that people use online.

Multilingual Sentiment Analysis Benchmark Datasets

We offer the following benchmark datasets for download:

Czech
Download our Czech travel review sentiment analysis dataset.
Dutch
Download the Dutch SemEval-2016 Task 5 restaurant review sentiment analysis dataset.
German
Download our German travel review sentiment analysis dataset.
English (Fine Food)
Download the English language Fine Food Reviews sentiment analysis dataset.
English (SST)
Download the English language Stanford Sentiment Treebank movie review sentiment analysis dataset.
Spanish
Download the Spanish SemEval-2016 Task 5 restaurant review sentiment analysis dataset.
French
Download the French Allocine movie review sentiment analysis dataset.
Italian
Download our Italian travel review sentiment analysis dataset.
Japanese
Download our Japanese travel review sentiment analysis dataset.
Russian
Download our Russian travel review sentiment analysis dataset.


Please refer to and cite the below papers for further details.

Multilingual Sentiment Vectors for Deep Sentiment Analysis

We learn sentiment embeddings by training different supervised models. The weights for a word across different models are concatenated into a single sentiment embedding vector for that word.
Method to create word embeddings capturing sentiment

Is the word "hot" negative or positive?

It depends! Clearly, "hot" may be quite positive when talking about music, but it tends to be more negative when talking about laptops.

We create word vector representations such that each dimension stores the sentiment polarity of the word in a different domain, so we can store that "hot" is positive in some domains and negative in others, and we can also quantify how positive or negative it tends to be in each domain.

Unlike traditional sentiment lexicons, our word vectors can capture the differences between different domains. Unlike regular word vectors, our sentiment vectors explicitly store sentiment polarity information about words.

Download

Our sentiment vectors for the languages in the paper are available for download here. The data provides sentiment vectors for words (in multiple languages). The provided sentiment vectors are 26-dimensional, where each dimension captures the sentiment polarity of the word in a specific domain (e.g., electronics, beauty, automative, music).

Sentiment Vectors
Download our Sentiment Vectors for multiple languages.
Domain Lexicons
Download our high-coverage Domain-Specific Sentiment Lexicons for English based on fasttext CommonCrawl induction.

License: CC-BY-NC-SA 4.0 license (for non-commercial use)

For further languages or other uses, please get in touch with us.


For the high-coverage domain-specific lexicons, please cite the following publication:

Domain-Specific Sentiment Lexicons Induced from Labeled Documents   BibTeX
SM Mazharul Islam, Xin Dong, Gerard de Melo (2020)
In: Proc. COLING 2020.


Emotion Analysis Resources


Often, it is useful to go beyond just positive vs. negative towards fine-grained emotion analytics, e.g. when we want to know whether a customer is angry, disappointed, or terrified.

AffectVec
AffectVec provides fine-grained emotion intensity scores for 70,000 English words covering over 200 emotions. We also provide more limited data for over 350 further languages.
Emoji resources
We have also developed a series of emoji-based emotion vectors for words.

References

DCM-CNN neural model to exploit sentiment embeddings

For more information about the datasets and related deep learning methods, please consult our publications:

Cross-Lingual Propagation for Deep Sentiment Analysis  BibTeX
Xin Dong, Gerard de Melo (2018)
In: Proc. AAAI 2018. AAAI Press.
Acceptance rate: 25%

A Helping Hand: Transfer Learning for Deep Sentiment Analysis  BibTeX
Xin Dong, Gerard de Melo (2018)
In: Proc. ACL 2018.
Acceptance rate: 24.9%

 

Return to Main Page


© 2020 Gerard de Melo