Contributor(s)Lu, Qin (COMP)
Full recordShow full item record
Abstractxx, 167 pages : color illustrations
PolyU Library Call No.: [THS] LG51 .H577P COMP 2018 Li
With the rapid development of the Internet, a large amount of text data is generated every day. People express their feelings and emotions through the Internet. The study of emotion analysis from texts is essential to understand the sentiment and emotions of people in public, especially in various social media. This is a major step towards enabling machines to have affective intelligence. This thesis focuses on emotion analysis from text, which studies four areas in emotion analysis, including (1) high-quality emotion corpus construction, (2) more comprehensive multi-dimensional emotion lexicon construction, (3) phrase level emotion analysis, and (4) fine grained emotion prediction based on event roles in context. The contribution mainly consists of five parts. The first part is on high-quality emotion corpus construction. Due to prohibiting cost, earlier works on emotion corpus annotation have very limited success. Many works use automatic methods based on natural labels such as hashtags, which can be very noisy. In this thesis, a three-step selection framework is proposed to improve the quality of corpus using natural labels by fltering noises in microblog data. The framework includes both automatic noise removal and semi-automatic noise removal. Evaluation of this framework shows that the corpus acquired automatically is of high-quality with Kappa value reaching 0.92. It can reduce manual annotation workload by 45.5% with a relative improvement in quality by 23.0% in macro F-score. The second part is on word level emotion analysis, namely multi-dimensional emotion lexicon construction which is more comprehensive and theoretically more sound. The biggest problem with emotion lexicons using discrete labels is its limited computability and extensibility. We propose to construct emotion lexicons based on multi-dimensional emotion model, such as Valence-Arousal-Dominance (VAD), Evaluation-Potency-Activity (EPA) using continuous values for each dimension. Then, a regression based method is proposed to infer affective meanings of words from word embedding. Evaluation on various emotion lexicons shows that the proposed method outperforms the state-of-the-art methods on all the lexicons under different evaluation metrics with large margins. Comparing to other state-of-the-art methods, the proposed method also has a computational advantage. The emotion lexicons obtained using our methods are available for public access.
The third part investigates phrase level emotion analysis. Based on vector representations of words, compositional models can be used to infer vector representations of larger text units. In this work, we first investigate the effectiveness of different word representations in compositional models for phrases on a phrase sentiment analysis task. Representation models include multi-dimensional emotion lexicons, sentiment lexicon and word embedding. Results show that word embedding clearly outperforms special purpose emotion lexicons even though they are cognitively backed by theories. Secondly, we investigate how phrase embedding can be learned and thus emotions of phrases can be inferred from their embedding representation directly. A hybrid method is proposed to learn phrase embedding from both the external context as well as component words with a compositionality constraint in such a way to reduce the data sparseness problem and at the same time reduce the semantic problem for non-compositional phrases. Evaluation on four datasets shows that the performance of this hybrid method is more robust and can improve the phrase embedding. The fourth part investigates fine-grained emotion analysis. Most studies on emotion analysis focus on the sentiment or emotion expressed by a whole sentence or document. In this work, a novel task is proposed to predict the emotion states of event roles in a specifc event context, where an event role can be the subject, act and object involved in the describedevent. This is backed by cognition theory of Affective Control Theory (ACT) that emotion states are context dependent. The main idea is to use automatically obtained word embedding as word representation and use the Long Short-Term Memory (LSTM) network as the prediction model. Compared to the linear model used in ACT which uses manually annotated EPA lexicon, the proposed method outperforms the linear model and word embedding also performs better than EPA lexicon. Together, our works show that (1) high-quality emotion corpus can be obtained through natural labels with proper noise elimination process; (2) provision of a sound and automatic method to obtain multi-dimensional emotion lexicons; (3) under different compositional models, word embedding representation performs better than other dimensional emotion representations; (4) both external context and component words are useful for learning the embedding of phrases; and (5) emotion under specifc context can be inferred more effectively based on LSTM with word embedding. Word embedding as a general semantic representation is a promising word representation even in domain specifc applications including emotion analysis.
Department of Computing
Ph.D., Department of Computing, The Hong Kong Polytechnic University, 2018