RAFRAFI Abdelhalim

PhD student at Sorbonne University
Team : MLIA
https://fr.linkedin.com/in/abdelhalimrafrafi

Supervision : Patrick GALLINARI

Co-supervision : GUIGUE Vincent

Sentiment classification on the Web 2.0

Internet becomes an essential media in everyday life: we use it to check the news, to do our shopping, to shape our opinion, to share our feelings and experience feedbacks. This process generates a large amount of data on our personalities and lifestyles. With this amount of information we are quickly disarmed. textit{"Looks like the overload of information gives a sense of emptiness." French quotation by Jean-Pierre April}. Thus, some automated filtering and analyzing tools are required to make the information accessible to everybody. In this general context, our works focuses on sentiment analysis and on sentiment classification in particular.
Classical algorithms for text categorization like SVM, NB, PLSA or LDA show several limitations for sentiment analysis. These limitations are related to the particularity of the task: sentiment classification requires to take into account the structure of the text (including negations for instance), the lexical field modeling is not sufficient to understand the user messages. However, considering the text structure requires some complex representations and/or algorithms that can hardly scale up. The second challenge consists in optimizing classifiers in large functional space (to describe sentiments efficiently) and preserving generality in the meantime. Indeed, we would like to be able to deal with documents from various topics gathered from different media (Twitter, blogs, reviews...).
We investigated many solutions to tackle those antagonist objectives simultaneously. First we focused on regularized formulations adapted to sentiment classification to perform an efficient feature selection in N-grams space. Then, we explored an orthogonal research axis: given a basic classifier, we simply increased the learning set sizes using the web2.0 as an infinite source of labeled data. Finally, we tried to combine the advantages from both solutions using an original neural network architecture.

Defence : 12/20/2013

Jury members :

Tellier Isabelle - Université Paris 3 [Rapportrice]
Paroubek Patrick - Université Paris Sud [Rapporteur]
Gallinari Patrick - Université Paris 6
Guigue Vincent - Université Paris 6
Gouttas Catherine - Thales Communications&Security
Bennani Younes - Université Paris 13
Marsala Christophe Université Paris 6

Departure date : 12/31/2013

2011-2013 Publications