RAFRAFI Abdelhalim
Supervision : Patrick GALLINARI
Co-supervision : GUIGUE Vincent
Sentiment classification on the Web 2.0
Internet becomes an essential media in everyday life: we use it to check the news, to do our shopping, to shape our opinion, to share our feelings and experience feedbacks. This process generates a large amount of data on our personalities and lifestyles. With this amount of information we are quickly disarmed. textit{"Looks like the overload of information gives a sense of emptiness." French quotation by Jean-Pierre April}. Thus, some automated filtering and analyzing tools are required to make the information accessible to everybody. In this general context, our works focuses on sentiment analysis and on sentiment classification in particular.
Classical algorithms for text categorization like SVM, NB, PLSA or LDA show several limitations for sentiment analysis. These limitations are related to the particularity of the task: sentiment classification requires to take into account the structure of the text (including negations for instance), the lexical field modeling is not sufficient to understand the user messages. However, considering the text structure requires some complex representations and/or algorithms that can hardly scale up. The second challenge consists in optimizing classifiers in large functional space (to describe sentiments efficiently) and preserving generality in the meantime. Indeed, we would like to be able to deal with documents from various topics gathered from different media (Twitter, blogs, reviews...).
We investigated many solutions to tackle those antagonist objectives simultaneously. First we focused on regularized formulations adapted to sentiment classification to perform an efficient feature selection in N-grams space. Then, we explored an orthogonal research axis: given a basic classifier, we simply increased the learning set sizes using the web2.0 as an infinite source of labeled data. Finally, we tried to combine the advantages from both solutions using an original neural network architecture.
Defence : 12/20/2013
Jury members :
Tellier Isabelle - Université Paris 3 [Rapportrice]
Paroubek Patrick - Université Paris Sud [Rapporteur]
Gallinari Patrick - Université Paris 6
Guigue Vincent - Université Paris 6
Gouttas Catherine - Thales Communications&Security
Bennani Younes - Université Paris 13
Marsala Christophe Université Paris 6
2011-2013 Publications
-
2013
- A. Rafrafi : “Classification de sentiments sur le Web 2.0”, thesis, phd defence 12/20/2013, supervision Gallinari, Patrick, co-supervision : Guigue, Vincent (2013)
- E. Guardia‑Sebaoun, A. Rafrafi, V. Guigue, P. Gallinari : “Cross-Media Sentiment Classification and Application to Box-Office Forecasting”, Proceedings of the 10th Conference on Open Research Areas in Information Retrieval (OAIR '13), Lisbon, Portugal, pp. 201-208 (2013)
- A. Rafrafi, V. Guigue, P. Gallinari : “Classification de Sentiments Multi-Domaines en Contexte Hétérogène et Passage à l’Echelle”, CORIA 2013 - 10e Conférence en Recherche d'Informations et Applications, Neuchâtel, Switzerland, pp. 117-124 (2013)
-
2012
- A. Rafrafi, V. Guigue, P. Gallinari : “Coping with the Document Frequency Bias in Sentiment Classification”, Sixth International AAAI Conference on Weblogs and Social Media (ICWSM'12), Dublin, Ireland, pp. 314-321 (2012)
- A. Rafrafi, V. Guigue, P. Gallinari : “Réseau de neurones à double convolution pour la classification de sentiments multi-domaines”, Actes de la Conférence Francophone sur l'Apprentissage Automatique - CAp 2012, Nancy, France, pp. 16 p. (2012)
- A. Rafrafi, V. Guigue, P. Gallinari : “Représentations et régularisations pour la classification de sentiments”, CORIA, Bordeau, France, pp. 285-300 (2012)
-
2011
- A. Rafrafi, V. Guigue, P. Gallinari : “Réseau de neurones profond et SVM pour la classification de sentiments”, CORIA: COnférence en Recherche d'Information et Applications, Avignon, France, pp. 121-133, (Éditions Universitaires d'Avignon) (2011)
- A. Rafrafi, V. Guigue, P. Gallinari : “Pénalisation des mots fréquents pour la classification de sentiments”, Les Cahiers du numérique, vol. 7 (2), pp. 63-84, (Lavoisier) (2011)