BOUKHALED Mohamed Amine
Supervision : Jean-Gabriel GANASCIA
On computational stylistics: mining literary texts for the extraction of characterizing stylistic patterns
The present thesis locates itself in the interdisciplinary field of computational stylistics, namely the application of statistical and computational methods to the study of literary style. Historically, most of the work done in computational stylistics has been focused on lexical aspects especially in the early decades of the discipline. However, in this thesis, our focus is put on the syntactic aspect of style which is quite much harder to capture and to analyze given its abstract nature. As main contribution, we work on an approach to the computational stylistic study of classic French literary texts based on a hermeneutic point of view, in which discovering interesting linguistic patterns is done without any prior knowledge. More concretely, we focus on the development and the extraction of complex yet computationally feasible stylistic features that are linguistically motivated, namely morpho-syntactic patterns. Following the hermeneutic line of thought, we propose a knowledge discovery process for the stylistic characterization with an emphasis on the syntactic dimension of style by extracting relevant patterns from a given text. This knowledge discovery process consists of two main steps, a sequential pattern mining step followed by the application of some interestingness measures. In particular, the extraction of all possible syntactic patterns of a given length is proposed as a particularly useful way to extract interesting features in an exploratory scenario. We propose, carry out an experimental evaluation and report results on three proposed interestingness measures, each of which is based on a different theoretical linguistic and statistical backgrounds.
Defence : 09/13/2016
Jury members :
Jean-Luc MINEL, Professeur [Rapporteur]
Thierry POIBEAU, Directeur de Recherche [Rapporteur]
Valérie BEAUDOUIN Valérie, Directrice d’Études
Jean-Gabriel GANASCIA, Professeur
Christophe MARSALA, CHRISTOPHE Professeur
Henry SOLDANO, Maître de Conférences
2014-2018 Publications
-
2018
- F. Frontini, M. Boukhaled, J.‑G. Ganascia, Th. Charnois, M. Larjavaara : “Approaching French theatrical characters by syntactical analysis: a study with motifs and correspondence analysis”, chapter in The Grammar of Genres and Styles. From Discrete to Non-Discrete Units, vol. 320, Trends in Linguistics. Studies and Monographs [TiLSM], pp. 118-139, (De Gruyter Mouton), (ISBN: 978-3-11-058968-9) (2018)
-
2017
- F. Frontini, M. Boukhaled, J.‑G. Ganascia : “Mining for characterising patterns in literature using correspondence analysis: an experiment on French novels”, Digital Humanities Quarterly, vol. 11 (2), Göttingen Dialog in Digital Humanities 2015, (Alliance of Digital Humanities) (2017)
-
2016
- M. Boukhaled : “De la stylistique computationnelle: fouille de textes littĂ©raires pour l’extraction de motifs stylistiques caractĂ©risants”, thesis, phd defence 09/13/2016, supervision Ganascia, Jean-Gabriel (2016)
-
2015
- F. Frontini, M. Boukhaled, J.‑G. Ganascia : “Moliere’s Raisonneurs: a quantitative study of distinctive linguistic patterns”, Corpus Linguistics 2015, Lancaster, United Kingdom (2015)
- M. Boukhaled, F. Frontini, G. Bourgne, J.‑G. Ganascia : “Computational Study of Stylistics: A Clustering-based Interestingness Measure for Extracting Relevant Syntactic Patterns”, International Journal of Computational Linguistics and Applications, vol. 6 (1), (Alexander Gelbukh) (2015)
- F. Frontini, M. Boukhaled, J.‑G. Ganascia : “Linguistic Pattern Extraction and Analysis for Classic French Plays”, JournĂ©e ConSciLa (Confrontations en Sciences du Langage), Paris, France (2015)
- M.‑A. Boukhaled, F. Frontini, J.‑G. Ganascia : “A Peculiarity-based Exploration of Syntactical Patterns: a Computational Study of Stylistics”, Workshop on Interactions between Data Mining and Natural Language Processing DMNLP'15 ECML/PKDD 2015 Workshop, Porto, Portugal, pp. 31-40 (2015)
- M.‑A. Boukhaled, F. Frontini, J.‑G. Ganascia : “Une mesure d’intĂ©rĂŞt Ă base de surreprĂ©sentation pour l’extraction des motifs syntaxiques stylistiques”, Actes de la 22e confĂ©rence sur le Traitement Automatique des Langues Naturelles, Caen, France (2015)
- M. Boukhaled : “Une mĂ©thode non supervisĂ©e pour la vĂ©rification d’auteur Ă base d’un modèle gaussien multivariĂ©”, CORIA 2015 - ConfĂ©rence en Recherche d'Infomations et Applications, Paris, France, pp. 525-533, (ARIA) (2015)
- M.‑A. Boukhaled, Z. Sellami, J.‑G. Ganascia : “Phoebus : un Logiciel d’Extraction de RĂ©utilisations dans des Textes LittĂ©raires”, 22e ConfĂ©rence sur le Traitement Automatique des Langues Naturelles, Caen, France (2015)
- M. Boukhaled, J.‑G. Ganascia : “Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules”, Natural Language Processing and Cognitive Science Proceedings 2014, Venice, Italy, pp. 115-122, (DE GRUYTER) (2015)
-
2014
- M. Boukhaled, J.‑G. Ganascia : “Probabilistic Anomaly Detection Method for Authorship Verification”, Statistical Language and Speech Processing, vol. 8791, Lecture Notes in Computer Science, Grenoble, France, pp. 211-219 (2014)