PEHLIVAN Zeynep
Supervision : Anne DOUCET
Co-supervision : GANÇARSKI Stéphane, PIWOWARSKI Benjamin
Access to web archives: Querying, Navigating and Optimizing
An important amount of the world’s cultural and intellectual knowledge is being created on the web everyday. However, the web has en ephemeral nature e.g. new information replaces older information constantly without any notification, leaving a significant gap in our knowledge. That’s why archiving the web has become a cultural necessity to preserve the knowledge for the next generations. However, the success of any web archive will be measured by the means of access it provides; as it is the case today on the real web. Our research is placed in the context of access to web archives and studies different research problems related to this issue. These research problems are grouped into two main topics: Access Methods and Optimization of Access. For access methods, we first propose a conceptual model, as well as operators to manipulate them, as the basis of a query language for web archives to better satisfy user information needs. Next, a new navigation method for web archives that takes the coherence of pages into account is introduced. In the context of access optimization, we propose a change detection algorithm to understand and to quantify what happened (and thus changed) between two versions of a web page. Then, we study the behavior of different static index pruning methods with temporal queries before proposing a new diversification-based static index pruning method and showing its application to temporal collections and a substantial gain in performance.
Defence : 10/11/2013
Jury members :
Sihem AMER-YAHIA CNRS / LIG [Rapporteur]
Arjen P. DE VRIES Université Delft [Rapporteur]
François BANCILHON DataPublica
Matthieu CORD UPMC Paris 6
David GROSS-AMBLARD Université de Rennes 1
Pierre SENELLART Télécom ParisTech
Anne DOUCET UPMC Paris 6
Stéphane GANÇARSKI UPMC Paris 6
Benjamin PIWOWARSKI, CNRS / LIP6
2010-2013 Publications
-
2013
- Z. Pehlivan : “Access to web archives: Querying, Navigating and Optimizing”, thesis, phd defence 10/11/2013, supervision Doucet, Anne, co-supervision : Gançarski, StĂ©phane, Piwowarski, Benjamin (2013)
- Z. Pehlivan, B. Piwowarski, S. Gançarski : “Diversification Based Static Index Pruning - Application to Temporal Collections”, (2013)
- Z. Pehlivan, B. Piwowarski, S. Gançarski : “A comparison of static index pruning methods with temporal queries”, SIGIR 2013 Workshop on Time-aware Information Access, TAIA2013, Dublin, Ireland, pp. 26-29 (2013)
-
2011
- M. Ben Saad, Z. Pehlivan, S. Gançarski : “Coherence-oriented Crawling and Navigation for Web Archives using Patterns”, 27es journĂ©es Bases de DonnĂ©es AvancĂ©es, BDA'11, Rabat, Morocco (2011)
- M. Ben Saad, Z. Pehlivan, S. Gançarski : “Coherence-oriented Crawling and Navigation for Web Archives using Patterns”, International Conference on Theory and Practice of Digital Libraries, TPDL 2011, vol. 6966, Lecture Notes in Computer Science, Berlin, Germany, pp. 421-433, (Springer) (2011)
- Z. Pehlivan, S. Gançarski, A. Doucet : “Changing Vision for Access to Web Archives”, Temporal Web Analytics Workshop (in conjunction with WWW 2011), vol. 707, CEUR-WS, Hyderabad, India, pp. 41-48, (CEUR) (2011)
-
2010
- Z. Pehlivan, M. Ben Saad, S. Gançarski : “Vi-DIFF: Understanding Web Pages Changes”, DEXA 2010, 21st International Conference on Database and Expert Systems Applications, vol. 6261, Lecture Notes in Computer Science, Bilbao, Spain, pp. 1-15, (Springer) (2010)