HORINCAR Roxana
Supervision : Bernd AMANN
Co-supervision : ARTIÈRES Thierry
Refresh Strategies and Online Change Estimation for Highly Dynamic Web Content
With the rapidly increasing number of sources and devices connected to the Internet and the growing success of the Web 2.0 services, the online available web content is getting more and more diverse and dynamic. In order to facilitate the efficient dissemination of the evolutive and often temporary information streams (news, messages, announcements), many web applications publish their most recent information items as RSS and Atom documents which are then collected and transformed by RSS aggregators like Google Reader or Yahoo! News. Our research is placed in the context of content-based feed aggregation systems and is focused on the design of optimal refresh strategies for highly dynamic RSS feed sources. First, we introduce two quality measures specific to aggregation feeds which reflect the information completeness and average freshness of the result feeds. We propose a best-effort feed refresh strategy that achieves maximum aggregation quality compared with all other existing policies with the same average number of refreshes. We analyse the characteristics of a representative collection of real-world RSS feeds focusing on their temporal dimension. We study different online change estimation models and techniques and their integration with our refresh strategy. The presented methods have been implemented and tested against synthetic and real-world RSS feed data sets.
Defence : 09/20/2012
Jury members :
M. LAMARRE Philippe (INSA Lyon) [Rapporteur]
M. GROSS-AMBLARD David (Université de Rennes 1) [Rapporteur]
Mme. BERTI-EQUILLE Laure (IRD, Aix-Marseille Université)
M. CORD Matthieu (UPMC Paris 6)
M. AMANN Bernd (UPMC Paris 6)
M. ARTIERES Thierry (UPMC Paris 6)
2009-2015 Publications
-
2015
- R. Horincar, B. Amann, Th. Artières : “Online refresh strategies for content based feed aggregation”, World Wide Web, vol. 18 (4), pp. 913-947, (Springer Verlag) (2015)
-
2012
- R. Horincar : “Stratégies de Rafraîchissement et Estimation en Ligne de Changements pour le Contenu Web Dynamique”, thesis, phd defence 09/20/2012, supervision Amann, Bernd, co-supervision : Artières, Thierry (2012)
- R. Horincar, B. Amann, Th. Artières : “Online Change Estimation Models for Dynamic Web Resources”, 12th International Conference on Web Engineering (ICWE), vol. 7387, Lecture Notes in Computer Science, Berlin, Germany, pp. 395-410 (2012)
-
2011
- R. Horincar, B. Amann, Th. Artières : “Online Refresh Strategies for RSS Feed Crawlers”, BDA, Rabat, Morocco (2011)
-
2010
- R. Horincar, B. Amann, Th. Artières : “Best-Effort Refresh Strategies for Content-Based RSS Feed Aggregation”, Proceedings of the 11th international conference on Web information systems engineering (WISE 2010), Hong Kong, China, pp. 262-270, (Springer) (2010)
-
2009
- C. Constantin, J. Creus, C. Du Mouza, R. Horincar, N. Travers : “D2.1 State-of-the art of XML data stream models, Livrable 2.1 ANR RoSeS”, (2009)