Projects BD
Team : BD
-
experimaestro - Computer science experiment scheduler and manager
Experimaestro is an experiment manager based on a server that contains a job scheduler (job dependencies, locking mechanisms) and a framework to describe the experiments with JavaScript or in Java.
Project leader : Benajmin PIWOWARSKI
01/01/2016
More details here …
-
SPARQL on Spark - SPARQL query processing with Apache Spark
A common way to achieve scalability for processing SPARQL queries over large RDF data sets is to choose map-reduce frameworks like Hadoop or Spark. Processing complex SPARQL queries generating large join plans over distributed data partitions is a major challenge in these shared nothing architectures. In this article we are particularly interested in two representative distributed join algorithms, partitioned join and broadcast join, which are deployed in map-reduce frameworks for the evaluation of complex distributed graph pattern join plans. We compare five SPARQL graph pattern evaluation implementations on top of Apache Spark to illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to take account of the existing predefined data partitionings. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often can achieve better performance than join plans using a single kind of join implementation.
Project leader : Hubert NAACKE
01/01/2015
More details here …
-
BOM - Block-o-Matic!
Block-o-Matic is a web page segmentation algorithm based on an hybrid approach for scanned document segmentation and visual-based content segmentation. A web page is associated with three structures: the DOM tree, the content structure and the logical structure. The DOM tree represents the HTML elements of a page, the geometric structure organizes the content based on a category and its geometry and finally the logical structure is the result of mapping content structure on the basis of the human-perceptible meaning that conforms the blocks. The segmentation process is divided in three phases: analysis, understanding and reconstruction of a web page. An evaluation method is proposed in order to perform the evaluation of web page segmentations based on a ground truth of 400 pages classified into 16 categories. A set of metrics are presented based on geometric properties of blocks. Satisfactory results are reached when comparing to other algorithms following the same approach.
Project leader : Andrès SANOJA
01/01/2012
More details here …