Colloquium d’Informatique de Sorbonne Université
Willy Zwaenepoel, École Polytechnique Fédérale de Lausanne
Tuesday, March 22, 2016 18:00
Amphi 25 Sorbonne University - Faculté des Sciences
Really Big DataAnalytics on Graphs with Trillions of Edges
Willy Zwaenepoel received his BS/MS from the University of Gent, Belgium, and his PhD from Stanford University. He is currently a Professor of Computer Science at EPFL. Before he has held appointments as Professor of Computer Science and Electrical Engineering at Rice University, and as Dean of the School of Computer and Communication Sciences at EPFL. His interests are in operating systems and distributed systems. He is a Fellow of the ACM and the IEEE, he has received the IEEE Kanai Award and several best paper awards, and is a member of the Belgian and European Academies. He has also been involved in a number of startups, including BugBuster (acquired by AppDynamics), iMimic (acquired by Ironport/Cisco), Midokura and Nutanix.
Abstract
Big graphs occur naturally in many applications, most obviously in social networks, but also in many other areas such as biology and forensics. Current approaches to processing large graphs use either supercomputers or very large clusters. In both cases the entire graph must reside in memory before it can be processed. We are pursuing an alternative approach, processing graphs from secondary storage. While this comes with a performance penalty, it makes analytics on very large graphs feasible on a small number of commodity machines.
We have developed two systems, one for a single machine and one for a cluster of machines. X-Stream, the single machine solution, aims to make all secondary storage access sequential. It uses two techniques to achieve this goal, edge-centric processing and streaming partitions. Chaos, the cluster solution, starts from the observation that there is little benefit to locality when accessing data from secondary storage over a high-speed network. As a result, Chaos spreads graph data uniformly randomly over storage devices, and uses randomized access to achieve I/O balance. Chaos furthermore uses work stealing to achieve computational load balance. By using these techniques, it avoids the need for expensive partitioning during pre-processing, while still achieving good scaling behavior. With Chaos we have been able to process an 8-trillion-edge graph on 32 machines, a new milestone for graph size on a small cluster. I will describe both systems and their performance on a number of benchmarks and in comparison to state-of-the-art alternatives.
This is joint work with Laurent Bindschaedler (EPFL), Jasmina Malicevic (EPFL) and Amitabha Roy (Intel Labs).
Master Class
One particularly popular moment associated to the colloquium is the “Master Class” where students have the opportunity to give a short (but well-prepared) presentation of his/her work. Each presentation (10 minutes) is followed by an open discussion with the guest speaker (15 minutes) who gives a detailed feedback. The complete program is provided here.
Other information
Steering committee
Colloquium announcements
In order to be informed of future events via emails, you can
subscribe to colloquium announcements.
If you do not want to be informed anymore, you can
unsubscribe to colloquium announcements
- Year 2024 – 2025
- Year 2023 – 2024
-
Maurice Herlihy
June 25, 2024
Further Decentralizing Decentralized Finance -
Jean-Marc Jézéquel
April 04, 2024
Comment dompter la variabilité du logiciel ? -
Claire Mathieu
January 24, 2024
Vehicle routing and approximation algorithms -
David Bol
November 21, 2023
Six of the nine planetary boundaries are transgressed – How we do research in the Anthropocene?
-
Maurice Herlihy
- Year 2022 – 2023
- Year 2021 – 2022
- Year 2020 – 2021
- Year 2019 – 2020
- Year 2018 – 2019
-
Cláudio T. Silva
June 11, 2019
Urban Data Science -
Sébastiano Vigna
May 06, 2019
Four degrees of separation (and how we did it) -
Hugo Gimbert
March 19, 2019
Les algorithmes de Parcoursup -
Julie Grollier
February 05, 2019
Nanodevices for Bio-inspired Computing -
Jacques Pitrat
November 20, 2018
L'IA forte -
James Larus
October 23, 2018
Programming Non-Volatile Memory
-
Cláudio T. Silva
- Year 2017 – 2018
-
Eric Horvitz
June 20, 2018
AI Aspirations and Advances -
Justine Cassell
May 15, 2018
Designing Bots, Virtual Humans, and Other Systems that Can Hold up Their End of the Conversation -
Léon Bottou
March 06, 2018
Une approche géométrique de l'apprentissage non supervisé -
Jean-Luc Schwartz
January 16, 2018
Modélisation cognitive des unités de la parole -
Timothy Roscoe
November 30, 2017
The Trouble with Hardware
-
Eric Horvitz
- Year 2016 – 2017
-
Simon Peyton Jones
May 23, 2017
Escape from the ivory tower: the Haskell journey -
Maria Chudnovsky
April 25, 2017
Induced subgraphs and coloring -
Philippa Gardner
March 28, 2017
Understanding and Verifying JavaScript Programs -
Michel Beaudoin-Lafon
February 28, 2017
Interfaces Homme-Machine -
Marie-Paule Cani
November 22, 2016
Modélisation 3D expressive -
Richard Stallman
October 11, 2016
What Makes Digital Inclusion Good Or Bad? -
Patrick Cousot
September 29, 2016
Abstract interpretation
-
Simon Peyton Jones
- Year 2015 – 2016
-
Patrick Flandrin
June 16, 2016
« Chirps » everywhere -
Aude Billard
April 12, 2016
Robots that exceed human capabilities -
Willy Zwaenepoel
March 22, 2016
Really Big Data -
Jon Crowcroft
January 19, 2016
Cybersecurity and network measurement -
Isabelle Collet
November 24, 2015
Les informaticiennes, de la dominance de classe aux discriminations de sexe -
Xavier Leroy
October 20, 2015
Desperately seeking software perfection
-
Patrick Flandrin
- Year 2014 – 2015
-
Silvio Micali
May 26, 2015
Proofs, Secrets, and Computation -
Alessandra Carbone
April 14, 2015
The new era of biology is computational -
Serge Abiteboul
February 24, 2015
Toward personal knowledge bases -
Manuel Silva
November 25, 2014
Fluidization of discrete event models or a marriage between the discrete and the continuous -
Andrew S. Tanenbaum
October 28, 2014
MINIX 3: A Reliable and Secure Operating System
-
Silvio Micali
- Year 2013 – 2014
-
Donald Knuth
June 17, 2014
Computer Science: All Questions Answered -
Jeannette Wing
May 20, 2014
Toward a Theory of Trust in Networks of Humans and Computers -
David Patterson
May 06, 2014
Myths about MOOCs and Software Engineering Education -
Claude Berrou
March 25, 2014
L'information mentale -
Vint Cerf
March 04, 2014
On the Preservation of Digital Information -
C.A.R. (Tony) Hoare
November 26, 2013
Laws of concurrent system design -
Gilles Dowek
October 22, 2013
Are formal methods the future of air traffic control?
(Is there an autopilot on board?)
-
Donald Knuth
- Year 2012 – 2013
-
Mathieu Feuillet, Camille Couprie, Mathilde Noual
June 25, 2013
Espoirs : Winners of the 2012 Gilles Kahn prize -
Robert Sedgwick
May 23, 2013
Taking Education Online: A Unique Opportunity for the New Millenium -
Frans Kaashoek
April 18, 2013
The multicore evolution and operating systems -
Stuart Russell
January 22, 2013
Unifying logic and probability: A “New Dawn” for Artificial Intelligence? -
Georges Gonthier
November 27, 2012
Le génie mathématique, du théorème des quatre couleurs à la classification des groupes -
Gérard Berry
October 24, 2012
Le temps et les événements en informatique
-
Mathieu Feuillet, Camille Couprie, Mathilde Noual
- Year 2011 – 2012