This Sunday I finally decided to tackle a question that was on my mind for a long time: what are the most central stations of Paris Métro? But to answer that question we first need to define centrality.
In graph theory and network analysis, there are several measures of centrality. The simplest one is called degree centrality. In the case of the Métro, the degree centrality of a station would be the number of lines coming in and out of the station. It is easy to look at a Métro map and figure out where many lines intersect.
Montparnasse - Bienvenüe
This is a good start but the degree only tells us if a station gives access to many lines. A station could be on the outskirts of the city and still have many lines running through it. So we need a better measure of centrality. Ideally the best measure would indicate how close a station is from all the others. In network theory, a popular metric is called closeness. It takes into account not only the average distance between a node and all the others but also the degree centrality of the station. So we know what we want to compute. Now all we need is some input data (a graph of the Métro) and a piece of software that can compute closeness.
After some googling, I found just what I needed on Wikipedia: http://en.wikipedia.org/wiki/Social_network_analysis_software. The article lists a bunch of network analysis software. I tried a few of them before I found Financial Network Analyzer (FNA). It is a new project so it still lacks a nice GUI but hey, who needs GUI anyway? Also it is written in Java and open source which means I can run it on my Mac and I can tweak the code if needed.
Now we need some data to put into FNA. So I wrote some Perl to scrap the RATP web site and create a graph file of the Métro netwrok. The file is available to download in GML, GraphML and FNA commands formats. I did a few sanity checks and the files look correct. Feel free to download them and reuse them in any way. If you do anything cool with them or if you find any mistake, please let me know.
So now that we have everything we need, let's jump to the results. The result file is available in CSV format. But here are the top stations:
Gare de Lyon
Hôtel de Ville
Chaussée d'Antin - Lafayette
A few remarks. First of all, good news: the results seem logical. Châtelet is clearly the biggest, best connected and most central station of Paris. But Les Halles, which is very close to Châtelet, has less lines running through (i.e. a lesser degree centrality) so it has a lower closeness score. The fact that Madeleine and Pyramides are in second and third position is more of a surprise. They are not really big stations but they are very centrally located and well connected. So it makes sens. Another interesting fact is that all the top stations are on the Right Bank.
This was an interesting exercise and I got the answer I was looking for. However this is not exactly a scientific study and there is a lot of room for improvement. For example I did not take into account the actual distance between the stations (for that we would need a weighted graph). Also I did not take into account the time required for connections. Connections should have a weight too. I will leave that as an exercise for the reader.
Short URL for this post: http://lepl.us/x