Salonanarchist | Leunstoelactivist

Mountains and cycling culture: on winning jerseys in the Giro, Tour and Vuelta

Are there any characteristics that explain why some countries are more successful in pro cycling than others? An article at the Inner Ring blog discusses why Germany is «Europe’s Pro Cycling Black Hole», despite having some serious mountains and a vibrant cycling culture (as illustrated by the membership of the Bund Deutscher Radfahrer) - but note that the same author has also warned against simple expanations of why countries are successful. And in the UK, there has been some disappointment that successes in professional cycling haven’t led to more cycling in general.

So are mountains and cycling culture somehow related to success in professional cycling? Of course, there are different ways to answer that question. Here’s a look at some indicators, which suggest mountains - no, and cycling culture - maybe.

The graph below shows maximum elevation (to be more precise, the difference in elevation between the lowest and highest location on the country’s mainland) and the number of jerseys won in the Giro d’Italia, the Tour de France and the Vuelta a España over the past years.

There is only a weak and not statistically significant correlation between elevation span and the number of jerseys won. If you adjust for the size of the population, the relation is even negative, and still weak. Perhaps a different indicator for mountainousness would yield other results, but for now it appears that having mountains has little to do with success in the grand tours.

Then how about cycling culture? The graph below shows two indicators on the x-axis: the share of trips made by bicycle in the country’s capital (modal share), and the relative number of bikes sold. The y-axis shows the relative number of jerseys won over the past years. According to these variables, cycling culture is not related to success in professional cycling (in fact, there’s a weak, not significant, negative correlation).

Another possible indicator of a cycling culture is the membership of cyclists’ organisations. The graph below is a bit geekier than the previous ones: the scales are logarithmic (for example, the y-axis goes from 0.1 to 1 to 10 to 100).

It turns out that there is in fact a correlation between the membership of cycling organisations and the number of jerseys won. Perhaps bicycle sales and modal share are indicators of everyday bicycle use whereas membership of cycling organisations also says something about recreational use, which in turn might be related to success in professional cycling – but that’s just guessing. Whether there’s a causal relation between the two is yet another question.

See also: Giro, Tour and Vuelta: Which countries won jerseys over the past 111 yrs.


The analysis is limited to the jerseys for the leaders of the general classification (the maglia rosa for the Giro d’Italia, the maillot jaune for the Tour de France and whatever colour the leader’s jersey had in the Vuelta a España that particular year). For each year and for each tour, for each rider who has won a jersey in that tour (regardless of how many days) a point was added to the country total of that rider’s country.
The D3 tooltip code is largely borrowed from D3 Tips and Tricks.

Note that Wikipedia explains that the modal share (the share of trips made by bicycle) is not measured in a consistent way and something similar may well apply to data for membership of cyclists’ organisations.


Giro, Tour and Vuelta: which countries won jerseys over the past 111 yrs

The graph below shows which countries have been successful at winning jerseys in the Giro d’Italia, the Tour de France and the Vuelta a España.

The graph shows among other things how France has been struggling since the 1990s, how Belgium (Eddy Merckx) and the Netherlands (Joop Zoetemelk, Gerrie Knetemann) did well in the 1970s and the success of the UK in the 2010s (Bradley Wiggins, Chris Froome, Mark Cavendish). If you adjust for population size (not shown), Luxembourg and Belgium are the most successful countries.

See also: Mountains and cycling culture: On winning jerseys in the Giro, Tour and Vuelta.


The analysis is limited to the jerseys for the leaders of the general classification (the maglia rosa for the Giro d’Italia, the maillot jaune for the Tour de France and whatever colour the leader’s jersey had in the Vuelta a España that particular year). For each year and for each tour, for each rider who has won a jersey in that tour (regardless of how many days) a point was added to the country total of that rider’s country.
The D3 tooltip code is largely borrowed from D3 Tips and Tricks.


Spamming after all? Revisiting the repost ratios of Vox, Upshot and 538

Recently I wrote about people who share their URLs on Twitter, and then post them again, hoping to draw even more people to their site. I said that FiveThirtyEight reposts its URLs on average 0.3 times. I was wrong: it reposts its URLs far more often. And so do voxdotcom and UpshotNYT, who didn’t even make the top 5 in my original analysis. The Upshot reposts its URLs on average as many as 0.8 times.

The reason I underestimated the repost ratios in my original analysis has to do with the fact that tweets tend to contain shortened URLs. and look like different URLs. However, they point to the same article, so one should be treated as a repost of the other (or perhaps both are a repost of yet another one, who knows). If you don’t take this into account and treat them as different URLs, you’ll underestimate the number of reposts (red bar in the graph).

It’s not that I wasn’t aware of this problem when I did the first analysis. I first tried to account for this by looking up the non-shortened URLs, using the Python urllib2 module. It turned out this was very time-consuming, which was a problem since I wanted to look up quite a few URLs. Pragmatically, I decided instead to use the ‘expanded URL’ provided by the Twitter API. This method does yield higher repost ratios for 538 and the Upshot (grey bars in the graph). Still, it doesn’t really solve the problem, because the expanded URL provided by the Twitter API will sometimes be yet another shortened URL. That’s the reason I still underestimated how often people recycle their content on Twitter.

When I realised the ratios I had originally calculated were still rather low given how many reposts there appeared to be in my timeline, I decided to recalculate repost ratios using urllib2 after all. Because this method is so time-consuming, I did this for just three accounts: Vox, 538 and Upshot NYT. This resulted in repost ratios that are substantially higher (light blue bars in the graph). The new Python script is here.

Note that the ratios are snapshots calculated on a sample of the 200 most recent tweets (that is, about one to two weeks of tweets).


Rise in Dutch cycling accidents, but Strava probably not to blame

The number of wielrenners (cyclists on racing bikes) treated at Dutch emergency departments has doubled since 2010, according to a study published today. Among a range of possible explanations the authors mention the popularity of apps like Strava:

The increasing popularity of smartphone apps like Strava, which let you keep track of cycling records for certain tracks and compare them with others, can lead to dangerous situations.

Like I said, this is just one of many possible explanations discussed in the report and the authors are by no means suggesting that Strava is a key factor causing cycling accidents. That said, the idea that Strava may have played a role doesn’t seem to be a priori absurd.

Strava was launched in 2009, but when did it become popular in the Netherlands? I couldn’t find any direct data on this, but Google trends is a plausible indicator.

The Google data are pretty clear: interest in Strava didn’t take off until February 2012 in the Netherlands (interestingly, the search volume index is highest in Limburg and Gelderland, which are also the main regions with hills in the Netherlands). As an extra check, I looked at messages at the forum pages (you need to login in order to be able to search the forum) containing the search term ‘strava’. There were 10 messages prior to 1 February 2012 and 1,843 after that date, which seems to confirm the Google pattern.

By contrast, the number of wielrenners at emergency departments saw its biggest increase between 2010 and 2011. The number was stable at about 2,000 prior to 2011, but rose to 3,700 in 2011 and 4,200 in 2012. So it seems Strava was largely unknown in the Netherlands at the time when the largest increase in cycling accidents happened.

The reason for the study was a media storm last year about supposed irresponsible behaviour of wielrenners towards ‘normal’ cyclists. Car lobby club ANWB even suggested wielrenners should stay at home on sunny days.

In a survey among wielrenners, 45% said wielrenners do not sufficiently adjust their speed and 51% said wielrenners often ride in (too) wide groups. An analysis of 2,849 injury-causing accidents involving two cyclists revealed that in 24 cases a ‘normal’ cyclist got injured as a result of a collision with a wielrenner. So while many wielrenners agree that (some) wielrenners behave irresponsibly, this doesn’t seem to be a major cause of injuries among other cyclists.

Wielrenners themselves have about 2.2 injuries per 100,000 hours of activity. This is much lower than the number for all sports combined (7.1). However, 23% of wielrenners who go to the emergency department have to be treated in hospital, compared to 6% for all sports. So in terms of serious injuries, wielrennen doesn’t seem to be much safer or unsafer than other sports.

While it’s difficult to pinpoint the exact cause of the rise in accidents involving wielrenners, the authors of the report suggest the capacity of cycle paths is no longer sufficient given the rising number of cyclists, including a rise in cycling among people above 55. One of their recommendations is to create more ‘cycling highways’ for fast cyclists.


Identify potential spammers in your timeline, using Python

(Also see follow-up article here) - Twitter has become an important tool to let people know you’ve published a new article on your website. It has been suggested that you can get more visitors if you tweet the article’s URL not once, but multiple times. Unfortunately, some people are following that advice and are systematically reposting URLs.

So who are those people? Identifying the biggest reposters in your timeline is quite straightforward (whether these people are spamming is up to you to decide). Here’s a script that calculates the repost ratio, that is the average number of times people repost URLs, for each person you follow. For people who post URLs only once - in other words, who never repost them - the ratio will be zero. Here are the biggest reposters among the accounts followed by Data and Data Viz:

DataDrivenJournalism reposts URLs on average 0.36 times
FiveThirtyEight, 0.30
HelpMeViz, 0.30
Jon Schwabish, 0.23
Archie Tse, 0.20

To be fair, the numbers show that these people only repost some URLs. Further, people who do not normally repost URLs may still end up with a relatively high repost ratio if there are one or two URLs that they have reposted very often: these outliers would drive up their average number of reposts. Here are some potential outliers:

Cole Nussbaumer linked to this page with workshop dates 21 times in her 200 most recent tweets
WTFViz : WTFViz submit page, 9 times
Zack Beatty: tool, 8 times
HelpMeViz: Help Me Viz homepage, 8 times

These examples illustrate that there may be legitimate reasons to repost URLs. For example, Cole Nussbaumer’s page with workshop dates probably changes frequently, so reposting that URL would seem to make sense.

If you don’t want these often-posted URLs to drive up the repost ratio, you can calculate the repost ratio as the share of URLs that got reposted at least once. That way, you’ll disregard how often they got reposted. Here are the top 5 results by that method:

FiveThirtyEight now has a repost ratio of 0.27, which means it reposts about 1 in 4 URLs
DataDrivenJournalism, 0.23
HelpMeViz, 0.17
NPR visuals team, 0.16
Jon Schwabish, 0.14

In case you’re wondering: my own repost ratio is 0.06 / 0.05.

Not ditching R for Python just yet

As a result of the whole controversy over using Python vs R for statistical analysis and graphs, I thought I’d switch to Python. Mostly because I think it’s more practical to use the same language for different tasks, but also because it seems easier to make decent-looking graphs with Python (I’m sure some people will thoroughly disagree). And, of course, because googling for solutions using «Python» as a search term simply works better than searching for «R».

But now Brian Caffo, Roger Peng and Jeff Leek’s Data Science Specialization Course has started on Coursera and they use R. I guess I’ll have to postpone my decision.


Big Brother: state or capitalist

George Orwell’s Nineteen Eighty-Four describes a future characterized by total surveillance (with telescreens observing people in their own homes, even monitoring their heartbeat and recognizing their facial expression). This surveillance is carried out by the state and its helpers. Corporations play no role in it.

In fact, corporations and capitalism are a thing of the past in Nineteen Eighty-Four, for private property has been abolished. A children’s book explains that capitalists were rich, ugly men wearing top hats. The Party constantly emphasizes how terrible conditions were before the Revolution and how much better they are today. But the main character, Winston Smith, can’t help but wonder if things had been really that bad in the past and if capitalists had really been such terrible creatures.

The suggestion is clear: the state is using capitalists as a scapegoat to mask its own failings (in fact, if I were a member of today’s whining one percent, I'd claim that Orwell had predicted the current «rising tide of hatred of the successful one percent»).

Today, thirty years after 1984, private property hasn’t been abolished, but we are approaching a level of surveillance pretty close to what Orwell described. When we try to explain what’s going on, we frequently use the term Big Brother. But when we do, are we referring to the state, as Orwell did, or do we have capitalists in mind?

To explore this matter, I looked up how often newspaper articles mention Big Brother in combination with either the names of government agencies, or the names Google and Facebook (of course I should have included Apple, notwithstanding their smart privacy patent, but I left them out for practical reasons explained below). The results are shown in the graph below. For the non-Dutch: NRC is a Dutch newspaper and AIVD is the Dutch intelligence service.

It appears that Google and Facebook turn up in combination with Big Brother far more often than government agencies like the CIA, MI5 or AIVD. However, as the red bars show, this has changed since the revelations of Edward Snowden. Since May last year, the NSA has been mentioned in combination with Big Brother more often than Google or Facebook (in the Guardian, the same applies to the GCHQ).

So Orwell didn’t foresee the role of corporations in mass surveillance, and we used to have a blind spot for the role of the state - but Snowden seems to have fixed that.


I used the Guardian and New York Times APIs to look up how often names of selected state agencies and corporations have appeared in combination with Big Brother in articles over the past ten years. I removed the results from the Guardian media section to get rid of most references to the Big Brother TV show. I wanted to include Apple, but unfortunately, the newspaper APIs don’t distinguish between apple and Apple. I thought searching for iPhone might be a practical solution, but the Guardian results included articles containing ‘I phone’. The NRC doesn’t have an API so I looked up the terms manually; the timeline to the right of the search results makes it quite easy to count the number of post Snowden occurences. In all cases, the method to search the newspaper archives is imperfect in that it yields some unwanted results (e.g. articles mentioning somebody’s big brother which have nothing to do with Big Brother).


Problematic cycling charts

You might think the graph above is about the effort required for climbing, with those little bicycles going up the slope, but it’s not (in fact, it shows for each bicycle type how much more power is required to cycle as speed increases). Apparently, somebody added the bicycles for «fun», without giving much thought to what the graph is supposed to communicate.

The graph is from the book Cycling Science (not to be confused with the intriguing Bicycling Science), a book full of charts that explain how cycling works. Unfortunately, it contains quite a bit of chart junk and some of the graphs raise more questions than they answer.

For example, the chapter on cycling safety has a map that suggests the Netherlands is the most unsafe country for cycling. The problem is that it shows the percentage of road deaths who are cyclists, which says more about how many people cycle than about cycling safety. Another graph says Chris Boardman managed to cycle more than 56 km in an hour when he assumed a super-aerodynamic position, but that he would only manage 15 km when sitting upright. Really?

Despite car sharing, still lots of cars in Amsterdam

Does car sharing mean the end of the car as we know it? A study by consultancy Alix Partners in American metropolitan areas claims that each vehicle in a car-sharing fleet leads to 32 fewer cars being bought.

I haven’t seen the original report, but apparently respondents were asked whether they have avoided buying a car due to their participation in a car-sharing scheme; 51% said yes. The average car-sharing service would have about 66 members per car, which would sort of result in 32 canceled car sales per shared-use car. Of course, this is not the most rigorous way to measure the impact of car sharing. All the same, the study suggests that the impact may be huge.

In Amsterdam, the number of cars in car-sharing schemes has grown (xls) from 378 to 1476 over the past ten years. If the Alix number would hold true here, that would mean some 35,000 fewer cars sold. In reality, the number of cars for private use has risen from 184,000 in 2003 to 201,000 in 2013. The number of cars per 1,000 remained pretty stable at about 250. In the inner city, the total number of cars has risen from 19,190 in 2004 to 19,840 in 2012.

Of course, it’s unrealistic to assume that the ratio of 32 cars not bought per car-sharing vehicle applies in Amsterdam. A study from 2006 on car sharing in the inner city found (pdf) that half the users hadn’t owned a car in the first place. In this study, each shared-use car replaced 3 private-owned cars (this would still imply that 2 parking spaces can be removed for each car-sharing vehicle introduced). Perhaps the ratio has gone up a bit since, that is if the number of members relative to the fleet has gone up.

Anyway, it seems that the current number of car-sharing vehicles may have reduced car ownership by a few thousand at most. For a more substantial impact, we’d need more shared-use cars.

TNS NIPO is about to launch a monitor on car sharing in the Netherlands.


Efforts to raise turnout in elections may increase turnout inequality

Just the other day I posted something about unequal voter turnout in Amsterdam (higher turnout in neoliberal-voting neighbourhoods; lower turnout in left-voting neighbourhoods). The conclusion would seem obvious: raise turnout, and election outcomes will likely become more representative of the preferences of Amsterdammers.

Now it turns out things may not be that simple. Based on a smart analysis (via), Ryan Enos, Anthony Fowler and Lynn Vavreck find that «get out the vote» efforts may raise turnout disproportionally among people who are more likely to vote in the first place, thus exacerbating turnout inequality.

This is not inconsequential, for these «high-propensity» citizens are far from representative of the general population. They are:

wealthier, more educated, more likely to attend church, more likely to be employed, more likely to approve of Bush, more conservative, and more Republican. They are more supportive of abortion rights and less supportive of withdrawing troops from Iraq, domestic spending, affirmative action, minimum wage, gay marriage, federal housing assistance, and taxes on wealthy famiilies.

All in all, it seems that in many respects, people who are likely to vote lean to the right compared to the general population; and that this right-wing bias may be exacerbated by efforts to raise turnout.

This is pretty sobering, but it doesn’t mean that the whole idea of raising turnout should be thrown out of the window. First of all, Enos et al. point out that their method can be used to gain a better understanding of the impact of interventions. Hopefully this will help develop interventions that reduce inequality instead of increasing it.

Second, it appears that the experiments analysed by Enos et al. randomly assigned people to treatment or control groups (I checked this for the largest experiments - the ones done by Gerber, Green & Larimer and Nickerson & Rogers). Of course, this is good practice from a research point of view.

However, it might still make sense to do voter mobilisations that specifically target a group of unlikely voters (instead of a randomly selected treatment group). For example, one might target a neighbourhood that normally has very low turnout. If I understand the findings of Enos et al. correctly, it’s conceivable that this would increase turnout inequality within the targeted neighbourhood, while at the same time reducing turnout inequality across the entire city.

Then again, perhaps we should consider compulsory voting after all (I’ll admit I used to be pretty sceptical of that idea). In a previous study, one of the authors (Anthony Fowler) analysed the impact of the introduction of compulsory voting in Australia in the first half of the 20th century. «When near-universal turnout was achieved, elections and policy shifted in favor of the working-class citizens who had previously failed to participate.» (pdf)