Salonanarchist | Leunstoelactivist

Belkin quits. How loyal are sponsors of cycling teams?

Last year, Belkin became the title sponsor of the former Rabobank cycling team, but today it announced that it will end its sponsorship by the end of the year. Various commentators have expressed concern over the lack of continuity in sponsoring. Which raises the question: is it normal for a sponsor to quit after such a short period? And is this becoming worse?

Some sponsors leave after one or two years, while others remain loyal for ten years or more (Française des Jeux, Lampre, Lotto, Quick Step).

The graph above shows the sponsor turnover of UCI Pro Tour teams (the share of sponsors that would quit the subsequent year). Turnover is about 25%, which suggests that a normal sponsorship duration should be about four years. So Belkin’s loyalty is not impressive by those standards.

While the sponsorship duration fluctuates, there doesn’t appear to be a trend of sponsors becoming more or less loyal.


I retrieved sponsor names from team names of UCI Pro Tour teams listed by Cycling News. Due to variations in spelling (Française des Jeux, FDJ,, the data needed some cleaning up. If you want to check them: here’s a list of sponsors and the years in which I think they were active.


Cyclists should have priority here


Some crossings make you wonder: isn’t it weird that cyclists don’t have priority here. This occurs in Amsterdam, but more often in the country. There are different variants, but often there’s a bend in the cycle path just before a crossing. The cycle path is no longer part of the main road and cyclists are confronted with give way road markings. You have to give way to everybody: motorists coming from behind who turn right, oncoming traffic turning left and traffic from the right.

Often, you have to give way to rather secondary roads. For example, the exit to a tiny car park along the Oostvaardersdijk in Almere (photo above). Or the entrance of a government building at the Amsterdamseweg in Velsen-Zuid, where motorists who get priority subsequently have to stop at a gate anyway.

As a cyclist, you end up with a tricky crossing. You have to pay attention to traffic from behind, oncoming traffic and traffic from the right. The sense of insecurity mixes with indignation at the fact that apparently, people have specifically diverted the cycle path just to rob cyclists of their priority. Why are they doing this?

I put this question – in somewhat more neutral terms – to a number of road maintenance authorities, with illustrations from Velsen-Zuid, Watergang, Monnickendam, Weesp, Almere and Muiden. Their answers reveal that there are two reasons for bending cycle paths. First, this creates a space for motorists coming from the right where they can wait before entering or crossing the main road (this is a reason for bending the cycle path, but in itself not a reason to rob cyclists of their priority). Second, it’s about bicycle safety. In the words of the spokesperson of the Province of Noord-Holland:

For reasons of bicycle safety, we at the province often choose not to let cyclists have priority, especially outside the built-up area. It’s the same thing as with roundabouts: you may have priority as a cyclist, but whether you’ll be given priority is a different matter. And with roundabouts, it’s been shown that cyclists who have priority are more often involved in accidents, simply because they’re not given priority.

It’s good to know that the safety of cyclists is high on the agenda. But bending the cycle path and robbing cyclists of their priority – I’m not convinced that’s the right solution. In fact, it’s a bit twisted to reward motorists for not paying attention to cyclists who have priority. There have to be better ways to make them pay attention to cyclists and to slow them down.

As I said, such situations occur mainly in the country. You can point to situations in Amsterdam where cyclists should have priority, but mostly these don’t concern cycle paths along main roads that have been bended. However, there is a slightly similar situation opposite the entrance of the Westerpark.

The original Dutch version of this article appeared in the OEK (pdf). More examples here.

Map: How the fastfood workers’ fight just went global

In November 2012, fastfood workers in New York went on strike for decent wages. Since, the fight has spread rapidly in the US and on 15 May, it went global. There were actions in cities like Dublin, Mumbai, São Paulo, Bandung, Kagoshima and many others. Security workers at Amsterdam Airport, who had just had their own action for real jobs, also showed their support.

The map above shows cities mentioned in tweets with the hashtag #FastFoodGlobal.


The map above doesn’t even do justice to the scope of the action. For one thing, many other hashtags were used besides #FastFoodGlobal (e.g., #fastfoodstrike, #fightfor15, #raisethewage, #lowpayisnotok, and, quite often actually, #ronaldmacdonald). Further, it only captures references in the Latin alphabet, and only the transcription used by Wikipedia.

I used the Twitter API to collect some 50,000 tweets with the hashtag #FastFoodGlobal. I checked the text of these tweets agains a list of cities with a population of 100,000 and over. Of course, it’s impossible to identify cities with 100% accuracy. I removed cities like Van (Turkish city but also a word in Spanish and Dutch) and Hamburg (cf. hamburger) as well as cities mentioned less than 25 times. The map is based on a tutorial by D3 Tips and Tricks.


Mountains and cycling culture: on winning jerseys in the Giro, Tour and Vuelta

Are there any characteristics that explain why some countries are more successful in pro cycling than others? An article at the Inner Ring blog discusses why Germany is «Europe’s Pro Cycling Black Hole», despite having some serious mountains and a vibrant cycling culture (as illustrated by the membership of the Bund Deutscher Radfahrer) - but note that the same author has also warned against simple expanations of why countries are successful. And in the UK, there has been some disappointment that successes in professional cycling haven’t led to more cycling in general.

So are mountains and cycling culture somehow related to success in professional cycling? Of course, there are different ways to answer that question. Here’s a look at some indicators, which suggest mountains - no, and cycling culture - maybe.

The graph below shows maximum elevation (to be more precise, the difference in elevation between the lowest and highest location on the country’s mainland) and the number of jerseys won in the Giro d’Italia, the Tour de France and the Vuelta a España over the past years.

There is only a weak and not statistically significant correlation between elevation span and the number of jerseys won. If you adjust for the size of the population, the relation is even negative, and still weak. Perhaps a different indicator for mountainousness would yield other results, but for now it appears that having mountains has little to do with success in the grand tours.

Then how about cycling culture? The graph below shows two indicators on the x-axis: the share of trips made by bicycle in the country’s capital (modal share), and the relative number of bikes sold. The y-axis shows the relative number of jerseys won over the past years. According to these variables, cycling culture is not related to success in professional cycling (in fact, there’s a weak, not significant, negative correlation).

Another possible indicator of a cycling culture is the membership of cyclists’ organisations. The graph below is a bit geekier than the previous ones: the scales are logarithmic (for example, the y-axis goes from 0.1 to 1 to 10 to 100).

It turns out that there is in fact a correlation between the membership of cycling organisations and the number of jerseys won. Perhaps bicycle sales and modal share are indicators of everyday bicycle use whereas membership of cycling organisations also says something about recreational use, which in turn might be related to success in professional cycling – but that’s just guessing. Whether there’s a causal relation between the two is yet another question.

See also: Giro, Tour and Vuelta: Which countries won jerseys over the past 111 yrs.


The analysis is limited to the jerseys for the leaders of the general classification (the maglia rosa for the Giro d’Italia, the maillot jaune for the Tour de France and whatever colour the leader’s jersey had in the Vuelta a España that particular year). For each year and for each tour, for each rider who has won a jersey in that tour (regardless of how many days) a point was added to the country total of that rider’s country.
The D3 tooltip code is largely borrowed from D3 Tips and Tricks.

Note that Wikipedia explains that the modal share (the share of trips made by bicycle) is not measured in a consistent way and something similar may well apply to data for membership of cyclists’ organisations.


Giro, Tour and Vuelta: which countries won jerseys over the past 111 yrs

The graph below shows which countries have been successful at winning jerseys in the Giro d’Italia, the Tour de France and the Vuelta a España.

The graph shows among other things how France has been struggling since the 1990s, how Belgium (Eddy Merckx) and the Netherlands (Joop Zoetemelk, Gerrie Knetemann) did well in the 1970s and the success of the UK in the 2010s (Bradley Wiggins, Chris Froome, Mark Cavendish). If you adjust for population size (not shown), Luxembourg and Belgium are the most successful countries.

See also: Mountains and cycling culture: On winning jerseys in the Giro, Tour and Vuelta.


The analysis is limited to the jerseys for the leaders of the general classification (the maglia rosa for the Giro d’Italia, the maillot jaune for the Tour de France and whatever colour the leader’s jersey had in the Vuelta a España that particular year). For each year and for each tour, for each rider who has won a jersey in that tour (regardless of how many days) a point was added to the country total of that rider’s country.
The D3 tooltip code is largely borrowed from D3 Tips and Tricks.


Spamming after all? Revisiting the repost ratios of Vox, Upshot and 538

Recently I wrote about people who share their URLs on Twitter, and then post them again, hoping to draw even more people to their site. I said that FiveThirtyEight reposts its URLs on average 0.3 times. I was wrong: it reposts its URLs far more often. And so do voxdotcom and UpshotNYT, who didn’t even make the top 5 in my original analysis. The Upshot reposts its URLs on average as many as 0.8 times.

The reason I underestimated the repost ratios in my original analysis has to do with the fact that tweets tend to contain shortened URLs. and look like different URLs. However, they point to the same article, so one should be treated as a repost of the other (or perhaps both are a repost of yet another one, who knows). If you don’t take this into account and treat them as different URLs, you’ll underestimate the number of reposts (red bar in the graph).

It’s not that I wasn’t aware of this problem when I did the first analysis. I first tried to account for this by looking up the non-shortened URLs, using the Python urllib2 module. It turned out this was very time-consuming, which was a problem since I wanted to look up quite a few URLs. Pragmatically, I decided instead to use the ‘expanded URL’ provided by the Twitter API. This method does yield higher repost ratios for 538 and the Upshot (grey bars in the graph). Still, it doesn’t really solve the problem, because the expanded URL provided by the Twitter API will sometimes be yet another shortened URL. That’s the reason I still underestimated how often people recycle their content on Twitter.

When I realised the ratios I had originally calculated were still rather low given how many reposts there appeared to be in my timeline, I decided to recalculate repost ratios using urllib2 after all. Because this method is so time-consuming, I did this for just three accounts: Vox, 538 and Upshot NYT. This resulted in repost ratios that are substantially higher (light blue bars in the graph). The new Python script is here.

Note that the ratios are snapshots calculated on a sample of the 200 most recent tweets (that is, about one to two weeks of tweets).


Rise in Dutch cycling accidents, but Strava probably not to blame

The number of wielrenners (cyclists on racing bikes) treated at Dutch emergency departments has doubled since 2010, according to a study published today. Among a range of possible explanations the authors mention the popularity of apps like Strava:

The increasing popularity of smartphone apps like Strava, which let you keep track of cycling records for certain tracks and compare them with others, can lead to dangerous situations.

Like I said, this is just one of many possible explanations discussed in the report and the authors are by no means suggesting that Strava is a key factor causing cycling accidents. That said, the idea that Strava may have played a role doesn’t seem to be a priori absurd.

Strava was launched in 2009, but when did it become popular in the Netherlands? I couldn’t find any direct data on this, but Google trends is a plausible indicator.

The Google data are pretty clear: interest in Strava didn’t take off until February 2012 in the Netherlands (interestingly, the search volume index is highest in Limburg and Gelderland, which are also the main regions with hills in the Netherlands). As an extra check, I looked at messages at the forum pages (you need to login in order to be able to search the forum) containing the search term ‘strava’. There were 10 messages prior to 1 February 2012 and 1,843 after that date, which seems to confirm the Google pattern.

By contrast, the number of wielrenners at emergency departments saw its biggest increase between 2010 and 2011. The number was stable at about 2,000 prior to 2011, but rose to 3,700 in 2011 and 4,200 in 2012. So it seems Strava was largely unknown in the Netherlands at the time when the largest increase in cycling accidents happened.

The reason for the study was a media storm last year about supposed irresponsible behaviour of wielrenners towards ‘normal’ cyclists. Car lobby club ANWB even suggested wielrenners should stay at home on sunny days.

In a survey among wielrenners, 45% said wielrenners do not sufficiently adjust their speed and 51% said wielrenners often ride in (too) wide groups. An analysis of 2,849 injury-causing accidents involving two cyclists revealed that in 24 cases a ‘normal’ cyclist got injured as a result of a collision with a wielrenner. So while many wielrenners agree that (some) wielrenners behave irresponsibly, this doesn’t seem to be a major cause of injuries among other cyclists.

Wielrenners themselves have about 2.2 injuries per 100,000 hours of activity. This is much lower than the number for all sports combined (7.1). However, 23% of wielrenners who go to the emergency department have to be treated in hospital, compared to 6% for all sports. So in terms of serious injuries, wielrennen doesn’t seem to be much safer or unsafer than other sports.

While it’s difficult to pinpoint the exact cause of the rise in accidents involving wielrenners, the authors of the report suggest the capacity of cycle paths is no longer sufficient given the rising number of cyclists, including a rise in cycling among people above 55. One of their recommendations is to create more ‘cycling highways’ for fast cyclists.


Identify potential spammers in your timeline, using Python

(Also see follow-up article here) - Twitter has become an important tool to let people know you’ve published a new article on your website. It has been suggested that you can get more visitors if you tweet the article’s URL not once, but multiple times. Unfortunately, some people are following that advice and are systematically reposting URLs.

So who are those people? Identifying the biggest reposters in your timeline is quite straightforward (whether these people are spamming is up to you to decide). Here’s a script that calculates the repost ratio, that is the average number of times people repost URLs, for each person you follow. For people who post URLs only once - in other words, who never repost them - the ratio will be zero. Here are the biggest reposters among the accounts followed by Data and Data Viz:

DataDrivenJournalism reposts URLs on average 0.36 times
FiveThirtyEight, 0.30
HelpMeViz, 0.30
Jon Schwabish, 0.23
Archie Tse, 0.20

To be fair, the numbers show that these people only repost some URLs. Further, people who do not normally repost URLs may still end up with a relatively high repost ratio if there are one or two URLs that they have reposted very often: these outliers would drive up their average number of reposts. Here are some potential outliers:

Cole Nussbaumer linked to this page with workshop dates 21 times in her 200 most recent tweets
WTFViz : WTFViz submit page, 9 times
Zack Beatty: tool, 8 times
HelpMeViz: Help Me Viz homepage, 8 times

These examples illustrate that there may be legitimate reasons to repost URLs. For example, Cole Nussbaumer’s page with workshop dates probably changes frequently, so reposting that URL would seem to make sense.

If you don’t want these often-posted URLs to drive up the repost ratio, you can calculate the repost ratio as the share of URLs that got reposted at least once. That way, you’ll disregard how often they got reposted. Here are the top 5 results by that method:

FiveThirtyEight now has a repost ratio of 0.27, which means it reposts about 1 in 4 URLs
DataDrivenJournalism, 0.23
HelpMeViz, 0.17
NPR visuals team, 0.16
Jon Schwabish, 0.14

In case you’re wondering: my own repost ratio is 0.06 / 0.05.

Not ditching R for Python just yet

As a result of the whole controversy over using Python vs R for statistical analysis and graphs, I thought I’d switch to Python. Mostly because I think it’s more practical to use the same language for different tasks, but also because it seems easier to make decent-looking graphs with Python (I’m sure some people will thoroughly disagree). And, of course, because googling for solutions using «Python» as a search term simply works better than searching for «R».

But now Brian Caffo, Roger Peng and Jeff Leek’s Data Science Specialization Course has started on Coursera and they use R. I guess I’ll have to postpone my decision.


Big Brother: state or capitalist

George Orwell’s Nineteen Eighty-Four describes a future characterized by total surveillance (with telescreens observing people in their own homes, even monitoring their heartbeat and recognizing their facial expression). This surveillance is carried out by the state and its helpers. Corporations play no role in it.

In fact, corporations and capitalism are a thing of the past in Nineteen Eighty-Four, for private property has been abolished. A children’s book explains that capitalists were rich, ugly men wearing top hats. The Party constantly emphasizes how terrible conditions were before the Revolution and how much better they are today. But the main character, Winston Smith, can’t help but wonder if things had been really that bad in the past and if capitalists had really been such terrible creatures.

The suggestion is clear: the state is using capitalists as a scapegoat to mask its own failings (in fact, if I were a member of today’s whining one percent, I'd claim that Orwell had predicted the current «rising tide of hatred of the successful one percent»).

Today, thirty years after 1984, private property hasn’t been abolished, but we are approaching a level of surveillance pretty close to what Orwell described. When we try to explain what’s going on, we frequently use the term Big Brother. But when we do, are we referring to the state, as Orwell did, or do we have capitalists in mind?

To explore this matter, I looked up how often newspaper articles mention Big Brother in combination with either the names of government agencies, or the names Google and Facebook (of course I should have included Apple, notwithstanding their smart privacy patent, but I left them out for practical reasons explained below). The results are shown in the graph below. For the non-Dutch: NRC is a Dutch newspaper and AIVD is the Dutch intelligence service.

It appears that Google and Facebook turn up in combination with Big Brother far more often than government agencies like the CIA, MI5 or AIVD. However, as the red bars show, this has changed since the revelations of Edward Snowden. Since May last year, the NSA has been mentioned in combination with Big Brother more often than Google or Facebook (in the Guardian, the same applies to the GCHQ).

So Orwell didn’t foresee the role of corporations in mass surveillance, and we used to have a blind spot for the role of the state - but Snowden seems to have fixed that.


I used the Guardian and New York Times APIs to look up how often names of selected state agencies and corporations have appeared in combination with Big Brother in articles over the past ten years. I removed the results from the Guardian media section to get rid of most references to the Big Brother TV show. I wanted to include Apple, but unfortunately, the newspaper APIs don’t distinguish between apple and Apple. I thought searching for iPhone might be a practical solution, but the Guardian results included articles containing ‘I phone’. The NRC doesn’t have an API so I looked up the terms manually; the timeline to the right of the search results makes it quite easy to count the number of post Snowden occurences. In all cases, the method to search the newspaper archives is imperfect in that it yields some unwanted results (e.g. articles mentioning somebody’s big brother which have nothing to do with Big Brother).