Salonanarchist | Leunstoelactivist

Chart junk

Twitter has fallen in love with a new study on data visualization. Not surprisingly, for Michelle Borkin and her co-authors promise to throw some light on the great controversy of this field: pro or against chart junk.

So what’s the controversy about? On the one hand, there are those who think it’s OK to add non-functional embellishments to graphs, because this may make them more engaging and memorable. Usually, Nigel Holmes is quoted as a proponent of this view and this graph is often offered as an illustration.

On the other hand, there are those who dismiss such embellishments as chart junk, which distracts from the content of the graph. The main name here is Edward Tufte, who argues for a high «data-to-ink ratio». The foremost example is his minimalistic but effective slope graph.

Both sides may have a point, but my sympathy lies with Tufte. I’ll admit that some of Holmes’ infographics are actually quite funny, but many embellished graphs you’ll find in the media (Dutch examples here and here) are just silly.

Of course, that’s a matter of taste, but what’s the scientific verdict? Borkin et al. had subjects look at visualizations for a second and tested whether they would recognize them when the same image was shown again. They found that:

visualizations with low data-to-ink ratios and high visual densities (i.e., more chart junk and ‘clutter’) were more memorable than minimal, ‘clean’ visualizations.

So does this settle the matter? Not quite. Borkin and her co-authors say that their findings are just a «first step to understanding how to create effective data presentations». Stephen Few, a well-known critic of chart junk, goes one step further and calls their study «useless» for that purpose. His main point is that the subjects got to look at the examples for just one second:

Visualizations cannot be read and understood in a second. Flashing a graph in front of someone’s eyes for a second tells us nothing useful about the graphical communication, with one possible exception: the ability to grab attention.

I’ll have to agree with Few: Borkin et al. may have demonstrated that chart junk is effective at grabbing someone’s attention, but not that it’s effective at helping people understand data. Apart from that, I maintain that embellished visualizations may sometimes be fun, but will often be silly and/or pretentious.

De papieren OEK

OEK
De ruim 4.500 Amsterdamse leden van de Fietsersbond krijgen drie keer per jaar het ledenblad OEK in de bus (bezorgd door vrijwilligers, waarvoor dank). De bond vraagt zich af:

of er tegenwoordig meer mensen zijn die het eigenlijk wel prima vinden om de OEK voortaan alleen digitaal te lezen en de papieren versie niet meer hoeven te ontvangen […] U kunt dan uit de distributielijst worden gehaald. Ook als u fervent voorstander bent van de papieren OEK, mag u dit laten weten.

Nou, bij deze dan. Ik lees zoveel mogelijk digitaal - boeken, kranten, rapporten. Veel praktischer. Maar voor de OEK maak ik graag een uitzondering. Je ziet dat het blad met enthousiasme in elkaar is gezet. Echt papier, niet van dat glimmende. Flink veel tekst per pagina, maar zonder dat het onleesbaar wordt. Een fijn blad om door te bladeren en te lezen.

Uiteraard ligt dat niet alleen aan het uiterlijk, maar ook aan de inhoud. In het laatste nummer bijvoorbeeld een goede analyse over de onzin van bewustwordingscampagnes, zoals de smileyborden («Wacht op groen!») die een tijdje bij stoplichten hebben gehangen. Een inventarisatie van in het asfalt gereden fietsbeldoppen op het Leidseplein. En nog veel meer (pdf).

Tags: 

Script to look up the gender of Dutch first names


This script determines the gender of Dutch persons by looking up their first name in a database of the Meertens Institute. The database indicates how often the name occurred as a first name for men and women in 2010. If the name is used for women substantially more often than for men, the name will be interpreted as female – and vice versa.

The reason I wrote the script has to to with this article on how the performance of women professional road cyclists is improving. I wanted to check whether a similar trend is going on among amateur riders, more specifically, participants in the Gerrie Knetemann Classic (incidentally, the script would take Knetemann for a woman – it’s not foolproof). The results of the ride are available online, but pre-2012 editions lack information on the gender of participants. So that’s what the script was for.

Speed of participants in Knetemann Classic

The results of the analysis aren’t exactly clearcut. The number of women participants in the 150km ride varied from 36 to 46, or 5 to 8% of the participants whose gender could be determined (the percentage for 2013 was 6%). The (median) speed of women participants rose in 2013, and more so than for men, but this rather thin to speak of a trend.

Slovenians seem to buy more bicycles than the Dutch or even Danes

Mona Chalabi of the Guardian has collected data on car and bicycle sales and concludes that bicycle sales not only outnumber car sales, but that the gap has widened. The title of the article suggests the recession might play a role, but this article by Fabian Küster of the European Cyclists’ Federation - who uses the same sources - suggests it’s «an idle hope to believe that as soon as Europe’s economy recovers, car sales will go up again to pre-crisis levels».

If you look at car sales per 1,000 population, it turns out the Slovenians are Europe’s most enthusiastic bicycle buyers (that’s assuming the bicycle sale data for Slovenia are correct - this article quotes a lower number but gives no source). If you look at the bicycle sales to car sales ratio the picture changes considerably - likely because fewer cars are sold in poorer countries.

Embed code for the graph (the graph probably doesn’t work in older versions of IE):

<iframe src = "http://www.dirkmjk.nl//2013/bikeSales/bikeSales2.html" frameborder=0 width = 510 height=610 scrolling='no'></iframe>

Tags: 

«Tweet this» link (using jQuery)

Creating a «tweet this» link with jQuery turns out to be quite simple - that is, once you know how it works...

<span class='tweetThis'></span>

<script> jQuery(".tweetThis").append("<a href=\'https://twitter.com/intent/tweet?text="+jQuery('h1')[0].innerHTML+"&url=http%3A%2F%2Fdirkmjk.nl"+location.pathname+"\'>Tweet this<\/a>") </script>

Update - Apparently the code messes up the url if you have puncutation in it (the original url of this article - automatically generated from the title - had the quotation marks in it)

Tags: 

Cycling: Garmin altimeter compared to elevation databases

During a very rainy ride in Scotland, my Garmin altimeter appeared to be off: on some of the steepest climbs it failed to register any gradient. Afterwards, I tried the «elevation correction» feature on the Garmin website, which generously added over 750m to the total ascent the device had measured. This was certainly more satisfying, but it left me wondering. Can the weather affect the Garmin altimeter? And how accurate is the recalculated ascent?

Garmin’s recalculation service works basically by looking up the gps locations of your ride in an elevation database. Strava offers a similar service. Below, I analyse the Garmin and Strava recalculations for a number of rides. Note that this is only an exploratory analysis and that no firm conclusions can be drawn on the basis of this rather small set of observations. That said, here are some preliminary conclusions:

  • If you want to boost your ego, let Garmin recalculate your ascent: chances are it will add (quite) a few metres. Strava’s recalculations tend to stay closer to the original measurement. When it does make changes, it frequently lowers the number of metres you’re supposed to have climbed, especially on relatively flat rides.
  • In theory, you’d expect weather changes to affect the ascent measured by the device, because the altimeter is basically a barometer. In practice, weather changes don’t seem to have much effect on the altimeter.
  • It appears plausible that heavy rain does in fact mess with the altimeter.

In the graphs below, the colour of the dots represents the region of the ride. Red dots represent the Ronde Hoep, a flat ride to the south of Amsterdam. Blue ones represent the Kopje van Bloemendaal (north, south), the closest thing to a climb near Amsterdam (it’s not high but quite steep). Green dots represent the central area of the country and include the Utrechtse Heuvelrug, Veluwezoom, Rijk van Nijmegen and Kreis Kleve (the latter in Germany).

General

By default, the graph above shows how much the Garmin recalculation differs from the ascent measured by the device (graphs may not show in older versions of Internet Explorer). The closer a dot is to the dashed line, the the closer the recalculated ascent is to the original measurement.

For rides shown on the left part of the graph, where the device measured less than 500m ascent, Garmin’s recalculation often adds about 50 to 100% or more. With higher ascents, the recalculated ascent is closer to the original measurement, although it still tends to add about 30 to 50%. The highest dot to the far right of the graph is the rainy ride in Scotland; here Garmin’s recalculation added over 35%.

With the selector above the graph, you can select the Strava recalculation. You’ll notice the scale on the y axis changes (and the dashed line moves up). Also, a few red dots enter the graph. These are rides along the Ronde Hoep, which is a flat ride. For these rides, Garmin’s recalculation added up to 750% to the ascent measured by the device; therefore these dots were initially outside the graph area.

The Strava recalculations are similar to the Garmin ones in that the correction is larger for relatively flat rides. Unlike Garmin, Strava lowers the ascent in these cases, often by 15 to 50%. For rides where the device measured a total ascent of over 500m, the Strava recalculation tends to be pretty close to the original measurement.

Weather changes

It has been suggested that changes in the weather may affect elevation measurements. This makes sense, since the Garmin altimeter is in fact a barometer. Wikipedia says that pressure decreases by about 1.2 kPa for every 100 metres in ascent. In other words, if net atmospheric pressure would rise by 6 mBar, this would cause the device to underestimate total ascent by about 50 metres, so the theoretical effect wouldn’t seem to be huge.

The graph above shows how much recalculations differed from the original measurement, with change in pressure on the x axis. Note that the effect of recalculations is here in metres, not percent. I tried different combinations of pressure measures and recalculations and in only one case - the Garmin recalculation shown above - the correlation was statistically significant (and the regression line much steeper than the Wikipedia data would suggest), so this is not exactly firm evidence for an effect of weather change on elevation measurement.

Heavy rain

It has been suggested that heavy rain may block the sensor hole and thus affect elevation measurement. This may sound a bit weird, but I have seen the device stop registering any ascent during very heavy rain. Among the rides considered here, there are two that saw really heavy rainfall (the Scottish ride and a ride in Utrechtse Heuvelrug on 27 July). These do show some of the largest corrections, especially in the Strava recalculation. So it does seem plausible that rain does in fact affect elevation measurement.

In the spirit of true pseudoscientific enquiry, I tried to replicate the effect of heavy rain by squirting water from my bidon onto the device during a ride in Utrechtse Heuvelrug. This didn’t yield straightforward results. At first, the device registered implausibly steep gradients and it turned out it had interpreted the hump between Maarn and Doorn as 115m high, more than twice its real height. About halfway, unpredicted rain started to fall, mocking my experiment. Strava recalculation didn’t change much to the total ascent but it did correct the height of the bit between Maarn and Doorn, so it must have added some 50+ metres elsewhere. Be it as it may, the «experiment» does seem to confirm that water can do things to the altimeter.

Method

I took total ascent data measured by my Garmin Edge 800 and obtained a recalculation from the Garmin Connect and Strava websites. Subsequently, I looked up weather data from Weather Underground (as an armchair activist I do appreciate their slightly subversive name). Weather Underground offers historical weather data by location, with numerous observations per day. I wrote a Python script that looks up the data for the day and location of the ride and then selects the observations that roughly overlap with the duration of the ride. There turned out to be two limitations to the data. First, it appears that only data at the national level are available (the Scottish ride yielded data for London and all Dutch ones data for Amsterdam). Second, for the day / location combinations I tried there was no time-specific data for precipitation available, only for the entire day.

Because of these limitations, I also took an alternative approach, looking up data from the Royal Netherlands Meteorological Institute KNMI. This did yield more fine-grained data, although obviously limited to the Netherlands. In the end it turned out that it didn’t make much difference for the analysis whether KNMI or Weather Underground data is used. Code from the scripts I used for looking up weather data is here.

I tested quite a few correlations so a couple of ‘false positives’ may be expected. I didn’t statistically correct for this. Instead, I took a rather pragmatic approach: I’m cautious when there’s simply a significant correlation between two phenomena but I’m more confident when there’s a pattern to the correlations (e.g., Garmin and Strava recalculations are correlated in a similar way to another variable).

The spread of the fast food strikes in the USA

[Updated 6 December 2013] - On 29 November last year, 200 workers in fast food restaurants in New York went on strike to demand decent wages. What seemed exceptional at the time, has only grown since, culminating in a national day of fast food strikes in over 100 cities last week.

Their demands are justified, the NYT noted: “we’re talking about big, profitable companies, which are big and profitable in part because they rely on underpaid labour”. You can support these workers by telling fast food chains like McDonald’s and Burger King that low pay is not ok.

Embed code for the map (may not display in older versions of internet explorer):

<iframe src="http://www.dirkmjk.nl/2013/fastfood/fastFoodStrikeMap.html" frameborder=0 width=510 height=380 scrolling='no'></iframe>

Method

Data on strikes was collected from various sources and may be incomplete. I used d3.js to draw the map and setTimeout to time the transitions. For some reason I couldn’t get this to work with a for-loop without the latest transition terminating the previous ones or all transitions using the last value of i, so I hard coded each step of the iteration.

Tags: 

American norms require more space for car than for bedroom. How about Amsterdam?

«Odds are, your bedroom is smaller than your car’s: your city nearly requires it to be», this infographic explains (via Herbert Tiemens). Perhaps so in the US or Canada - but how about Amsterdam?

  • According to national regulations (see p.152), a residence area - which may coincide with or comprise a bedroom - must be at least 5 m2. This is not to say bedrooms are usually that size; it’s a minimum. For example, many bedrooms in the new Oostpoort project in Amsterdam Oost are 9.5 m2, 11 m2 or larger (these examples regard houses that are not in the ‘affordable’ category).
  • For new houses, the Amsterdam West district requires 0.6 parking spaces per house if it’s affordable housing and 0.8 per house in other cases (where, as the Zuid district puts it, «it may be expected that [households] have a median or high income and own a car»). Parking norms in some other districts are higher, e.g. 0.6/1.1 for the aforementioned Oostpoort project; 0.7/1.0 in Zuid and 1.0/1.3 in Nieuw-West.
  • For guidelines regarding the size of parking spaces, the Amsterdam municipality refers to expertise platform CROW. CROW advises 2.5 x 5 metres or 2.0 x 6 to 7 metres depending on the type of parking space; that amounts to a surface area of between 12 and 14 m2.

This implies that the required space for car parking may vary from 7.2 to 18.2 m2 per new house. Or 1.4 to 3.6 times the minimum size of a bedroom. Of course, most bedrooms will be larger than the minimum 5 m2. That said, your bedroom may very well be smaller than the required car space.

Tags: 

Cycling: are women catching up with men


Photo by DAVID ILIFF. License: CC-BY-SA 3.0 / via Wikimedia

In a petition already signed by 88,000 people, riders including Emma Pooley and Marianne Vos ask for women to be allowed to participate in the Tour de France and other cycling events:

We seek not to race against the men, but to have our own professional field running in conjunction with the men’s event, at the same time, over the same distances, on the same days, with modifications in start/finish times so neither gender’s race interferes with the other.

Among other things, they want to «debunk the myths of physical ‘limitations’ placed upon female athletes». So how about those limitations? In an opinion article in NRC Handelsblad, Sanne van Oosten of WOMEN Inc. argues that the world hour record for men (49.7 km) is only slightly higher than for women (46.1 km). And Guardian cycling columnist William Fotheringham observes:

Over the years there has been a convergence between the distances men and women race, as men’s professional races are becoming progressively shorter, and women’s gradually longer.

He doesn’t specify which races this applies to. The distances of the UCI world championships haven’t changed much, at least not since 2004. Below are the distances of the Olympic individual road race since 1984, the first year women were included. I collected the data from different sources - surprisingly there doesn’t appear to be a single source that has consistent records of distances and times over that period (not even Wikipedia!). Of course, to better understand the data one should also consider how much climbing was involved.

Distances Olympic individual road race, 1984-2012

The absolute difference hasn’t changed much: men ride about 110 km more than women. The relative difference has decreased substantially: until 1992, the distance for men was 2.4 times the distance for women; by 2012 that factor had shrunk to 1.8. So this confirms that distances are converging. Of course, the distance women race is still shorter than most Tour stages.

In a slightly cryptic article, it has been argued that the distances for women must be shorter than those for men: otherwise women’s speed would drop and they wouldn’t be able to display their technical skills («corner, change direction, or maintain their trajectory while looking at their opponents») optimally. So is women’s speed dropping as Olympic races become longer?

Average speed of winner Olympic individual road race, 1984-2012

The graph above shows the average speed of the winners of the Olympic road races since 1984. While the difference between men and women has somewhat increased, it’s not the case that women’s speed has dropped. On the contrary, the winner of 2012 was 8% faster than the winner of 1984 (over a distance that was two-thirds longer).

In short, I can see no particular reason why women riders shouldn’t get the chance to prove themselves in the 2014 Tour de France.

Graphs may not display in older versions of Internet Explorer.

Tags: 

OV-fiets gaat live informatie bieden over beschikbaarheid huurfietsen

Alle data & grafieken-geeks retweeten momenteel een kaart die laat zien op welke locaties er nog goedkope huurfietsen beschikbaar zijn bij populaire projecten als Vélib’ in Parijs, Bicing in Barcelona en Citi Bike in New York. De maker van de kaart, Ramnath Vaidyanathan, heeft voor honderd bike sharing-projecten de beschikbaarheid van huurfietsen in kaart gebracht. Dat betekent in de eerste plaats dat er dus al honderd van dit soort projecten zijn en in de tweede plaats dat al die projecten actuele gegevens aanbieden over de beschikbaarheid van huurfietsen (via een API).

Nederland komt in het lijstje niet voor. Ondanks het feit dat het Amsterdamse witte fietsenplan vaak als inspiratie wordt genoemd voor dit soort projecten, hebben wij zelf geen echt bike sharing-project (lees hier waarom). Wel hebben we de OV-fiets, maar die biedt weer geen actuele informatie over de beschikbaarheid van huurfietsen. Althans, nog niet. Een woordvoerder van de NS laat desgevraagd weten dat er momenteel aanpassingen worden gedaan aan de ICT waardoor «in de nabije toekomst» wel actuele informatie over de beschikbaarheid van OV-fietsen kan worden geboden. Dat is goed nieuws.

Overigens is het mij nog niet vaak overkomen dat de OV-fietsen op waren. De Fietsersbond heeft een aantal keer onderzoek gedaan naar de OV-fiets. In 2011 was de beschikbaarheid nog het belangrijkste probleem volgens respondenten; in 2013 was dat niet langer het geval. De NS zegt constant in de gaten te houden of er voldoende fietsen beschikbaar zijn en zonodig bij te sturen.

Summary: 

Ramnath Vaidyanathan has mapped the availability of bikes in bike sharing programmes across the world. Although the Dutch ‘White Bicycle’ plan is often cited as inspiration for such initiatives, there’s no ‘real’ bike sharing programme in the Netherlands (read why). Dutch Railways does offer OV-fiets rental bikes (note that the OV-fiets may not be easily available to tourists), but doesn’t have an API that provides realtime data on the availability of bikes. That is, not yet: when I asked Dutch Railways about their plans, a spokesperson indicated that this data will be made available ‘in the near future’.

Tags: 

Pages