Salonanarchist | Leunstoelactivist

Turnout and population size

The Dutch minister of the interior, Ronald Plasterk, has asked the Bureau for Economic Policy Analysis (CPB) to evaluate the declining turnout in local elections. This is an important issue, given how inequality and low turnout are related.

More specifically, Plasterk would like to know: first, if turnout is correlated to population size and second, what effect do municipal mergers have on turnout (one suspects a lobby of local governments opposed to mergers behind these questions).

As for the first question, that’s an easy one: yes. Smaller cities tend to have higher turnout. I looked it up, and the correlation’s actually pretty strong, if declining: 0.62 in 2002; 0.59 in 2006 and 0.50 in 2010 (somehow I couldn’t download data from the Kiesraad website, so I used the data I had downloaded some time ago, not including 2014 yet). I think political scientists will not be shocked by these outcomes.

More interesting is what kind of recommendations the CPB will come up with. Somehow I don’t think they’ll recommend cutting up large municipalities. Perhaps they should consider recommending a reintroduction of compulsory voting.

Tags: 

Cycling and income in the Netherlands

In Nickel and Dimed, her book on going undercover in low-wage America, Barbara Ehrenreich describes how not owning a car is one of the many factors making it difficult for low-paid workers to find better jobs. «Some of my co-workers, in Minneapolis as well as Key West, rode bikes to work, and this clearly limited their geographical range», she adds.

I was reminded of Ehrenreich’s book when I read a blog post by Michael Andersen. He argues that Denmark’s good quality bicycle infrastructure has contributed to the country’s egalitarian nature by making it easier to escape poverty. Danes with low incomes make a high share of their trips by bicycle. Rich Danes cycle too, but make far more trips by car.

In the comments to the blog post there’s a suggestion that in Amsterdam, it’s mainly the wealthy who ride bicycles. I couldn’t find recent data for Amsterdam, but geographical patterns may play a role. In the central area, where density is high and where the high-income districts Zuid and Centrum are located, people cycle more. In the peripheral districts, where distances to shops and other facilities tend to be longer, fewer trips are made by bicycle. Some of the poorest neighbourhoods are located there.

Statistics Netherlands (CBS) has data for the entire country, as well as for the cities with the highest addresses per surface area ratio. These include Amsterdam, Rotterdam, the Hague, Utrecht and a number of smaller cities. The main conclusions:

  • Like in Denmark, cycling infrastructure benefits all kinds of people, but low-income people even more so;
  • In high-density cities, not just the lowest income groups, but also the richest are more likely to take advantage of cycling infrastructure.

Incidentally, this doesn’t mean that cyclists get the space they should get. In a recent opinion article in NRC Handelsblad, writer Fred Feddes says that bicycle lanes make up 11% of public space in Amsterdam’s inner city, but parked cars probably far more.

Detailed data can be found here (I took this as an opportunity to practice the knitr skills taught in the Reproducible research course).

Update 20 August - Someone at the Fietsersbond dug up this (pdf) publication of the Amsterdam Municipality from 2010 which compares the mobility of Amsterdammers over the period 1986-1991 to 2005-2008. It suggests that cycling patterns in Amsterdam may in fact differ from the general pattern in high-density cities, with more cycling among high-income residents (as suggested by the commenter quoted above):

As for the development per income class, it turns out there are substantial differences. Among high-income residents the share of cycling in the total number of trips has more than doubled (from 15% to 33%), whereas the growth is only modest among low-income residents (from 26% to 33%). This means that relatively speaking, wealthy Amsterdammers today cycle more than low-income residents.

Tags: 

Are parked cars really dominating Amsterdam’s public space

In an intriguing opinion article in Thursday’s NRC Handelsblad, an author named Fred Feddes suggests banning parked cars from Amsterdam’s city centre. He argues that the current 15,000 parking spaces in the inner city take up 18ha, amounting to as much as 40% of the 45ha public space.

Sure, parked cars use lots of space, but 40%? Apparently, I wasn’t the only one to find that figure incredible. Council member Zeeger Ernsting tweeted:

As much as I endorse the viewpoint, the figure of 40% parking can’t possibly be right.. But indeed, cars [are] still far too dominant

I couldn’t immediately trace Feddes’ source and I’m sure there will be more debate on the issue. For now, here’s a quick and dirty calculation:

  • According to this (pdf) document of the Centrum district, «traffic areas» and green areas amount to 86ha. That’s more than Feddes’ 45ha, although I think the green areas may include some non-public space.
  • The district’s open data site has data on parking spaces (dating from 2010). All types combined, there were some 16,000 of them, slightly more than Feddes’ estimate.
  • Assuming that one parking space takes up 12 to 14m2, this would amount to 19 to 22ha; again slightly more than Feddes’ 18ha.

Perhaps Ernsting could ask the local government to shed some more light on this issue. Meanwhile, my provisional conclusion is that Feddes’ estimate doesn’t seem as incredible as I initially thought. And even if parked cars use only about 25% of public space, that’s still an enormous amount of space if you think about it.

Tags: 

Identifying «communists» at the New York Times, by 1955 US Army criteria

A while ago, Open Culture wrote about a 1955 US Army manual entitled How to spot a communist. According to the manual, communists have a preference for long sentences and tend to use expressions like:

integrative thinking, vanguard, comrade, hootenanny, chauvinism, book-burning, syncretistic faith, bourgeois-nationalism, jingoism, colonialism, hooliganism, ruling class, progressive, demagogy, dialectical, witch-hunt, reactionary, exploitation, oppressive, materialist.

What happened in the 1950s is pretty terrible, but that doesn’t mean we can’t have a bit of fun with the manual. I used the New York Times Article Search API to look up which of its writers actually use terms like hootenanny, book-burning and jingoism. The results are summarised below.

Interestingly, many of the users of «communist» terms are either foreign correspondents or art, music and film critics. While it’s possible that people who have an affinity with the arts tend to sympathise with communism, an alternative explanation would be that critics have more freedom than «regular» journalists to use somewhat exotic and expressive terms like the ones the US Army associated with communism.

Also of interest is that one of the current writers on the list is Ross Douthat, the main conservative columnist of the New York Times. In his articles, he uses terms like materialist, oppressive, reactionary, exploitation, vanguard, ruling class, progressive and chauvinism. Surely he wouldn’t be a reformed communist - would he?

Method

The New York Times Article Search API is a great tool, but you have to keep in mind that digitising the archive isn’t an entirely error-free process. For example, sometimes bits of information end up in the lastname field that don’t belong there (e.g. "lastname": "DURANTYMOSCOW"). While it’s possible to correct some of these issues, it’s likely that search results will in some way be incomplete.

To get a manageable dataset, I looked up all articles containing any combination of two terms from the manual. I then calculated a score for each author by simply counting the number of unique terms they have used.

An alternative would have been to correct for the total number of articles per author in the NYT archive. It took me a while to figure out how to search by author using the NYT API. It turns out you can search for terms appearing in the byline using ?fq=byline:("firstname middlename lastname") - even though this option isn’t mentioned in the documentation. I’m not entirely sure such a search will return articles where the byline/original field is empty.

As you might expect, there’s a correlation between the number of articles per author and the number of unique terms this author has used.

All in all, it would be possible to calculate a relative score, for example number of terms used per 1,000 articles, but this may have unintended consequences. To take an extreme example: an author who has written one article which happened to contain three terms would get a score of 3,000 using this method, whereas an author who has thousands of articles and consistently uses a broad range of terms but not at a rate of three per article would get a (considerably) lower score.

I decided to stick with the absolute number of unique terms per author. This has the disadvantage that authors who have written few articles are unlikely to show up in the analysis, but I’m not sure that this problem can be adequately solved by calculating a relative score.

The Python and R code used to collect and analyse the data is available on Github.

Tags: 

Connections between businesses and politics: banks and Shell dominate

Website Follow the Money has analysed the «revolving door» between politics and businesses in the Netherlands, adding that the examples discussed are far from exhaustive. I’ve expanded the list of connections between businesses and politics by checking the resumes of close to 700 politicians – government members and members of parliament – who have been active in Dutch politics after 2001.

The list is headed by the Rabobank: 32 politicians have (had) a position there. This score can perhaps partly be explained by the fact that Rabobank is a cooperative of local banks, each with their own advisory board; so many people have positions there. Number two is Royal Dutch Shell, the largest Dutch company (of course, it’s partly British).

From the list, it can be concluded that financial institutions play a central role in the connections between businesses and politics. The phenomenon is not politically neutral: almost three-quarters of the politicians who have (had) positions with the three largest banks are (or have been) affiliated to the conservative parties CDA and VVD.

One of them is former finance minister Gerrit Zalm (VVD). After his political career, he first moved to DSB Bank and then became chairman of the board of ABN Amro (for controversies, see the FTM article as well as this analysis by de Correspondent). Another example is Joop Wijn (CDA) who started at ABN Amro and subsequently served as minister and state secretary at the finance and economic affairs departments. After that, he had a management position at Rabobank and currently he’s on the executive board of ABN Amro.

Financial institutions aside, an interesting case is airline KLM, now part of Air France-KLM, which appears to have played a bit of an emancipatory role. Over the past years, as many as four former KLM stewardesses have obtained a position in national politics: Fransje Roscam Abbing-Bos (VVD, Senate); Gonny van Oudenallen (various parties, Lower House); Ing Yoe Tan (PvdA, Senate) and Kathleen Ferrier (CDA, Lower House).

Method

I’ve created a list of Dutch companies using information from Wikipedia and Elsevier / Bureau van Dijk. I’ve checked these companies against resumes from the (very useful) website Parlement.com. Here’s the Python script I used to download the resumes and to analyse them. The results had to be cleaned up manually. For example, former MP Wijnand Duyvendak, who’s been in charge of the Friends of the Earth Schiphol campaign, should not be counted as having had a position with Schiphol. To be on the safe side, I also didn’t count positions on the pension board or the board of a foundation of a company.

Tags: 

Scraping websites with Outwit Hub: Step by step tutorial

Some websites offer data that you can download as an Excel or CSV file (e.g., Eurostat), or they may offer structured data in the form of an API. Other websites contain useful information, but they don’t provide that information in a convenient form. In such cases, a webscraper may be the tool to extract that information and store it in a format that you can use for further analysis.

If you really want control over your scraper the best option is probably to write it yourself, for example in Python. If you’re not into programming, there are some apps that may help you out. One is Outwit Hub. Below I will provide some step by step examples of how you can use Outwit Hub to scrape websites and export the results as an Excel file.

But first a few remarks:

  • Outwit Hub comes in a free and a paid version; the paid version has more options. As far as I can tell, the most important limitation of the free version is that it will only let you extract 100 records per query. In the examples below, I’ll try to stick to functionality available in the free version.
  • Information on websites may be copyrighted. Using that information for other purposes than personal use (e.g. publishing it) may be a violation of copyright.
  • Webscraping is a messy process. The data you extract may need some cleaning up. More importantly, always do some checks to make sure the scraper is functioning properly. For example, is the number of results you got consistent with what you expected? Check some examples to see if the numbers you get are correct and if they have ended up in the right row and column.
The Outwit Hub app can be downloaded here (it’s also available as a Firefox plugin, but last time I checked it wasn’t compatible with the newest version of Firefox).

Scraping a single webpage

Sometimes, all the information you’re looking for will be available from one single webpage.

Strategy

Out of the box, Outwit Hub comes with a number of preset scrapers. These include scrapers for extracting links, tables and lists. In many cases, it makes sense to simply try Outwit Hub’s tables and lists scrapers to see if that will get you the results you want. It will save you some time, and often the results will be cleaner than when you create your own scraper.

Sometimes, however, you will have to create your own scraper. You do so by telling Outwit Hub which chunks of information it should look for. The output will be presented in the form of a table, so think of the information as cases (units of information that should go into one row) and within those cases, the different types of information you want to retrieve about those cases (the information that should go into the different cells within a row).

You tell Outwit Hub what information to look for by defining the «Marker Before» and the «Marker After». For example, you may want to extract the tekst of a title that is represented as <h1>Chapter One<h1> in the html code. In this case the Marker Before could be <h1> and the Marker After could be </h1>. This would tell Outwit Hub to extract any text between those two markers.

It may take some trial and error to get the markers right. Ideally, they should meet two criteria:

  • They should capture all the instances you want included. For example, if some of the titles you want to extract aren’t h1 titles but h2 titles, the <h1> and </h1> markers will give you incomplete results. Perhaps you could use <h and </h as markers.
  • They should capture as little irrelevant pieces of information as possible. For example, you may find that an interesting piece of information is located between <p> and </p> tags. However, p-tags (used to define paragraphs in a text) may occur a lot on a webpage and you may end up with a lot of irrelevant results. So you may want to try to find markers that more precisely define what you’re looking for.

Example: Bossnappings

Some French workers have resorted to «bossnapping» as a response to mass layoffs during the crisis. If you’re interested in the phenomenon, you can find some information from a paper on the topic summarized here. From a webscraping perspective, this is pretty straightforward: all the information can be found in one table on a single webpage.

The easiest way to extract the information is to use Outwit Hub’s preset «tables» scraper:

Of course, rather than using the preset table scraper, you may want to try to create your own scraper:

Example: Wikipedia Yellow Jerseys table

If you’re interested in riders who won Yellow Jerseys in the Tour de France, you can find statistics on this Wikipedia page. Again, the information is presented in a single table on a single website.

Again, the easy way is to use Outwit Hub’s «tables» scraper:

And here’s how you create your own scraper:

Example: the Fall band members

Mark E. Smith of the Fall is a brilliant musician, but he does have a reputation for discarding band members. If you want to analyse the Fall band member turnover, you can find the data here. This time, the data is not in a table structure. The webpage does have a list structure, but the list elements are the descriptions of band members, not their names and the years in which they were band members. So Outwit Hub’s «tables» and «lists» scrapers won’t be much help in this case – you’ll have to create your own scaper.

To extract the information:

Navigating through links on a webpage

In the previous examples, all the information could be found on a single webpage. Often, the information will be spread out over a series of webpages. Hopefully, there will also be a page with links to all the pages that contain the relevant information. Let’s call the page with links the index page and the webpages it links to (where the actual information is to be found) the linked pages.

Strategy

You’ll need a strategy to follow the links on the index page and collect the information from all the linked pages. Here’s how you do it:

  • First visit one of the linked pages and create a scraper to retrieve the information you need from that page.
  • Return to the index page and tell Outwit Hub to extract all the links from that page.
  • Try to filter these links as well as you can to exclude irrelevant links (most webpages contain large numbers of links and most of them are probably irrelevant for your purposes).
  • Tell Outwit Hub to apply the scraper (the one you created for one of the linked pages) to all the linked pages.

Two remarks:

  • Hopefully, all the linked pages have the same structure, but don’t count on it. You’ll need to check if your scraper works properly for all the linked pages.
  • In the output window, make sure to set the catch / empty settings correctly because otherwise Outwit Hub will discard the output collected so far before moving to the next linked page.

Example: Tour de France 2013 stages

We’ll return to the Tour de France Yellow Jersey, but this time we’ll look in more detail into the stages of the 2013 edition. Information can be found on the official webpage of le Tour.

Navigating through multiple pages with links

Same as above, but now the links to the linked pages are not to be found on a single index page, but a series of index pages.

Strategy

First create a web scraper for one of the linked pages, then collect the links from the index page so you can tell Outwit Hub to apply your scraper to all the linked pages. However, you’ll need one more step before you can tell Outwit Hub to apply the scraper: you’ll need to collect the links from all the index pages, not just the first one. In many cases, Outwit Hub will be able to find out by itself how to move through all the index pages.

Example: Proceedings of Parliament

Suppose you want to analyse how critically Dutch Members of Parliament have been following the Dutch intelligence service AIVD over the past 15 years or so. You can search the questions they have asked with a search query like this, which gives you 206 results, and their urls can be found on a series of 21 index pages (perhaps new questions have been asked since, in which case you’ll get a higher number of results). So the challenge is to create a scraper for one of the linked pages and then get Outwit Hub to apply this scraper to all the links from all 21 index pages.

Resources

Tags: 

King’s Day associations lose tax exempt status

Don’t ask me why, but Oranjeverenigingen (Orange Associations - most focus on organising festivities on King’s Day) seem to be struggling with the new transparency rules of the tax authority.

Recently, new rules have been introduced for organisations that want to receive tax-exempt donations. Among other things, they must have a website and publish the compensation their board members receive. As a consequence of these new rules, over two thousand organisations have had their «anbi status» withdrawn, broadcaster NOS reported.

The tax authority has published a dataset on organisations that have or used to have the anbi status. It appears that especially Oranjeverenigingen have been affected. Six percent of all organisations had their anbi status withdrawn, but this happened to 75% of organisations with «oranje» in their name. Obviously, it’s a bit risky to draw conclusions from this as long as the explanation of the phenomenon is unclear.

Method

Data from the tax authority are here, and here’s the R script I analysed the data with. I also checked this for other terms that occur frequently (organisations with the Dutch word for «first aid», «christian», «jehova», «education», «amsterdam», «third world aid shop» or «museum» in their names), but they don’t show the same pattern.

Tags: 

Decline in cycling in the Netherlands?

Using new data from Statistics Netherlands (CBS), cycling expertise centre Fietsberaad reports that cycling has declined in the Netherlands over the past three years, both in terms of the distance traveled and the number of trips per person per day. The chart to the left is from their website.

Fietsberaad does warn against reading too much into this: there have been changes in how the data are collected and analysed, and the weather may have caused short-term fluctuations in cycling (meteorological institute KNMI reports that there were 46 days with minimum temperatures below 0°C in 2011; 50 in 2012 and 64 in 2013). Keeping all this in mind, it’s still interesting to note that the same period saw an increase in cycling in the four largest cities.

Be that as it may, the chart created by Fietsberaad does look worrisome. But what does it actually show? There are no values on the y-axis. Does the y-axis even start at zero? Apparently it doesn't, for otherwise the chart would have looked more like the one below. Which looks slightly less dramatic.

Belkin quits. How loyal are sponsors of cycling teams?

Last year, Belkin became the title sponsor of the former Rabobank cycling team, but today it announced that it will end its sponsorship by the end of the year. Various commentators have expressed concern over the lack of continuity in sponsoring. Which raises the question: is it normal for a sponsor to quit after such a short period? And is this becoming worse?

Some sponsors leave after one or two years, while others remain loyal for ten years or more (Française des Jeux, Lampre, Lotto, Quick Step).

The graph above shows the sponsor turnover of UCI Pro Tour teams (the share of sponsors that would quit the subsequent year). Turnover is about 25%, which suggests that a normal sponsorship duration should be about four years. So Belkin’s loyalty is not impressive by those standards.

While the sponsorship duration fluctuates, there doesn’t appear to be a trend of sponsors becoming more or less loyal.

Method

I retrieved sponsor names from team names of UCI Pro Tour teams listed by Cycling News. Due to variations in spelling (Française des Jeux, FDJ, FDJ.fr), the data needed some cleaning up. If you want to check them: here’s a list of sponsors and the years in which I think they were active.

Tags: 

Cyclists should have priority here

IMG_1202

Some crossings make you wonder: isn’t it weird that cyclists don’t have priority here. This occurs in Amsterdam, but more often in the country. There are different variants, but often there’s a bend in the cycle path just before a crossing. The cycle path is no longer part of the main road and cyclists are confronted with give way road markings. You have to give way to everybody: motorists coming from behind who turn right, oncoming traffic turning left and traffic from the right.

Often, you have to give way to rather secondary roads. For example, the exit to a tiny car park along the Oostvaardersdijk in Almere (photo above). Or the entrance of a government building at the Amsterdamseweg in Velsen-Zuid, where motorists who get priority subsequently have to stop at a gate anyway.

As a cyclist, you end up with a tricky crossing. You have to pay attention to traffic from behind, oncoming traffic and traffic from the right. The sense of insecurity mixes with indignation at the fact that apparently, people have specifically diverted the cycle path just to rob cyclists of their priority. Why are they doing this?

I put this question – in somewhat more neutral terms – to a number of road maintenance authorities, with illustrations from Velsen-Zuid, Watergang, Monnickendam, Weesp, Almere and Muiden. Their answers reveal that there are two reasons for bending cycle paths. First, this creates a space for motorists coming from the right where they can wait before entering or crossing the main road (this is a reason for bending the cycle path, but in itself not a reason to rob cyclists of their priority). Second, it’s about bicycle safety. In the words of the spokesperson of the Province of Noord-Holland:

For reasons of bicycle safety, we at the province often choose not to let cyclists have priority, especially outside the built-up area. It’s the same thing as with roundabouts: you may have priority as a cyclist, but whether you’ll be given priority is a different matter. And with roundabouts, it’s been shown that cyclists who have priority are more often involved in accidents, simply because they’re not given priority.

It’s good to know that the safety of cyclists is high on the agenda. But bending the cycle path and robbing cyclists of their priority – I’m not convinced that’s the right solution. In fact, it’s a bit twisted to reward motorists for not paying attention to cyclists who have priority. There have to be better ways to make them pay attention to cyclists and to slow them down.

As I said, such situations occur mainly in the country. You can point to situations in Amsterdam where cyclists should have priority, but mostly these don’t concern cycle paths along main roads that have been bended. However, there is a slightly similar situation opposite the entrance of the Westerpark.

The original Dutch version of this article appeared in the OEK (pdf). More examples here.

Pages