Drawing Blanks

Premature Optimization is a Prerequisite for Success

Elections and statistics

leave a comment »

Everybody and their brother is demonstrating their skills in statistics by doing analyses of the 2011 Russian parliamentary elections. Some of them are interesting. And some are unfortunately “lazy research”.

I’d like to briefly comment on the main points made in those analyses. Those points are mostly concerning “anomalies” in the distribution of the vote percent for the United Russia party (the ruling and the most popular party in Russia, also known as “the party of swindlers and thieves”).

Note that I’m neither trying to refute any conclusion, nor playing devil’s advocate. I’m playing “diligent researcher”. At the end of the post I include some graphs too.

  1. “The distribution of votes for United Russia is very different from the normal distribution.” We cannot expect it to be normal because of massive territorial inhomogenity in Russia. What we see is a mix of a number of very different distributions. Sort the precincts by district and plot the percent of vote vs. precinct number. You’ll see what I mean by territorial clustering and inhomogenity.
  2. “The distribution of votes for United Russia is very different from normal, while the distributions of votes for the other parties are very close to normal”. The distributions add up to 100%. The UR distribution is simply (1 – AllOthers). So saying “all the distributions but one are OK” is just ridiculous. The other parties’ distributions are very different from normal too. Moreover, plot the UR votes versus the Communist Party votes and you’ll see that they are a linear transform of each other. So their distributions are in fact very similar. So when people say “the distribution of vote for Communist Party follows the model, while distribution for United Russia does not”, that means they have some model that explains (1-x)/3, but does not explain x. Some weird model.
  3. “The percent of votes for United Russia correlates with the turnout percent”. There are many  natural reasons for that. First, both percentages correlate with the precinct size. Because the precinct sizes are clustered by territory, and also for purely arithmetical reasons (see the Dumb Model below). Second, the person’s political activity is obviously correlated with their political preference. Moreover, this correlation (even its sign) may be very different among different social and geographical clusters. Although it is difficult to model the correlation between turnout  and party support, it is obvious that some correlation is naturally present. We must not assume zero correlation as the null hypothesis. However, speaking about turnout, I can’t explain the very high turnout at many large precincts.
  4. “The distribution of votes for United Russia has peaks at round numbers: 50%, 60%, 65, 80, 85, etc.” This is a phenomenon that I think can be considered a red flag. Although some natural explanations are possible (see the Dumb Model below), and this had been observed in previous elections too.

The Dumb Model and diligence

Consider a country with two regions A and B. In A 90% of people support party P, and in B only 30% of people support P. And let’s assume each voting precinct has only 1 (one) registered voter. Let’s look at the distribution of votes. It will look like two peaks at 0% and at 100%. So it’s not normal. And in addition it perfectly correlates with the turnout ratio! And there are peaks at round numbers!! OMG, fraud!!!

This model is dumb. As we increase the size of the precincts the distribution will look closer to the mix of two normals and the peaks at round numbers will become less significant. But what if we keep the average size of precincts in A smaller than the average size of precincts in B? (Say, A is country side and B is city). And what if we have more than two regions? What if the signs and magnitudes of the correlations between the precinct size, territory, and party support vary? The model is no longer dumb. And it may fit the observations.

I do not have a model like that, and I don’t know if anyone does. But it looks like no one is even trying to come up with one, even though a diligent researcher would definitely try.  Diligent researchers would try, and even if they didn’t succeed, they would still publish about those attempts, rather than resort to a “simple hypothesis which explains everything”.

Illustrations

1. Territorial Inhomogenity/clustering

Percent of votes for United Russia ordered by district. Each point is one precinct:

URPlot

I colored some of the most dense clusters and here is their approximate geographical identification: 1,2,3 are Buriat, Dagestan, Komi, Mari, North Ossetia, Tatarstan. 4 – Volgograd, Vologda. 5 – Leningrad region, Moscow region (“regions” here means suburbs surrounding big cities. don’t confuse Moscow Region and Moscow City). And 6 is Tyumen. Please judge for yourself if such massive inhomogenity is natural or a result of some fraud. But after looking at this picture please don’t say that the expected distribution of the vote ratio is normal.

Same plot for the Communist Party vote ratio, for comparison:

CPPlot

2. The United Russia and the Communist Party vote ratio distributions are linearly related

Scatter plot of the CP vs. UR vote percent (each point is one precinct):

CP-UR-scatter

Communist Party vote ratio by precinct (red) and a linear transform of the United Russia vote ratio (1 – VoteRatioForUnitedRussia) * 0.28 (black). Horizontal – percent of votes, vertical – number of precincts.

CPdistModel

3. Correlations between the precinct size, the turnout ratio and the vote ratio

The graphics in this section only include precincts with less than 3000 registered voters. This is only done to make the pictures easier to observe. There are 236 precincts (out of 95,000) with more than 3000 registered voters and they don’t show any “anomalies” anyway.

The goal of this section is to demonstrate that a) vote ratios are correlated with the precinct sizes, b) turnout ratios are correlated with the precinct sizes, c) the precinct sizes are not normally distributed, d) the precinct sizes are somewhat correlated with geography. In addition, as we saw, the vote ratios are also correlated with geography. Correlations are everywhere. So how can one expect that the vote ratios would be uncorrelated with turnout and normally distributed???

Distribution of precincts by size. Horizontal – number of registered voters at precinct, vertical – number of precincts:

sizedist

Precinct sizes ordered by district. We can see some geographical inhomogenity/clustering:

sizeplot

Turnout vs. precinct size:

turnout-size

Votes for UR vs. precinct size:

UR-size

Votes for CP vs. precinct size:

CP-size

4. The vote ratio distribution is mixed

Votes for United Russia in districts with code < 22:

UR-badregions

Votes for United Russia in districts with code > 22 and precinct size < 1000:

UR-goodsmall

Votes for United Russia in districts with code > 22 and precinct size > 1000:

UR-goodlarge

Written by bbzippo

01/02/2012 at 3:15 am

Posted in Uncategorized

Tagged with

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: