Category Archives: Other

Blogs that are probably about maths, science or economics.

Be scared of the myth of big data

Last night I attended a lecture by Yuval Noah Harari – historian and author of the popular book ‘Sapiens’. Harari’s thesis is that human society is built on shared myths, and that without these we wouldn’t be able to organise ourselves into groups of more than a couple of hundred people. These myths are things like religion, social caste, political ideologies, and money.

During questions a member of the audience asked Harari what he predicted the next great myth would be. He answered, “Data.”

Harari’s contention is that with the growth of big data we are moving towards deifying quantitative information. Just as money has become something in which we unanimously place our trust (and therefore grant great power to otherwise valueless slips of paper) so we will begin to place our faith in data.

I can see signs of this myth emerging already, and I think it goes something like this: “if we get enough data we will be able to predict the future.”

The problem is that we won’t. There are some things data cannot tell us; there are limits to its power. Bigger sample sizes can take us so far, but there are certain frontiers that no sample size can help us cross. My fear is that if the data myth grows we will increasingly find ourselves basing decisions on statistical fallacies, and in a false sense of security end up with all of our eggs in a very unstable basket.

There are four reasons this myth is wrong:

The way we use statistical significance is logically flawed – so we cannot trust our results
Many social scientists use statistical significance tests to answer the question “Given the hypothesis is true, what is the probability of observing these results?”. However the question it should be used to answer is “Given these results have been observed, what is the probability that the hypothesis is true?”. Though similar, these questions are fundamentally different.

Ziliak & McCloskey (2008) liken this to the difference between saying “Given a person has been hanged, what is the probability they are dead?” (~100%) and “Given a person is dead, what is the probability they have been hanged?” (<1%). Although these questions sound similar they give completely different answers; and we could be using our statistical significance testing to make mistakes as big as these.

The laws of societies are not fixed – so we cannot predict the impact of our actions
We use data to estimate parameters about society and the economy, such as the relationship between inflation and unemployment, or between income inequality and crime. Although we can measure the parameters of these relationships at the moment, these parameters are not fixed. In fact they are highly prone to change whenever we alter something like technology or government policy.

So for example we cannot predict the impact of a new invention on society, because our prediction would be using parameters from the pre-invention world and not accounting for the invention’s impact on the deeper structures of society. This means that the times we most want to use data to predict the future – those times of significant change – are precisely those times when to do so would be utterly invalid.

No amount of data can capture the complexity of human systems – so we cannot make predictions beyond very short time horizons
Non-linear systems suffer from what mathematicians call “sensitive dependence on initial conditions”, popularly known as the butterfly effect. In a linear system measurement error is not a big problem. As long as a measurement falls within reasonable bounds of error we can make predictions within similarly reasonable boundaries because we know how much the error can be magnified. In a non-linear system, however, measurement error, even if utterly minuscule, can completely dominate a prediction. This is because the feedback loops in such a system continually transform and magnify the error until the resulting behaviour of the model is totally divorced from that of reality.

Human systems are so complex that we cannot measure them accurately. There will always be a measurement error, no matter how much data we obtain. They are also extremely non-linear. And this means that our predictions will quickly deviate from reality.

We don’t know how to handle uncertainty – so we cannot forecast probabilities
Our forecasting models are built on probabilities. We manage risk by assigning probabilities to all possible outcomes, based on historical data. What we can’t do is manage uncertainty. Uncertainty is different to risk because it describes a situation where the possible range of outcomes and/or their probabilities are not known. If we don’t know the probability of an outcome, or we don’t even know what the outcome is, then we can’t build it into a model. And if our models only take a subset of possible outcomes, and assume that the probabilities of the past are unchanged in the future, then the probabilities they forecast will be wrong.

The Thinking Cycle: why education is a battleground, and what to do about it

Education is a battleground. Public statements on schooling frequently insult dissenters, whilst civil disagreements on Twitter spontaneously combust into name-calling and bullying that puts our profession to shame. Like many battlegrounds, the soldiers on this one are often guilty of forgetting why the battle is being fought. Quick to pounce on any indicator of hostility – an innocent deployment of a loaded word, or a well-meaning opinion on a contentious topic – we have created caricatures of ourselves, and use these shorthands to distinct friend and foe.

The dominant fields of thought in education are popularly considered to be traditionalism and progressivism, and generally defined in terms of the issues they disagree over . My contention is this:

Traditionalism and progressivism are manifestations of two competing approaches to scientific reasoning, and will become more pronounced as the scientific aspects of education develop further. To be able to navigate the disputes that will ensue, and know when to leave our natural positions in favour of compromise, we need to understand what these approaches are and how they shape our thinking. Both approaches have merits and flaws – to dismiss either outright is foolish.

Mechanisms vs systems

There are two approaches to scientific thought. The mechanistic approach seeks to break processes down into smaller chunks, and understand each step of a causal chain to learn the precise mechanism that leads from cause to effect.  The systems approach believes that certain properties only emerge at the system-level, and so some knowledge cannot be gathered by looking at the smaller parts – no matter in how much detail you look.

Neither of these approaches is universally ‘correct’. Through history their respective powers have oscillated depending on which was most able to generate the next breakthrough. For example, physics, though dominated for much of history by the mechanistic drive to look at the next smallest thing, had a resurgence of systems thinking after the discovery of quantum theory. Without mechanistic thinking we would not know about the existence or behaviour of fundamental particles, but without systems thinking we would not be able to link their behaviour to the phenomena we see in the observable world. Systems biology is also undergoing a resurgence at the moment, and is proving an incredibly popular option on many university courses.

There are times when the dominant theory endorsed by one approach is simply wrong, and is eventually abandoned in favour of another. However this does not mean that the approach itself is wrong. Science progresses by resolving individual disputes and selecting the best theories, whilst preserving the approaches to thought themselves.

The dichotomy in education

The battleground in education is too often defined by the micro-level disagreements, which mask the underlying approaches to thought that are the origins of these disagreements. I prefer to follow these definitions:

Traditionalism: a preference for mechanistic thinking, or solving problems by looking at component parts to explore observable chains of cause and effect

Progressivism: a preference for systems thinking, or solving problems by looking at properties of entire systems rather than smaller causal chains

Mechanistic thinking: striving to understand the components of learning

Mechanistic thinking digs deeper into the processes of learning. Its natural instinct is towards some kind of experiment with falsifiable hypotheses, and ideally work with quantifiable data. It believes that by learning more about the intricate parts of learning, we will be able to adapt our policies and practice to benefit children. Without mechanistic thinking we would lack these insights and be unable to intervene effectively in the processes of learning – just like early medicine was fixated on the system at the expense of understanding the causal chains.

However mechanistic thinking has its flaws. A whole is often more than the sum of its parts, with certain properties only emerging at the system level that are not observable in the mechanisms themselves. Mechanistic thinking risks missing these, and so maximising the effectiveness of individual processes without actually maximising the end result for the child.

Systems thinking: striving to understand the child as a whole

Systems thinking looks at the overarching behaviour of the child as a whole. Its natural instinct is towards more qualitative research over a longer period of time, and will happily look for effects that cannot be quantified. This does not mean that they cannot be understood scientifically, but that they need more complex techniques as they deal with more complex systems than the individual processes of mechanistic thinking. Without systems thinking we would lack insight into the emergent properties of systems (that only appear at the system-level) – which would leave our knowledge of mechanisms divorced from our observations of reality.

However systems thinking has its flaws. We can only learn so much about a system without understanding its components, and knowledge of details does allow us to develop a greater knowledge at the system-level. By casting aside mechanistic inquiry as reductionist it risks missing out on these details, and so halting the growth of our understanding.

The thinking cycle

Every scientific field is subject to a natural “thinking cycle”, where the influence of these two approaches oscillates and they alternate in dominance. Each takes its turn as the revolutionary, that steps in and makes a much-needed change to overthrow the complacent orthodoxy of the day. We need eras of mechanistic dominance to dig deeper and learn more about the processes of learning. However between these we need eras of systems dominance to link our discoveries and make coherent theories of children’s’ whole development.

Learn to understand each other, but not necessarily to compromise

The message of this post is not to blandly compromise. There are correct theories and there are incorrect theories – the answer is rarely in the middle. However we do need to learn the discipline of adopting both approaches in our thinking. If mechanistic thinkers could step back and try to think of systems, and if systems thinkers could look deeper and try to think of mechanisms, we would take a great step forward in understanding each other and growing our knowledge about education.