# Statistics in the New HSC

One notable addition to the HSC new syllabus is the topic of Statistical Analysis which has been missing from the Stage 6 Calculus courses since time before I was born. This post is just a rambling of my thoughts on the topic: my initial reactions, a keen kindling of interest and some thoughts on the teaching of this topic across NSW.

### Initial Reactions

In my formative years of high school, the topic of Statistics often involved drawing up tables, tallying up personally irrelevant data, doing mundane calculations on a calculator, and getting out a protractor and ruler to do summary statistic diagrams (which spells TREK!). I could not help but notice an adverse reaction from everyone, from both teachers and students, towards the topic. People treated Statistics as this topic that you just had to do because it was related to Mathematics but it really wasn’t Mathematics. This attitude of course had a bleeding effect onto my own attitude towards the topic. Even today in the staffroom that I work in, there are many teachers who still hold a major dislike towards Statistics.

It didn’t help that during my first year of University, I was completely turned off Statistics as my lecturer mumbled and fumbled his way through lectures with an accent and stutter. I was turned off learning anything new in what I perceived to be a completely useless and uninteresting subject. I also didn’t so well but that was probably due to my lack of interest in it. I didn’t do anymore Statistics study in my Degree after first year.

So when I heard that Statistics was being added into the HSC, I initially reacted with shock and disappointment. I could not believe it. I believe this reaction was due to the culmination of all the negative experiences I had with Statistics compounded with the fact that I had not done my own research and learning in the subject – I was a poor student of Statistics and now I have to teach it? Perhaps this was an experience that other teachers can relate to.

### A Keen Kindling of Interest

Such a negative and preconceived view that I had would require some eye-opening insight and mathematical maturity to overcome – and so begins my journey to gradually appreciate and nurture my interest in it.

In teaching the IB courses with Statistics, I started interacting more with senior level Statistics just out of necessity. Unfortunately, my earlier classes in my career had to suffer from the bleeding effect of a negative attitude towards the topic, emanating from their teacher. Through teaching, I found that I really didn’t know much about the topic – I thought that what I learned in high school was sufficient but there was a lot more that I didn’t really know as I hadn’t learned it for myself. I couldn’t even explain what p-value meant!

From 2017, my head of department modelled to his staff what using data to make informed decisions in teaching looked like. This started with collecting and tallying up multiple choice responses in a half-yearly or yearly assessment task. The insights drawn from this exercise were surprising. Anecdotal evidence and judgement in what students need is often not enough – cold hard facts from the data they produce is more important: You may have taught something, but have your students learned it?

We also had staff meetings where we looked at RAP data from HSC performance. The trends we saw such as how the state and our school did not really understand significant figures was a surprising one. We definitely taught it, we thought it was easy, but obviously the students didn’t get it. And so we did something about it in our teaching of the next cohort.

The power of Statistical Analysis was so obvious.

I had merely been blinded to do anything with it because of my stubborn preconceptions.

In doing more exercises like these, later building upon the analysis of cohort multiple choice, I also started adding more granularity to the data by examining class clusters and per-question responses. The performance insights as I looked from class to class was interesting to say the least. For example, my class did not do as well in the topic of Bits and Bytes (when I thought we nailed it). This informed my own teaching to go back to revisit that. Such insight wouldn’t have been possible if we just looked at whole cohort data.

Further conversations with my head of department revealed to me that he loves Statistics. In the school, he is also the Director of Statistics and helps other heads of departments in other faculties to understand their data. I wanted to be like him because what he was doing was so beneficial and it was kinda cool too – helping others find insight in something that they would otherwise be unable to comprehend fully so they can do something about it. Before I knew it, data analysis became something I am also passionate about. In wanting more, I started googling for courses and found that at the University of Sydney, the Graduate Certificate of Data Science offered what looks to be the perfect packaged of up-skilling my data analysis abilities. And so I applied and I am now studying part time there!

In my lectures I have learned what I should have learned in first year Statistics. It was probably due to a combination of my increased awareness of the relevance of Statistics and my own mathematical maturity that allowed me to learn this stuff properly now. Some of the material I learned throughout my undergraduate degree helped also (like Measure Theory). It also helped that my lecturer was quite audible!

So Statistics is being added into the HSC courses? Bring it on!

### Comments on the New Syllabus

When teaching the new syllabus, some of the material can be cross referenced with textbooks, worksheets and resources from the old syllabus. These dot points don’t really present a difficulty in the teaching.

However, there aren’t many resources readily available in the usual channels of published textbooks or worksheets for Statistics. These resources will need time to build up. Teachers will need time to up-skill their own understanding of the topic before they teach it – if they don’t, it’ll become a classic example of the blind leading the blind.

It also doesn’t help that the syllabus doesn’t define what a random variable is very clearly.

A random variable is a variable whose possible values are outcomes of a statistical experiment or a random phenomenon. (p. 73)

Know that a random variable describes some aspect in a population from which samples can be drawn. (p. 47)

There’s a few problems with this.

The first definition presented in the glossary is quite a self-referential definition. A random variable is a variable. Great.

The second definition in the content outcomes is merely a qualitative description. A vague understanding of what a random variable does rather than what it is.

#### So what is a random variable?

A random variable, usually denoted with a capital X, is actually a function that maps each element of a sample space to a real number (there’s a further generalisation in University mathematics, but for the sake of a high school understanding this is suficient).

Basically, the domain of a random variable is the sample space and the range is a subset of the real numbers. If the number of elements in the sample space is countable then it’s a discrete random variable and if it’s uncountable, then it’s continuous. (The definition of countable and uncountable is for another day).

For example, suppose we roll a 4 sided die with faces coloured blue, green, red and yellow. An example random variable that allows us to model the scenario is as follows:

$X(blue) = 1, X(green) = 2, X(red) = 3, X(yellow) = 4$

There are of course an infinite number of ways you can define this random variable. There’s nothing stopping you from defining them as $$X(blue) = 342, X(green) = 111, X(red)=-2313, X(yellow)=99999$$ but that just makes things hard on yourself and most likely will have no benefit in modelling the situation.

Now we can ask questions like “what is the probability of getting a green or a blue?” which translates to finding $$P(X < 3)$$.

So $P(X<3) = P(X=2) + P(X=1) = P(\{green\}) + P(\{blue\})$

Now if the dice is fair, the random variable follows a uniform distribution which means each outcome has an equally likely chance of occurring. By mapping the sample space of blue, green, red and yellow to numbers (1, 2, 3, 4 in this example), we don’t have to concern ourselves with those words but rather focus on the behaviour of the numbers under a certain distribution. Most of the time when it’s numbers we are studying, we just let $$X(x) = x$$ for all $$x \in \Omega$$ where $$\Omega$$ is the sample space. This is why sometimes, capital X and lower case x are interchangeable in statistics questions and we still get the correct answers.

I guess I should write more (clearly) on these kind of things later in the future.

All in all, I welcome the addition of Statistics to the new syllabus and I am excited to teach it to my students!