Basic Things I’ve Learnt About Statistics

Ufuoma Ejite
3 min readJun 3, 2021

--

Source: the conversation.com
Source: the conversation.com

It’s been quite a very long time I gave an update to my study path😐 I’ve been doing a whole lot more study than penning down my experiences. It’s been a wholesome journey altogether.

Anyways, I’d be doing a brief review on Statistics (basically what you need to know as a Data Scientist) in this blog post.

Statistics generally entails the collection, organization, analysis, and interpretation of data. The data (in question) could either be population data or sample data. Sample data is just a subset of the population data. Whilst dealing with population data (denoted by N), you use the word “parameter”, and for that of sample (denoted by n), you use “statistic”. Two vital characteristics of a good sample data are randomness (data chosen by “chance”) and representativeness (avoiding biased samples by accurately reflecting the members of the entire population).

Data cannot be analyzed if its type isn’t known. Now, data is being classified based on two criteria: based on type and levels of measurement.

Based on type:

  1. Categorical data — this describes data in categories/groups. An example is a yes/no survey (“yes” is one group of data, and “no” is another)
  2. Numerical data — this basically describes numbers, and has two divisions: discrete data (finite) and continuous data (infinite; varies)

Levels of measurement:

  1. Qualitative data— of two types: nominal (cannot be ordered, e.g. seasons[autumn, spring, summer, winter], country of origin) and ordinal (can be ordered or arranged in categories, e.g. stages of cancer)
  2. Quantitative data — of two types: interval (does not have a true zero, e.g. year, time) and ratio (has a true zero, e.g. distance)

TYPES OF STATISTICS

Statistics is of three (3) types: descriptive statistics, inferential statistics and hypothesis testing

DESCRIPTIVE STATISTICS

This entails organizing, visualizing/picturing, and summarizing data. Visualizing data is very much easy when we know the type of data and the level of measurement.

When working with categorical variables, you visualize using tools such as:

  1. Frequency distribution tables
  2. Bar charts (or column charts)
  3. Pie charts
  4. Pareto diagrams
  5. Cross tables

But, with numerical variables, we could use:

  1. Histograms
  2. Scatter plots

Some Terms Used in Statistics

  1. Measure of central tendency: The mean (average), mode (most occurring data), and median (the middle number in an ordered dataset).
  2. Measure of asymmetry (skewness): shows the concentration of the data
  3. Measures of variability: The variance, standard deviation and coefficient of variation

INFERENTIAL STATISTICS

These are methods that rely on probability theory and distributions to predict population values based on sample data.

HYPOTHETICAL TESTING

A hypothesis is an idea that can be tested. There are two types of hypothesis: the null hypothesis (Ho) and the alternative hypothesis (H1 or Ha). The null hypothesis is the idea that is to be tested while the alternative is every other idea. In other words, the null hypothesis could be likened to the “status quo” while the alternative would be the change/innovation challenging the status quo. Or, we could say simply that the null hypothesis is the “statement we are trying to reject” and the alternative hypothesis is “our personal opinion”. The null hypothesis is being accepted if it is closer to the mean.

In hypothesis testing, two (2) errors could be gotten:

  1. Type 1 error (or a false positive)— when you reject a TRUE null hypothesis
  2. Type 2 error (or a false negative) — when you accept a FALSE null hypothesis

--

--

Ufuoma Ejite
Ufuoma Ejite

Written by Ufuoma Ejite

Technical Writer || Data Scientist|| Tech Enthusiast

No responses yet