Classification of Data | StudyTution

data is where we have columns representing variables and rows representing the cases or observations.

numerical in nature or quantitative in nature

  • when we look at data, data is broadly classified into two
    categories; categorical data and numerical data.
  • So, when we look at categorical data, these are also called as qualitative variables.

What do we mean by group membership?

  • Again we go back to our student data, let us look at gender.
  • Gender is a categorical variable.
  • I have two categories here. I can classify any observation into one of these two categories.
  • So, it is a group membership. Similarly, when I look at board I have a category which is a State Board, I have ICSE, I have CBSE. So, again you can see that this categorical variable has three categories and any observation can be categorized into one of these three groups.
  • So we are giving membership of an observation to a particular group in that particular variable.
  • So, this category has groups.

  • Let us go to the hospital data.
  • You see that blood group every patient is either an O positive or an O negative or a B positive or a A positive or A negative.
  • So, you can see that there are many blood groups I again this is a categorical variable; gender is a categorical variable.

numerical data

  •  When we have numerical data, numerical data is also
    called quantitative variables.
  • Here I can talk about numerical properties of data.
  • we need to understand what is the scale that defines the numerical data.
  • Again we have already emphasized on the point that when you have numerical data which take units.
  •  within numerical data, I could have discrete data and I could have continuous data.
  • We need to ensure that the variable is measured across all observations and shares a common unit.

Time Series Data and Cross sectional

  • Apart from categorical data and numerical data, we also have data which where which  are referred to as time series data.
  • we refer to as a time series data where the data on a particular variable; this could be the quantity procured on potato is obtained the variable is the same.
  • what we refer to as a time series data whereas, cross sectional data is the data which is observed at the same time.

Summary

  • So, we will to broadly classify, we should know that given data we classify them broadly as categorical or numerical.
  • So, whenever we are presented with a data set, we should be able to classify all the variables in the data set as a categorical variable or a numerical variable.
Facebook Comments Box