This topic gives an overview of
Everyday we come across a lot of information in the form of facts, numerical figures, tables, graphs, etc. These are provided by newspapers, televisions, magazines and other means of communication. These may relate to cricket batting or bowling averages, profits of a company, temperatures of cities, expenditures in various sectors of a five year plan, polling results, and so on. These facts or figures, which are numerical or otherwise, collected with a definite purpose are called data. Data is the plural form of the Latin word datum. Of course, the word ‘data’ is not new for you. You have studied about data and data handling in earlier classes.
Our world is becoming more and more information oriented. Every part of our lives utilises data in one form or the other. So, it becomes essential for us to know how to extract meaningful information from such data. This extraction of meaningful information is studied in a branch of mathematics called Statistics.
The word ‘statistics’ appears to have been derived from the Latin word ‘status’ meaning ‘a (political) state’. In its origin, statistics was simply the collection of data on different aspects of the life of people, useful to the State. Over the period of time, however, its scope broadened and statistics began to concern itself not only with the collection and presentation of data but also with the interpretation and drawing of inferences from the data. Statistics deals with collection, organisation, analysis and interpretation of data. The word ‘statistics’ has different meanings in different contexts. Let us observe the following sentences:
In the first sentence, statistics is used in a plural sense, meaning numerical data. These may include a number of educational institutions of India, literacy rates of various states, etc. In the second sentence, the word ‘statistics’ is used as a singular noun,
meaning the subject which deals with the collection, presentation, analysis of data as well as drawing of meaningful conclusions from the data.In this chapter, we shall briefly discuss all these aspects regarding data.
Let us begin with an exercise on gathering data by performing the following activity.
Activity : Divide the students of your class into four groups. Allot each group the work of collecting one of the following kinds of data:
Let us move to the results students have gathered. How did they collect their data in each group?
In the first case, when the information was collected by the investigator herself or himself with a definite objective in her or his mind, the data obtained is called primary data.
In the second case, when the information was gathered from a source which already had the information stored, the data obtained is called secondary data. Such data, which has been collected by someone else in another context, needs to be used with great care ensuring that the source is reliable.
By now, you must have understood how to collect data and distinguish between primary and secondary data.
As soon as the work related to collection of data is over, the investigator has to find out ways to present them in a form which is meaningful, easily understood and gives its main features at a glance. Let us now recall the various ways of presenting the data through some examples.
Example : Consider the marks obtained by 10 students in a mathematics test as given below:
55 36 95 73 60 42 25 78 75 62
The data in this form is called raw data. By looking at it in this form, can you find the highest and the lowest marks? Did it take you some time to search for the maximum and minimum scores? Wouldn’t it be less time consuming if these scores were arranged in ascending or descending order? So let us arrange the marks in ascending order as
25 36 42 55 60 62 73 75 78 95
Now, we can clearly see that the lowest marks are 25 and the highest marks are 95. The difference of the highest and the lowest values in the data is called the range of the data. So, the range in this case is 95 – 25 = 70.
Presentation of data in ascending or descending order can be quite time consuming, particularly when the number of observations in an experiment is large, as in the case of the next example.
Example : Consider the marks obtained (out of 100 marks) by 30 students of Class IX of a school:
10 20 36 92 95 40 50 56 60 70
92 88 80 70 72 70 36 40 36 40
92 40 50 50 56 60 70 60 60 88
Recall that the number of students who have obtained a certain number of marks is called the frequency of those marks. For instance, 4 students got 70 marks. So the frequency of 70 marks is 4. To make the data more easily understandable, we write it in a table, as given below:
The above Table is called an ungrouped frequency distribution table, or simply a frequency distribution table. Note that you can use also tally marks in preparing these tables,as in the next example.
Example: 100 plants each were planted in 100 schools during Van Mahotsava. After one month, the number of plants that survived were recorded as :
95 67 28 32 65 65 69 33 98 96
76 42 32 38 42 40 40 69 95 92
75 83 76 83 85 62 37 65 63 42
89 65 73 81 49 52 64 76 83 92
93 68 52 79 81 83 59 82 75 82
86 90 44 62 31 36 38 42 39 83
87 56 58 23 35 76 83 85 30 68
69 83 86 43 45 39 83 75 66 83
92 75 89 66 91 27 88 89 93 42
53 69 90 55 66 49 52 83 34 36
To present such a large amount of data so that a reader can make sense of it easily, we condense it into groups like 20-29, 30-39, ., 90-99 (since our data is from 23 to 98). These groupings are called ‘classes’ or ‘class-intervals’, and their size is called the class-size or class width, which is 10 in this case. In each of these classes, the least number is called the lower class limit and the greatest number is called the upper class limit, e.g., in 20-29, 20 is the ‘lower class limit’ and 29 is the ‘upper class limit’.
Also, recall that using tally marks, the data above can be condensed in tabular form as follows:
Presenting data in this form simplifies and condenses data and enables us to observe certain important features at a glance. This is called a grouped frequency distribution table. Here we can easily observe that 50% or more plants survived in 8 + 18 + 10 + 23 + 12 = 71 schools.
We observe that the classes in the table above are non-overlapping. Note that we could have made more classes of shorter size, or fewer classes of larger size also. For instance, the intervals could have been 22-26, 27-31, and so on. So, there is no hard and fast rule about this except that the classes should not overlap.
Cite this Simulator: