Geography: Data Collection, Processing and Analysis: Processing of Data, Processing of Primary Data
Get unlimited access to the best preparation resource for IAS : Get detailed illustrated notes covering entire syllabus: point-by-point for high retention.
Download PDF of This Page (Size: 199K) ↧
Processing of Data
The processing of data/information is an essential dimension of streamlining the facts and writing of a field report. A separate account of processing is given here.
Processing of Primary Data
The primary data collected from the field remains in the raw form of statements, digits, and qualitative terms. The raw data contains error, omissions, and inconsistencies. It requires corrections after careful scrutinizing the completed questionnaires. The following steps are involved in the processing of primary data.
Editing of Data: The editing of data can be done at two stages: field and post-field editing. The field editing is a review of reporting by the investigator for completing what has been written in an abbreviated form during interviewing the respondent. The post-field editing is carried out when field survey is completed, and all forms of schedule have been collected together. This type of editing requires review of all forms thoroughly.
The Coding of Data: To keep the response within limited alternatives, we need to assign some alphabetical or numerical symbols or both to the answers. The alternatives must be mutually exclusive i.e. defined in one concept or term only. This form of processing is known as coding. For example, in a question of educational qualifications alternative choices given are: Uneducated; Below Matriculation; Matriculation and above but below Graduate; Graduate and above; Technical Diploma; Technical Degree.
The alphabetical codes assigned to these alternatives could be A, B, C, D, E, and F. Similarly, numerical codes to these alternatives could be 1, 2, 3, 4, 5, and 6 respectively. It is necessary for efficient analysis. Though coding exercise is a part of the formulation of questionnaire yet responses to questions need to be coded and made final at the processing stage. This simplifies the transfer of data from questionnaires to the master chart. It is a two-dimensional chart in which observations are entered on one axis and details of the responses on the other axis (Y). The calculations become easier and quicker if the details are coded and entered in the master chart or fed in the computers.
Organization of Data: The data information collected through different sources should be organized. The first task in this regard is to develop a master chart. For example, in a local area survey, we record individual households in rows and the details of population, function, facilities, and amenities etc. in columns. Thus, a large chart is prepared that contains, practically all relevant information/data. Finally, the total of rows and columns are cross-checked. The information arranged in an ascending order is known as the array of data. The set of information related to specific entity is called the field.
The following illustration demonstrates the way data is organized.
Households | Details | |||||||||
Population | Functions | Facilities | ||||||||
P | M | F | Agri | Ind | Trade | Service | T.V. | Phone | Vehicle | |
01 | 20 | 12 | 08 | 5 | - | 1 | 12 | 1 | 1 | 1 Scooter |
02 | 17 | 09 | 08 | 6 | - | 1 | 1 | 1 | 1 | 1 Scooter |
03 | 9 | 04 | 05 | - | - | 2 | 1 | 1 | 2 | 1 Car and 1 Scooter |
04 | 12 | 06 | 06 | 1 | 2 | 1 | 1 | 1 Scooter | ||
05 | 13 | 07 | 06 | 2 | - | - | 2 | 1 | - | 1 Scooter |
Classification of Data: A huge volume of raw data collected through field survey needs to be grouped for similar details of individual responses. The process of organizing data into groups and classes on the basis of certain characteristics is known as the classification of data. Classification helps in making comparisons among the categories of observations. It can be either according to numerical characteristics or according to attributes. The numerical characteristics are classified on the basis of class intervals. For example, monthly income up to Rs.2000 may form its group and the respondents reporting income in the range may form its frequency.
Similarly, further group can also be made like income group Rs.2000 to Rs.3000 and so on. The number of items entered against each class is known as the frequency of the class. Every class has a lower and an upper limit. The difference between the upper and lower limits is known as the range of the class. The class intervals are mostly kept equal. Sometimes when the range of the data is too large class intervals are not kept equal, instead they are based on the perceptible gaps in the array of the data. For example, settlements having less than 2000 population can be grouped as below 200 population 200-500 population, 500-1000 population and so on. In this group as class intervals are unequal.
The data is also classified on the following bases.
Descriptive characteristics- examples include land holding, sex, caste and so on.
Time, situation, and area specific characteristics.
Nature of data as continuous or discrete.
Presentation of Data
The presentation of data could be tabular, statistical, and cartographic. In case of tabular form of presentation, data related to different variables should be classified and compared. Various statistical techniques are available to derive accurate and precise results. Since techniques have a large range coupled with the limitations of their own, selection of appropriate technique needs to be made for the purpose.
The construction of graphs, charts, diagrams, and maps are the various forms of cartographic presentations. The data is transformed into cartographic system which is used for visual presentation. A brief account of tabular, statistical as well as cartographic presentation of data is discussed below.
Tabular Presentation: It is used for summarization of data in its micro form. It helps in the analysis of trends, relationship, and other characteristics of a given data. Simple tabulation is used to answer question related to one characteristic of the data whereas complex tabulation is used to present several interrelated characteristics. Complex tabulation results in two-way, three-way tables which give information about two or three inter-related characteristics of data.
The following points may be kept in mind while constructing a table.
To make a table easily understandable without a text, a clear and concise title be given just above the frame of the table.
Each table should be numbered to facilitate easy reference.
Both columns and rows of the table should have a short and clear caption. They may also be numbered to facilitate the reference.
The units of measurement (production units)- kgs, quintals, tones, or areal units-hectare, kilometre) be indicated. If table relates to some specific time, it must be mentioned. The tables should be logical, clear, and as simple as possible.
The source of data must be indicated just below the body of the table.
The abbreviated words and explanatory foot notes if any should be placed beneath the table. However, it should be used to the minimum possible extent.
The sequence of data categories in a table may follow alphabetical, chronological, geographical order according to magnitude of the item presented.
Statistical Presentation of Data: The data collected through various sources needs to be processed statistically for precise explanations. Very often it becomes necessary to obtain a single representative value for the whole data set. The statistical measures that enable us to work out a single representative figure for the entire data distribution, is known as central tendency. Measures of central tendency help us to compare different distributions besides being representative for each distribution. These measures normally denote the central points of values, distance, and occurrence in a distribution. The commonly used measures of central tendency are:
Arithmetic Mean: It is most frequently used and is calculated by adding the sum of all individual values in a distribution and dividing the sum by the total number of individuals. For example, the production of rice per acre in five districts is 10, 8, 12, 9, and 6 quintals. The average production of rice for these districts is:
The arithmetic mean is expressed in the form of equation noted below:
Where, The mean value,
,
The arithmetic mean can be easily worked for small ungrouped data. However, when the number of observations is large and data is in the form of frequency distribution of groups, arithmetic mean will be worked out with the help of following equation.
Where,
,
Example: Calculate the arithmetic mean from the temperature data given in the following table.
Classes (Temperatures in ) | No. of Days | Mid Values | |