Data processing

Transcription of interview or focus groups

The feedback that you receive from interviews or focus groups can be extensive, meaning that the process of transcription is time-consuming and error prone. Many researchers choose to use a digital recorder so that they can concentrate on the interview without having to scribble down – and possibly miss – responses. CIPFA Research uses a fast and reliable transcription agency to ensure that the data can be analysed promptly.

Scanning and data extraction

The capture of information / data from a paper based questionnaire is most accurately and efficiently undertaken through a combination of scanning and data extraction. This is especially the case where the questionnaire is standardised, ie all respondents complete the same document. Once the questionnaire has been designed a specification document is created that describes in technical terms the relative location of each question, in terms of its sequence, and the rules pertaining to each question. For example is it a single choice only, as in the case of male / female, or is it multiple choice, eg which of the following activities have you undertaken in the past 6 months. The technical document (data definition file or ddf) also describes the manner in which the data is to be extracted, ie is it numerical or alphanumerical? From the ddf a template is created that effectively fits on top of the scanned image. Once a document has been scanned this template is then used to extract the data automatically. For the vast majority of information contained on a typical questionnaire this is undertaken very quickly, especially where what is being read is a tick or a cross.


The next stage, known as indexing, is where a copy of the scanned image along with the extracted data is presented to a human operator for validation, ie a check is made to ensure that what has been extracted matches what can be viewed. This process may lead to a record being approved, edited or rejected. It is likely that some information needs to be verified twice, for example a respondent's actual age or their postcode. In these cases it effectively indexed twice.