ICIQ Validation Methodology
Various steps are required to assure the validity and reliability of the ICIQ. All the following steps (apart from translation) will be undertaken with the English-language version of the questionnaire. In countries where other language versions are being developed, the primary tests must be undertaken and it would be advisable to carry out most or all of the optional tests.
This must be undertaken in the following ways:
Initial translation of the questionnaire – preferably undertaken by a bilingual native speaker of the language in question.
Back translation into English – preferably by a bilingual native English speaker, who was not involved in the translation stage.
Review of back translations by the ICIQ group and adjustment as necessary.
Pre-testing for equivalence using bilinguals or monolinguals
External companies/collaborators wishing to produce translations/adaptations of the ICIQ or its modules must gain prior permission from the ICIQ Development Group (contact Nikki Cotterill at firstname.lastname@example.org). A recognised service that employs a standard translation/adaptation policy, such as that provided by the MEDTAP International and the Centre of Outcomes, Research, and Education (CORE) or the MAPI Research Institute, can be used. The final version must be approved by the ICIQ Development Group. Only one translation per language is allowed, although more than one translation within a language/culture may be considered where regional/local differences in language or meaning occur.
Primary tests (essential)
Content/face validity is the assessment of whether the questionnaire makes sense to those being measured and to experts in the clinical area. These aspects are best assessed through interviews with patients and observations of patients completing draft versions of questionnaires. At some stage, the researcher will also need to obtain the opinions of clinicians and other involved parties to check that clinically meaningful aspects are included in the questionnaire. Once the questionnaire has been developed and administered, levels of missing data can be used as an indicator of inappropriate or badly worded questions.
Internal consistency (reliability)
Internal consistency refers to the extent to which items within the questionnaire are related to each other. Internal consistency can be assessed by statistical techniques such as item-total correlations or Cronbach’s alpha coefficient. Cronbach’s alpha coefficient should be calculated for the total score eliminating one item at a time, and any items that significantly increase or decrease the alpha should be re-evaluated. A Cronbach’s alpha in excess of 0.70 is usually considered to show adequate internal consistency.
Stability (test-retest reliability)
Particularly important for questionnaires used to examine outcome is the concept of stability – whether the questionnaire measures the same sorts of things in the same person over a period of time. Clearly, a questionnaire that cannot demonstrate that responses are stable over a short period of time in a pre-treatment sample will not be able to measure change following treatment accurately. Stability is commonly assessed by a test-retest analysis, where the questionnaire is given to the same set of respondents twice, usually with an interval of two to six weeks. The interval should be chosen so that symptoms are unlikely to have changed and respondents will not be able to remember their first responses. For incontinence, two weeks is probably sufficient. A graphical presentation and analyses of paired differences in individual items can be helpful in interpreting stability.
Construct validity relates to the relationships between the questionnaire and underlying theories. This is very much an ongoing procedure that requires a number of studies of the performance of a questionnaire in a range of settings and patient groups. Each one of these studies will examine some aspect of the validity of particular constructs or ‘mini-theories.’ It may be, for example, that responses to the questionnaire are compared with clinical tests confirming a diagnosis, or in different age groups if an age relationship is postulated. A common method of obtaining some indication of the construct validity of a questionnaire is to examine its ability to differentiate between different patient groups – for example clinic attendees compared with individuals in the community, or clinic attendees with a particular diagnosis compared with those with another. Construct validity also includes the concepts of ‘convergent’ and ‘discriminant’ validity. Convergent validity involves seeing how closely a new questionnaire is related to other measures of the same construct. Discriminant validity relates to the absence of relationships between constructs that are postulated to be independent.
Criterion validity describes how well the questionnaire correlates with a ‘gold standard’ measure that already exists. Such ‘gold standards’ may be clinical or other validated measures. For incontinence, there is no clear gold standard against which to measure the criterion validity of questionnaires. While it is acknowledged that urodynamic studies or pad tests represent the most accurate representation of leakage and thus of a clinical diagnosis of incontinence, these factors are not the only ones that one would want to be reflected by a questionnaire. It should be expected that there should be some relationship between a questionnaire aiming to measure incontinence and a clinical finding of the condition, but questionnaires are primarily designed to measure the patient’s perspective of their condition, and so the diagnosis of incontinence may be less important than the way in which urinary leakage is perceived by patients and the impact it has on their quality of life. Existing published questionnaires should be compared with the ICI-Q, but as there is no ‘gold standard’, these results should be presented under construct validity, above.
There has been considerable controversy concerning the most appropriate methods of measuring change in questionnaires. There are three main aspects to the measurement of change: differentiating between those who change a lot and those who change little, the identification of factors which are associated with a good outcome, and inferring treatment effects from group differences, commonly in clinical trials. Where a questionnaire results in a simple score, treatment effects can be assessed by examining pre- and post-treatment differences between the intervention and control group by means of unpaired t-tests or repeated measures analysis of variance. As additional evidence, patients’ perceptions of change can also be measured and relationships between reported change and difference in quality of life scores can be examined. Effect sizes are also commonly used. Outside randomised controlled trials, where there may be baseline differences between treatment groups, analyses of covariance may be more appropriate for assessing responsiveness.
The ICIQ is copyright protected and should not be altered in any way. If any researchers wish to be involved in the development and psychometric testing of the ICIQ in alternative language versions, they should contact Nikki Cotterill, Paul Abrams or Jenny Donovan.
We would ask that no data from studies to validate the ICIQ be published without our prior consent – we will strongly support the publication of studies that use the questionnaire appropriately.