ICIQ Validation Methodology

The ICIQ modules are copyright protected and should not be altered in any way. If any researchers wish to be involved in the further development and psychometric testing of the ICIQ modules, including the development of alternative language versions, please complete a request form on the ICIQ website www.iciq.net or contact iciq@nbt.nhs.uk. The ICIQ group encourages collaborations with other groups, and studies that contribute to the further validation of the ICIQ modules. We would ask that no data from studies to validate the ICIQ be published without our prior consent – we will strongly support the publication of studies that use the questionnaire appropriately.

The ICIQ modules have historically been developed using traditional classical psychometric methods. The following tests of validity and reliability are undertaken during initial development of the English-language versions.

Primary tests (essential)

Content validity
Content validity is the degree to which the tool measures what is intended and its relevance to the target population. These aspects are assessed through interviews and observations of patients during the item generation phases of questionnaire development. The input of clinicians and other stakeholders are also important to ensure that the instrument is comprehensive and clinically meaningful. Once the questionnaire has been developed and administered, levels of missing data and other descriptive statistics may be used as an indicator of inappropriately worded items.

Construct validity
Construct validity is the aspect of validity that provides confirmation that the instrument is measuring the underlying concept that it intends to measure, by comparison with known theory. Hypotheses of how the instrument should ‘behave’ when compared with expected relationships according to known theory are explored, and construct validity is supported when the instrument is shown to measure constructs that are consistent with these. Convergent validity is the extent to which the instrument correlates with other ways of measuring the same construct. Conversely, discriminant validity is evidenced by the absence of relationships between constructs that are hypothesised to be independent.

Criterion validity
Criterion validity refers to the correlation of the developmental instrument with a ‘criterion’ scale. This is a measure that may be considered a ‘gold standard’ by which the instrument to be evaluated is compared. However, when a new module is under development, there is often no existing gold standard, so other existing measures which measure similar constructs may be used for comparison.

Internal consistency (reliability)
Internal consistency refers to the extent to which items within the questionnaire are related to each other. This can be assessed by statistical techniques such as item-total correlations or Cronbach’s alpha coefficient. Any items that significantly increase or decrease the Cronbach’s alpha may be considered for removal. A Cronbach’s alpha of between 0.70 and 0.9 is usually considered to show adequate internal consistency without conceptual redundancy within the item pool.

Test-retest reliability (stability)
The test-retest reliability gives an indication of the reproducibility or stability of the instrument. It is evaluated by individual response consistency between repeat administrations of the instrument over a time-frame in which the responses are not expected to change.

Additional tests

The following tests are not routinely carried out as part of the initial development of the English language versions of the ICIQ modules, but for many of the modules these tests have also been completed. Further development by collaborative researchers is encouraged where required.

Responsiveness and sensitivity to change
If an instrument is intended to be an effective outcome measure for clinical assessment or research purposes, then it must be shown to be sensitive or responsive to the change in condition of a patient. In order to be sure that any change detected is not due to chance, when testing responsiveness to change, the instrument’s scores should be correlated with those of another validated measure or treatment administered at the same time. The sensitivity to change can be evaluated using the change in total score before and after an intervention of known efficacy.

Score interpretation
A minimally important difference (MID) represents the smallest change in scores between administrations detected by the questionnaire that the patient perceives as important, and might potentially lead to the patient or clinician to consider a change in management. For example, a change of score of 5-12 points on the ICIQ-LUTSqol is deemed to be clinically significant. For some questionnaires, studies have been carried out to establish bands of severity, for example, the ICIQ-UI SF (range 0-21) is given four scoring categories: slight (1-5), moderate (6-12), severe (13-18) and very severe (19-21). During questionnaire development, the ICIQ does not routinely assess MIDs or scoring categories for its questionnaires, but encourages further research to be carried out to empirically derive MIDs for its questionnaires.

Item response theory or Rasch methods
The use of item response theory (IRT) or Rasch methods can be complementary to the traditional classical psychometric methods and can provide additional information to make decisions on item removal, or to assess how well response options are working. These methods allow items to be calibrated onto a common scale, or underlying construct. The use of Rasch analysis alongside classical psychometric methods was used for the initial development of the ICIQ-Cog and it is to be encouraged as another tool in the protocol for the development of existing and future ICIQ questionnaires.


External companies/collaborators wishing to produce translations/adaptations of the ICIQ or its modules must gain prior permission from the ICIQ group. A recognised service that employs a standard translation/adaptation policy, such as that provided by the MEDTAP International and the Centre of Outcomes, Research, and Education (CORE) or the MAPI Research Institute, can be used. The ICIQ Development Group retains copyright and distribution rights of any translations produced. Only one translation per language is allowed, although more than one translation within a language/culture may be considered where regional/local differences in language or meaning occur.

To ensure a high level of linguistic validity, at the minimum, we recommend the following steps should be undertaken:

  1. Initial translation(s) and harmonisation of the questionnaire – preferably undertaken by bilingual native speaker(s) of the language in question.
  2. Back translation into English – preferably by bilingual native English speaker(s), who were not involved in the translation stage.
  3. Review and harmonisation of back translation(s) including review by the ICIQ group and adjustment as necessary.
  4. Cognitive interviews with the target population by bilingual interviewer(s), including review of any unresolved conceptual equivalence issues by the ICIQ group and adjustment as necessary.

Full psychometric validation of the translations may also be warranted, especially if translations are to be used in multi-national studies where pooling of data is required.