Translation, cross-cultural adaptation and validation of the
Translation, cross-cultural adaptation and validation of the
Work Role Functioning Questionnaire (WRFQ) to Spanish
Work Role Functioning Questionnaire (WRFQ) to Spanish
spoken in Spain
spoken in Spain
Traducción, adaptación cultural y validación del Work Role
Traducción, adaptación cultural y validación del Work Role
Functioning Questionnaire (WRFQ) al castellano hablado en España.
Functioning Questionnaire (WRFQ) al castellano hablado en España.
José María Ramada Rodilla
José María Ramada Rodilla
TESI DOCTORAL UPF / 2014
TESI DOCTORAL UPF / 2014
DIRECTORS DE LA TESI
DIRECTORS DE LA TESI
Dra. Consol Serra Pujadas CiSAL – Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. PRBB Building. Doctor Aiguader, 88. 08003- Barcelona, España.
Dra. Consol Serra Pujadas CiSAL – Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. PRBB Building. Doctor Aiguader, 88. 08003- Barcelona, España.
Dr. George L Delclós Clanchet Southwest Center for Occupational and Environmental Health, the University of Texas School of Public Health. 1200 Pressler Street. Houston, Texas 77030, USA.
Dr. George L Delclós Clanchet Southwest Center for Occupational and Environmental Health, the University of Texas School of Public Health. 1200 Pressler Street. Houston, Texas 77030, USA.
DEPARTAMENT DE CIÈNCIES EXPERIMENTALS I DE LA SALUT
DEPARTAMENT DE CIÈNCIES EXPERIMENTALS I DE LA SALUT
i
i
184
184
A mis tres hijos, que son la pasión de mi vida, por su capacidad para comprender, aceptar y amar. A la memoria de mi padre a quien tanto le debo.
A mis tres hijos, que son la pasión de mi vida, por su capacidad para comprender, aceptar y amar. A la memoria de mi padre a quien tanto le debo.
iii
iii
184
184
AKNOWLEDGEMENTS (Agradecimientos – Agraïments)
AKNOWLEDGEMENTS (Agradecimientos – Agraïments)
Esta tesis llega cuando en mis gafas ya necesito una cierta adición de dioptrías
Esta tesis llega cuando en mis gafas ya necesito una cierta adición de dioptrías
para poder leer de cerca. Así que quiero comenzar por dar las gracias a todos los
para poder leer de cerca. Así que quiero comenzar por dar las gracias a todos los
que toman decisiones sobre el futuro académico de las personas sabiendo que el
que toman decisiones sobre el futuro académico de las personas sabiendo que el
empuje de la juventud no está en la edad sino en el espíritu.
empuje de la juventud no está en la edad sino en el espíritu.
Me viene a la cabeza la leyenda de San Agustín de Hipona y el niño en la playa y
Me viene a la cabeza la leyenda de San Agustín de Hipona y el niño en la playa y
no sé cómo voy a meter, con mi capacidad limitada, todo el agua del mar en un
no sé cómo voy a meter, con mi capacidad limitada, todo el agua del mar en un
hoyo tan pequeño. También ahora, en esta parte de mi tesis, se me ha pasado
hoyo tan pequeño. También ahora, en esta parte de mi tesis, se me ha pasado
por la cabeza buscar el respaldo de mis Directores para controlar los sesgos, pero
por la cabeza buscar el respaldo de mis Directores para controlar los sesgos, pero
no he querido correr el riesgo de que me recomienden la realización de otra
no he querido correr el riesgo de que me recomienden la realización de otra
revisión sistemática. Así que con mis solas palabras asumo el reto en solitario.
revisión sistemática. Así que con mis solas palabras asumo el reto en solitario.
Moltíssimes gràcies Consol. Gràcies mestra. La llista dels motius pel quals em
Moltíssimes gràcies Consol. Gràcies mestra. La llista dels motius pel quals em
sento tan agraït amb tu podria ser interminable. Comptes amb tota la meva
sento tan agraït amb tu podria ser interminable. Comptes amb tota la meva
admiració i respecte per la teva decència professional com a metgessa del treball i
admiració i respecte per la teva decència professional com a metgessa del treball i
investigadora, per la teva particular i innovadora visió de la salut laboral,
investigadora, per la teva particular i innovadora visió de la salut laboral,
professionalitat i capacitat com a directora de tesi. Gràcies pel teu suport
professionalitat i capacitat com a directora de tesi. Gràcies pel teu suport
permanent en aquest projecte i per ser, en l’hospital on treballem, una cap i una
permanent en aquest projecte i per ser, en l’hospital on treballem, una cap i una
companya de treball infatigable, incombustible. Mil gràcies per totes les
companya de treball infatigable, incombustible. Mil gràcies per totes les
oportunitats que m'has anat oferint al llarg d’aquestos darrers anys i que he
oportunitats que m'has anat oferint al llarg d’aquestos darrers anys i que he
intentat aprofitar sempre.
intentat aprofitar sempre.
Gracias Jordi, por guiar mis primeros pasos en el mundo de la investigación en
Gracias Jordi, por guiar mis primeros pasos en el mundo de la investigación en
2010, siendo mi tutor del Máster en Salud Laboral; en ese momento empezó a
2010, siendo mi tutor del Máster en Salud Laboral; en ese momento empezó a
gestarse la posibilidad de seguir más allá, después del Máster. Gracias maestro,
gestarse la posibilidad de seguir más allá, después del Máster. Gracias maestro,
por enseñarme tanto en la Unidad de Patología Laboral, por la magnífica
por enseñarme tanto en la Unidad de Patología Laboral, por la magnífica
experiencia en la Escuela de Salud Pública de la Universidad de Texas, por poner
experiencia en la Escuela de Salud Pública de la Universidad de Texas, por poner
a mi disposición tu enorme valía como clínico y como profesor. Gracias por tu
a mi disposición tu enorme valía como clínico y como profesor. Gracias por tu
humanidad, consideración, accesibilidad, ejemplaridad como investigador y
humanidad, consideración, accesibilidad, ejemplaridad como investigador y
v
v
paciencia como director de tesis. También, gracias a Conchita por su hospitalidad
paciencia como director de tesis. También, gracias a Conchita por su hospitalidad
y amabilidad durante mi estancia en Houston (Texas) en 2013.
y amabilidad durante mi estancia en Houston (Texas) en 2013.
Gracias Fernando, por abrirme las puertas del CiSAL, por tus constantes
Gracias Fernando, por abrirme las puertas del CiSAL, por tus constantes
propuestas para involucrarme en proyectos y por tu apoyo desde el primer minuto
propuestas para involucrarme en proyectos y por tu apoyo desde el primer minuto
para que mi estancia en la Universidad de Groningen (Holanda) en 2013 fuera
para que mi estancia en la Universidad de Groningen (Holanda) en 2013 fuera
posible.
posible.
Thank you Ute, Femke and Iris (University Medical Center Groningen, The
Thank you Ute, Femke and Iris (University Medical Center Groningen, The
Netherlands). Your warm welcome, support and help in this thesis at any time in
Netherlands). Your warm welcome, support and help in this thesis at any time in
Groningen were invaluable. Thank you, Roy, for making statistical analysis
Groningen were invaluable. Thank you, Roy, for making statistical analysis
understandable and also for your kind and disinterested help and availability at all
understandable and also for your kind and disinterested help and availability at all
times.
times.
Gràcies a la Direcció del PSMAR, on exerceixo com a metge del treball. Gràcies
Gràcies a la Direcció del PSMAR, on exerceixo com a metge del treball. Gràcies
per fer realitat que aquesta organització sigui un dels pols més dinàmics de
per fer realitat que aquesta organització sigui un dels pols més dinàmics de
coneixement assistencial, docent i de recerca de la ciutat de Barcelona. Gràcies
coneixement assistencial, docent i de recerca de la ciutat de Barcelona. Gràcies
pel suport prestat per l'obtenció de la menció europea al títol de doctorat. Gràcies
pel suport prestat per l'obtenció de la menció europea al títol de doctorat. Gràcies
als/les companys/es metges/esses i infermers/es del PSMAR, que m'han ajudat
als/les companys/es metges/esses i infermers/es del PSMAR, que m'han ajudat
amb tanta generositat en el treball de camp.
amb tanta generositat en el treball de camp.
Gràcies als companys i companyes del Servei de Salut Laboral del PSMAR, pel
Gràcies als companys i companyes del Servei de Salut Laboral del PSMAR, pel
suport donat a aquest projecte sempre que l’he necessitat: a Aida, a Carmen, a
suport donat a aquest projecte sempre que l’he necessitat: a Aida, a Carmen, a
Chelo, a Julià i a Nuria. Mil gràcies a Fina Pi-Sunyer i a Joan Mirabent, per ser tan
Chelo, a Julià i a Nuria. Mil gràcies a Fina Pi-Sunyer i a Joan Mirabent, per ser tan
excel·lents infermers del treball i per tanta generositat en vostra col·laboració
excel·lents infermers del treball i per tanta generositat en vostra col·laboració
durant el treball de camp. Moltíssimes gràcies Dra. Villar (Rocio), sense el teu
durant el treball de camp. Moltíssimes gràcies Dra. Villar (Rocio), sense el teu
suport i companyonia de veritable col·lega tal vegada la estada a Groningen no
suport i companyonia de veritable col·lega tal vegada la estada a Groningen no
hagués estat possible. De tot cor, moltes gràcies companys i companyes.
hagués estat possible. De tot cor, moltes gràcies companys i companyes.
Gracias a los amigos y amigas del CiSAL-UPF por contar siempre conmigo, a
Gracias a los amigos y amigas del CiSAL-UPF por contar siempre conmigo, a
pesar de no estar físicamente presente en el PRBB. Gracias a Montse Fernández
pesar de no estar físicamente presente en el PRBB. Gracias a Montse Fernández
y a Sandra Garrido por su ayuda siempre diligente en cualquier gestión con la
y a Sandra Garrido por su ayuda siempre diligente en cualquier gestión con la
vi
vi
vi
vi
Universidad y con el CIBERSP. Gracias a María López por el ánimo ofrecido en
Universidad y con el CIBERSP. Gracias a María López por el ánimo ofrecido en
todo momento y a Sergio Vargas por todos los favores realizados durante estos
todo momento y a Sergio Vargas por todos los favores realizados durante estos
años.
años.
Y para terminar gracias a mis hermanas y a mi madre, siempre disponibles. Yo
Y para terminar gracias a mis hermanas y a mi madre, siempre disponibles. Yo
siento por vosotras tres verdadera devoción; a Ram Dulthummon por su apoyo y
siento por vosotras tres verdadera devoción; a Ram Dulthummon por su apoyo y
por sus palabras (always in English) para poner algún límite a mi fantasía a veces
por sus palabras (always in English) para poner algún límite a mi fantasía a veces
desbordada y ayudarme a mantener los pies sobre la tierra; a mis tres hijos José,
desbordada y ayudarme a mantener los pies sobre la tierra; a mis tres hijos José,
Borja y María Ángeles, por su generosa ayuda con la base de datos, por soportar
Borja y María Ángeles, por su generosa ayuda con la base de datos, por soportar
mil veces los ensayos de mis exposiciones y por sus muestras de aliento
mil veces los ensayos de mis exposiciones y por sus muestras de aliento
permanente.
permanente.
Y como no, gracias a la madre de mis tres hijos, Ángeles Calaforra, por su
Y como no, gracias a la madre de mis tres hijos, Ángeles Calaforra, por su
inmensa generosidad, por estar siempre ahí cuando ha sido necesario, por su
inmensa generosidad, por estar siempre ahí cuando ha sido necesario, por su
paciencia inagotable, y por sus consejos siempre sensatos cuando han aparecido
paciencia inagotable, y por sus consejos siempre sensatos cuando han aparecido
dudas sobre el sentido de este proyecto en este momento de mi vida.
dudas sobre el sentido de este proyecto en este momento de mi vida.
A todos y todas, un millón de gracias.
A todos y todas, un millón de gracias.
vii
vii
184
184
SUMMARY
SUMMARY
Background
Background
Health and work mutually influence the working population. Health-related work
Health and work mutually influence the working population. Health-related work
functioning is the worker’s ability to meet work demands for a given health status.
functioning is the worker’s ability to meet work demands for a given health status.
Quality validated measurement tools are needed to assess how workers function
Quality validated measurement tools are needed to assess how workers function
at work along their professional life course and to evaluate interventions to
at work along their professional life course and to evaluate interventions to
accommodate job conditions to the worker’s skills and health status.
accommodate job conditions to the worker’s skills and health status.
The use of directly translated measurement tools may lead to unreliable or
The use of directly translated measurement tools may lead to unreliable or
misleading results in research and practice, and could limit the exchange of
misleading results in research and practice, and could limit the exchange of
information in the scientific community. Due to possible cultural differences in
information in the scientific community. Due to possible cultural differences in
perception of work, health and disease, instruments developed in other languages
perception of work, health and disease, instruments developed in other languages
or cultures should be systematically translated, adapted and validated for use in
or cultures should be systematically translated, adapted and validated for use in
different target languages or cultures.
different target languages or cultures.
The Work Role Functioning Questionnaire (WRFQ) is an instrument designed to
The Work Role Functioning Questionnaire (WRFQ) is an instrument designed to
measure self-perceived difficulties to perform work, in active workers, given a
measure self-perceived difficulties to perform work, in active workers, given a
certain health condition. Its results can be interpreted in terms of work functioning,
certain health condition. Its results can be interpreted in terms of work functioning,
work performance, work productivity, work disability and presenteeism, and they
work performance, work productivity, work disability and presenteeism, and they
can be transformed into meaningful social and economic outcomes.
can be transformed into meaningful social and economic outcomes.
Objective
Objective
The aim of this thesis was to provide a high quality validated instrument in
The aim of this thesis was to provide a high quality validated instrument in
Spanish, able to assess the impact of health on “work functioning” and describe
Spanish, able to assess the impact of health on “work functioning” and describe
the extent to which workers improve or deteriorate their ability to meet the
the extent to which workers improve or deteriorate their ability to meet the
demands of the job in Spanish-speaking populations.
demands of the job in Spanish-speaking populations.
This overall objective was carried out through three specific objectives: 1) to
This overall objective was carried out through three specific objectives: 1) to
review the literature on the methodology for cross-cultural adaptation and
review the literature on the methodology for cross-cultural adaptation and
ix
ix
validation (CCAV) of health questionnaires; 2) to estimate the degree of
validation (CCAV) of health questionnaires; 2) to estimate the degree of
compliance with literature recommendations for CCAV in Spanish and Latin
compliance with literature recommendations for CCAV in Spanish and Latin
American scientific journals; and 3) to translate and cross-culturally adapt the
American scientific journals; and 3) to translate and cross-culturally adapt the
WRFQ and validate it in a sample of a general working Spanish-speaking
WRFQ and validate it in a sample of a general working Spanish-speaking
population.
population.
Methods
Methods
An evidence-based decision was taken to select a generic measurement
An evidence-based decision was taken to select a generic measurement
instrument that evaluates health-related work functioning. A comprehensive
instrument that evaluates health-related work functioning. A comprehensive
literature review was performed to identify and synthesize recommendations on
literature review was performed to identify and synthesize recommendations on
the methodology of CCAV of health questionnaires. Five high impact journals in
the methodology of CCAV of health questionnaires. Five high impact journals in
epidemiology and/or public health from Spain and Latin America were analyzed to
epidemiology and/or public health from Spain and Latin America were analyzed to
estimate the degree of compliance with the methodological recommendations.
estimate the degree of compliance with the methodological recommendations.
A systematic 5-step procedure (direct translation, synthesis, back-translation,
A systematic 5-step procedure (direct translation, synthesis, back-translation,
consolidation by an expert committee and pre-test) described in the literature was
consolidation by an expert committee and pre-test) described in the literature was
followed to translate, cross-cultural adapt and validate the WRFQ. The
followed to translate, cross-cultural adapt and validate the WRFQ. The
applicability, readability and integrity of the Spanish version of the Work Role
applicability, readability and integrity of the Spanish version of the Work Role
Functioning Questionnaire (WRFQ-SpV), together with its preliminary internal
Functioning Questionnaire (WRFQ-SpV), together with its preliminary internal
consistency, test-retest reliability and validity were assessed in a pre-test with 40
consistency, test-retest reliability and validity were assessed in a pre-test with 40
participants.
participants.
Next, a cross-sectional study was conducted among 455 active workers of a
Next, a cross-sectional study was conducted among 455 active workers of a
general working population to evaluate the reliability and validity of the WRFQ-
general working population to evaluate the reliability and validity of the WRFQ-
SpV. A longitudinal survey was carried out to examine the responsiveness in a
SpV. A longitudinal survey was carried out to examine the responsiveness in a
sample of 102 workers of this general working population. The consensus-based
sample of 102 workers of this general working population. The consensus-based
standards on measurement properties of health status measurement instruments
standards on measurement properties of health status measurement instruments
(COSMIN) guided the design of the different studies.
(COSMIN) guided the design of the different studies.
x
x
x
x
Results
Results
To identify and synthesize the literature recommendations on the methodology of
To identify and synthesize the literature recommendations on the methodology of
CCAV of health questionnaires, 21 articles (out of 214 citations) and seven
CCAV of health questionnaires, 21 articles (out of 214 citations) and seven
relevant books were selected for full text analysis. A high degree of consensus
relevant books were selected for full text analysis. A high degree of consensus
was found on the steps to follow to guarantee conceptual, semantic, idiomatic and
was found on the steps to follow to guarantee conceptual, semantic, idiomatic and
experiential equivalence. Two steps were widely recommended to carry out the
experiential equivalence. Two steps were widely recommended to carry out the
CCAV process: first, the cross-cultural adaptation process (following a systematic
CCAV process: first, the cross-cultural adaptation process (following a systematic
and rigorous procedure); and secondly, validation in the target language
and rigorous procedure); and secondly, validation in the target language
(evaluating reliability, validity and responsiveness). Only 6% of the retrieved
(evaluating reliability, validity and responsiveness). Only 6% of the retrieved
articles followed all recommended steps.
articles followed all recommended steps.
The CCAV of the WRFQ was carried out without major difficulty. Idiomatic
The CCAV of the WRFQ was carried out without major difficulty. Idiomatic
challenges were found and an expert committee provided a solution. The
challenges were found and an expert committee provided a solution. The
questionnaire showed adequate applicability and good face and content validity.
questionnaire showed adequate applicability and good face and content validity.
Internal consistency was satisfactory (Cronbach alpha =0.98). The original five
Internal consistency was satisfactory (Cronbach alpha =0.98). The original five
factor structure of the WRFQ reflected fair dimensionality of the construct (Chi-
factor structure of the WRFQ reflected fair dimensionality of the construct (Chi-
square, 1445.8; 314 degrees of freedom; root mean square error of approximation
square, 1445.8; 314 degrees of freedom; root mean square error of approximation
[RMSEA] =0.08; comparative fit index [CFI] >0.95 and weighed root mean residual
[RMSEA] =0.08; comparative fit index [CFI] >0.95 and weighed root mean residual
[WRMR] >0.90). The test–retest reliability showed good reproducibility of the
[WRMR] >0.90). The test–retest reliability showed good reproducibility of the
questionnaire outcomes (0.77 ≤ intraclass correlation coefficient [ICC] ≤ 0.93 and
questionnaire outcomes (0.77 ≤ intraclass correlation coefficient [ICC] ≤ 0.93 and
standard error of measurement [SEM] =7.10). For construct validity assessment,
standard error of measurement [SEM] =7.10). For construct validity assessment,
all formulated hypotheses were confirmed differentiating groups with different jobs,
all formulated hypotheses were confirmed differentiating groups with different jobs,
health conditions and ages. Moreover, we verified that the WRFQ-SpV was able to
health conditions and ages. Moreover, we verified that the WRFQ-SpV was able to
detect (true) changes over time.
detect (true) changes over time.
Conclusions:
Conclusions:
The CCAV process should follow several well established steps. However, the
The CCAV process should follow several well established steps. However, the
degree of compliance of the scientific literature with the methodological
degree of compliance of the scientific literature with the methodological
recommendations for CCAV can be improved. The WRFQ-SpV is a reliable and
recommendations for CCAV can be improved. The WRFQ-SpV is a reliable and
valid instrument to measure health-related work functioning in day-to-day practice
valid instrument to measure health-related work functioning in day-to-day practice
xi
xi
and research in occupational health. Suggestive evidence about the possible use
and research in occupational health. Suggestive evidence about the possible use
of the WRFQ-SpV in evaluative studies was found. More research is needed to
of the WRFQ-SpV in evaluative studies was found. More research is needed to
examine the instrument responsiveness for groups who do not experience health
examine the instrument responsiveness for groups who do not experience health
improvement or deteriorate.
improvement or deteriorate.
Key words:
Key words:
Work functioning instrument; questionnaires; scales; health survey; measurement
Work functioning instrument; questionnaires; scales; health survey; measurement
instrument; cross-cultural comparison; validation studies; psychometric properties;
instrument; cross-cultural comparison; validation studies; psychometric properties;
reliability; validity; responsiveness.
reliability; validity; responsiveness.
xii
xii
xii
xii
RESUMEN
RESUMEN
Antecedentes
Antecedentes
Salud y trabajo constituyen un binomio con una permanente influencia mutua. El
Salud y trabajo constituyen un binomio con una permanente influencia mutua. El
desempeño del trabajo en relación con la salud se define como la capacidad de
desempeño del trabajo en relación con la salud se define como la capacidad de
un/a trabajador/a para dar respuesta a las demandas del trabajo dado un
un/a trabajador/a para dar respuesta a las demandas del trabajo dado un
determinado estado de salud. Se necesitan herramientas de medición validadas
determinado estado de salud. Se necesitan herramientas de medición validadas
de calidad para evaluar los niveles de desempeño del trabajo a lo largo de la vida
de calidad para evaluar los niveles de desempeño del trabajo a lo largo de la vida
laboral y para evaluar las intervenciones destinadas a adaptar las condiciones de
laboral y para evaluar las intervenciones destinadas a adaptar las condiciones de
trabajo a las habilidades y el estado de salud de la población trabajadora.
trabajo a las habilidades y el estado de salud de la población trabajadora.
El uso de instrumentos literalmente traducidos puede dar lugar a resultados poco
El uso de instrumentos literalmente traducidos puede dar lugar a resultados poco
fiables o engañosos en la práctica y en la investigación, pudiendo limitar el
fiables o engañosos en la práctica y en la investigación, pudiendo limitar el
intercambio de información en la comunidad científica. Debido a las posibles
intercambio de información en la comunidad científica. Debido a las posibles
diferencias culturales en la percepción del trabajo, la salud y la enfermedad, los
diferencias culturales en la percepción del trabajo, la salud y la enfermedad, los
instrumentos desarrollados en otros idiomas o culturas deberían ser traducidos de
instrumentos desarrollados en otros idiomas o culturas deberían ser traducidos de
manera sistemática, adaptados y validados para su uso en idiomas o culturas
manera sistemática, adaptados y validados para su uso en idiomas o culturas
diferentes.
diferentes.
El Cuestionario de Desempeño del Trabajo (del inglés Work Role Functioning
El Cuestionario de Desempeño del Trabajo (del inglés Work Role Functioning
Questionnaire, WRFQ) es un instrumento para medir las dificultades auto-
Questionnaire, WRFQ) es un instrumento para medir las dificultades auto-
percibidas para desempeñar el trabajo, en trabajadores en activo, dado un
percibidas para desempeñar el trabajo, en trabajadores en activo, dado un
determinado estado de salud. Sus resultados pueden ser interpretados en
determinado estado de salud. Sus resultados pueden ser interpretados en
términos de desempeño, rendimiento o productividad en el trabajo, discapacidad
términos de desempeño, rendimiento o productividad en el trabajo, discapacidad
laboral y presentismo, pudiendo ser transformados en resultados con significación
laboral y presentismo, pudiendo ser transformados en resultados con significación
social y económica.
social y económica.
xiii
xiii
Objetivo
Objetivo
El objetivo de esta tesis fue poner a disposición un instrumento de calidad
El objetivo de esta tesis fue poner a disposición un instrumento de calidad
validado en español, capaz de evaluar el impacto de la salud en el desempeño del
validado en español, capaz de evaluar el impacto de la salud en el desempeño del
trabajo, y describir el grado en que los trabajadores mejoran o empeoran su
trabajo, y describir el grado en que los trabajadores mejoran o empeoran su
capacidad para dar respuesta a las demandas del trabajo.
capacidad para dar respuesta a las demandas del trabajo.
Este objetivo general se llevó a cabo por medio de tres objetivos específicos: 1)
Este objetivo general se llevó a cabo por medio de tres objetivos específicos: 1)
revisar la literatura sobre la metodología para la traducción, adaptación cultural y
revisar la literatura sobre la metodología para la traducción, adaptación cultural y
validación (TACV) de cuestionarios de salud; 2) estimar el grado de cumplimiento
validación (TACV) de cuestionarios de salud; 2) estimar el grado de cumplimiento
de las recomendaciones metodológicas en revistas científicas Españolas y de
de las recomendaciones metodológicas en revistas científicas Españolas y de
América Latina; 3) Traducir y adaptar el WRFQ y validarlo en una muestra de la
América Latina; 3) Traducir y adaptar el WRFQ y validarlo en una muestra de la
población general trabajadora hispano-parlante.
población general trabajadora hispano-parlante.
Métodos
Métodos
Se seleccionó un instrumento genérico para evaluar el desempeño del trabajo en
Se seleccionó un instrumento genérico para evaluar el desempeño del trabajo en
relación con la salud en base a la evidencia. Se llevó a cabo una revisión
relación con la salud en base a la evidencia. Se llevó a cabo una revisión
bibliográfica exhaustiva para identificar y sistematizar las recomendaciones de la
bibliográfica exhaustiva para identificar y sistematizar las recomendaciones de la
literatura sobre la TACV de cuestionarios de salud y adicionalmente se analizaron
literatura sobre la TACV de cuestionarios de salud y adicionalmente se analizaron
cinco revistas de epidemiología y/o salud pública de España y América Latina,
cinco revistas de epidemiología y/o salud pública de España y América Latina,
con los factores de impacto más altos, para estimar el grado de cumplimiento con
con los factores de impacto más altos, para estimar el grado de cumplimiento con
las recomendaciones metodológicas.
las recomendaciones metodológicas.
Se
siguió
un
procedimiento
en
5
pasos
(traducción
directa,
síntesis,
Se
siguió
un
procedimiento
en
5
pasos
(traducción
directa,
síntesis,
retrotraducción, consolidación por un comité de expertos y pre-test) descrito en la
retrotraducción, consolidación por un comité de expertos y pre-test) descrito en la
literatura para traducir, adaptar y validar el WRFQ. Se realizó un pre-test con 40
literatura para traducir, adaptar y validar el WRFQ. Se realizó un pre-test con 40
participantes para evaluar la aplicabilidad, legibilidad e integridad de la versión
participantes para evaluar la aplicabilidad, legibilidad e integridad de la versión
española del WRFQ (WRFQ-SpV), junto con su consistencia interna, fiabilidad
española del WRFQ (WRFQ-SpV), junto con su consistencia interna, fiabilidad
test-retest y validez.
test-retest y validez.
xiv
xiv
xiv
xiv
Posteriormente, se llevó a cabo un estudio transversal con una muestra de 455
Posteriormente, se llevó a cabo un estudio transversal con una muestra de 455
trabajadores en activo para evaluar la fiabilidad y validez del WRFQ-SpV. Se llevó
trabajadores en activo para evaluar la fiabilidad y validez del WRFQ-SpV. Se llevó
a cabo un estudio longitudinal en una muestra de 102 trabajadores en activo de
a cabo un estudio longitudinal en una muestra de 102 trabajadores en activo de
una población general para examinar su sensibilidad al cambio. Se utilizaron los
una población general para examinar su sensibilidad al cambio. Se utilizaron los
estándares de consenso para la evaluación de las propiedades de medición de
estándares de consenso para la evaluación de las propiedades de medición de
los cuestionarios de salud (COSMIN) en el diseño de los diferentes estudios.
los cuestionarios de salud (COSMIN) en el diseño de los diferentes estudios.
Resultados
Resultados
Para identificar y sistematizar las recomendaciones metodológicas existentes en
Para identificar y sistematizar las recomendaciones metodológicas existentes en
la literatura, se seleccionaron 21 artículos (de un total de 214 citas) y siete libros
la literatura, se seleccionaron 21 artículos (de un total de 214 citas) y siete libros
relevantes para su análisis. Se encontró un alto grado de consenso en la
relevantes para su análisis. Se encontró un alto grado de consenso en la
realización de dos pasos en la TACV para garantizar la equivalencia conceptual,
realización de dos pasos en la TACV para garantizar la equivalencia conceptual,
semántica, idiomática y vivencial. El primero, el proceso de adaptación cultural
semántica, idiomática y vivencial. El primero, el proceso de adaptación cultural
(siguiendo un procedimiento sistemático y riguroso), y el segundo, la validación en
(siguiendo un procedimiento sistemático y riguroso), y el segundo, la validación en
el idioma de destino (evaluando la fiabilidad, validez y sensibilidad al cambio).
el idioma de destino (evaluando la fiabilidad, validez y sensibilidad al cambio).
El grado de cumplimiento de las recomendaciones metodológicas para llevar a
El grado de cumplimiento de las recomendaciones metodológicas para llevar a
cabo la TACV puede ser mejorado. El 6% de los artículos recuperados siguieron
cabo la TACV puede ser mejorado. El 6% de los artículos recuperados siguieron
todos los pasos recomendados en la literatura que les eran aplicables.
todos los pasos recomendados en la literatura que les eran aplicables.
La TACV del WRFQ se llevó a cabo sin dificultades relevantes. Se encontraron
La TACV del WRFQ se llevó a cabo sin dificultades relevantes. Se encontraron
desafíos idiomáticos y un comité de expertos proporcionó una solución. El
desafíos idiomáticos y un comité de expertos proporcionó una solución. El
cuestionario mostró una adecuada aplicabilidad, validez aparente o lógica así
cuestionario mostró una adecuada aplicabilidad, validez aparente o lógica así
como de contenido. La consistencia interna fue satisfactoria (alfa de Cronbach
como de contenido. La consistencia interna fue satisfactoria (alfa de Cronbach
=0.98). La estructura original de cinco factores del WRFQ refleja una adecuada
=0.98). La estructura original de cinco factores del WRFQ refleja una adecuada
dimensionalidad del constructo (Chi-cuadrado, 1445,8; 314 grados de libertad;
dimensionalidad del constructo (Chi-cuadrado, 1445,8; 314 grados de libertad;
error cuadrático medio de aproximación [RMSEA] =0,08, índice de ajuste
error cuadrático medio de aproximación [RMSEA] =0,08, índice de ajuste
comparativo [CFI] >0,95 y media ponderada de la raíz residual [WRMR] >0,90). La
comparativo [CFI] >0,95 y media ponderada de la raíz residual [WRMR] >0,90). La
fiabilidad test-retest mostró una buena reproductibilidad de las puntuaciones del
fiabilidad test-retest mostró una buena reproductibilidad de las puntuaciones del
cuestionario (0.77 ≤ coeficiente de correlación intraclase [CCI] ≤ 0.93 y error
cuestionario (0.77 ≤ coeficiente de correlación intraclase [CCI] ≤ 0.93 y error
estándar de la medida [SEM] =7.10). Para la evaluación de la validez de
estándar de la medida [SEM] =7.10). Para la evaluación de la validez de
xv
xv
constructo se confirmaron todas las hipótesis formuladas, diferenciando grupos
constructo se confirmaron todas las hipótesis formuladas, diferenciando grupos
con diferentes trabajos, problemas de salud y grupos de edad. Se verificó que el
con diferentes trabajos, problemas de salud y grupos de edad. Se verificó que el
WRFQ-SpV fue capaz de detectar cambios (verdaderos) a lo largo del tiempo.
WRFQ-SpV fue capaz de detectar cambios (verdaderos) a lo largo del tiempo.
Conclusiones
Conclusiones
El proceso de TACV debería seguir varios pasos bien establecidos. Sin embargo,
El proceso de TACV debería seguir varios pasos bien establecidos. Sin embargo,
el grado de cumplimiento de las recomendaciones metodológicas propuestas en
el grado de cumplimiento de las recomendaciones metodológicas propuestas en
la literatura científica para la TACV puede ser mejorado. El WRFQ-SpV es un
la literatura científica para la TACV puede ser mejorado. El WRFQ-SpV es un
instrumento fiable y válido para medir el desempeño del trabajo en relación con la
instrumento fiable y válido para medir el desempeño del trabajo en relación con la
salud tanto para la práctica diaria como para la investigación en salud laboral. Se
salud tanto para la práctica diaria como para la investigación en salud laboral. Se
ha encontrado evidencia sugerente sobre el posible uso de la WRFQ-SpV con
ha encontrado evidencia sugerente sobre el posible uso de la WRFQ-SpV con
fines evaluativos. Se necesita investigación adicional para examinar la
fines evaluativos. Se necesita investigación adicional para examinar la
sensibilidad al cambio del instrumento en grupos que no experimentan mejoría o
sensibilidad al cambio del instrumento en grupos que no experimentan mejoría o
que sufren deterioro de su salud.
que sufren deterioro de su salud.
Palabras clave:
Palabras clave:
Desempeño en el trabajo; cuestionarios, escalas; encuesta de salud; instrumento
Desempeño en el trabajo; cuestionarios, escalas; encuesta de salud; instrumento
de
de
medición;
adaptación
cultural;
estudios
de
validación;
propiedades
psicométricas; fiabilidad; validez; sensibilidad al cambio.
xvi
medición;
adaptación
cultural;
estudios
de
validación;
propiedades
psicométricas; fiabilidad; validez; sensibilidad al cambio.
xvi
xvi
xvi
PREFACE
PREFACE
The analysis of measurement instruments for use in occupational health research
The analysis of measurement instruments for use in occupational health research
and practice is currently an area of research interest within the Center for
and practice is currently an area of research interest within the Center for
Research in Occupational Health (CiSAL), and it is in this context that this doctoral
Research in Occupational Health (CiSAL), and it is in this context that this doctoral
thesis was undertaken. Its content is part of a CiSAL research project entitled
thesis was undertaken. Its content is part of a CiSAL research project entitled
“Evaluation of health-related work functioning and identification of preventive
“Evaluation of health-related work functioning and identification of preventive
interventions
interventions
with
the
Spanish
version
of
the Work
Role
Functioning
with
the
Spanish
version
of
the Work
Role
Functioning
Questionnaire”. This project is funded by the Instituto de Salud Carlos III, ISCIII
Questionnaire”. This project is funded by the Instituto de Salud Carlos III, ISCIII
(Ministry of
(Ministry of
Economy and Competitiveness, Spanish Government), FIS:
Economy and Competitiveness, Spanish Government), FIS:
PI12/02556 (Principal Investigator, Consol Serra Pujadas; co-investigators, José
PI12/02556 (Principal Investigator, Consol Serra Pujadas; co-investigators, José
María Ramada and George Delclos).
María Ramada and George Delclos).
This project arises from the need for validated instruments to assess the impact of
This project arises from the need for validated instruments to assess the impact of
health on “work functioning” in Spanish-speaking populations. There are a number
health on “work functioning” in Spanish-speaking populations. There are a number
of instruments to evaluate “health-related work functioning” in English, but these
of instruments to evaluate “health-related work functioning” in English, but these
have not always been adapted and/or validated into the Spanish context. Thus,
have not always been adapted and/or validated into the Spanish context. Thus,
identifying and selecting an instrument to properly measure health-related work
identifying and selecting an instrument to properly measure health-related work
functioning and then translating, adapting and validating its measurement
functioning and then translating, adapting and validating its measurement
properties, for future use in research, was consistent with the goals of this project.
properties, for future use in research, was consistent with the goals of this project.
According to the policy of the Doctoral Program Committee in the Department of
According to the policy of the Doctoral Program Committee in the Department of
Experimental and Health Sciences at Pompeu Fabra University, this thesis is
Experimental and Health Sciences at Pompeu Fabra University, this thesis is
presented as a compendium of four scientific publications, derived from the
presented as a compendium of four scientific publications, derived from the
literature review and field work conducted in the Parc de Salut Mar de Barcelona
literature review and field work conducted in the Parc de Salut Mar de Barcelona
health system. The first publication was written in Spanish and the other three in
health system. The first publication was written in Spanish and the other three in
English. All have been published recently in international occupational health peer-
English. All have been published recently in international occupational health peer-
reviewed journals, indexed in PubMed, with the PhD candidate as first author.
reviewed journals, indexed in PubMed, with the PhD candidate as first author.
xvii
xvii
The results have been presented in part at several scientific meetings, specifically:
The results have been presented in part at several scientific meetings, specifically:
the First CiSAL Annual Scientific Meeting (1), the Second Scientific Conference on
the First CiSAL Annual Scientific Meeting (1), the Second Scientific Conference on
Work Disability Prevention and Integration (WDPI) ( 2 ), the XXII Diada of the
Work Disability Prevention and Integration (WDPI) ( 2 ), the XXII Diada of the
Catalan Society of Safety and Occupational Medicine (3), the Third CiSAL Annual
Catalan Society of Safety and Occupational Medicine (3), the Third CiSAL Annual
Scientific Meeting (4) and the First BiblioPRO Scientific Meeting (5).
Scientific Meeting (4) and the First BiblioPRO Scientific Meeting (5).
In addition to the funding from the Instituto de Salud Carlos III (PI12/ 02556), this
In addition to the funding from the Instituto de Salud Carlos III (PI12/ 02556), this
thesis received partial financial support from the The University of Texas School of
thesis received partial financial support from the The University of Texas School of
Public Health at Houston (USA) and from the Network of Biomedical Research
Public Health at Houston (USA) and from the Network of Biomedical Research
Centers in Epidemiology and Public Health (CIBERESP), for completion of short-
Centers in Epidemiology and Public Health (CIBERESP), for completion of short-
term stays at international universities, in order to fulfill the requirements for a
term stays at international universities, in order to fulfill the requirements for a
doctorate with European mention.
doctorate with European mention.
(1) Ramada JM, Serra C, Delclós J. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. 1ª Jornada Científica CISAL. Barcelona, 2011.
(1) Ramada JM, Serra C, Delclós J. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. 1ª Jornada Científica CISAL. Barcelona, 2011.
(2) Ramada JM, Serra C, Delclós GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Second Scientific Conference on Work Disability Prevention and Integration ‘Healthy ageing in a working society’. WDPI; Groningen, 2012.
(2) Ramada JM, Serra C, Delclós GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Second Scientific Conference on Work Disability Prevention and Integration ‘Healthy ageing in a working society’. WDPI; Groningen, 2012.
(3) Ramada JM. Qüestionaris de salut de qualitat: requisits bàsics. XXII Diada de la Societat Catalana de Seguretat i Medicina del Treball. Barcelona, 2012.
(3) Ramada JM. Qüestionaris de salut de qualitat: requisits bàsics. XXII Diada de la Societat Catalana de Seguretat i Medicina del Treball. Barcelona, 2012.
(4) Ramada JM, Serra C, Delclós J. Traducción, adaptación cultural y validación del “Work role functioning questionnaire (WRFQ-27)”. 3ª Jornada Científica CISAL. Barcelona, 2013.
(4) Ramada JM, Serra C, Delclós J. Traducción, adaptación cultural y validación del “Work role functioning questionnaire (WRFQ-27)”. 3ª Jornada Científica CISAL. Barcelona, 2013.
(5) Ramada JM, Serra C, Amick BC, Castaño JR, Delclós GL. Adaptación cultural del "Work Role Functioning Questionnaire (WRFQ)" al castellano hablado en España. I Jonada Científica BiblioPRO. IMIM-CIBERESP. Barcelona, 2013.
(5) Ramada JM, Serra C, Amick BC, Castaño JR, Delclós GL. Adaptación cultural del "Work Role Functioning Questionnaire (WRFQ)" al castellano hablado en España. I Jonada Científica BiblioPRO. IMIM-CIBERESP. Barcelona, 2013.
xviii
xviii
xviii
xviii
PRÓLOGO
PRÓLOGO
El análisis de instrumentos de medición para su uso en la investigación y la
El análisis de instrumentos de medición para su uso en la investigación y la
práctica diaria en salud laboral es, en estos momentos, un área de interés para la
práctica diaria en salud laboral es, en estos momentos, un área de interés para la
investigación del Centro de Investigación en Salud Laboral (CiSAL), y es en este
investigación del Centro de Investigación en Salud Laboral (CiSAL), y es en este
contexto en el que se ha desarrollado la presente tesis doctoral. El contenido de
contexto en el que se ha desarrollado la presente tesis doctoral. El contenido de
esta tesis forma parte del proyecto de investigación del CiSAL, titulado
esta tesis forma parte del proyecto de investigación del CiSAL, titulado
“Evaluación de la capacidad para trabajar y posibilidades de intervención
“Evaluación de la capacidad para trabajar y posibilidades de intervención
mediante el Work Role Functioning Questionnaire adaptado al castellano”. Este
mediante el Work Role Functioning Questionnaire adaptado al castellano”. Este
proyecto ha sido financiado por el Instituto de Salud Carlos III, ISCIII (Ministerio de
proyecto ha sido financiado por el Instituto de Salud Carlos III, ISCIII (Ministerio de
Economía
Economía
y
Competitividad,
Gobierno
de
España),
FIS:
PI12/02556,
y
Competitividad,
Gobierno
de
España),
FIS:
PI12/02556,
(Investigadora principal Consol Serra Pujadas; co-investigadores José María
(Investigadora principal Consol Serra Pujadas; co-investigadores José María
Ramada y George Delclós).
Ramada y George Delclós).
Este proyecto surge de la necesidad de disponer de instrumentos en Español
Este proyecto surge de la necesidad de disponer de instrumentos en Español
validados para evaluar el impacto de la salud sobre el “desempeño del trabajo” en
validados para evaluar el impacto de la salud sobre el “desempeño del trabajo” en
poblaciones hispano-parlantes. Existe un número de instrumentos para evaluar el
poblaciones hispano-parlantes. Existe un número de instrumentos para evaluar el
“desempeño del trabajo” en relación con la salud en Inglés, pero no siempre han
“desempeño del trabajo” en relación con la salud en Inglés, pero no siempre han
sido adaptados y/o validados en el contexto Español. Por ello, la identificación y
sido adaptados y/o validados en el contexto Español. Por ello, la identificación y
selección de un instrumento para medir adecuadamente el “desempeño del
selección de un instrumento para medir adecuadamente el “desempeño del
trabajo” en relación con la salud y proceder a su traducción, adaptación y
trabajo” en relación con la salud y proceder a su traducción, adaptación y
validación de sus propiedades de medición, para su uso en futuras
validación de sus propiedades de medición, para su uso en futuras
investigaciones, es consistente con los objetivos de este proyecto.
investigaciones, es consistente con los objetivos de este proyecto.
Conforme a la normativa dada por la Comisión de Dirección del Programa de
Conforme a la normativa dada por la Comisión de Dirección del Programa de
Doctorado del Departamento de Ciencias Experimentales y de la Salud de la
Doctorado del Departamento de Ciencias Experimentales y de la Salud de la
Universidad Pompeu Fabra, esta tesis doctoral se presenta como un compendio
Universidad Pompeu Fabra, esta tesis doctoral se presenta como un compendio
de cuatro publicaciones científicas en las que el doctorando es el primer autor,
de cuatro publicaciones científicas en las que el doctorando es el primer autor,
fruto de la revisión de la literatura y el trabajo de campo llevado a cabo en el
fruto de la revisión de la literatura y el trabajo de campo llevado a cabo en el
sistema hospitalario del Parc de Salut Mar de Barcelona. La primera de las
sistema hospitalario del Parc de Salut Mar de Barcelona. La primera de las
publicaciones fue escrita en español y las tres restantes en inglés. Tres de ellas
publicaciones fue escrita en español y las tres restantes en inglés. Tres de ellas
xix
xix
han sido publicadas recientemente en revistas internacionales de salud laboral,
han sido publicadas recientemente en revistas internacionales de salud laboral,
indexadas en PubMed y con revisión por pares. La cuarta se encuentra en el
indexadas en PubMed y con revisión por pares. La cuarta se encuentra en el
momento de la impresión de esta tesis en proceso de revisión por pares, en una
momento de la impresión de esta tesis en proceso de revisión por pares, en una
revista internacional de salud laboral, asimismo indexada en Pubmed.
revista internacional de salud laboral, asimismo indexada en Pubmed.
Los resultados han sido presentados parcialmente en la Primera Jornada
Los resultados han sido presentados parcialmente en la Primera Jornada
Científica Anual del CiSAL (6); la Second Scientific Conference on Work Disability
Científica Anual del CiSAL (6); la Second Scientific Conference on Work Disability
Prevention and Integration (WDPI) (7); la XXII Diada de la Societat Catalana de
Prevention and Integration (WDPI) (7); la XXII Diada de la Societat Catalana de
Seguretat i Medicina del Treball (8); la Tercera Jornada Científica Anual del CiSAL
Seguretat i Medicina del Treball (8); la Tercera Jornada Científica Anual del CiSAL
(9) y en la Primera Jornada Científica BiblioPRO (10).
(9) y en la Primera Jornada Científica BiblioPRO (10).
Adicionalmente a la financiación del Instituto de Salud Carlos III (PI12/ 02556),
Adicionalmente a la financiación del Instituto de Salud Carlos III (PI12/ 02556),
esta tesis recibió apoyo económico parcial de la Escuela de Salud Pública de la
esta tesis recibió apoyo económico parcial de la Escuela de Salud Pública de la
Universidad de Texas (Estados Unidos de América) y del Centro de Investigación
Universidad de Texas (Estados Unidos de América) y del Centro de Investigación
Biomédica en Red de Epidemiología y Salud Pública (CIBERESP).
Biomédica en Red de Epidemiología y Salud Pública (CIBERESP).
(6) Ramada JM, Serra C, Delclós J. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. 1ª Jornada Científica CISAL. Barcelona, 2011.
(6) Ramada JM, Serra C, Delclós J. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. 1ª Jornada Científica CISAL. Barcelona, 2011.
(7) Ramada JM, Serra C, Delclós GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Second Scientific Conference on Work Disability Prevention and Integration ‘Healthy ageing in a working society’. WDPI; Groningen, 2012.
(7) Ramada JM, Serra C, Delclós GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Second Scientific Conference on Work Disability Prevention and Integration ‘Healthy ageing in a working society’. WDPI; Groningen, 2012.
(8) Ramada JM. Qüestionaris de salut de qualitat: requisits bàsics. XXII Diada de la Societat Catalana de Seguretat i Medicina del Treball. Barcelona, 2012.
(8) Ramada JM. Qüestionaris de salut de qualitat: requisits bàsics. XXII Diada de la Societat Catalana de Seguretat i Medicina del Treball. Barcelona, 2012.
(9) Ramada JM, Serra C, Delclós J. Traducción, adaptación cultural y validación del “Work role functioning questionnaire (WRFQ-27)”. 3ª Jornada Científica CISAL. Barcelona, 2013.
(9) Ramada JM, Serra C, Delclós J. Traducción, adaptación cultural y validación del “Work role functioning questionnaire (WRFQ-27)”. 3ª Jornada Científica CISAL. Barcelona, 2013.
(10) Ramada JM, Serra C, Amick BC, Castaño JR, Delclós GL. Adaptación cultural del "Work Role Functioning Questionnaire (WRFQ)" al castellano hablado en España. I Jonada Científica BiblioPRO. IMIM-CIBERESP. Barcelona, 2013.
(10) Ramada JM, Serra C, Amick BC, Castaño JR, Delclós GL. Adaptación cultural del "Work Role Functioning Questionnaire (WRFQ)" al castellano hablado en España. I Jonada Científica BiblioPRO. IMIM-CIBERESP. Barcelona, 2013.
xx
xx
TABLE OF CONTENTS
TABLE OF CONTENTS Page
Page
ACKNOWLEDGEMENTS (Agradecimientos – Agraïments)
v
ACKNOWLEDGEMENTS (Agradecimientos – Agraïments)
v
SUMMARY
ix
SUMMARY
ix
RESUMEN
xiii
RESUMEN
xiii
PREFACE
xvii
PREFACE
xvii
PRÓLOGO
xix
PRÓLOGO
xix
1. INTRODUCTION
1
1. INTRODUCTION
1
1.1. Statement of the problem
1
1.1. Statement of the problem
1
1.2. From work disability to health-related work functioning
2
1.2. From work disability to health-related work functioning
2
1.3. General overview of work outcome measurement tools
7
1.3. General overview of work outcome measurement tools
7
1.4. Methodological quality in health-questionnaire validation
11
1.4. Methodological quality in health-questionnaire validation
11
2. OBJECTIVES
23
2. OBJECTIVES
23
2.1. Study I Objectives
23
2.1. Study I Objectives
23
2.2. Study II Objectives
23
2.2. Study II Objectives
23
2.3. Study III Objectives
23
2.3. Study III Objectives
23
2.4. Study IV Objectives
23
2.4. Study IV Objectives
23
xxi
xxi
Page 3. PAPER # 1
25
Page 3. PAPER # 1
Ramada JM, Serra C, Delclós GL. Adaptación cultural y validación de
Ramada JM, Serra C, Delclós GL. Adaptación cultural y validación de
cuestionarios de salud: revisión y recomendaciones metodológicas.
cuestionarios de salud: revisión y recomendaciones metodológicas.
Salud Publica Mex. 2013;55:57-66.
Salud Publica Mex. 2013;55:57-66.
4. PAPER # 2
37
4. PAPER # 2
Ramada JM, Serra C, Amick III BC, Castaño JR, Delclos GL. Cross-
Ramada JM, Serra C, Amick III BC, Castaño JR, Delclos GL. Cross-
cultural adaptation of the work role functioning questionnaire to Spanish
cultural adaptation of the work role functioning questionnaire to Spanish
spoken in Spain. J Occup Rehabil. 2013;23:566-75.
spoken in Spain. J Occup Rehabil. 2013;23:566-75.
5. PAPER # 3
49
5. PAPER # 3
Ramada JM, Serra C, Amick III BC, Abma FI, Castaño JR, Pidemunt G,
Ramada JM, Serra C, Amick III BC, Abma FI, Castaño JR, Pidemunt G,
Bültmann U, Delclos GL. Reliability and validity of the Work Role
Bültmann U, Delclos GL. Reliability and validity of the Work Role
Functioning Questionnaire (Spanish version). [Submitted for peer-
Functioning Questionnaire (Spanish version). [Submitted for peer-
review].
review].
PAPER # 4
99
PAPER # 4
Ramada JM, Delclos GL, Amick III BC, Abma FI, Castaño JR, Pidemunt
Ramada JM, Delclos GL, Amick III BC, Abma FI, Castaño JR, Pidemunt
G, Bültmann, Serra C.Responsiveness of the Work Role Functioning
G, Bültmann, Serra C.Responsiveness of the Work Role Functioning
Questionnaire (Spanish version). J Occup Environ Med. [In Press
Questionnaire (Spanish version). J Occup Environ Med. [In Press
2013].
2013].
25
37
49
99
6. GENERAL DISCUSSION
135
6. GENERAL DISCUSSION
135
6.1. The concept of health-related work functioning.
135
6.1. The concept of health-related work functioning.
135
6.2. Selection of an instrument to measure health-related work functioning.
139
6.2. Selection of an instrument to measure health-related work functioning.
139
xxii
xxii
xxii
xxii
6.3. Cross-cultural adaptation and validation process (reliability, validity and responsiveness).
6.3. Cross-cultural adaptation and validation process (reliability, validity and 140
6.4. Standards to be used for methodological quality in health-questionnaire validation.
responsiveness).
140
6.4. Standards to be used for methodological quality in health-questionnaire 144
validation.
144
6.5. Implications for research and practice.
146
6.5. Implications for research and practice.
146
6.6. Future research.
147
6.6. Future research.
147
7. GENERAL CONCLUSIONS
157
7. GENERAL CONCLUSIONS
157
8. APPENDICES
158
8. APPENDICES
158
Appendix I: WRFQ (English version)
158
Appendix I: WRFQ (English version)
158
Appendix II: WRFQ (Spanish version)
161
Appendix II: WRFQ (Spanish version)
161
Appendix III: Single items of the WAI
165
Appendix III: Single items of the WAI
165
Appendix IV: Global perceived effect question (GPE-Q)
167
Appendix IV: Global perceived effect question (GPE-Q)
167
Appendix V: Clinical Research Ethical Committee approval
169
Appendix V: Clinical Research Ethical Committee approval
169
Appendix VI: Informed consent
171
Appendix VI: Informed consent
171
Appendix VII: Poster Primera Jornada Científica CiSAL (2011)
173
Appendix VII: Poster Primera Jornada Científica CiSAL (2011)
173
Appendix VIII: Poster WDPI, Groningen, The Netherlands (2012)
177
Appendix VIII: Poster WDPI, Groningen, The Netherlands (2012)
177
Appendix iX: Poster Tercera Jornada Científica Cisal (2013)
181
Appendix iX: Poster Tercera Jornada Científica Cisal (2013)
181
xxiii
xxiii
184
184
1. INTRODUCTION
1. INTRODUCTION
1.1. Statement of the problem
1.1. Statement of the problem
Health and work form an indivisible duality in which mutual influence is permanent.
Health and work form an indivisible duality in which mutual influence is permanent.
The World Health Organization (WHO) defines health as "a state of complete
The World Health Organization (WHO) defines health as "a state of complete
physical, mental and social well-being" and not merely the absence of disease.
physical, mental and social well-being" and not merely the absence of disease.
This definition is part of the Declaration of Principles of the WHO since its founding
This definition is part of the Declaration of Principles of the WHO since its founding
in 1948 (1).
in 1948 (1).
Work is a health determinant and there is an increasing body of evidence showing
Work is a health determinant and there is an increasing body of evidence showing
that work has positive health effects when working conditions are reasonably
that work has positive health effects when working conditions are reasonably
acceptable (2,3). Decent work sums up the aspirations of people in their working
acceptable (2,3). Decent work sums up the aspirations of people in their working
lives. It involves opportunities for productive work, delivers a fair income, security
lives. It involves opportunities for productive work, delivers a fair income, security
in the workplace and social protection for families, opportunities for personal
in the workplace and social protection for families, opportunities for personal
development and social integration, freedom for people to express their concerns,
development and social integration, freedom for people to express their concerns,
organize and participate in the decisions that affect their lives and equality of
organize and participate in the decisions that affect their lives and equality of
opportunity and treatment for all women and men. A community or a country
opportunity and treatment for all women and men. A community or a country
improves population health status when everyone who is able to work can get a
improves population health status when everyone who is able to work can get a
decent job (4).
decent job (4).
Increased life expectancy and prolongation of retirement age are increasing the
Increased life expectancy and prolongation of retirement age are increasing the
overall age of the workforce, and might result in an increasing number of
overall age of the workforce, and might result in an increasing number of
employees working with chronic diseases (5-7). Interventions to keep these
employees working with chronic diseases (5-7). Interventions to keep these
workers in the labor market and promote work participation are being increasingly
workers in the labor market and promote work participation are being increasingly
developed to support a sustainable, active, and productive work life (7,8).
developed to support a sustainable, active, and productive work life (7,8).
Furthermore, rehabilitation programs and interventions to adapt or accommodate
Furthermore, rehabilitation programs and interventions to adapt or accommodate
working conditions to the workers' health and skills are becoming more frequent,
working conditions to the workers' health and skills are becoming more frequent,
with the goal of achieving a safe return to work after a period of sick leave.
with the goal of achieving a safe return to work after a period of sick leave.
1
1
The effectiveness of these rehabilitation programs and interventions has usually
The effectiveness of these rehabilitation programs and interventions has usually
been assessed using outcome measures such as work status (active, temporary
been assessed using outcome measures such as work status (active, temporary
or permanent disability), time to return to work, duration of functional disability and
or permanent disability), time to return to work, duration of functional disability and
costs of incapacity to work (8-11). These outcomes have been useful but are
costs of incapacity to work (8-11). These outcomes have been useful but are
limited, as they mainly assess whether workers are present or absent from their
limited, as they mainly assess whether workers are present or absent from their
jobs. They do not offer information about the worker's participation in the job or the
jobs. They do not offer information about the worker's participation in the job or the
degree to which the worker is able to respond to the job's demands (12,13).
degree to which the worker is able to respond to the job's demands (12,13).
Quality validated measurement tools are needed to assess how workers function
Quality validated measurement tools are needed to assess how workers function
at work along their professional life course, and the existing continuum between
at work along their professional life course, and the existing continuum between
working successfully at one extreme and work absence at the other (14).
working successfully at one extreme and work absence at the other (14).
Outcome measures able to describe the extent to which workers increase or
Outcome measures able to describe the extent to which workers increase or
decrease their ability to meet job demands and to fully assess rehabilitation
decrease their ability to meet job demands and to fully assess rehabilitation
programs and intervention effectiveness are needed in Spanish-speaking
programs and intervention effectiveness are needed in Spanish-speaking
occupational health settings, yet there is a lack of quality validated instruments in
occupational health settings, yet there is a lack of quality validated instruments in
Spanish for this purpose. Thus, the rationale for this thesis is to provide an
Spanish for this purpose. Thus, the rationale for this thesis is to provide an
evidence base for an instrument to evaluate health-related work functioning, and
evidence base for an instrument to evaluate health-related work functioning, and
make it available to Spanish-speaking occupational health professionals and
make it available to Spanish-speaking occupational health professionals and
researchers for use in daily practice and research.
researchers for use in daily practice and research.
1.2.
1.2.
From work disability to health-related work functioning
From work disability to health-related work functioning
Disability can be described as the environmentally determined effect of an
Disability can be described as the environmentally determined effect of an
impairment that, in interaction with other factors and within a specific social
impairment that, in interaction with other factors and within a specific social
context, is likely to cause an individual to experience an undue disadvantage in his
context, is likely to cause an individual to experience an undue disadvantage in his
or her personal, social or professional life (15). Work disability could be defined as
or her personal, social or professional life (15). Work disability could be defined as
the effect of an illness or an accident in the ability of a person to perform a
the effect of an illness or an accident in the ability of a person to perform a
particular work activity.
particular work activity.
Disability is not an absolute attribute of an individual; rather, it is a social construct.
Disability is not an absolute attribute of an individual; rather, it is a social construct.
A person who is blind, or deaf, or needs a wheelchair to move can be completely
A person who is blind, or deaf, or needs a wheelchair to move can be completely
2
2
2
2
dependent in one setting, but fully autonomous and functional in a different one.
dependent in one setting, but fully autonomous and functional in a different one.
Thus, the effect of impairment will always be referred to a given environment, and
Thus, the effect of impairment will always be referred to a given environment, and
if we restrict disability to the functional effects of this impairment, regardless of the
if we restrict disability to the functional effects of this impairment, regardless of the
environment, we put the burden of the problem and the responsibility to find a
environment, we put the burden of the problem and the responsibility to find a
solution on the individual.
solution on the individual.
From a social perspective, work disability should be understood as a manageable
From a social perspective, work disability should be understood as a manageable
situation, where different stake holders (workers, employers, human resource
situation, where different stake holders (workers, employers, human resource
managers, supervisors, unions and occupational health professionals) should be
managers, supervisors, unions and occupational health professionals) should be
involved to respond to an individual’s needs so that he/she can function
involved to respond to an individual’s needs so that he/she can function
successfully at work. Disability is, therefore, a social rather than a medical issue
successfully at work. Disability is, therefore, a social rather than a medical issue
and from this perspective it is easier to understand that positive action towards
and from this perspective it is easier to understand that positive action towards
integration and job participation is required, rather than merely passive measures
integration and job participation is required, rather than merely passive measures
to provide income support (15).
to provide income support (15).
Once tucked into this paradigm, it is possible to analyze from a broader
Once tucked into this paradigm, it is possible to analyze from a broader
perspective the economic and social impact of removing barriers for integration of
perspective the economic and social impact of removing barriers for integration of
individuals with disabilities. Imaginative and economically viable solutions
individuals with disabilities. Imaginative and economically viable solutions
addressing a wider range of interventions may arise from this, varying from
addressing a wider range of interventions may arise from this, varying from
improving the workers’ skills (through training and rehabilitation programs), to
improving the workers’ skills (through training and rehabilitation programs), to
facilitating accommodation in suitable workplaces or intervening to adapt the
facilitating accommodation in suitable workplaces or intervening to adapt the
workplace and/or working conditions to the specific needs of these individuals.
workplace and/or working conditions to the specific needs of these individuals.
A significant number of research teams and occupational health services are
A significant number of research teams and occupational health services are
increasingly designing and implementing rehabilitation and/or accommodation
increasingly designing and implementing rehabilitation and/or accommodation
programs to adapt working conditions to worker skills and health to support an
programs to adapt working conditions to worker skills and health to support an
active working life (7,8,16,17). To fully assess intervention effectiveness requires
active working life (7,8,16,17). To fully assess intervention effectiveness requires
outcome measures that describe the extent to which people increase their ability
outcome measures that describe the extent to which people increase their ability
to meet the demands of the job.
to meet the demands of the job.
3
3
Health-related work functioning is a comprehensive concept that incorporates the
Health-related work functioning is a comprehensive concept that incorporates the
previously described paradigm shift, and can be defined as the ability of a worker
previously described paradigm shift, and can be defined as the ability of a worker
to meet work demands for a given physical and emotional health status (18).
to meet work demands for a given physical and emotional health status (18).
Theoretically, working conditions and demands are modifiable and health is a
Theoretically, working conditions and demands are modifiable and health is a
dynamic concept that can change over a lifetime. Hence, health-related work
dynamic concept that can change over a lifetime. Hence, health-related work
functioning constitutes a continuum rather than a dichotomy, with “working
functioning constitutes a continuum rather than a dichotomy, with “working
successfully” at one end and “work absence” at the other. Measuring the results of
successfully” at one end and “work absence” at the other. Measuring the results of
the impact of health on work in terms of "present” versus “absent" is not enough to
the impact of health on work in terms of "present” versus “absent" is not enough to
understand what happens along this continuum (19). Based on the individual’s
understand what happens along this continuum (19). Based on the individual’s
work performance and on-the-job productivity (Figure 1), and especially in the
work performance and on-the-job productivity (Figure 1), and especially in the
current European socio-economic context, it constitutes a phenomenon of great
current European socio-economic context, it constitutes a phenomenon of great
interest in occupational health care settings and research.
interest in occupational health care settings and research.
The rationale for this thesis arises from the need for quality validated
The rationale for this thesis arises from the need for quality validated
measurement instruments to assess health-related work functioning in Spanish-
measurement instruments to assess health-related work functioning in Spanish-
speaking settings. This will serve to enhance the evaluation of rehabilitation,
speaking settings. This will serve to enhance the evaluation of rehabilitation,
accommodation or adaptation programs. The emphasis is on the ability of the
accommodation or adaptation programs. The emphasis is on the ability of the
instrument to measure the worker's participation, and not only whether workers are
instrument to measure the worker's participation, and not only whether workers are
present or absent from their jobs.
present or absent from their jobs.
4
4
4
4
5 5
Health Status
Work Demands
WORK FUNCTIONING
Work Absence
Exhausting Oneself
Societal Context
Labour Market Context
Organizational Context (Workplace System)
Occupational Health Care
WORKER
Productive & Healthy
Working Successfully
WORK FUNCTIONING
Working Healthy
Participation
Business Productivity
NONDISCRIMINATION
CONFIDENTIALITY
RESPECT
VOLUNTARINESS
Health Status
Work Demands
WORK FUNCTIONING
Work Absence
Exhausting Oneself
Productive & Healthy
Working Successfully
WORK FUNCTIONING
Societal Context
Labour Market Context
Organizational Context (Workplace System)
Occupational Health Care
WORKER
Human Resources Management
Organizational Context (Workplace System)
Labour Market Context
Societal Context
Working Healthy
Participation
Business Productivity
NONDISCRIMINATION
CONFIDENTIALITY
Figure 1. Conceptual frame of Health-Related Work Functioning based on Amick, Gimeno (18) and Abma (19) and ethical use of the questionnaire.
RESPECT
VOLUNTARINESS
Human Resources Management
Organizational Context (Workplace System)
Labour Market Context
Societal Context
Figure 1. Conceptual frame of Health-Related Work Functioning based on Amick, Gimeno (18) and Abma (19) and ethical use of the questionnaire.
5
5
184
184
1.3. General overview of work outcome measurement tools
1.3. General overview of work outcome measurement tools
When reviewing the literature on work outcome measures it is possible to find
When reviewing the literature on work outcome measures it is possible to find
different approaches to work outcome measurement and, in general, it is possible
different approaches to work outcome measurement and, in general, it is possible
to retrieve four groups of work outcome measures (12). Several assess the labor
to retrieve four groups of work outcome measures (12). Several assess the labor
force status (mainly time to return to work and duration of functional disability).
force status (mainly time to return to work and duration of functional disability).
Another group assesses the economic impact of work outcomes (especially lost
Another group assesses the economic impact of work outcomes (especially lost
time from work and self-reported effectiveness in performing the job). A third set of
time from work and self-reported effectiveness in performing the job). A third set of
measures assesses the impact of health on role functioning (mixing work-role with
measures assesses the impact of health on role functioning (mixing work-role with
other roles). And finally, there is a group of work-role specific functioning
other roles). And finally, there is a group of work-role specific functioning
measurement instruments that measure health-related functioning at work.
measurement instruments that measure health-related functioning at work.
Several studies and reviews have analyzed both strengths and weaknesses of
Several studies and reviews have analyzed both strengths and weaknesses of
each group of measurement tools (12,18,20-26).
each group of measurement tools (12,18,20-26).
Focusing on the instruments that measure our phenomenon of interest (health-
Focusing on the instruments that measure our phenomenon of interest (health-
related work functioning), a number of health and/or job specific work functioning
related work functioning), a number of health and/or job specific work functioning
measurement instruments together with other generic instruments have been
measurement instruments together with other generic instruments have been
developed. The most relevant are shown in Table 1.
developed. The most relevant are shown in Table 1.
When measuring health-related work functioning in research and practice,
When measuring health-related work functioning in research and practice,
evidence-based decisions should be made about which instrument to use. Evans
evidence-based decisions should be made about which instrument to use. Evans
recommends considering three areas when choosing a questionnaire: the
recommends considering three areas when choosing a questionnaire: the
psychometric properties of the instrument, administration complexity, and the
psychometric properties of the instrument, administration complexity, and the
setting of the evaluation (27). Firstly, it is essential to know the purpose for use of
setting of the evaluation (27). Firstly, it is essential to know the purpose for use of
the instrument (in medicine, for example, it could be for diagnosis, evaluation or
the instrument (in medicine, for example, it could be for diagnosis, evaluation or
prediction) (28). Then, depending on this, it is necessary to find out whether the
prediction) (28). Then, depending on this, it is necessary to find out whether the
measurement properties of the instrument have been assessed with quality
measurement properties of the instrument have been assessed with quality
methodology.
methodology.
If the instrument is going to be applied for diagnostic or prognostic purposes, such
If the instrument is going to be applied for diagnostic or prognostic purposes, such
as to estimate work functioning status or to distinguish between different courses
as to estimate work functioning status or to distinguish between different courses
7
7
(or outcomes) of work functioning, evidence of its discriminative ability should be
(or outcomes) of work functioning, evidence of its discriminative ability should be
provided; in this case, parameters of reliability are very important (including those
provided; in this case, parameters of reliability are very important (including those
of measurement error). But if the aim is to apply the instrument to evaluate
of measurement error). But if the aim is to apply the instrument to evaluate
interventions or to monitor work functioning in individuals, the instrument needs to
interventions or to monitor work functioning in individuals, the instrument needs to
provide evidence of its ability to detect (true) changes over time; in this case,
provide evidence of its ability to detect (true) changes over time; in this case,
parameters of responsiveness (on top of measurement error) are crucial (28).
parameters of responsiveness (on top of measurement error) are crucial (28).
It is also necessary to know in which language or culture the questionnaire was
It is also necessary to know in which language or culture the questionnaire was
originally developed. If the intention is to use it in a different language, then it is
originally developed. If the intention is to use it in a different language, then it is
necessary to determine whether the process of cross-cultural adaptation and
necessary to determine whether the process of cross-cultural adaptation and
validation in the target language employed quality evidence-based methods.
validation in the target language employed quality evidence-based methods.
In the 2000s a series of specific work-role functioning questionnaires were
In the 2000s a series of specific work-role functioning questionnaires were
developed; among them, the Work Limitations Questionnaire (WLQ) and the Work
developed; among them, the Work Limitations Questionnaire (WLQ) and the Work
Role Functioning Questionnaire (WRFQ) (12,29) where developed as generic
Role Functioning Questionnaire (WRFQ) (12,29) where developed as generic
instruments to measure work functioning. These instruments provide an overall
instruments to measure work functioning. These instruments provide an overall
work functioning score, but also allow an estimation of work functioning in relation
work functioning score, but also allow an estimation of work functioning in relation
to each domain of work demands (work scheduling, output, physical, mental and
to each domain of work demands (work scheduling, output, physical, mental and
social demands).
social demands).
The WRFQ measures perceived difficulties to perform the job due to health
The WRFQ measures perceived difficulties to perform the job due to health
problems. As mentioned above, it is a generic instrument conceptually developed
problems. As mentioned above, it is a generic instrument conceptually developed
to represent a wide range of health conditions and work demands and is freely
to represent a wide range of health conditions and work demands and is freely
available in the literature for professionals and researchers. The questionnaire has
available in the literature for professionals and researchers. The questionnaire has
undergone various levels of validity and reliability testing and has displayed
undergone various levels of validity and reliability testing and has displayed
relevant levels of reliability and content, construct and criterion validity. Numerous
relevant levels of reliability and content, construct and criterion validity. Numerous
studies have demonstrated the usefulness of this tool in English-speaking health
studies have demonstrated the usefulness of this tool in English-speaking health
care environments (30-32) and it has been successfully translated, adapted and
care environments (30-32) and it has been successfully translated, adapted and
validated in Canadian French (33), Brazilian Portuguese (34) and Dutch
validated in Canadian French (33), Brazilian Portuguese (34) and Dutch
(14,19,35). No such version exists in Spanish.
(14,19,35). No such version exists in Spanish.
8
8
8
8
Type Reference Generic WF instrument. Single global rating. (36) Generic WF instrument. Single global rating. (37) Generic WF instrument. Single global rating. (38) Specific WF instrument for lost productive time. (39) Generic WF instrument. Overall and subscales rating. (40) Specific WF instrument for daily follow-up. (41) Specific WFfor quantity and quality of work. (42) Specific WF for rheumatic conditions. (43) Specific WF for angina pectoris. (44) Specific WF or arthritic population. (45) Specific WF for clinically depressed population. (46) Specific WF for nurses with common mental disorders (47) Generic WF instrument. Overall rating. (48) Generic WF instrument. Overall rating. (49) Generic WF instrument. Overall and subscales rating. (29) Generic WF instrument. Overall and subscales rating. (12)
Acronym HPQ WPAI WPSI WHI HLQ HRPQ-D QQ HAQ − WALS LEAPS NWFQ EWPS SPS WLQ WRFQ
Name of the Instrument Health and Work Performance Questionnaire Work Productivity and Activity Impairment Questionnaire Work Productivity Short Inventory Work and Health Interview Health and Labor Questionnaire Health Related Productivity Questionnaire Dairy Quantity and Quality Instrument Health Assessment Questionnaire Angina-related Limitations at Work Questionnaire Workplace Activity Limitations Scale Lam Employment Absence and Productivity Scale Nurses Work Functioning Questionnaire Endicott Work Productivity Scale Standford Presenteeism Scale Work Limitations Questionnaire Work Role Functioning Questionnaire
Type Reference Generic WF instrument. Single global rating. (36) Generic WF instrument. Single global rating. (37) Generic WF instrument. Single global rating. (38) Specific WF instrument for lost productive time. (39) Generic WF instrument. Overall and subscales rating. (40) Specific WF instrument for daily follow-up. (41) Specific WFfor quantity and quality of work. (42) Specific WF for rheumatic conditions. (43) Specific WF for angina pectoris. (44) Specific WF or arthritic population. (45) Specific WF for clinically depressed population. (46) Specific WF for nurses with common mental disorders (47) Generic WF instrument. Overall rating. (48) Generic WF instrument. Overall rating. (49) Generic WF instrument. Overall and subscales rating. (29) Generic WF instrument. Overall and subscales rating. (12)
Table 1. Specific and generic work functioning measurement instruments.
Name of the Instrument Health and Work Performance Questionnaire Work Productivity and Activity Impairment Questionnaire Work Productivity Short Inventory Work and Health Interview Health and Labor Questionnaire Health Related Productivity Questionnaire Dairy Quantity and Quality Instrument Health Assessment Questionnaire Angina-related Limitations at Work Questionnaire Workplace Activity Limitations Scale Lam Employment Absence and Productivity Scale Nurses Work Functioning Questionnaire Endicott Work Productivity Scale Standford Presenteeism Scale Work Limitations Questionnaire Work Role Functioning Questionnaire
Table 1. Specific and generic work functioning measurement instruments. Acronym HPQ WPAI WPSI WHI HLQ HRPQ-D QQ HAQ − WALS LEAPS NWFQ EWPS SPS WLQ WRFQ
9 9 9
9
184
184
1.4. Methodological quality in health-questionnaire validation
1.4. Methodological quality in health-questionnaire validation
Since measurement is at the core of occupational health research and practice,
Since measurement is at the core of occupational health research and practice,
access to quality measurement instruments is essential. Ensuring that it is well-
access to quality measurement instruments is essential. Ensuring that it is well-
designed and its content appropriate to measuring what it claims to measure
designed and its content appropriate to measuring what it claims to measure
should not be underestimated. In absolute terms, valid instruments do not exist.
should not be underestimated. In absolute terms, valid instruments do not exist.
Validating a measuring instrument is a process, sometimes complex, in which a
Validating a measuring instrument is a process, sometimes complex, in which a
base of evidence has to be constructed to support that the instrument meets a
base of evidence has to be constructed to support that the instrument meets a
number of measurement properties. When quality evidence is provided about the
number of measurement properties. When quality evidence is provided about the
presence or absence of these properties, it is possible to assign a degree of
presence or absence of these properties, it is possible to assign a degree of
quality to the instrument for a specific purpose. Hence, the methodology used to
quality to the instrument for a specific purpose. Hence, the methodology used to
carry out a validation process becomes the most important determinant to accept
carry out a validation process becomes the most important determinant to accept
or reject the quality of a measurement instrument.
or reject the quality of a measurement instrument.
This process becomes more challenging when using a measurement instrument
This process becomes more challenging when using a measurement instrument
developed in a particular language or culture with the aim of using it in a different
developed in a particular language or culture with the aim of using it in a different
one. In these cases, a simple (direct) translation of the questionnaire could be
one. In these cases, a simple (direct) translation of the questionnaire could be
unreliable, because misinterpretation could appear due to language and cultural
unreliable, because misinterpretation could appear due to language and cultural
differences in the perception of work, health and/or disease. In these
differences in the perception of work, health and/or disease. In these
circumstances, it is necessary to perform a cross-cultural validation, following a
circumstances, it is necessary to perform a cross-cultural validation, following a
systematic procedure. For several authors the cross-cultural validation is part of
systematic procedure. For several authors the cross-cultural validation is part of
the construct validation and should be assessed to guarantee the validity of the
the construct validation and should be assessed to guarantee the validity of the
instrument (28,50-52).
instrument (28,50-52).
There are several approaches in the literature to address the validation process of
There are several approaches in the literature to address the validation process of
a measuring instrument. Some approaches come from internationally renowned
a measuring instrument. Some approaches come from internationally renowned
experts in the design and validation methodology of questionnaires (28,53-59).
experts in the design and validation methodology of questionnaires (28,53-59).
Others come from different research groups that have achieved international
Others come from different research groups that have achieved international
standards. Among the latter, the following stand out: the consensus-based
standards. Among the latter, the following stand out: the consensus-based
standards on terminology and recommendations to assess the methodological
standards on terminology and recommendations to assess the methodological
quality of studies on measurement properties of health status measurements
quality of studies on measurement properties of health status measurements
11
11
instruments (COSMIN) (50-52);the standardized methodology for evaluating the
instruments (COSMIN) (50-52);the standardized methodology for evaluating the
measurement of patient-reported outcomes (EMPRO) to assist the choice of
measurement of patient-reported outcomes (EMPRO) to assist the choice of
instruments (60); the methodology of the Health Technology Assessment
instruments (60); the methodology of the Health Technology Assessment
Programme (HTA Programme) to evaluate patient-based outcome measures for
Programme (HTA Programme) to evaluate patient-based outcome measures for
use in clinical trials (61) and the criteria proposed by the Scientific Advisory
use in clinical trials (61) and the criteria proposed by the Scientific Advisory
Committee of the Medical Outcomes Trust (62).
Committee of the Medical Outcomes Trust (62).
To state that a questionnaire has been validated, it is necessary to provide
To state that a questionnaire has been validated, it is necessary to provide
evidence about certain features: 1) whether an instrument measures what it
evidence about certain features: 1) whether an instrument measures what it
purports to measure, 2) how it reflects the theory underlying the phenomenon
purports to measure, 2) how it reflects the theory underlying the phenomenon
being measured, 3) the degree to which the scores are an adequate reflection of
being measured, 3) the degree to which the scores are an adequate reflection of
a gold standard, 4) the extent to which the scores of the instrument are consistent
a gold standard, 4) the extent to which the scores of the instrument are consistent
with stated hypotheses,5) the degree of simplicity, feasibility and acceptability to
with stated hypotheses,5) the degree of simplicity, feasibility and acceptability to
patients, users and researchers, 4) the ability to measure free from error and,
patients, users and researchers, 4) the ability to measure free from error and,
therefore, ability to provide reproducible results when applied to individuals who
therefore, ability to provide reproducible results when applied to individuals who
have not changed over time, and 5) the sensitivity to detecting true changes over
have not changed over time, and 5) the sensitivity to detecting true changes over
time. All these features are related to three properties of the questionnaires:
time. All these features are related to three properties of the questionnaires:
validity, reliability and responsiveness.
validity, reliability and responsiveness.
However, the terminology found in the literature can be confusing for several
However, the terminology found in the literature can be confusing for several
reasons. First, there are differences in terms used as synonyms for measurement
reasons. First, there are differences in terms used as synonyms for measurement
properties (e.g. reliability, repeatability, stability, reproducibility and precision are
properties (e.g. reliability, repeatability, stability, reproducibility and precision are
used interchangeably). Second, there are different definitions given to the same
used interchangeably). Second, there are different definitions given to the same
concept (e.g. different authors give different definitions for responsiveness). Third,
concept (e.g. different authors give different definitions for responsiveness). Third,
different research groups evaluate different properties and characteristics of the
different research groups evaluate different properties and characteristics of the
instruments when assessing their quality (e.g. evaluation of appropriateness,
instruments when assessing their quality (e.g. evaluation of appropriateness,
interpretability, acceptability or feasibility are recommended in some guides but not
interpretability, acceptability or feasibility are recommended in some guides but not
others). Fourth, there is a wide variety of classifications of measurement properties
others). Fourth, there is a wide variety of classifications of measurement properties
depending on authors and research groups (e.g. some authors, but not all,
depending on authors and research groups (e.g. some authors, but not all,
consider evaluating the cross-cultural adaptation as a part of construct validity;
consider evaluating the cross-cultural adaptation as a part of construct validity;
12
12
12
12
some consider responsiveness to be an aspect of validity, and also that face
some consider responsiveness to be an aspect of validity, and also that face
validity is an aspect of content validity).
validity is an aspect of content validity).
In this thesis a comprehensive review of the literature was conducted to
In this thesis a comprehensive review of the literature was conducted to
systematize the steps involved in validating a health questionnaire, the Work Role
systematize the steps involved in validating a health questionnaire, the Work Role
Functioning
Functioning
Questionnaire
(WRFQ),
following
the
methodological
Questionnaire
(WRFQ),
following
the
methodological
recommendations which found greater consensus. Next, the requirements for
recommendations which found greater consensus. Next, the requirements for
conducting a quality cross-cultural adaptation of health questionnaires were
conducting a quality cross-cultural adaptation of health questionnaires were
defined in detail and the properties evaluated were based on those most
defined in detail and the properties evaluated were based on those most
frequently recommended by experts and consensus groups, and then applied to
frequently recommended by experts and consensus groups, and then applied to
this questionnaire.
this questionnaire.
13
13
REFERENCES
REFERENCES
1. WHO: Constitution of the World Health Organization [Internet]. Geneva: World
1. WHO: Constitution of the World Health Organization [Internet]. Geneva: World
health Organization; 1948-2013. International Health Conference, 1948; [cited
health Organization; 1948-2013. International Health Conference, 1948; [cited
2013
2013
November
19];
Available
from:
http://apps.who.int/gb/bd/PDF/bd47/SP/constitucion-sp.pdf
November
19];
Available
from:
http://apps.who.int/gb/bd/PDF/bd47/SP/constitucion-sp.pdf
2. Wadell G, Burton T, Aylward M. Work and common health problems. J Insur
2. Wadell G, Burton T, Aylward M. Work and common health problems. J Insur
Med. 2007;39:109-20.
Med. 2007;39:109-20.
3. Butterworth P, Leach LS, Strazdins L, Olesen SC, Rodgers B, Broom DH. The
3. Butterworth P, Leach LS, Strazdins L, Olesen SC, Rodgers B, Broom DH. The
psychosocial quality of work determines whether employment has benefits for
psychosocial quality of work determines whether employment has benefits for
mental health: results from a longitudinal national household panel survey. J
mental health: results from a longitudinal national household panel survey. J
Occup Environ Med. 2011;68:806-12.
Occup Environ Med. 2011;68:806-12.
4. ILO: Promoting jobs, protecting people [Internet]. Geneva: International Labor
4. ILO: Promoting jobs, protecting people [Internet]. Geneva: International Labor
Organization; 1996-2013. Decent Work; [cited 2013 July 19]; [about 2 screens];
Organization; 1996-2013. Decent Work; [cited 2013 July 19]; [about 2 screens];
Available from: http://www.ilo.org/global/topics/decent-work/lang--es/index.htm
Available from: http://www.ilo.org/global/topics/decent-work/lang--es/index.htm
5. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169-71.
5. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169-71.
6. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers
6. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers
employment: the case for delaying the retirement age. J Eur Economic Assoc.
employment: the case for delaying the retirement age. J Eur Economic Assoc.
2010;8:1034-76.
2010;8:1034-76.
7. Macdonald EB, Sanati KA. Occupational health services now and in the future:
7. Macdonald EB, Sanati KA. Occupational health services now and in the future:
the need for a paradigm shift. J Occup Environ Med. 2010;52:1273-7.
the need for a paradigm shift. J Occup Environ Med. 2010;52:1273-7.
8. Sampere M, Gimeno D, Serra C, Plana M, Martínez JM, Delclos GL, Benavides
8. Sampere M, Gimeno D, Serra C, Plana M, Martínez JM, Delclos GL, Benavides
FG. Organizational return to work support and sick leave duration: a cohort of
FG. Organizational return to work support and sick leave duration: a cohort of
Spanish workers with a long-term non-work-related sick leave episode. J Occup
Spanish workers with a long-term non-work-related sick leave episode. J Occup
Environ Med. 2011;53:674-9.
Environ Med. 2011;53:674-9.
14
14
14
14
9. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to
9. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to
return employees to work following long-term sickness absence due to
return employees to work following long-term sickness absence due to
musculoskeletal disorders. J Public Health (Oxf).2012;34:115-24.
musculoskeletal disorders. J Public Health (Oxf).2012;34:115-24.
10. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic
10. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic
evaluation on the cost-effectiveness of employability interventions among work
evaluation on the cost-effectiveness of employability interventions among work
disabled employees or employees at risk of work disability: the CASE-study.
disabled employees or employees at risk of work disability: the CASE-study.
BMC Public Health. 2012;18:12:43.
BMC Public Health. 2012;18:12:43.
11. Arends I, Bültmann U, van Rhenen W, Groen H, van der Klink JJ. Economic
11. Arends I, Bültmann U, van Rhenen W, Groen H, van der Klink JJ. Economic
evaluation of a problem solving intervention to prevent recurrent sickness
evaluation of a problem solving intervention to prevent recurrent sickness
absence in workers with common mental disorders. PLoS One. 2013;8:e71937.
absence in workers with common mental disorders. PLoS One. 2013;8:e71937.
12. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-
12. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-
related work outcome measures and their uses and recommended measures.
related work outcome measures and their uses and recommended measures.
Spine. 2000;25:3152-60.
Spine. 2000;25:3152-60.
13. Baldwin ML, Johnson WG, Butler RJ. The error of using returns-to-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632-41.
13. Baldwin ML, Johnson WG, Butler RJ. The error of using returns-to-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632-41.
14. Abma FI, van der Klink JJ, Bültmann U. The work role functioning questionnaire
14. Abma FI, van der Klink JJ, Bültmann U. The work role functioning questionnaire
2.0 (Dutch version): examination of its reliability, validity and responsiveness in
2.0 (Dutch version): examination of its reliability, validity and responsiveness in
the general working population. J Occup Rehabil. 2013;23:135-47.
the general working population. J Occup Rehabil. 2013;23:135-47.
15. ILO Encyclopaedia of Occupational health and Safety [Internet]. Part III.
15. ILO Encyclopaedia of Occupational health and Safety [Internet]. Part III.
Management & Policy. Chapter 17: Disability and work [cited October 2013];
Management & Policy. Chapter 17: Disability and work [cited October 2013];
Available from: http://www.ilo.org/oshenc/part-iii/disability-and-work/item/170-
Available from: http://www.ilo.org/oshenc/part-iii/disability-and-work/item/170-
disability-concepts-and-definitions
disability-concepts-and-definitions
16. Arends I, Bruinvels DJ, Rebergen DS, Nieuwenhuijsen K, Madan I, Neumeyer-
16. Arends I, Bruinvels DJ, Rebergen DS, Nieuwenhuijsen K, Madan I, Neumeyer-
Gromen A et al. Interventions to facilitate return to work in adults with
Gromen A et al. Interventions to facilitate return to work in adults with
adjustment disorders. Cochrane Database Syst Rev. 2012, Dec 12;12.
adjustment disorders. Cochrane Database Syst Rev. 2012, Dec 12;12.
15
15
17. Arends I, van der Klink JJ, Bültmann U. Prevention of recurrent sickness
17. Arends I, van der Klink JJ, Bültmann U. Prevention of recurrent sickness
absence among employees with common mental disorders: design of a cluster-
absence among employees with common mental disorders: design of a cluster-
randomized controlled trial with cost-benefit and effectiveness evaluation. BMC
randomized controlled trial with cost-benefit and effectiveness evaluation. BMC
Public Health. 2010;10:132.
Public Health. 2010;10:132.
18. Amick BC III, Gimeno D. Measuring work outcomes with a focus on health-
18. Amick BC III, Gimeno D. Measuring work outcomes with a focus on health-
related work productivity loss. In: Wittink H & Carr D Editors. Evidence,
related work productivity loss. In: Wittink H & Carr D Editors. Evidence,
Outcomes & Quality of Life in Pain Treatment: A Handbook for Pain Treatment
Outcomes & Quality of Life in Pain Treatment: A Handbook for Pain Treatment
Professionals. London, UK: Elsevier, 2007. pp. 329-343.
Professionals. London, UK: Elsevier, 2007. pp. 329-343.
19. Abma FI. Work functioning: development and evaluation of a measurement tool
19. Abma FI. Work functioning: development and evaluation of a measurement tool
[PhD thesis]. Groningen, NL: University of Groningen; 2012. [Internet]. Available
[PhD thesis]. Groningen, NL: University of Groningen; 2012. [Internet]. Available
from: http://irs.ub.rug.nl/ppn/351176438
from: http://irs.ub.rug.nl/ppn/351176438
20. Lofland, J. H., Pizzi, L., & Frick, K. D. A review of health-related workplace
20. Lofland, J. H., Pizzi, L., & Frick, K. D. A review of health-related workplace
productivity loss instruments. Pharmacoeconomics. 2004;22:165-84.
productivity loss instruments. Pharmacoeconomics. 2004;22:165-84.
21. Prasad, M., Wahlqvist, P., Shikiar, R., & Shih, Y. T. A review of self-report
21. Prasad, M., Wahlqvist, P., Shikiar, R., & Shih, Y. T. A review of self-report
instruments measuring health-related work productivity. Pharmacoeconomics.
instruments measuring health-related work productivity. Pharmacoeconomics.
2004;22:225-44.
2004;22:225-44.
22. Ozminkowski, R. J., Goetzel, R. Z., Chang, S., & Long, S. The application of
22. Ozminkowski, R. J., Goetzel, R. Z., Chang, S., & Long, S. The application of
two health and productivity instruments at a large employer. J Occup Environ
two health and productivity instruments at a large employer. J Occup Environ
Med. 2004;46:635-48.
Med. 2004;46:635-48.
23. Williams RM, Schmuck G, Allwood S, Sanchez M, Shea R, Wark G.
23. Williams RM, Schmuck G, Allwood S, Sanchez M, Shea R, Wark G.
Psychometric evaluation of health-related work outcome measures for
Psychometric evaluation of health-related work outcome measures for
musculoskeletal disorders: a systematic review. J Occup Rehabil. 2007;17:504-
musculoskeletal disorders: a systematic review. J Occup Rehabil. 2007;17:504-
21.
21.
24. Beaton DE, Tang K, Gignac MA, Lacaille D, Badley EM, Anis AH et al.
24. Beaton DE, Tang K, Gignac MA, Lacaille D, Badley EM, Anis AH et al.
Reliability, validity, and responsiveness of five at-work productivity measures in
Reliability, validity, and responsiveness of five at-work productivity measures in
16
16
16
16
patients with rheumatoid arthritis or osteoarthritis. Arthritis Care Res (Hoboken).
patients with rheumatoid arthritis or osteoarthritis. Arthritis Care Res (Hoboken).
2010;62:28-37.
2010;62:28-37.
25. Nieuwenhuijsen K, Franche RL, van Dijk FJ. Work functioning measurement:
25. Nieuwenhuijsen K, Franche RL, van Dijk FJ. Work functioning measurement:
tools for occupational mental health research. J Occup Environ Med.
tools for occupational mental health research. J Occup Environ Med.
2010;52:778-90.
2010;52:778-90.
26. Abma FI, van der Klink JJ, Terwee CB, Amick BC 3rd, Bültmann U.Evaluation
26. Abma FI, van der Klink JJ, Terwee CB, Amick BC 3rd, Bültmann U.Evaluation
of the measurement properties of self-reported health-related work-functioning
of the measurement properties of self-reported health-related work-functioning
instruments among workers with common mental disorders.Scand J Work
instruments among workers with common mental disorders.Scand J Work
Environ Health. 2012;38:5-18.
Environ Health. 2012;38:5-18.
27. Evans CJ. Health and work productivity assessment: state of the art or state of flux? J Occup Environ Med. 2004;46(6 Suppl):S3-11.
27. Evans CJ. Health and work productivity assessment: state of the art or state of flux? J Occup Environ Med. 2004;46(6 Suppl):S3-11.
28. de Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: A practical guide. 1st ed. Cambridge, UK: The University Press Cambridge, 2011. 29. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72-85.
28. de Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: A practical guide. 1st ed. Cambridge, UK: The University Press Cambridge, 2011. 29. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72-85.
30. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-
30. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-
related work limitations among employed persons in the United States. Disabil
related work limitations among employed persons in the United States. Disabil
Rehabil. 2000;22:225-32.
Rehabil. 2000;22:225-32.
31. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity
31. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity
student-role functioning scale using item response theory. Work. 2002;19:105-
student-role functioning scale using item response theory. Work. 2002;19:105-
16.
16.
32. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et
32. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et
al. Validity and responsiveness of presenteeism scales in chronic work-related
al. Validity and responsiveness of presenteeism scales in chronic work-related
upper-extremity disorders. Phys Ther. 2011;91:254-66.
upper-extremity disorders. Phys Ther. 2011;91:254-66.
17
17
33. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-
33. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-
cultural adaptation of the work role functioning questionnaire in Canadian
cultural adaptation of the work role functioning questionnaire in Canadian
French.Int J Rehabil 2004;27:261-8.
French.Int J Rehabil 2004;27:261-8.
34. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability,
34. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability,
and validity of the work role functioning questionnaire to Brazilian Portuguese. J
and validity of the work role functioning questionnaire to Brazilian Portuguese. J
Occup Rehabil 2007;17:701-11.
Occup Rehabil 2007;17:701-11.
35. Abma FI, Amick III BC, Brouwer S, van der Klink JJ, Bültmann U. The cross-
35. Abma FI, Amick III BC, Brouwer S, van der Klink JJ, Bültmann U. The cross-
cultural adaptation of the work role functioning questionnaire to Dutch. Work.
cultural adaptation of the work role functioning questionnaire to Dutch. Work.
2012;43:203-10.
2012;43:203-10.
36. Kessler R, Barber C, Beck A, Berglund P, Cleary PD, McKenas D et al. The
36. Kessler R, Barber C, Beck A, Berglund P, Cleary PD, McKenas D et al. The
World Health Organization health and work performance questionnaire (HPQ). J
World Health Organization health and work performance questionnaire (HPQ). J
Occup Environ Med. 2003;45;156-74.
Occup Environ Med. 2003;45;156-74.
37. Reilly MC, Zbrozek AS, Dukes EM. The validity and reproducibility of a work productivity
and
activity
impairment
instrument.
37. Reilly MC, Zbrozek AS, Dukes EM. The validity and reproducibility of a work
Pharmacoeconomics.
productivity
1993;4:353-65.
and
activity
impairment
instrument.
Pharmacoeconomics.
1993;4:353-65.
38. Goetzel RZ, Ozminkowski RJ, Long, SR. Development and reliability analysis of
38. Goetzel RZ, Ozminkowski RJ, Long, SR. Development and reliability analysis of
the Work Productivity Short Inventory (WPSI) instrument measuring employee
the Work Productivity Short Inventory (WPSI) instrument measuring employee
health and productivity. J Occup Environ Med. 2003;45:743-62.
health and productivity. J Occup Environ Med. 2003;45:743-62.
39. Stewart WF, Ricci JA, Leotta C, Chee E. Validation of the work and health
39. Stewart WF, Ricci JA, Leotta C, Chee E. Validation of the work and health
interview. Pharmacoeconomics. 2004;22:1127-40.
interview. Pharmacoeconomics. 2004;22:1127-40.
40. van Roijen L, Essink-Bot ML, Koopmanschap MA, Bonsel G, Rutten FF. Labor
40. van Roijen L, Essink-Bot ML, Koopmanschap MA, Bonsel G, Rutten FF. Labor
and health status in economic evaluation of health care. The Health and Labor
and health status in economic evaluation of health care. The Health and Labor
Questionnaire. Int J Technol Assess Health Care. 1996;12:405-15.
Questionnaire. Int J Technol Assess Health Care. 1996;12:405-15.
41. Kumar RN, Hass SL, Li JZ, Nickens DJ, Daenzer CL, Wathen LK. Validation of
41. Kumar RN, Hass SL, Li JZ, Nickens DJ, Daenzer CL, Wathen LK. Validation of
the Health-Related Productivity Questionnaire Diary (HRPQ-D) on a sample of
the Health-Related Productivity Questionnaire Diary (HRPQ-D) on a sample of
18
18
18
18
patients with infectious mononucleosis: results from a phase 1 multicenter
patients with infectious mononucleosis: results from a phase 1 multicenter
clinical trial. J Occup Environ Med. 2003;45:899-907.
clinical trial. J Occup Environ Med. 2003;45:899-907.
42. Meerding WJ, IJzelenberg W, Koopmanschap MA, Severens JL, Burdorf A.
42. Meerding WJ, IJzelenberg W, Koopmanschap MA, Severens JL, Burdorf A.
Health problems lead to considerable productivity loss at work among workers
Health problems lead to considerable productivity loss at work among workers
with high physical load jobs. J Clin Epidemiol. 2005;58:517-23.
with high physical load jobs. J Clin Epidemiol. 2005;58:517-23.
43. Wolfe F, Michaud K, Pincus T. Development and validation of the health
43. Wolfe F, Michaud K, Pincus T. Development and validation of the health
assessment questionnaire II: a revised version of the health assessment
assessment questionnaire II: a revised version of the health assessment
questionnaire. Arthritis Rheum. 2004;50:3296-305.
questionnaire. Arthritis Rheum. 2004;50:3296-305.
44. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH, Gomes DR, Salem DN. The Angina-related Limitations at Work Questionnaire. Qual Life Res. 1998;7:23-32.
44. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH, Gomes DR, Salem DN. The Angina-related Limitations at Work Questionnaire. Qual Life Res. 1998;7:23-32.
45. Gignac MA, Badley EM, Lacaille D, Cott CC, Adam P, Anis AH. Managing
45. Gignac MA, Badley EM, Lacaille D, Cott CC, Adam P, Anis AH. Managing
arthritis and employment: making arthritis-related work changes as a means of
arthritis and employment: making arthritis-related work changes as a means of
adaptation. Arthritis Rheum. 2004;51:909-16.
adaptation. Arthritis Rheum. 2004;51:909-16.
46. Lam RW, Michalak EE, Yatham LN. A new clinical rating scale for work
46. Lam RW, Michalak EE, Yatham LN. A new clinical rating scale for work
absence and productivity: validation in patients with major depressive disorder.
absence and productivity: validation in patients with major depressive disorder.
BMC Psychiatry. 2009;9:78.
BMC Psychiatry. 2009;9:78.
47. Gärtner FR, Nieuwenhuijsen K, van Dijk FJ, Sluiter JK. Psychometric properties
47. Gärtner FR, Nieuwenhuijsen K, van Dijk FJ, Sluiter JK. Psychometric properties
of the Nurses Work Functioning Questionnaire (NWFQ). PLoS One.
of the Nurses Work Functioning Questionnaire (NWFQ). PLoS One.
2011;6:e26565.
2011;6:e26565.
48. Endicott J, Nee J. Endicott Work Productivity Scale (EWPS): a new measure to
48. Endicott J, Nee J. Endicott Work Productivity Scale (EWPS): a new measure to
assess treatment effects. Endicott Work Productivity Scale (EWPS): a new
assess treatment effects. Endicott Work Productivity Scale (EWPS): a new
measure to assess treatment effects. Psychopharmacol Bull. 1997;33:13-6.
measure to assess treatment effects. Psychopharmacol Bull. 1997;33:13-6.
49. Koopman C, Pelletier KR, Murray JF, Sharda CE, Berger ML, Turpin RS, et al.
49. Koopman C, Pelletier KR, Murray JF, Sharda CE, Berger ML, Turpin RS, et al.
Stanford presenteeism scale: health status and employee productivity. J Occup
Stanford presenteeism scale: health status and employee productivity. J Occup
Environ Med. 2002;44:14-20.
Environ Med. 2002;44:14-20.
19
19
50. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The
50. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The
COSMIN study reached international consensus on taxonomy, terminology, and
COSMIN study reached international consensus on taxonomy, terminology, and
definitions of measurement properties for health-related patient-reported
definitions of measurement properties for health-related patient-reported
outcomes. J Clin Epidemiol. 2010;63:737-45.
outcomes. J Clin Epidemiol. 2010;63:737-45.
51. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The
51. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The
COSMIN checklist for evaluating the methodological quality of studies on
COSMIN checklist for evaluating the methodological quality of studies on
measurement properties: A clarification of its content. BMC Med Res Methodol.
measurement properties: A clarification of its content. BMC Med Res Methodol.
2010;10:22.
2010;10:22.
52. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The
52. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The
COSMIN checklist for assessing the methodological quality of studies on
COSMIN checklist for assessing the methodological quality of studies on
measurement properties of health status measurement instruments: an
measurement properties of health status measurement instruments: an
international Delphi study. Qual Life Res. 2010;19:539-49.
international Delphi study. Qual Life Res. 2010;19:539-49.
53. Guillemin F. Cross-cultural adaptation and validation of health status measures.
53. Guillemin F. Cross-cultural adaptation and validation of health status measures.
Scand J Rheumatol.1995;24:61-63.
Scand J Rheumatol.1995;24:61-63.
54. Beaton DE, Bombardier C, Guillemin F, Bosi Ferraz M. Guidelines for the
54. Beaton DE, Bombardier C, Guillemin F, Bosi Ferraz M. Guidelines for the
process of cross-cultural adaptation of self-reports measures. Spine. 2000;
process of cross-cultural adaptation of self-reports measures. Spine. 2000;
25:3186-3191.
25:3186-3191.
55. Aday LA, Cornelius LJ. Designing and conducting health surveys: a
55. Aday LA, Cornelius LJ. Designing and conducting health surveys: a
comprehensive guide. 3rd ed. San Francisco, CA: Jossey-Bass publisher; 2006.
comprehensive guide. 3rd ed. San Francisco, CA: Jossey-Bass publisher; 2006.
56. Streiner DL, Norman GR. Health measurement scales: a practical guide to their
56. Streiner DL, Norman GR. Health measurement scales: a practical guide to their
development and use. 4thed. New York: Oxford University Press Inc.; 2008.
development and use. 4thed. New York: Oxford University Press Inc.; 2008.
57. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación
57. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación
de cuestionarios. Reumatol Clin. 2009;5:171-77.
de cuestionarios. Reumatol Clin. 2009;5:171-77.
58. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J
58. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J
Psychosom Res. 2010;68:319-23.
20
Psychosom Res. 2010;68:319-23.
20
20
20
59. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al.
59. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al.
Quality criteria were proposed for measurement properties of health status
Quality criteria were proposed for measurement properties of health status
questionnaires. J Clin Epidemiol. 2007;60:34-42.
questionnaires. J Clin Epidemiol. 2007;60:34-42.
60. Valderas JM, Ferrer M, Mendívil J, Garin O, Rajmil L, Herdman M, et al.
60. Valderas JM, Ferrer M, Mendívil J, Garin O, Rajmil L, Herdman M, et al.
Development of EMPRO: a tool for the standardized assessment of patient-
Development of EMPRO: a tool for the standardized assessment of patient-
reported outcome measures. Value Health. 2008;11:700-8.
reported outcome measures. Value Health. 2008;11:700-8.
61. Fitzpatrick R, Davey C, Buxton MJ, Jones DR. Evaluating patient-based
61. Fitzpatrick R, Davey C, Buxton MJ, Jones DR. Evaluating patient-based
outcome measures for use in clinical trials. Health Technol Assessment.
outcome measures for use in clinical trials. Health Technol Assessment.
1998;2:1-74.
1998;2:1-74.
62. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, et al.
62. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, et al.
Assessing health status and quality-of-life instruments: attributes and review
Assessing health status and quality-of-life instruments: attributes and review
criteria. Qual Life Res. 2002;11:193-205.
criteria. Qual Life Res. 2002;11:193-205.
21
21
184
184
2. OBJECTIVES
2. OBJECTIVES
2.1. Study I Objectives
2.1. Study I Objectives
To review the literature on the methodology for cross-cultural adaptation and
To review the literature on the methodology for cross-cultural adaptation and
validation
validation
(CCAV)
of
health
questionnaires
and
to
synthesize
(CCAV)
of
health
questionnaires
and
to
synthesize
recommendations based on the scientific literature to facilitate this process.
recommendations based on the scientific literature to facilitate this process.
To
To
evaluate
the
degree
of
compliance
with
the
methodological
evaluate
the
degree
of
compliance
with
the
methodological
recommendations for the CCAV of health questionnaires in a selection of
recommendations for the CCAV of health questionnaires in a selection of
Spanish-language scientific journals.
Spanish-language scientific journals.
2.2. Study II Objectives
2.2. Study II Objectives
To translate and adapt the Work Role Functioning Questionnaire to Spanish
To translate and adapt the Work Role Functioning Questionnaire to Spanish
spoken in Spain.
spoken in Spain.
To perform a preliminary evaluation of the Spanish version of the Work Role
To perform a preliminary evaluation of the Spanish version of the Work Role
Functioning Questionnaire psychometric properties by means of a pre-test.
Functioning Questionnaire psychometric properties by means of a pre-test.
2.3. Study III Objective
2.3. Study III Objective
To examine the reliability and validity of the Spanish version of the Work Role
To examine the reliability and validity of the Spanish version of the Work Role
Functioning Questionnaire in a Spanish-speaking general working population.
Functioning Questionnaire in a Spanish-speaking general working population.
2.4. Study IV Objective
2.4. Study IV Objective
To examine the responsiveness of the Spanish version of the Work Role
To examine the responsiveness of the Spanish version of the Work Role
Functioning Questionnaire in a Spanish-speaking general working population.
Functioning Questionnaire in a Spanish-speaking general working population.
23
23
184
184
3. PAPER # 1
3. PAPER # 1
Adaptación cultural y validación de cuestionarios de
Adaptación cultural y validación de cuestionarios de
salud: revisión y recomendaciones metodológicas. Salud
salud: revisión y recomendaciones metodológicas. Salud
Pública de México. 2013; 55:57-66.
Pública de México. 2013; 55:57-66.
25
25
184
184
Adaptación y validación de cuestionarios
Adaptación y validación de cuestionarios
Artículo de revisión
Artículo de revisión
Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas
Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas
José María Ramada-Rodilla, MD, MOH,(1,2) Consol Serra-Pujadas, MD, PhD,(1,2,3) George L Delclós-Clanchet, MD, MPH, PhD.(2,3,4)
José María Ramada-Rodilla, MD, MOH,(1,2) Consol Serra-Pujadas, MD, PhD,(1,2,3) George L Delclós-Clanchet, MD, MPH, PhD.(2,3,4)
Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. Salud Publica Mex 2013;55:57-66.
Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. Salud Publica Mex 2013;55:57-66.
Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Salud Publica Mex 2013;55:57-66.
Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Salud Publica Mex 2013;55:57-66.
Resumen La traducción simple de un cuestionario puede dar lugar a interpretaciones erróneas debido a diferencias culturales y de lenguaje. Cuando se utilicen cuestionarios desarrollados en otros países e idiomas en estudios científicos, además de traducirlos, es necesaria su adaptación cultural y validación. El objetivo de este trabajo es revisar la literatura sobre la traducción, adaptación cultural y validación (TACV) de cuestionarios de salud, y sintetizar y proponer recomendaciones basadas en la literatura científica que faciliten este proceso. La TACV debe seguir un proceso sistematizado, por lo que se recomiendan dos etapas: a) adaptación cultural: traducción directa, síntesis, traducción inversa, consolidación por comité de expertos y pre-test, y b) validación (con hasta siete pasos): evaluación de la consistencia interna, fiabilidad intra e interobservador, validez lógica, de contenido, criterio y constructo. La falta de equivalencia de los cuestionarios limita las posibilidades de comparación entre poblaciones con idiomas o culturas diferentes y el intercambio de información en la comunidad científica.
Abstract The simple translation of a questionnaire may lead to misinterpretation due to language and cultural differences. When using questionnaires developed in other countries and languages in scientific studies it is necessary, besides the translation, to carry out a cross-cultural adaptation and validation. Our objective was to review the literature on cross-cultural adaptation and validation (CCAV) of health questionnaires, and to synthesize and propose recommendations based on the scientific literature to facilitate this process. The CCAV should follow a systematic process. Two steps are recommended: 1) cross-cultural adaptation: direct translation, synthesis, back translation, expert committee consolidation and pre-testing, and 2) validation (with up to seven steps): assessment of internal consistency, reliability, intra- and interobserver reliability, face, content, criterion and construct validity. Lack of equivalence between questionnaires limits the comparability of results among populations with different cultures and languages and the exchange of information in the scientific community.
Resumen La traducción simple de un cuestionario puede dar lugar a interpretaciones erróneas debido a diferencias culturales y de lenguaje. Cuando se utilicen cuestionarios desarrollados en otros países e idiomas en estudios científicos, además de traducirlos, es necesaria su adaptación cultural y validación. El objetivo de este trabajo es revisar la literatura sobre la traducción, adaptación cultural y validación (TACV) de cuestionarios de salud, y sintetizar y proponer recomendaciones basadas en la literatura científica que faciliten este proceso. La TACV debe seguir un proceso sistematizado, por lo que se recomiendan dos etapas: a) adaptación cultural: traducción directa, síntesis, traducción inversa, consolidación por comité de expertos y pre-test, y b) validación (con hasta siete pasos): evaluación de la consistencia interna, fiabilidad intra e interobservador, validez lógica, de contenido, criterio y constructo. La falta de equivalencia de los cuestionarios limita las posibilidades de comparación entre poblaciones con idiomas o culturas diferentes y el intercambio de información en la comunidad científica.
Abstract The simple translation of a questionnaire may lead to misinterpretation due to language and cultural differences. When using questionnaires developed in other countries and languages in scientific studies it is necessary, besides the translation, to carry out a cross-cultural adaptation and validation. Our objective was to review the literature on cross-cultural adaptation and validation (CCAV) of health questionnaires, and to synthesize and propose recommendations based on the scientific literature to facilitate this process. The CCAV should follow a systematic process. Two steps are recommended: 1) cross-cultural adaptation: direct translation, synthesis, back translation, expert committee consolidation and pre-testing, and 2) validation (with up to seven steps): assessment of internal consistency, reliability, intra- and interobserver reliability, face, content, criterion and construct validity. Lack of equivalence between questionnaires limits the comparability of results among populations with different cultures and languages and the exchange of information in the scientific community.
Palabras clave: cuestionarios; escalas; encuestas de salud; comparación transcultural; estudios de validación; confiabilidad y validez
Key words: questionnaires; scales; health survey; cross-cultural comparison; validation studies; reliability and validity
Palabras clave: cuestionarios; escalas; encuestas de salud; comparación transcultural; estudios de validación; confiabilidad y validez
Key words: questionnaires; scales; health survey; cross-cultural comparison; validation studies; reliability and validity
(1) (2) (3) (4)
Servicio de Salud Laboral, Parc de Salut MAR. Barcelona, España. Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra. Barcelona, España. CIBER de Epidemiología y Salud Pública (CIBERESP). Barcelona, España. Epidemiology, Human Genetics and Environmental Sciences Division, The University of Texas School of Public Health. Houston, Texas, EUA.
(1) (2) (3) (4)
Fecha de recibido: 2 de enero de 2012 • Fecha de aceptado: 21 de septiembre de 2012 Autor de correspondencia: José Ma. Ramada Rodilla. Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. Dr. Aiguader, 88, 08003-Barcelona, España. Correo electrónico:
[email protected] salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Servicio de Salud Laboral, Parc de Salut MAR. Barcelona, España. Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra. Barcelona, España. CIBER de Epidemiología y Salud Pública (CIBERESP). Barcelona, España. Epidemiology, Human Genetics and Environmental Sciences Division, The University of Texas School of Public Health. Houston, Texas, EUA.
Fecha de recibido: 2 de enero de 2012 • Fecha de aceptado: 21 de septiembre de 2012 Autor de correspondencia: José Ma. Ramada Rodilla. Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. Dr. Aiguader, 88, 08003-Barcelona, España. Correo electrónico:
[email protected] 57
27
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
57
27
Artículo de revisión
V
aldría la pena imaginar a un investigador que se encuentra aplicando un cuestionario británico a una muestra de peatones alemanes. En el cuestionario se pregunta sobre la costumbre de “mirar a la derecha” antes de cruzar una vía de doble sentido de circulación. Es probable que se detecte una carencia en la formación vial de los peatones alemanes, ya que éstos no miran a la derecha cuando cruzan. Sin embargo, este resultado estará más bien relacionado con una inadecuada adaptación cultural del cuestionario porque en Alemania se circula por la derecha y, por tanto, “se mira a la izquierda” antes de cruzar. La traducción simple de un cuestionario puede conducir a una interpretación errónea debido a diferencias culturales y de lenguaje. Si el proceso de traducción, adaptación cultural y validación (TACV) no se realiza correctamente pueden producirse errores de índole diversa dependiendo del objetivo del cuestionario. Una inadecuada TACV de cuestionarios como el Goldberg (GHQ), 1 el Nordic Occupational Skin Questionnaire (NOSQ),2 el Test de Control de Asma (ACT)3 o del Michigan Alcohol Screening Test (MAST),4 provocarían errores de clasificación en el despistaje de pacientes con trastornos ansioso-depresivos, dermatosis profesionales, asma o alcoholismo. Deficiencias en la TACV de cuestionarios como el Work Ability Index (WAI),5 o el Work Role Functioning Questionnaire (WRFQ),6 podrían dar lugar a errores en la evaluación del grado de capacidad para el trabajo, afectando la orientación de las medidas preventivas. Una TACV poco sistemática de cuestionarios para la vigilancia epidemiológica de enfermedades y exposiciones, como el Cuestionario de Detección Epidemiológica para Artritis Reumatoide,7,8 el Cuestionario Nórdico Estandarizado para la Detección de Síntomas Músculoesqueléticos en Salud Ocupacional,9 o el Cuestionario para la Detección Integrada de Obesidad, Diabetes e Hipertensión Arterial de la Secretaría de Salud de México,10podría llegar a inducir el diseño y puesta en marcha de políticas públicas inadecuadas. La TACV es necesaria incluso cuando se desea aplicar un cuestionario en países distintos que hablan un mismo idioma. En ocasiones se asume que la adaptación cultural a un idioma diferente garantiza las propiedades psicométricas del cuestionario. Esto no siempre es así. Por ejemplo, las diferencias en cómo se realiza la actividad laboral en los países pueden modificar la validez de un cuestionario de aplicación en salud laboral.11-13 La necesidad de intercambiar experiencias y llevar a cabo comparaciones entre poblaciones y países distintos precisa de versiones lingüísticas adecuadamente adaptadas y validadas de los instrumentos de medida.14,15
58
28
Ramada-Rodilla JM y col.
El grado de cumplimiento de los pasos metodológicos que se recomiendan en la literatura internacional para la realización de la TACV es bajo. Para deobjetivar esta afirmación, se recuperaron todos los artículos, sin límite temporal ni de idioma, publicados en cinco de las revistas de epidemiología y salud pública con mayor factor de impacto en América Latina y en España -Revista Panamericana de Salud Pública, Revista de Saúde Pública, Salud Pública de México, Gaceta Sanitaria y Revista Española de Salud Pública-, usando los términos MeSH: cuestionarios, escalas, encuestas de salud, comparación transcultural, estudios de validación, confiabilidad y validez. Se incluyeron aquellos artículos cuyo objetivo fue la TACV de un cuestionario a un idioma diferente del original. Se excluyeron aquellos que perseguían el diseño y validación de un cuestionario o bien la validación del mismo, partiendo de un cuestionario cuya adaptación cultural había sido publicada en un estudio anterior. Se obtuvieron en total 32 artículos que se analizaron en su versión completa. De ellos, 25% siguió menos de la mitad de los pasos recomendados; 72% siguió menos de 80% de dichos pasos, y sólo 6% de los artículos siguió la totalidad de éstos (cuadro I). No se ha identificado ninguna revisión en la literatura que integre y sistematice todo el proceso de TACV, por lo que el objetivo de este trabajo fue revisar y sintetizar la literatura proponiendo recomendaciones que faciliten el proceso de TACV para su aplicación en cuestionarios de salud.
Material y métodos Se realizó una revisión bibliográfica exhaustiva para localizar la información disponible sobre la metodología de la TACV de cuestionarios de salud. La búsqueda bibliográfica se inició con la revisión de varios libros y monografías especializadas en metodología para el diseño, adaptación y validación de cuestionarios publicados entre 1996 y 2007.11,16-21 A partir de las citas bibliográficas de dichas publicaciones, se recuperaron diversos artículos sobre la TACV de cuestionarios de salud y sus aspectos metodológicos, que estuvieran publicados en inglés, francés, italiano, español y portugués. Se seleccionaron las palabras clave que agrupaban un mayor número de términos y se contrastaron con el tesauro de Medline, identificando los términos (MeSH terms): 1) “health survey”; 2) “health questionnaire”; 3) “scale”; 4) “cross cultural adaptation”; 5) “validation”; 6) “validity”, y 7) reliability”. Con la combinación de estos términos se realizó la búsqueda en Medline, de tal manera que se obtuvieron 214 citas.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Artículo de revisión
V
aldría la pena imaginar a un investigador que se encuentra aplicando un cuestionario británico a una muestra de peatones alemanes. En el cuestionario se pregunta sobre la costumbre de “mirar a la derecha” antes de cruzar una vía de doble sentido de circulación. Es probable que se detecte una carencia en la formación vial de los peatones alemanes, ya que éstos no miran a la derecha cuando cruzan. Sin embargo, este resultado estará más bien relacionado con una inadecuada adaptación cultural del cuestionario porque en Alemania se circula por la derecha y, por tanto, “se mira a la izquierda” antes de cruzar. La traducción simple de un cuestionario puede conducir a una interpretación errónea debido a diferencias culturales y de lenguaje. Si el proceso de traducción, adaptación cultural y validación (TACV) no se realiza correctamente pueden producirse errores de índole diversa dependiendo del objetivo del cuestionario. Una inadecuada TACV de cuestionarios como el Goldberg (GHQ), 1 el Nordic Occupational Skin Questionnaire (NOSQ),2 el Test de Control de Asma (ACT)3 o del Michigan Alcohol Screening Test (MAST),4 provocarían errores de clasificación en el despistaje de pacientes con trastornos ansioso-depresivos, dermatosis profesionales, asma o alcoholismo. Deficiencias en la TACV de cuestionarios como el Work Ability Index (WAI),5 o el Work Role Functioning Questionnaire (WRFQ),6 podrían dar lugar a errores en la evaluación del grado de capacidad para el trabajo, afectando la orientación de las medidas preventivas. Una TACV poco sistemática de cuestionarios para la vigilancia epidemiológica de enfermedades y exposiciones, como el Cuestionario de Detección Epidemiológica para Artritis Reumatoide,7,8 el Cuestionario Nórdico Estandarizado para la Detección de Síntomas Músculoesqueléticos en Salud Ocupacional,9 o el Cuestionario para la Detección Integrada de Obesidad, Diabetes e Hipertensión Arterial de la Secretaría de Salud de México,10podría llegar a inducir el diseño y puesta en marcha de políticas públicas inadecuadas. La TACV es necesaria incluso cuando se desea aplicar un cuestionario en países distintos que hablan un mismo idioma. En ocasiones se asume que la adaptación cultural a un idioma diferente garantiza las propiedades psicométricas del cuestionario. Esto no siempre es así. Por ejemplo, las diferencias en cómo se realiza la actividad laboral en los países pueden modificar la validez de un cuestionario de aplicación en salud laboral.11-13 La necesidad de intercambiar experiencias y llevar a cabo comparaciones entre poblaciones y países distintos precisa de versiones lingüísticas adecuadamente adaptadas y validadas de los instrumentos de medida.14,15
58
28
Ramada-Rodilla JM y col.
El grado de cumplimiento de los pasos metodológicos que se recomiendan en la literatura internacional para la realización de la TACV es bajo. Para deobjetivar esta afirmación, se recuperaron todos los artículos, sin límite temporal ni de idioma, publicados en cinco de las revistas de epidemiología y salud pública con mayor factor de impacto en América Latina y en España -Revista Panamericana de Salud Pública, Revista de Saúde Pública, Salud Pública de México, Gaceta Sanitaria y Revista Española de Salud Pública-, usando los términos MeSH: cuestionarios, escalas, encuestas de salud, comparación transcultural, estudios de validación, confiabilidad y validez. Se incluyeron aquellos artículos cuyo objetivo fue la TACV de un cuestionario a un idioma diferente del original. Se excluyeron aquellos que perseguían el diseño y validación de un cuestionario o bien la validación del mismo, partiendo de un cuestionario cuya adaptación cultural había sido publicada en un estudio anterior. Se obtuvieron en total 32 artículos que se analizaron en su versión completa. De ellos, 25% siguió menos de la mitad de los pasos recomendados; 72% siguió menos de 80% de dichos pasos, y sólo 6% de los artículos siguió la totalidad de éstos (cuadro I). No se ha identificado ninguna revisión en la literatura que integre y sistematice todo el proceso de TACV, por lo que el objetivo de este trabajo fue revisar y sintetizar la literatura proponiendo recomendaciones que faciliten el proceso de TACV para su aplicación en cuestionarios de salud.
Material y métodos Se realizó una revisión bibliográfica exhaustiva para localizar la información disponible sobre la metodología de la TACV de cuestionarios de salud. La búsqueda bibliográfica se inició con la revisión de varios libros y monografías especializadas en metodología para el diseño, adaptación y validación de cuestionarios publicados entre 1996 y 2007.11,16-21 A partir de las citas bibliográficas de dichas publicaciones, se recuperaron diversos artículos sobre la TACV de cuestionarios de salud y sus aspectos metodológicos, que estuvieran publicados en inglés, francés, italiano, español y portugués. Se seleccionaron las palabras clave que agrupaban un mayor número de términos y se contrastaron con el tesauro de Medline, identificando los términos (MeSH terms): 1) “health survey”; 2) “health questionnaire”; 3) “scale”; 4) “cross cultural adaptation”; 5) “validation”; 6) “validity”, y 7) reliability”. Con la combinación de estos términos se realizó la búsqueda en Medline, de tal manera que se obtuvieron 214 citas.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Cuadro I
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013 No Sí No No Sí No No Sí Sí No No Sí No Sí Sí Sí Sí Sí Sí No No No Sí No No Sí No No No Sí Sí Sí
Validez constructo
59
29 No Sí No No Sí No No Sí Sí No No Sí No Sí Sí Sí Sí Sí Sí No No No Sí No No Sí No No No Sí Sí Sí
Validez constructo
Adaptación y validación de cuestionarios
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
TACV: Traducción, adaptación cultural y validación. GS: Gaceta Sanitaria . RESP: Revista Española de Salud Pública. SPM: Salud Pública de México RDSP: Revista de Saúde Pública. RPSP: Revista Panamericana de Salud Pública . Sí: Paso realizado. No: Paso no realizado. NP: No procede
Artículo Revista Adaptación cultural Validación Fiabilidad Validez Traducción Síntesis Traducción Comité Pre-test Consistencia test- Fiabilidad Validez Validez Validez directa traducciones inversa expertos interna retest interobservador aparente contenido criterio Mas Pons 1998 RESP Sí Sí Sí Sí Sí No No NP No No No López-Alvarenga 2001 SPM Sí No No No No No Sí NP No No Sí Amaral-Pinheiro 2002 RDSP Sí No No No Sí No No NP No Sí Sí Serra-Sutton 2002 RESP Sí Sí Sí Sí Sí No No NP No No NP López-Vázquez 2004 SPM Sí No No No No Sí No No No No NP Guimaraes de Mello 2004 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No No Melgar-Quiñonez 2005 SPM NP NP NP Sí Sí No No NP Sí Sí Sí Avanci 2005 RDSP Sí No Sí Sí No Sí Sí NP Sí Sí No Aymerich 2005 GS Sí Sí Sí Sí Sí Sí No NP Sí Sí No Torres 2005 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No Majdalani 2005 RPSP NP NP NP Sí Sí Sí Sí Sí Sí Sí Sí Rodriguez da Silva 2005 RDSP Sí No Sí Sí Sí Sí Sí No Sí No Sí López-Carmona 2006 SPM NP NP NP Sí No Sí Sí NP No No Sí Carpio 2006 RPSP Sí No No No No No No Sí Sí Sí Sí Álvarez 2006 SPM NP NP NP No Sí Sí No NP No No Sí Reichenheim 2007 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí Esteva 2007 GS Sí No Sí Sí Sí Sí Sí No Sí Sí No Pinto Guedes 2007 RDSP Sí Sí Sí Sí Sí No Sí NP Sí Sí NP Peña de León 2007 RPSP Sí Sí Sí Sí Sí Sí No NP No No Sí Remor 2007 RDSP Sí Sí Sí No No Sí No NP No No Sí Aguirre Jaime 2008 RESP NP NP NP Sí No Sí No No Sí Sí Sí González-Block 2008 SPM Sí No No No No No No No Sí Sí Sí Pedro Gómez 2009 RESP Sí No Sí Sí No Sí No NP No No NP Zurbarán 2009 RPSP Sí Sí Sí Sí Sí Sí No No No No Sí Martínez-Gómez 2009 RESP Sí No Sí Sí Sí Sí Sí NP Sí Sí Sí ShiNohara 2010 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No Sí Silva 2010 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No de Souza-Machado 2010 RDSP Sí No No No Sí NP Sí NP No No No Garrido-Urrutia 2010 RESP Sí Sí No No No Sí Sí NP Sí Sí NP Gutiérrez Sánchez 2011 RESP Sí No Sí No No Sí No NP No No NP Amaral Saliba 2011 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí de Barrios Leite 2011 RDSP Sí Sí Sí Sí Sí No Sí NP No Sí No
Artículo de revisión
Cumplimiento de los pasos metodológicos para la TACV de cuestionarios publicados en las revistas GS, RESP, SPM, RDSP, y RPSP, sin límite temporal ni de idioma, hasta el 1 de noviembre de 2011. Barcelona, España, noviembre 2011
Cuadro I
TACV: Traducción, adaptación cultural y validación. GS: Gaceta Sanitaria . RESP: Revista Española de Salud Pública. SPM: Salud Pública de México RDSP: Revista de Saúde Pública. RPSP: Revista Panamericana de Salud Pública . Sí: Paso realizado. No: Paso no realizado. NP: No procede
Artículo Revista Adaptación cultural Validación Fiabilidad Validez Traducción Síntesis Traducción Comité Pre-test Consistencia test- Fiabilidad Validez Validez Validez directa traducciones inversa expertos interna retest interobservador aparente contenido criterio Mas Pons 1998 RESP Sí Sí Sí Sí Sí No No NP No No No López-Alvarenga 2001 SPM Sí No No No No No Sí NP No No Sí Amaral-Pinheiro 2002 RDSP Sí No No No Sí No No NP No Sí Sí Serra-Sutton 2002 RESP Sí Sí Sí Sí Sí No No NP No No NP López-Vázquez 2004 SPM Sí No No No No Sí No No No No NP Guimaraes de Mello 2004 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No No Melgar-Quiñonez 2005 SPM NP NP NP Sí Sí No No NP Sí Sí Sí Avanci 2005 RDSP Sí No Sí Sí No Sí Sí NP Sí Sí No Aymerich 2005 GS Sí Sí Sí Sí Sí Sí No NP Sí Sí No Torres 2005 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No Majdalani 2005 RPSP NP NP NP Sí Sí Sí Sí Sí Sí Sí Sí Rodriguez da Silva 2005 RDSP Sí No Sí Sí Sí Sí Sí No Sí No Sí López-Carmona 2006 SPM NP NP NP Sí No Sí Sí NP No No Sí Carpio 2006 RPSP Sí No No No No No No Sí Sí Sí Sí Álvarez 2006 SPM NP NP NP No Sí Sí No NP No No Sí Reichenheim 2007 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí Esteva 2007 GS Sí No Sí Sí Sí Sí Sí No Sí Sí No Pinto Guedes 2007 RDSP Sí Sí Sí Sí Sí No Sí NP Sí Sí NP Peña de León 2007 RPSP Sí Sí Sí Sí Sí Sí No NP No No Sí Remor 2007 RDSP Sí Sí Sí No No Sí No NP No No Sí Aguirre Jaime 2008 RESP NP NP NP Sí No Sí No No Sí Sí Sí González-Block 2008 SPM Sí No No No No No No No Sí Sí Sí Pedro Gómez 2009 RESP Sí No Sí Sí No Sí No NP No No NP Zurbarán 2009 RPSP Sí Sí Sí Sí Sí Sí No No No No Sí Martínez-Gómez 2009 RESP Sí No Sí Sí Sí Sí Sí NP Sí Sí Sí ShiNohara 2010 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No Sí Silva 2010 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No de Souza-Machado 2010 RDSP Sí No No No Sí NP Sí NP No No No Garrido-Urrutia 2010 RESP Sí Sí No No No Sí Sí NP Sí Sí NP Gutiérrez Sánchez 2011 RESP Sí No Sí No No Sí No NP No No NP Amaral Saliba 2011 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí de Barrios Leite 2011 RDSP Sí Sí Sí Sí Sí No Sí NP No Sí No
Cumplimiento de los pasos metodológicos para la TACV de cuestionarios publicados en las revistas GS, RESP, SPM, RDSP, y RPSP, sin límite temporal ni de idioma, hasta el 1 de noviembre de 2011. Barcelona, España, noviembre 2011
Adaptación y validación de cuestionarios Artículo de revisión
59
29
Ramada-Rodilla JM y col.
Artículo de revisión
Fueron criterios de inclusión que el artículo tratara sobre aspectos metodológicos de los procesos de TACV de cuestionarios de salud y que fueran publicados en los idiomas mencionados. Con base en estos criterios y partiendo de la lectura de los resúmenes, se seleccionaron 20 artículos que se analizaron en su versión completa.12,13,15,22-38 Asimismo, se realizó una búsqueda de la literatura gris a través de Internet, introduciendo como criterios de búsqueda las palabras clave obtenidas, así como los autores identificados en el proceso anterior. Finalmente, se incluyeron siete libros11,16-21 y 21 artículos.12-15,22-38 A partir de esta revisión, se elaboró una propuesta con las recomendaciones metodológicas sobre las que existía un mayor consenso entre los autores y se formuló un glosario con los términos más comúnmente empleados en los procesos de TACV de cuestionarios (cuadro II). Síntesis y recomendaciones Existe amplio consenso en recomendar dos etapas para el proceso de TACV: a) adaptación cultural, donde es necesario tener en cuenta los giros idiomáticos, el contexto cultural, y las diferencias en la percepción de la salud y la enfermedad de las poblaciones, y b) la validación en el idioma de destino, para evaluar el grado de preservación de las propiedades psicométricas. Primera etapa: traducción y adaptación cultural En esta etapa se traduce la herramienta partiendo de su versión original y procurando mantener la estructura del cuestionario. El objetivo es conseguir que el instrumento resultante mantenga la equivalencia semántica, idiomática, conceptual y experiencial con el cuestionario original.22,23 En la literatura existe consenso sobre cómo abordar esta primera etapa,12,13,22-27 recomendándose una secuencia de cinco pasos (figura 1): Traducción directa: se realiza una traducción conceptual del instrumento. Deben participar, al menos, dos traductores bilingües independientes cuya lengua materna sea el idioma de destino. Uno de los traductores deberá conocer los objetivos y los conceptos considerados en el cuestionario, y tendrá experiencia previa en la traducción técnica de textos. El otro u otros traductores no tendrán conocimientos previos sobre el cuestionario y desconocerán los objetivos del estudio. Estos traductores ofrecerán una traducción más ajustada al lenguaje de uso coloquial, detectando las dificultades de comprensión y traducción derivadas del uso de vocablos técnicos o poco comunes. 60
30
Cuadro II
Glosario de términos comúnmente empleados en los procesos de traducción y adaptación cultural de cuestionarios.
Barcelona, España, noviembre 2011
Adaptación cultural (cross-cultural adaptation): tomar en consideración el contexto cultural, los giros idiomáticos y las diferencias en la percepción de la salud y la enfermedad de aquellas poblaciones en las cuales se desea aplicar. Consistencia interna (internal consistency reliability): es el grado de interrelación y coherencia de los componentes (ítems o variables) del instrumento de medida. Constructo (construct): teoría subyacente en el fenómeno o concepto que se quiere medir. Se trata de una cualidad no observable en una población de sujetos. Criterio o prueba de referencia (gold standard): método de medición alternativo equivalente, independiente de los resultados de un cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida. Escala (scale): graduación utilizada en diversos instrumentos de medida para posibilitar la medición de una magnitud. Especificidad (specificity): capacidad para detectar a los individuos que no presentan el fenómeno de estudio. Fiabilidad (reliability): grado en que un instrumento es capaz de medir sin errores. Es la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos. Fiabilidad inter-observador (inter-rater reliability): mide el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento. Fiabilidad intra-observador o fiabilidad test-retest (test-retest reliability): mide la estabilidad de las puntuaciones otorgadas por el mismo evaluador, en los mismos sujetos y con el mismo método en momentos diferentes. Ítem (item): cada uno de los componentes o variables de un instrumento de medida; cada una de las partes o unidades de que se compone una prueba, un test o un cuestionario. Sensibilidad (responsiveness): capacidad de detectar y medir cambios, tanto en los diferentes individuos como en la respuesta de un mismo individuo a través del tiempo. Traducción (translation): expresar en una lengua algo que se ha expresado anteriormente o que está escrito en otra diferente. Traducción directa (forward translation): es aquella que se realiza de un idioma extranjero al idioma del traductor. Traducción inversa (back translation): es la traducción de un texto a su idioma original partiendo de una traducción de este texto realizada previamente a otro idioma. Traducción literal (literal translation): es aquella en la que se respeta el sentido del texto original. Validación (validation): evaluación del grado de preservación de las propiedades psicométricas del cuestionario. Validez (validity): capacidad que tiene el instrumento de medir aquel constructo para el que ha sido diseñado. Validez aparente (face validity): grado en que los ítems de un cuestionario, a juicio de los expertos y de los usuarios, miden de modo lógico o reflejan adecuadamente el constructo que se quiere medir. Validez de contenido (content validity): grado en que el contenido de un instrumento es capaz de medir la mayor parte de las dimensiones del constructo que se quiere estudiar. Validez de constructo (construct validity): grado en que las mediciones que resulten de las respuestas del cuestionario puedan considerarse como una medición del fenómeno estudiado. Validez de criterio (criterion validity): grado en que el resultado del cuestionario predice o concuerda con algún criterio de “valor real” o gold standard. Valor predictivo positivo (positive predictive validity): es la probabilidad de que esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es positivo. Valor predictivo negativo (negative predictive validity): es la probabilidad de que no esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es negativo.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Ramada-Rodilla JM y col.
Artículo de revisión
Fueron criterios de inclusión que el artículo tratara sobre aspectos metodológicos de los procesos de TACV de cuestionarios de salud y que fueran publicados en los idiomas mencionados. Con base en estos criterios y partiendo de la lectura de los resúmenes, se seleccionaron 20 artículos que se analizaron en su versión completa.12,13,15,22-38 Asimismo, se realizó una búsqueda de la literatura gris a través de Internet, introduciendo como criterios de búsqueda las palabras clave obtenidas, así como los autores identificados en el proceso anterior. Finalmente, se incluyeron siete libros11,16-21 y 21 artículos.12-15,22-38 A partir de esta revisión, se elaboró una propuesta con las recomendaciones metodológicas sobre las que existía un mayor consenso entre los autores y se formuló un glosario con los términos más comúnmente empleados en los procesos de TACV de cuestionarios (cuadro II). Síntesis y recomendaciones Existe amplio consenso en recomendar dos etapas para el proceso de TACV: a) adaptación cultural, donde es necesario tener en cuenta los giros idiomáticos, el contexto cultural, y las diferencias en la percepción de la salud y la enfermedad de las poblaciones, y b) la validación en el idioma de destino, para evaluar el grado de preservación de las propiedades psicométricas. Primera etapa: traducción y adaptación cultural En esta etapa se traduce la herramienta partiendo de su versión original y procurando mantener la estructura del cuestionario. El objetivo es conseguir que el instrumento resultante mantenga la equivalencia semántica, idiomática, conceptual y experiencial con el cuestionario original.22,23 En la literatura existe consenso sobre cómo abordar esta primera etapa,12,13,22-27 recomendándose una secuencia de cinco pasos (figura 1): Traducción directa: se realiza una traducción conceptual del instrumento. Deben participar, al menos, dos traductores bilingües independientes cuya lengua materna sea el idioma de destino. Uno de los traductores deberá conocer los objetivos y los conceptos considerados en el cuestionario, y tendrá experiencia previa en la traducción técnica de textos. El otro u otros traductores no tendrán conocimientos previos sobre el cuestionario y desconocerán los objetivos del estudio. Estos traductores ofrecerán una traducción más ajustada al lenguaje de uso coloquial, detectando las dificultades de comprensión y traducción derivadas del uso de vocablos técnicos o poco comunes. 60
30
Cuadro II
Glosario de términos comúnmente empleados en los procesos de traducción y adaptación cultural de cuestionarios.
Barcelona, España, noviembre 2011
Adaptación cultural (cross-cultural adaptation): tomar en consideración el contexto cultural, los giros idiomáticos y las diferencias en la percepción de la salud y la enfermedad de aquellas poblaciones en las cuales se desea aplicar. Consistencia interna (internal consistency reliability): es el grado de interrelación y coherencia de los componentes (ítems o variables) del instrumento de medida. Constructo (construct): teoría subyacente en el fenómeno o concepto que se quiere medir. Se trata de una cualidad no observable en una población de sujetos. Criterio o prueba de referencia (gold standard): método de medición alternativo equivalente, independiente de los resultados de un cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida. Escala (scale): graduación utilizada en diversos instrumentos de medida para posibilitar la medición de una magnitud. Especificidad (specificity): capacidad para detectar a los individuos que no presentan el fenómeno de estudio. Fiabilidad (reliability): grado en que un instrumento es capaz de medir sin errores. Es la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos. Fiabilidad inter-observador (inter-rater reliability): mide el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento. Fiabilidad intra-observador o fiabilidad test-retest (test-retest reliability): mide la estabilidad de las puntuaciones otorgadas por el mismo evaluador, en los mismos sujetos y con el mismo método en momentos diferentes. Ítem (item): cada uno de los componentes o variables de un instrumento de medida; cada una de las partes o unidades de que se compone una prueba, un test o un cuestionario. Sensibilidad (responsiveness): capacidad de detectar y medir cambios, tanto en los diferentes individuos como en la respuesta de un mismo individuo a través del tiempo. Traducción (translation): expresar en una lengua algo que se ha expresado anteriormente o que está escrito en otra diferente. Traducción directa (forward translation): es aquella que se realiza de un idioma extranjero al idioma del traductor. Traducción inversa (back translation): es la traducción de un texto a su idioma original partiendo de una traducción de este texto realizada previamente a otro idioma. Traducción literal (literal translation): es aquella en la que se respeta el sentido del texto original. Validación (validation): evaluación del grado de preservación de las propiedades psicométricas del cuestionario. Validez (validity): capacidad que tiene el instrumento de medir aquel constructo para el que ha sido diseñado. Validez aparente (face validity): grado en que los ítems de un cuestionario, a juicio de los expertos y de los usuarios, miden de modo lógico o reflejan adecuadamente el constructo que se quiere medir. Validez de contenido (content validity): grado en que el contenido de un instrumento es capaz de medir la mayor parte de las dimensiones del constructo que se quiere estudiar. Validez de constructo (construct validity): grado en que las mediciones que resulten de las respuestas del cuestionario puedan considerarse como una medición del fenómeno estudiado. Validez de criterio (criterion validity): grado en que el resultado del cuestionario predice o concuerda con algún criterio de “valor real” o gold standard. Valor predictivo positivo (positive predictive validity): es la probabilidad de que esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es positivo. Valor predictivo negativo (negative predictive validity): es la probabilidad de que no esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es negativo.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Adaptación y validación de cuestionarios
Todo el cuestionario, incluyendo las instrucciones, los ítems y las opciones de respuesta, se traducirá utilizando este método, recopilando todo en un informe. Síntesis de traducciones: las traducciones serán comparadas por los traductores. Se identificarán y se discutirán las discrepancias entre las versiones traducidas hasta alcanzar el consenso. En el caso de que no exista consenso, se requerirá la participación del equipo de investigación. Al final, se realizará un informe del proceso en el que aparecerá una única traducción del cuestionario que será la versión de síntesis en el idioma de destino. Traducción inversa (retro traducción): la versión de síntesis será retro traducida al idioma original, al menos por dos traductores profesionales bilingües cuya lengua materna sea la del cuestionario original. Los traductores trabajarán de forma independiente, estarán ciegos para la versión original del cuestionario, no tendrán conocimientos previos sobre el tema y desconocerán los objetivos del estudio.12,13 Los traductores deberán subrayar las redacciones difíciles y las incertidumbres encontradas durante el
P r i m e r a f a s e
T r a d u c c i ó n
a d a p t a c i ó n
c u l t u r a l
Paso 1 Traducción directa
Adaptación y validación de cuestionarios
Artículo de revisión
Paso 2
proceso de traducción. Se determinará si la traducción ha dado lugar a diferencias semánticas o conceptuales importantes entre el cuestionario original y la versión de síntesis obtenida en el paso anterior. Todo lo anterior se recopilará en un informe.
Síntesis de traducciones
Todo el cuestionario, incluyendo las instrucciones, los ítems y las opciones de respuesta, se traducirá utilizando este método, recopilando todo en un informe. Síntesis de traducciones: las traducciones serán comparadas por los traductores. Se identificarán y se discutirán las discrepancias entre las versiones traducidas hasta alcanzar el consenso. En el caso de que no exista consenso, se requerirá la participación del equipo de investigación. Al final, se realizará un informe del proceso en el que aparecerá una única traducción del cuestionario que será la versión de síntesis en el idioma de destino.
Consolidación por un comité de expertos: se recomienda constituir un comité multidisciplinar, si es posible de expertos bilingües en el tema sobre el que trata el cuestionario: un experto en metodología, un lingüista y un profesional de la salud, además de los traductores que han participado en el proceso. El objetivo de este comité será llegar a un único cuestionario consolidado pre-final adaptado al idioma de destino.16,17 En este paso se dispondrá de las traducciones directas (paso 1), la versión de síntesis (paso 2) y las retrotraducciones (paso 3). Se identificarán y discutirán las discrepancias encontradas. Se cerciorará de que la versión pre-final sea totalmente comprensible y equivalente al cuestionario original. Se asegurará que el cuestionario pre-final resulte comprensible para una persona escolarizada con conocimientos equivalentes a un individuo de 12 años de edad. Paso 3 Traducción inversa
Paso 4 Consolidación comité expertos
Traducción inversa (retro traducción): la versión de síntesis será retro traducida al idioma original, al menos por dos traductores profesionales bilingües cuya lengua materna sea la del cuestionario original. Los traductores trabajarán de forma independiente, estarán ciegos para la versión original del cuestionario, no tendrán conocimientos previos sobre el tema y desconocerán los objetivos del estudio.12,13 Los traductores deberán subrayar las redacciones difíciles y las incertidumbres encontradas durante el
Paso 5 P r i m e r a
Pre-test (viabilidad)
f a s e
Informes del proceso
y
T r a d u c c i ó n
a d a p t a c i ó n
c u l t u r a l
f a s e
v a l i d a c i ó n
Figura 1. Proceso noviembre 2011
Fiabilidad
Validez
de traducción, adaptación cultural y validación (adaptado de referencia
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Consolidación por un comité de expertos: se recomienda constituir un comité multidisciplinar, si es posible de expertos bilingües en el tema sobre el que trata el cuestionario: un experto en metodología, un lingüista y un profesional de la salud, además de los traductores que han participado en el proceso. El objetivo de este comité será llegar a un único cuestionario consolidado pre-final adaptado al idioma de destino.16,17 En este paso se dispondrá de las traducciones directas (paso 1), la versión de síntesis (paso 2) y las retrotraducciones (paso 3). Se identificarán y discutirán las discrepancias encontradas. Se cerciorará de que la versión pre-final sea totalmente comprensible y equivalente al cuestionario original. Se asegurará que el cuestionario pre-final resulte comprensible para una persona escolarizada con conocimientos equivalentes a un individuo de 12 años de edad.
Paso 2
Paso 3
Traducción directa
Síntesis de traducciones
Traducción inversa
Paso 4 Consolidación comité expertos
Paso 5 Pre-test (viabilidad)
Informes del proceso
y Versión traducida y adaptada culturalmente
S e g u n d a
1. Consistencia interna 2. Fiabilidad intra-observador 3. Fiabilidad inter-observador 1. Validez aparente o lógica 2. Validez de contenido 3. Validez de criterio 4. Validez de constructo
proceso de traducción. Se determinará si la traducción ha dado lugar a diferencias semánticas o conceptuales importantes entre el cuestionario original y la versión de síntesis obtenida en el paso anterior. Todo lo anterior se recopilará en un informe.
Paso 1
Versión traducida y adaptada culturalmente
S e g u n d a
Artículo de revisión
Versión validada
f a s e
22). Barcelona,
v a l i d a c i ó n
Figura 1. Proceso noviembre 2011
61
31
Fiabilidad
Validez
1. Consistencia interna 2. Fiabilidad intra-observador 3. Fiabilidad inter-observador 1. Validez aparente o lógica 2. Validez de contenido 3. Validez de criterio 4. Validez de constructo
de traducción, adaptación cultural y validación (adaptado de referencia
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Versión validada
22). Barcelona,
61
31
Artículo de revisión
En el caso de que surjan incertidumbres se recurrirá, de ser posible, con alguno de los autores del cuestionario para solicitar su participación. Se elaborará un informe que sintetice las decisiones del comité, incluyendo la versión consolidada. Pre-test (aplicabilidad / viabilidad): su realización permitirá evaluar la calidad de la traducción, la adaptación cultural y la aplicabilidad o viabilidad del cuestionario. Asimismo permitirá calcular si el tiempo de cumplimentación se encuentra dentro de límites razonables. Investigadores como Durand y colaboradores,25 y Gallasch y colaboradores,26 realizaron el pre-test durante el proceso de traducción y adaptación cultural del Work Role Functioning Questionnaire (WRFQ-27) con una muestra de 30-40 trabajadores, y se obtuvieron resultados satisfactorios. Lo mismo realizaron De Soárez y colaboradores para el Work Limitations Questionnaire (WLQ), incluyendo a 20 voluntarios.27 Beaton propuso incluir en la muestra entre 30 y 40 participantes, basándose en una revisión bibliográfica de adaptaciones culturales.22 Se recomienda la realización del pre-test con participantes de distintos niveles educativos y, si se trata de cuestionarios autocumplimentados, los participantes deberán saber leer y comprender lo leído. Para seleccionar la muestra, es importante definir los criterios de inclusión y exclusión, así como el modo en que serán reclutados los participantes. En el caso de cuestionarios de aplicación en salud laboral, se recomienda incluir en el pre-test a trabajadores en activo, con una jornada mayor o igual a 10 horas semanales, de ambos sexos, con edades entre 18 y 65 años, con diferentes niveles educativos y que hablen como primera lengua, lean y comprendan el idioma de destino si se trata de cuestionarios autocumplimentados. De cada participante se recopilarán datos, al menos, sobre sus características sociodemográficas, nivel educativo y ocupación.25,26 Se solicitará a los participantes que llenen la versión consolidada y, mediante una entrevista estructurada, se les invitará a comentar cualquier aspecto que haya resultado difícil de entender. Se recomienda grabar estas entrevistas así como la autorización previa de los participantes, con el fin de poder revisarlas tantas veces como sea necesario. Al final, se realizará un informe donde se identificarán las posibles dificultades en la comprensión de las instrucciones del cuestionario, las preguntas y las opciones de respuesta. Se recomienda la revisión de cualquier pregunta del cuestionario si al menos 15% de los participantes encuentran dificultades en la misma.27
62
32
Ramada-Rodilla JM y col.
Segunda etapa: validación del cuestionario en el idioma destino La correcta traducción y adaptación cultural de un cuestionario no siempre garantiza la preservación de sus propiedades psicométricas, por lo que es necesaria su validación en el idioma de destino.22 Para que un cuestionario se considere válido, debe de reunir las siguientes características: a) ser fiable y capaz de medir sin error; b) ser capaz de detectar y medir cambios, tanto entre individuos como en la respuesta de un mismo individuo a través del tiempo; c) ser sencillo, viable y aceptado por pacientes, usuarios e investigadores; d) ser adecuado para medir el fenómeno que se pretende medir, y e) reflejar la teoría subyacente en el fenómeno o concepto que se quiere medir. Todas estas características están relacionadas con dos propiedades de los cuestionarios: la fiabilidad y la validez.14 La Sociedad Internacional para la Evaluación de la Calidad de Vida (en inglés, IQOLA)8,18,19 y otros investigadores como Aday,19 Lam,30 Mokkink,31-33 Ren,34 Scott-Lennox35 y Wiesinger,36 han propuesto o empleado diferentes métodos de evaluación de la fiabilidad y validez de los cuestionarios. De acuerdo con esas experiencias, se propone la validación de cuestionarios con la siguiente secuencia (figura 1):
Artículo de revisión
En el caso de que surjan incertidumbres se recurrirá, de ser posible, con alguno de los autores del cuestionario para solicitar su participación. Se elaborará un informe que sintetice las decisiones del comité, incluyendo la versión consolidada.
1.1. Consistencia interna: es el grado de interrelación y coherencia de los ítems. A través de este aspecto, se evalúa si los ítems que miden un mismo constructo presentan homogeneidad entre ellos.33,39 Cuando la escala de un instrumento es consistente, se garantiza que todos los ítems miden un solo constructo y, en general, se asegura la existencia de una relación lineal entre la suma de las puntuaciones de los ítems y el constructo medido.
Pre-test (aplicabilidad / viabilidad): su realización permitirá evaluar la calidad de la traducción, la adaptación cultural y la aplicabilidad o viabilidad del cuestionario. Asimismo permitirá calcular si el tiempo de cumplimentación se encuentra dentro de límites razonables. Investigadores como Durand y colaboradores,25 y Gallasch y colaboradores,26 realizaron el pre-test durante el proceso de traducción y adaptación cultural del Work Role Functioning Questionnaire (WRFQ-27) con una muestra de 30-40 trabajadores, y se obtuvieron resultados satisfactorios. Lo mismo realizaron De Soárez y colaboradores para el Work Limitations Questionnaire (WLQ), incluyendo a 20 voluntarios.27 Beaton propuso incluir en la muestra entre 30 y 40 participantes, basándose en una revisión bibliográfica de adaptaciones culturales.22 Se recomienda la realización del pre-test con participantes de distintos niveles educativos y, si se trata de cuestionarios autocumplimentados, los participantes deberán saber leer y comprender lo leído. Para seleccionar la muestra, es importante definir los criterios de inclusión y exclusión, así como el modo en que serán reclutados los participantes. En el caso de cuestionarios de aplicación en salud laboral, se recomienda incluir en el pre-test a trabajadores en activo, con una jornada mayor o igual a 10 horas semanales, de ambos sexos, con edades entre 18 y 65 años, con diferentes niveles educativos y que hablen como primera lengua, lean y comprendan el idioma de destino si se trata de cuestionarios autocumplimentados. De cada participante se recopilarán datos, al menos, sobre sus características sociodemográficas, nivel educativo y ocupación.25,26 Se solicitará a los participantes que llenen la versión consolidada y, mediante una entrevista estructurada, se les invitará a comentar cualquier aspecto que haya resultado difícil de entender. Se recomienda grabar estas entrevistas así como la autorización previa de los participantes, con el fin de poder revisarlas tantas veces como sea necesario. Al final, se realizará un informe donde se identificarán las posibles dificultades en la comprensión de las instrucciones del cuestionario, las preguntas y las opciones de respuesta. Se recomienda la revisión de cualquier pregunta del cuestionario si al menos 15% de los participantes encuentran dificultades en la misma.27
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
62
1. Fiabilidad: es el grado en que un instrumento es capaz de medir sin errores. Mide la proporción de variación en las mediciones que es debida a la diversidad de valores que adopta la variable y no al posible error sistemático o aleatorio.14,33 La fiabilidad determina la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos.20,33,37 Dependiendo de las características del cuestionario, su fiabilidad puede evaluarse para todas o algunas de sus tres dimensiones: 1) consistencia interna; 2) fiabilidad intra-observador o fiabilidad test-retest, y 3) fiabilidad inter-observador.
32
Ramada-Rodilla JM y col.
Segunda etapa: validación del cuestionario en el idioma destino La correcta traducción y adaptación cultural de un cuestionario no siempre garantiza la preservación de sus propiedades psicométricas, por lo que es necesaria su validación en el idioma de destino.22 Para que un cuestionario se considere válido, debe de reunir las siguientes características: a) ser fiable y capaz de medir sin error; b) ser capaz de detectar y medir cambios, tanto entre individuos como en la respuesta de un mismo individuo a través del tiempo; c) ser sencillo, viable y aceptado por pacientes, usuarios e investigadores; d) ser adecuado para medir el fenómeno que se pretende medir, y e) reflejar la teoría subyacente en el fenómeno o concepto que se quiere medir. Todas estas características están relacionadas con dos propiedades de los cuestionarios: la fiabilidad y la validez.14 La Sociedad Internacional para la Evaluación de la Calidad de Vida (en inglés, IQOLA)8,18,19 y otros investigadores como Aday,19 Lam,30 Mokkink,31-33 Ren,34 Scott-Lennox35 y Wiesinger,36 han propuesto o empleado diferentes métodos de evaluación de la fiabilidad y validez de los cuestionarios. De acuerdo con esas experiencias, se propone la validación de cuestionarios con la siguiente secuencia (figura 1): 1. Fiabilidad: es el grado en que un instrumento es capaz de medir sin errores. Mide la proporción de variación en las mediciones que es debida a la diversidad de valores que adopta la variable y no al posible error sistemático o aleatorio.14,33 La fiabilidad determina la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos.20,33,37 Dependiendo de las características del cuestionario, su fiabilidad puede evaluarse para todas o algunas de sus tres dimensiones: 1) consistencia interna; 2) fiabilidad intra-observador o fiabilidad test-retest, y 3) fiabilidad inter-observador. 1.1. Consistencia interna: es el grado de interrelación y coherencia de los ítems. A través de este aspecto, se evalúa si los ítems que miden un mismo constructo presentan homogeneidad entre ellos.33,39 Cuando la escala de un instrumento es consistente, se garantiza que todos los ítems miden un solo constructo y, en general, se asegura la existencia de una relación lineal entre la suma de las puntuaciones de los ítems y el constructo medido.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Adaptación y validación de cuestionarios
Un constructo es una cualidad latente o intangible de un sujeto o de una población que no se puede observar y medir directamente con un instrumento de medida, ya que esta cualidad tiene lugar dentro de una teoría. Son ejemplos el estrés laboral, la motivación, la discapacidad o el liderazgo. Evaluar la fiabilidad de un instrumento no ofrece mayores problemas cuando se trata de cuantificar cualidades objetivas, como el peso o la talla. No obstante, para los constructos es necesario probar de forma empírica que el instrumento sirve para medir aquello que se pretende medir. La medición de los constructos se realiza frecuentemente mediante cuestionarios donde se supone que cada ítem está relacionado con la cualidad no observable de interés. Para cada ítem se suele solicitar una respuesta a la que se asigna una puntuación. La suma de las puntuaciones proporciona la escala del cuestionario. En ocasiones, una escala puede estar compuesta por un grupo de subescalas. Por ejemplo, el riesgo laboral psicosocial es un constructo que, a su vez, puede estar compuesto por varias dimensiones como el nivel de demanda del trabajo, las recompensas, el nivel de control y el apoyo social. El coeficiente alfa de Cronbach permite cuantificar el nivel de fiabilidad de una escala si se cumplen dos requisitos: a) debe estar formada por un conjunto de ítems, cuyas puntuaciones se suman para calcular una puntuación global, y b) todas las puntuaciones de los ítems deben medir en la misma dirección; por ejemplo, a mayor puntuación mayor capacidad funcional o mayor bienestar emocional. El coeficiente alfa de Cronbach es la media ponderada de las correlaciones entre los ítems que forman parte de una escala.39 Cuando el instrumento está compuesto por un grupo de subescalas, debe calcularse el coeficiente alfa de Cronbach para los ítems respecto de la puntuación global (correlación ítem-total) y para los ítems de cada subescala respecto del valor de la misma (correlación ítem-subescala). El coeficiente alfa de Cronbach no viene acompañado de ningún valor de p que permita rechazar o no la hipótesis de fiabilidad de la escala. Puede adoptar valores entre 0 y 1. Se considera que valores alfa superiores a 0.70 son suficientes para garantizar la consistencia interna de la escala. salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Adaptación y validación de cuestionarios
Artículo de revisión
Un constructo es una cualidad latente o intangible de un sujeto o de una población que no se puede observar y medir directamente con un instrumento de medida, ya que esta cualidad tiene lugar dentro de una teoría. Son ejemplos el estrés laboral, la motivación, la discapacidad o el liderazgo. Evaluar la fiabilidad de un instrumento no ofrece mayores problemas cuando se trata de cuantificar cualidades objetivas, como el peso o la talla. No obstante, para los constructos es necesario probar de forma empírica que el instrumento sirve para medir aquello que se pretende medir. La medición de los constructos se realiza frecuentemente mediante cuestionarios donde se supone que cada ítem está relacionado con la cualidad no observable de interés. Para cada ítem se suele solicitar una respuesta a la que se asigna una puntuación. La suma de las puntuaciones proporciona la escala del cuestionario. En ocasiones, una escala puede estar compuesta por un grupo de subescalas. Por ejemplo, el riesgo laboral psicosocial es un constructo que, a su vez, puede estar compuesto por varias dimensiones como el nivel de demanda del trabajo, las recompensas, el nivel de control y el apoyo social. El coeficiente alfa de Cronbach permite cuantificar el nivel de fiabilidad de una escala si se cumplen dos requisitos: a) debe estar formada por un conjunto de ítems, cuyas puntuaciones se suman para calcular una puntuación global, y b) todas las puntuaciones de los ítems deben medir en la misma dirección; por ejemplo, a mayor puntuación mayor capacidad funcional o mayor bienestar emocional. El coeficiente alfa de Cronbach es la media ponderada de las correlaciones entre los ítems que forman parte de una escala.39 Cuando el instrumento está compuesto por un grupo de subescalas, debe calcularse el coeficiente alfa de Cronbach para los ítems respecto de la puntuación global (correlación ítem-total) y para los ítems de cada subescala respecto del valor de la misma (correlación ítem-subescala). El coeficiente alfa de Cronbach no viene acompañado de ningún valor de p que permita rechazar o no la hipótesis de fiabilidad de la escala. Puede adoptar valores entre 0 y 1. Se considera que valores alfa superiores a 0.70 son suficientes para garantizar la consistencia interna de la escala.
1.2 Fiabilidad intra-observador o fiabilidad test-retest: este aspecto hace referencia a la repetibilidad del instrumento, cuando se administra con el mismo método a la misma población en dos momentos diferentes.14,33 Cuando la escala es cuantitativa, su análisis se realiza mediante el cálculo del coeficiente de correlación intraclase (CCI), y cuando es cualitativa se realiza mediante el cálculo del índice Kappa de Cohen.21,37 El tiempo que debe transcurrir entre la primera vez (test) y la segunda (retest) dependerá de lo que se esté midiendo. No debe ser muy largo para evitar que el fenómeno observado sufra variaciones que alterarían el valor de la repetibilidad y tampoco debe ser demasiado corto para evitar el recuerdo de las respuestas (efecto aprendizaje). 1.3 Fiabilidad inter-observador: es el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento.33 Esta propiedad no es evaluable cuando se trata de cuestionarios autocumplimentados, ya que es el propio individuo quien proporciona las respuestas sin que exista interferencia de los investigadores. Si se requiere su evaluación, se realizará mediante el cálculo del coeficiente de correlación intraclase (CCI) cuando la escala sea cuantitativa, y el índice Kappa de Cohen cuando sea cualitativa. Las limitaciones principales se deben a la posibilidad de que existan de acuerdos entre los observadores debidos al azar y la posibilidad de que exista un error sistemático (sesgo de información) de alguno de los evaluadores. 2. Validez: es la capacidad del cuestionario de medir aquel constructo para el que ha sido diseñado.19,33 Puede evaluarse para todas o sólo para alguna de sus cuatro dimensiones: validez aparente o lógica, de contenido, de criterio y de constructo. 2.1 Validez aparente o lógica: se refiere al grado en que un cuestionario, a juicio de los expertos y de los usuarios, mide de forma lógica lo que quiere medir.14,19 Cuando se carece de validez aparente o lógica, los sujetos sometidos a estudio pueden no ver la relación entre las preguntas que se les formulan y el objeto para el cual han accedido a contestar. Este hecho puede provocar el rechazo de los participantes. 63
33
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Artículo de revisión
1.2 Fiabilidad intra-observador o fiabilidad test-retest: este aspecto hace referencia a la repetibilidad del instrumento, cuando se administra con el mismo método a la misma población en dos momentos diferentes.14,33 Cuando la escala es cuantitativa, su análisis se realiza mediante el cálculo del coeficiente de correlación intraclase (CCI), y cuando es cualitativa se realiza mediante el cálculo del índice Kappa de Cohen.21,37 El tiempo que debe transcurrir entre la primera vez (test) y la segunda (retest) dependerá de lo que se esté midiendo. No debe ser muy largo para evitar que el fenómeno observado sufra variaciones que alterarían el valor de la repetibilidad y tampoco debe ser demasiado corto para evitar el recuerdo de las respuestas (efecto aprendizaje). 1.3 Fiabilidad inter-observador: es el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento.33 Esta propiedad no es evaluable cuando se trata de cuestionarios autocumplimentados, ya que es el propio individuo quien proporciona las respuestas sin que exista interferencia de los investigadores. Si se requiere su evaluación, se realizará mediante el cálculo del coeficiente de correlación intraclase (CCI) cuando la escala sea cuantitativa, y el índice Kappa de Cohen cuando sea cualitativa. Las limitaciones principales se deben a la posibilidad de que existan de acuerdos entre los observadores debidos al azar y la posibilidad de que exista un error sistemático (sesgo de información) de alguno de los evaluadores. 2. Validez: es la capacidad del cuestionario de medir aquel constructo para el que ha sido diseñado.19,33 Puede evaluarse para todas o sólo para alguna de sus cuatro dimensiones: validez aparente o lógica, de contenido, de criterio y de constructo. 2.1 Validez aparente o lógica: se refiere al grado en que un cuestionario, a juicio de los expertos y de los usuarios, mide de forma lógica lo que quiere medir.14,19 Cuando se carece de validez aparente o lógica, los sujetos sometidos a estudio pueden no ver la relación entre las preguntas que se les formulan y el objeto para el cual han accedido a contestar. Este hecho puede provocar el rechazo de los participantes. 63
33
Ramada-Rodilla JM y col.
Artículo de revisión
Esta dimensión de la validez debe evaluarse en el momento de su diseño; no obstante, si en el proceso de TACV se detectan desajustes debidos al proceso de traducción o adaptación cultural, será necesario corregirlos. 2.2 Validez de contenido: los constructos suelen estar compuestos por varias dimensiones. La validez de contenido es el grado en que la herramienta es capaz de medir la mayor parte de las dimensiones del constructo.14,19,33 Un cuestionario con alta validez de contenido es aquel que mide todas las dimensiones relacionadas con el constructo que se quiere estudiar. Su evaluación es un proceso formal que siempre debe realizarse en un proceso de TACV y consiste en valorar si los ítems del cuestionario son una muestra representativa de aquello que se quiere medir. Se trata de una evaluación empírica, basada en juicios de diferente procedencia, como son las opiniones de los autores de la herramienta, los resultados de estudios piloto, los razonamientos realizados por el comité de expertos en un proceso de TACV y el análisis cualitativo de los comentarios realizados por los participantes durante el proceso de pre-test. 2.3 Validez de criterio: establece la validez de un instrumento comparándola con algún criterio externo o prueba de referencia (“gold standard”,GS). Tiene dos dimensiones: 1) la validez concurrente o grado en que el resultado del cuestionario concuerda con algún GS, y 2) la validez predictiva o grado en que es capaz de pronosticar un determinado resultado.14,19,33 El GS debe ser un método alternativo equivalente, independiente de los resultados del cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida.14,19 Cuando reúne estos requisitos es capaz de dar un resultado siempre positivo en presencia del fenómeno a estudiar y siempre negativo en ausencia del mismo. Por ejemplo, la electromiografía realizada en condiciones adecuadas podría ser el GS frente a un cuestionario para la evaluación de la presencia del síndrome del túnel carpiano. Siempre que haya un GS, debería evaluarse la validez de criterio concurrente, siguiendo cinco pasos: 1) selección del GS; 2) selección de una muestra de sujetos representativa de la población; 3) administración del cuestionario y obtención del resultado para cada individuo; 4) evaluación de cada individuo con el GE, y 5) 64
34
comparación de los resultados obtenidos con el cuestionario y el GS. El análisis de la validez de criterio concurrente consiste en examinar la fuerza de la correlación existente entre el resultado del cuestionario y el del GS y se puede cuantificar mediante el cálculo del coeficiente de correlación de Pearson (r). Otro enfoque para cuantificar la validez de criterio concurrente consiste en analizar la sensibilidad y la especificidad.19,21 La sensibilidad es la capacidad que tiene el cuestionario para detectar a los individuos que presentan el fenómeno de estudio. Se puede definir como la probabilidad de que un individuo que realmente tenga el fenómeno de estudio obtenga un resultado positivo cuando se le aplique el cuestionario. Se calcula mediante el cociente entre los verdaderos positivos (VP) y la suma de los VP y los falsos negativos (FN). De ahí que también que se le conozca como la fracción de verdaderos positivos (FVP). Sensibilidad=VP/(VP+FN). La especificidad es la capacidad de detectar a los que no presentan el fenómeno de estudio, y es la probabilidad de que un individuo que no tenga el fenómeno de estudio obtenga un resultado negativo cuando se le aplique el cuestionario. Se puede calcular mediante el cociente entre los VN y la suma de los VN y los FP, y se le conoce como la fracción de verdaderos negativos (FVN); especificidad = VN/(VN+FP) (cuadro III). Cuanto más alta sea la sensibilidad y especificidad, y menor sea el porcentaje de FP y FN, mayor será la validez concurrente.
Esta dimensión de la validez debe evaluarse en el momento de su diseño; no obstante, si en el proceso de TACV se detectan desajustes debidos al proceso de traducción o adaptación cultural, será necesario corregirlos. 2.2 Validez de contenido: los constructos suelen estar compuestos por varias dimensiones. La validez de contenido es el grado en que la herramienta es capaz de medir la mayor parte de las dimensiones del constructo.14,19,33 Un cuestionario con alta validez de contenido es aquel que mide todas las dimensiones relacionadas con el constructo que se quiere estudiar. Su evaluación es un proceso formal que siempre debe realizarse en un proceso de TACV y consiste en valorar si los ítems del cuestionario son una muestra representativa de aquello que se quiere medir. Se trata de una evaluación empírica, basada en juicios de diferente procedencia, como son las opiniones de los autores de la herramienta, los resultados de estudios piloto, los razonamientos realizados por el comité de expertos en un proceso de TACV y el análisis cualitativo de los comentarios realizados por los participantes durante el proceso de pre-test. 2.3 Validez de criterio: establece la validez de un instrumento comparándola con algún criterio externo o prueba de referencia (“gold standard”,GS). Tiene dos dimensiones: 1) la validez concurrente o grado en que el resultado del cuestionario concuerda con algún GS, y 2) la validez predictiva o grado en que es capaz de pronosticar un determinado resultado.14,19,33 El GS debe ser un método alternativo equivalente, independiente de los resultados del cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida.14,19 Cuando reúne estos requisitos es capaz de dar un resultado siempre positivo en presencia del fenómeno a estudiar y siempre negativo en ausencia del mismo. Por ejemplo, la electromiografía realizada en condiciones adecuadas podría ser el GS frente a un cuestionario para la evaluación de la presencia del síndrome del túnel carpiano. Siempre que haya un GS, debería evaluarse la validez de criterio concurrente, siguiendo cinco pasos: 1) selección del GS; 2) selección de una muestra de sujetos representativa de la población; 3) administración del cuestionario y obtención del resultado para cada individuo; 4) evaluación de cada individuo con el GE, y 5)
Cuadro III
Cálculo de la sensibilidad, especificidad, valor predictivo positivo y valor predictivo negativo.* Barcelona, España, noviembre 2011. Resultado del cuestionario
Fenómeno de estudio (gold standard) Presente Ausente Total
Positivo Negativo Total
VP FN VP+FN
FP VN FP+VN
VP+FP FN+VN
Fuente: * Adaptado de referencia 21 VP: verdaderos positivos; FP: falsos positivos; FN: falsos negativos; VN: verdaderos negativos Sensibilidad: VP/(VP+FN) Especificidad: VN/(FP+VN) Valor Predictivo Positivo (VPP): VP/(VP+FP)
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Ramada-Rodilla JM y col.
Artículo de revisión
64
34
comparación de los resultados obtenidos con el cuestionario y el GS. El análisis de la validez de criterio concurrente consiste en examinar la fuerza de la correlación existente entre el resultado del cuestionario y el del GS y se puede cuantificar mediante el cálculo del coeficiente de correlación de Pearson (r). Otro enfoque para cuantificar la validez de criterio concurrente consiste en analizar la sensibilidad y la especificidad.19,21 La sensibilidad es la capacidad que tiene el cuestionario para detectar a los individuos que presentan el fenómeno de estudio. Se puede definir como la probabilidad de que un individuo que realmente tenga el fenómeno de estudio obtenga un resultado positivo cuando se le aplique el cuestionario. Se calcula mediante el cociente entre los verdaderos positivos (VP) y la suma de los VP y los falsos negativos (FN). De ahí que también que se le conozca como la fracción de verdaderos positivos (FVP). Sensibilidad=VP/(VP+FN). La especificidad es la capacidad de detectar a los que no presentan el fenómeno de estudio, y es la probabilidad de que un individuo que no tenga el fenómeno de estudio obtenga un resultado negativo cuando se le aplique el cuestionario. Se puede calcular mediante el cociente entre los VN y la suma de los VN y los FP, y se le conoce como la fracción de verdaderos negativos (FVN); especificidad = VN/(VN+FP) (cuadro III). Cuanto más alta sea la sensibilidad y especificidad, y menor sea el porcentaje de FP y FN, mayor será la validez concurrente. Cuadro III
Cálculo de la sensibilidad, especificidad, valor predictivo positivo y valor predictivo negativo.* Barcelona, España, noviembre 2011. Resultado del cuestionario
Fenómeno de estudio (gold standard) Presente Ausente Total
Positivo Negativo Total
VP FN VP+FN
FP VN FP+VN
VP+FP FN+VN
Fuente: * Adaptado de referencia 21 VP: verdaderos positivos; FP: falsos positivos; FN: falsos negativos; VN: verdaderos negativos Sensibilidad: VP/(VP+FN) Especificidad: VN/(FP+VN) Valor Predictivo Positivo (VPP): VP/(VP+FP)
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Adaptación y validación de cuestionarios
Se considera que un cuestionario tiene una sensibilidad y especificidad aceptable cuando éstas son superiores a 0,80.20 A partir de aquí, puede ser de interés conocer la validez predictiva.21 El valor predictivo positivo (VPP) es la probabilidad de que un individuo presente el fenómeno de estudio que se busca medir con el cuestionario si se obtiene un resultado positivo en el mismo. Se calcula mediante la proporción de participantes con un resultado positivo en el cuestionario y que finalmente presentaban el fenómeno de estudio que se intentaba medir: VPP = VP/(VP+FP). El valor predictivo negativo (VPN) es la probabilidad de que no esté presente dicho fenómeno cuando el resultado del cuestionario es negativo: VPN = VN/(FN+VN). 2.4 Validez de constructo: es el grado en que las mediciones que resultan de las respuestas del cuestionario pueden considerarse una medición del fenómeno estudiado.14,19,33 Su evaluación consiste en contrastar las hipótesis que se han formulado sobre el comportamiento de las puntuaciones de un instrumento en situaciones diferentes. Existen varios métodos para su evaluación, que deben realizarse cuando el fenómeno a medir es abstracto o no es posible comparar con un GE. El uso de técnicas de análisis de la validez para grupos conocidos es un procedimiento muy adecuado en cuestionarios de salud laboral para medir el grado de capacidad física o cognitiva para el trabajo. Permite comparar los resultados obtenidos mediante la aplicación del cuestionario a grupos con un diagnóstico clínico conocido de salud física o mental.19,20
Conclusiones La TACV de cuestionarios para su uso en otros idiomas es un proceso que consume recursos; sin embargo, cuando se lleva a cabo de forma sistemática permite obtener una herramienta de medición equivalente a su versión original. El modo en que se realiza la TACV de cuestionarios de salud es perfectible; así entonces, es importante seguir las recomendaciones metodológicas. Si el proceso de TACV no se lleva a cabo de manera rigurosa, pueden producirse errores con implicaciones en el diagnóstico, en las decisiones que deben tomarse con respecto a la terapia individual, en los registros epidemiológicos e, incluso, en el diseño y puesta en marcha de políticas públicas. Además, el uso de herramientas no equivalentes salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Adaptación y validación de cuestionarios
Artículo de revisión
al cuestionario original puede producir resultados no fiables o confusos que podrían limitar el intercambio de información entre la comunidad científica.13,14,22-24 Esta propuesta para la TACV de cuestionarios de salud guarda coherencia con las recomendaciones de expertos como Alexandre,13 Beaton,22 Carvajal,23 Guillemin12 y Herdman24 para la realización de traducciones y adaptaciones culturales. El proceso de traducción y adaptación debe ir seguido de un proceso de validación en la lengua de destino, lo cual permite minimizar el sesgo de información que podría asociarse a la administración de cuestionarios en países con idiomas y culturas diferentes. Por ello, se complementa el proceso proponiendo una serie de pasos a seguir durante la etapa de validación, coherentes con las recomendaciones de expertos como Aday,19 Mokkink,31-33 Müller37 y Keszey.38
Se considera que un cuestionario tiene una sensibilidad y especificidad aceptable cuando éstas son superiores a 0,80.20 A partir de aquí, puede ser de interés conocer la validez predictiva.21 El valor predictivo positivo (VPP) es la probabilidad de que un individuo presente el fenómeno de estudio que se busca medir con el cuestionario si se obtiene un resultado positivo en el mismo. Se calcula mediante la proporción de participantes con un resultado positivo en el cuestionario y que finalmente presentaban el fenómeno de estudio que se intentaba medir: VPP = VP/(VP+FP). El valor predictivo negativo (VPN) es la probabilidad de que no esté presente dicho fenómeno cuando el resultado del cuestionario es negativo: VPN = VN/(FN+VN). 2.4 Validez de constructo: es el grado en que las mediciones que resultan de las respuestas del cuestionario pueden considerarse una medición del fenómeno estudiado.14,19,33 Su evaluación consiste en contrastar las hipótesis que se han formulado sobre el comportamiento de las puntuaciones de un instrumento en situaciones diferentes. Existen varios métodos para su evaluación, que deben realizarse cuando el fenómeno a medir es abstracto o no es posible comparar con un GE. El uso de técnicas de análisis de la validez para grupos conocidos es un procedimiento muy adecuado en cuestionarios de salud laboral para medir el grado de capacidad física o cognitiva para el trabajo. Permite comparar los resultados obtenidos mediante la aplicación del cuestionario a grupos con un diagnóstico clínico conocido de salud física o mental.19,20
Declaración de conflicto de intereses: Los autores declararon no tener conflicto de intereses.
Referencias 1. Goldberg D, Bridges K, Duncan-Jones P, Grayson D. Detecting anxiety and depression in general medical settings. BMJ 1988; 297: 897-899. 2. Susitaival P, Flyvholm MA, Meding B, Kanerva L, Lindberg M, Svensson A, et al. Nordic Occupational Skin Questionnaire (NOSQ-2002): a new tool for surveying occupational skin diseases and exposure. Contact Dermatitis 2003;49:70-76. 3. Melosini L, Dente FL, Bacci E, Bartoli ML, Cianchetti S, Costa F, et al. Asthma control test (ACT): comparison with clinical, functional, and biological markers of asthma control. J Asthma 2012;49:317-323. 4. Connor JP, Grier M, Feeney GF, Young RM. The validity of the Brief Michigan Alcohol Screening Test (bMAST) as a problem drinking severity measure. J Stud Alcohol Drugs 2007;68:771-779. 5. Tuomi K, Ilmarinen J, Eskelinen L, Järvinen E, Toikkanen J, Klockars M. Prevalence and incidence rates of diseases and work ability in different work categories of municipal occupations. Scand J Work Environ Health 1991;17 (Suppl 1):67-74. 6. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses, and recommended measures. Spine 2000; 25:3152-160. 7. Scublinsky D, González C, Iannantuono R, Somma LF, Rillo O, Casado G et al. Adaptación al español y validación del cuestionario de detección epidemiológica para artritis reumatoidea. Rev Argent Reumatol 2008; 19:33-35. 8. Simonsson M, Bergman S, Jacobsson L, Petersson I, Svensson B. The prevalence of rheumatoid arthritis in Sweden. Scand J Rheumatol 1999;28:340-343. 9. Kuorinka I, Jonsson B, Kilbom A, Vinterberg H, Biering-Sørensen F, Andersson G, et al. Standardised Nordic questionnaires for the analysis of musculoskeletal symptoms. Appl Ergon 1987; 18: 233-237. 10. Tapia-Conyer R, Velázquez-Monroy O, Lara-Esqueda A, Tapia-Olarte F, Aurora-Jiménez R, Sánchez-Montes J, et al. Guía de detección integrada de obesidad, diabetes e hipertensión arterial. [monografía en Internet]. Ciudad de México, DF: Secretaría de Salud de México; [consultado 2012 septiembre 18]. Disponible en: www.salud.gob.mx/unidades/cdi/documentos/DOCSAL7482.pdf
Conclusiones La TACV de cuestionarios para su uso en otros idiomas es un proceso que consume recursos; sin embargo, cuando se lleva a cabo de forma sistemática permite obtener una herramienta de medición equivalente a su versión original. El modo en que se realiza la TACV de cuestionarios de salud es perfectible; así entonces, es importante seguir las recomendaciones metodológicas. Si el proceso de TACV no se lleva a cabo de manera rigurosa, pueden producirse errores con implicaciones en el diagnóstico, en las decisiones que deben tomarse con respecto a la terapia individual, en los registros epidemiológicos e, incluso, en el diseño y puesta en marcha de políticas públicas. Además, el uso de herramientas no equivalentes
65
35
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Artículo de revisión
al cuestionario original puede producir resultados no fiables o confusos que podrían limitar el intercambio de información entre la comunidad científica.13,14,22-24 Esta propuesta para la TACV de cuestionarios de salud guarda coherencia con las recomendaciones de expertos como Alexandre,13 Beaton,22 Carvajal,23 Guillemin12 y Herdman24 para la realización de traducciones y adaptaciones culturales. El proceso de traducción y adaptación debe ir seguido de un proceso de validación en la lengua de destino, lo cual permite minimizar el sesgo de información que podría asociarse a la administración de cuestionarios en países con idiomas y culturas diferentes. Por ello, se complementa el proceso proponiendo una serie de pasos a seguir durante la etapa de validación, coherentes con las recomendaciones de expertos como Aday,19 Mokkink,31-33 Müller37 y Keszey.38 Declaración de conflicto de intereses: Los autores declararon no tener conflicto de intereses.
Referencias 1. Goldberg D, Bridges K, Duncan-Jones P, Grayson D. Detecting anxiety and depression in general medical settings. BMJ 1988; 297: 897-899. 2. Susitaival P, Flyvholm MA, Meding B, Kanerva L, Lindberg M, Svensson A, et al. Nordic Occupational Skin Questionnaire (NOSQ-2002): a new tool for surveying occupational skin diseases and exposure. Contact Dermatitis 2003;49:70-76. 3. Melosini L, Dente FL, Bacci E, Bartoli ML, Cianchetti S, Costa F, et al. Asthma control test (ACT): comparison with clinical, functional, and biological markers of asthma control. J Asthma 2012;49:317-323. 4. Connor JP, Grier M, Feeney GF, Young RM. The validity of the Brief Michigan Alcohol Screening Test (bMAST) as a problem drinking severity measure. J Stud Alcohol Drugs 2007;68:771-779. 5. Tuomi K, Ilmarinen J, Eskelinen L, Järvinen E, Toikkanen J, Klockars M. Prevalence and incidence rates of diseases and work ability in different work categories of municipal occupations. Scand J Work Environ Health 1991;17 (Suppl 1):67-74. 6. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses, and recommended measures. Spine 2000; 25:3152-160. 7. Scublinsky D, González C, Iannantuono R, Somma LF, Rillo O, Casado G et al. Adaptación al español y validación del cuestionario de detección epidemiológica para artritis reumatoidea. Rev Argent Reumatol 2008; 19:33-35. 8. Simonsson M, Bergman S, Jacobsson L, Petersson I, Svensson B. The prevalence of rheumatoid arthritis in Sweden. Scand J Rheumatol 1999;28:340-343. 9. Kuorinka I, Jonsson B, Kilbom A, Vinterberg H, Biering-Sørensen F, Andersson G, et al. Standardised Nordic questionnaires for the analysis of musculoskeletal symptoms. Appl Ergon 1987; 18: 233-237. 10. Tapia-Conyer R, Velázquez-Monroy O, Lara-Esqueda A, Tapia-Olarte F, Aurora-Jiménez R, Sánchez-Montes J, et al. Guía de detección integrada de obesidad, diabetes e hipertensión arterial. [monografía en Internet]. Ciudad de México, DF: Secretaría de Salud de México; [consultado 2012 septiembre 18]. Disponible en: www.salud.gob.mx/unidades/cdi/documentos/DOCSAL7482.pdf
65
35
Artículo de revisión
11. Hutchinson A, Bentzen N, Konig-Zahn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO, 1996. 12. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol 1995;24:61-63. 13. Alexandre NMC, Guirardello Ede B. Cultural adaptation of instruments utilized in occupational health. Rev Panam Salud Publica 2002;11:109-111. 14. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación de cuestionarios. Reumatol Clin 2009;5:171-177. 15. Kulis D, Arnott M, Greimel ER, Bottomley A, Koller M. Trends in translation requests and arising issues regarding cultural adaptation. Expert Rev Pharmacoecon Outcomes Res 2011;11:307-314. 16. Lobiondo-Wood G, Haber J. Reliability and validity. Nursing research: methods, critical appraisal, and utilization. 4a. ed. St. Louis: Mosby, 1998 17. Burns N, Grove SK. The practice of nursing research: conduct, critique and utilization. 3a. ed. Philadelphia: Saunders, 1997. 18. Ware JE Jr, Gandec B, Keller S, IQOLA Group. Evaluating instruments used cross-nationally: Methods from the IQOLA Project. En: SpilkerB, ed. Quality of life and pharmacoeconomics in clinical trials. 2a. ed. Philadelphia: Lippincort-Raven Publishers, 1996: 681-692. 19. Aday LA, Cornelius LJ. Designing and conducting health surveys: a comprehensive guide. 3a. ed. San Francisco, CA: Jossey-Bass publisher, 2006. 20. Argimon-Pallas JM, Jimenez-Villa J. Métodos de investigación clínica y epidemiológica. 3a. ed. Madrid: Elsevier España, 2004. 21. Serra C, Company A. Vigilancia de la salud. En: Ruiz-Frutos C, García AM, Delclòs J, Benavides FG. Salud laboral, conceptos y técnicas para la prevención de riesgos laborales. 3a. ed. Barcelona: Masson, 2007: 255-264. 22. Beaton DE, Bombardier C, Guillemin F, Bosi-Ferraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine 2000;25:3186-3191. 23. Carvajal A, Centeno C, Watson R, Martínez M, Rubiales AS. How is an instrument for measuring health to be validated?. An Sist Sanit Navar 2011;34:63-72. 24. Herdman M, Fox-Rushby J, Badia X. A model of equivalence in the cultural adaptation of HRQoL instruments: the universalist approach. Qual Life Res 1998;7:323-335. 25. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil 2004;27:261-268. 26. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil 2007;17:701-711.
66
36
Ramada-Rodilla JM y col.
27. de Soárez PC, Kowalski CC, Ferraz MB, Ciconelli RM. Translation into Brazilian Portuguese and validation of the Work Limitations Questionnaire. Rev Panam Salud Publica 2007;22:21-28. 28. Bullinger M, Aonso J, Apolone G, et al. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998;51:913-923. 29. Gandek B, Ware JE Jr, IQOLA Group. Methods for validation and norming translations of health status questionnaires: the IQOLA project approach. International quality of life assessment. J Clin Epidemiol 1998;51:953-959. 30. Lam CL, Gandek B, Ren XS, Chan MS. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139-1147. 31. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010;19:539-549. 32. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content. BMC Med Res Methodol 2010;10: 22. 33. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737-745. 34. Ren XS, Amik B III, Zhou L, Gandek B. Translation and psychometric evaluation of a Chinese version of the SF-36 Health Survey in the United States. J Clin Epidemiol 1998;51:1129-1138. 35. Scott-Lenox JA, Wu AW, Boyer JG, Ware JE Jr. Reliability and validity of French, German, Italian, Dutch, and UK English translations of the medical outcomes study HIV Health Survey. Med Care 1999;37:908-925. 36. Wiesinger GF, Nhur M, Quitann M, Ebenbichler G, Wölfl G, FialkaMoser V. Cross-cultural adaptation of the Roland-Morris questionnaire for German-speaking patients with low back pain. Spine 1999;24:1099-1103. 37. Müller R, Büttner P. A critical discussion of intraclass correlation coefficients. Stat Med 1994;13:2465-2476. 38. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J Psychosom Res 2010;68:319-323. 39. Cronbach, LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
Artículo de revisión
11. Hutchinson A, Bentzen N, Konig-Zahn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO, 1996. 12. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol 1995;24:61-63. 13. Alexandre NMC, Guirardello Ede B. Cultural adaptation of instruments utilized in occupational health. Rev Panam Salud Publica 2002;11:109-111. 14. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación de cuestionarios. Reumatol Clin 2009;5:171-177. 15. Kulis D, Arnott M, Greimel ER, Bottomley A, Koller M. Trends in translation requests and arising issues regarding cultural adaptation. Expert Rev Pharmacoecon Outcomes Res 2011;11:307-314. 16. Lobiondo-Wood G, Haber J. Reliability and validity. Nursing research: methods, critical appraisal, and utilization. 4a. ed. St. Louis: Mosby, 1998 17. Burns N, Grove SK. The practice of nursing research: conduct, critique and utilization. 3a. ed. Philadelphia: Saunders, 1997. 18. Ware JE Jr, Gandec B, Keller S, IQOLA Group. Evaluating instruments used cross-nationally: Methods from the IQOLA Project. En: SpilkerB, ed. Quality of life and pharmacoeconomics in clinical trials. 2a. ed. Philadelphia: Lippincort-Raven Publishers, 1996: 681-692. 19. Aday LA, Cornelius LJ. Designing and conducting health surveys: a comprehensive guide. 3a. ed. San Francisco, CA: Jossey-Bass publisher, 2006. 20. Argimon-Pallas JM, Jimenez-Villa J. Métodos de investigación clínica y epidemiológica. 3a. ed. Madrid: Elsevier España, 2004. 21. Serra C, Company A. Vigilancia de la salud. En: Ruiz-Frutos C, García AM, Delclòs J, Benavides FG. Salud laboral, conceptos y técnicas para la prevención de riesgos laborales. 3a. ed. Barcelona: Masson, 2007: 255-264. 22. Beaton DE, Bombardier C, Guillemin F, Bosi-Ferraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine 2000;25:3186-3191. 23. Carvajal A, Centeno C, Watson R, Martínez M, Rubiales AS. How is an instrument for measuring health to be validated?. An Sist Sanit Navar 2011;34:63-72. 24. Herdman M, Fox-Rushby J, Badia X. A model of equivalence in the cultural adaptation of HRQoL instruments: the universalist approach. Qual Life Res 1998;7:323-335. 25. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil 2004;27:261-268. 26. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil 2007;17:701-711.
66
36
Ramada-Rodilla JM y col.
27. de Soárez PC, Kowalski CC, Ferraz MB, Ciconelli RM. Translation into Brazilian Portuguese and validation of the Work Limitations Questionnaire. Rev Panam Salud Publica 2007;22:21-28. 28. Bullinger M, Aonso J, Apolone G, et al. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998;51:913-923. 29. Gandek B, Ware JE Jr, IQOLA Group. Methods for validation and norming translations of health status questionnaires: the IQOLA project approach. International quality of life assessment. J Clin Epidemiol 1998;51:953-959. 30. Lam CL, Gandek B, Ren XS, Chan MS. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139-1147. 31. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010;19:539-549. 32. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content. BMC Med Res Methodol 2010;10: 22. 33. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737-745. 34. Ren XS, Amik B III, Zhou L, Gandek B. Translation and psychometric evaluation of a Chinese version of the SF-36 Health Survey in the United States. J Clin Epidemiol 1998;51:1129-1138. 35. Scott-Lenox JA, Wu AW, Boyer JG, Ware JE Jr. Reliability and validity of French, German, Italian, Dutch, and UK English translations of the medical outcomes study HIV Health Survey. Med Care 1999;37:908-925. 36. Wiesinger GF, Nhur M, Quitann M, Ebenbichler G, Wölfl G, FialkaMoser V. Cross-cultural adaptation of the Roland-Morris questionnaire for German-speaking patients with low back pain. Spine 1999;24:1099-1103. 37. Müller R, Büttner P. A critical discussion of intraclass correlation coefficients. Stat Med 1994;13:2465-2476. 38. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J Psychosom Res 2010;68:319-323. 39. Cronbach, LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334.
salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013
3. PAPER # 2
3. PAPER # 2
Cross-cultural adaptation of the work role functioning
Cross-cultural adaptation of the work role functioning
questionnaire to Spanish spoken in Spain. Journal of
questionnaire to Spanish spoken in Spain. Journal of
Occupational Rehabilitation. 2013;23:566-75.
Occupational Rehabilitation. 2013;23:566-75.
37
37
184
184
J Occup Rehabil (2013) 23:566–575 DOI 10.1007/s10926-013-9420-6
J Occup Rehabil (2013) 23:566–575 DOI 10.1007/s10926-013-9420-6
Cross-Cultural Adaptation of the Work Role Functioning Questionnaire to Spanish Spoken in Spain
Cross-Cultural Adaptation of the Work Role Functioning Questionnaire to Spanish Spoken in Spain
Jose´ M. Ramada • Consol Serra • Benjamin C. Amick III • Juan R. Castan˜o • George L. Delclos
Jose´ M. Ramada • Consol Serra • Benjamin C. Amick III • Juan R. Castan˜o • George L. Delclos
Published online: 29 January 2013 Ó Springer Science+Business Media New York 2013
Published online: 29 January 2013 Ó Springer Science+Business Media New York 2013
Abstract Purpose The Work Role Functioning Questionnaire (WRFQ) is a tool developed in the United States to measure work disability and assess the perceived impact of health problems on worker ability to perform jobs. We translated and adapted the WRFQ to Spanish spoken in Spain and assessed preservation of its psychometric properties. Methods Cross-cultural adaptation of the WRFQ was performed following a systematic 5-step procedure: (1) direct translation, (2) synthesis, (3) back-translation, (4) consolidation by an expert committee and (5) pre-test. Psychometric properties were evaluated by administering the questionnaire to 40 patients with different cultural levels and health problems. Applicability,
J. M. Ramada C. Serra (&) G. L. Delclos Center for Research in Occupational Health (CiSAL), University Pompeu Fabra, PRBB Building, Dr. Aiguader, 88, 08003 Barcelona, Spain e-mail:
[email protected] J. M. Ramada C. Serra Occupational Health Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. M. Ramada C. Serra G. L. Delclos CIBER of Epidemiology and Public Health (CIBERESP), Barcelona, Spain B. C. Amick III G. L. Delclos Southwest Center for Occupational and Environmental Health, School of Public Health, University of Texas, 6901 Bertner, Houston, TX 77030, USA J. R. Castan˜o Psychiatry Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. R. Castan˜o Neuropsychiatry and Addictions Institute (INAD), Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain
usability, readability and integrity of the WRFQ were assessed, together with its validity and reliability. Results Questionnaire translation, back translation and consolidation were carried out without relevant difficulties. Idiomatic issues requiring reformulation were found in the instructions, response options and in 2 items. Participants appreciated the applicability, usability, readability and integrity of the questionnaire. The results indicated good face and content validity. Internal consistency was satisfactory for all subscales (Cronbach’s alpha between 0.88 and 0.96), except for social demands (Cronbach’s alpha = 0.56). Test–retest reliability showed good stability, with intraclass correlation coefficients between 0.77 and 0.93 for all subscales. Construct validity was considered preserved based on the comparison of median scores for each patient group and subscale. Conclusions Our results indicate the cross-cultural adaptation of the WRFQ to Spanish was satisfactory and preserved its psychometric properties, except for the subscale of social demands, whose internal consistency should be interpreted with caution. Keywords Work outcome measure Work disability measurement Questionnaires Scales Health survey Cross-cultural comparison Validation studies
Introduction Work disability is a health problem with high prevalence and economic costs in industrialized societies [1, 2]. In Europe, the proportion of workers with a long term health problem or disability varies between 5.8 % in Romania and 32.2 % in Finland [3]. Increased life expectancy and prolongation of the retirement age are increasing the overall age of the workforce. With an older workforce, more workers are working with health problems [4–6].
123
Abstract Purpose The Work Role Functioning Questionnaire (WRFQ) is a tool developed in the United States to measure work disability and assess the perceived impact of health problems on worker ability to perform jobs. We translated and adapted the WRFQ to Spanish spoken in Spain and assessed preservation of its psychometric properties. Methods Cross-cultural adaptation of the WRFQ was performed following a systematic 5-step procedure: (1) direct translation, (2) synthesis, (3) back-translation, (4) consolidation by an expert committee and (5) pre-test. Psychometric properties were evaluated by administering the questionnaire to 40 patients with different cultural levels and health problems. Applicability,
J. M. Ramada C. Serra (&) G. L. Delclos Center for Research in Occupational Health (CiSAL), University Pompeu Fabra, PRBB Building, Dr. Aiguader, 88, 08003 Barcelona, Spain e-mail:
[email protected] J. M. Ramada C. Serra Occupational Health Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. M. Ramada C. Serra G. L. Delclos CIBER of Epidemiology and Public Health (CIBERESP), Barcelona, Spain B. C. Amick III G. L. Delclos Southwest Center for Occupational and Environmental Health, School of Public Health, University of Texas, 6901 Bertner, Houston, TX 77030, USA J. R. Castan˜o Psychiatry Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. R. Castan˜o Neuropsychiatry and Addictions Institute (INAD), Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain
usability, readability and integrity of the WRFQ were assessed, together with its validity and reliability. Results Questionnaire translation, back translation and consolidation were carried out without relevant difficulties. Idiomatic issues requiring reformulation were found in the instructions, response options and in 2 items. Participants appreciated the applicability, usability, readability and integrity of the questionnaire. The results indicated good face and content validity. Internal consistency was satisfactory for all subscales (Cronbach’s alpha between 0.88 and 0.96), except for social demands (Cronbach’s alpha = 0.56). Test–retest reliability showed good stability, with intraclass correlation coefficients between 0.77 and 0.93 for all subscales. Construct validity was considered preserved based on the comparison of median scores for each patient group and subscale. Conclusions Our results indicate the cross-cultural adaptation of the WRFQ to Spanish was satisfactory and preserved its psychometric properties, except for the subscale of social demands, whose internal consistency should be interpreted with caution. Keywords Work outcome measure Work disability measurement Questionnaires Scales Health survey Cross-cultural comparison Validation studies
Introduction Work disability is a health problem with high prevalence and economic costs in industrialized societies [1, 2]. In Europe, the proportion of workers with a long term health problem or disability varies between 5.8 % in Romania and 32.2 % in Finland [3]. Increased life expectancy and prolongation of the retirement age are increasing the overall age of the workforce. With an older workforce, more workers are working with health problems [4–6].
123 39
39
J Occup Rehabil (2013) 23:566–575
In occupational health, rehabilitation and/or accommodation programs to adapt work conditions to worker skills and health are being increasingly used to support an active work life and better quality of life [6, 7]. The effectiveness of rehabilitation and work accommodation programs needs to be assessed using outcomes such as work status (active, temporary disability, permanent disability), time to return to work, duration of functional disability and costs of inability to work [7–9]. However, these outcomes can be useful but are limited, as they mainly assess whether workers are present or absent from their jobs [10]. They do not offer information about the worker’s participation in the job or the degree to which he or she is able to respond to the job’s demands [10, 11]. To fully assess effectiveness of intervention, outcome measures are required that describe the extent to which people increase their ability to meet the demands of the job. In the 1990s a series of work-role specific functioning questionnaires were developed; among these, the Work Limitations Questionnaire (WLQ), the Work Limitations-26 (WL-26) and the Work Role Functioning Questionnaire (WRFQ) [10, 12]. The WRFQ measures perceived disability in terms of work limitation to perform the job due to health problems. Work limitation is defined as the level of difficulty encountered by the worker to carry out the demands of his/her job. Numerous studies have demonstrated the usefulness of these tools in English language-speaking health care environments [13–15], but no versions have been adapted for Spanish-speaking health care environments. Due to possible cultural differences in perception of work, health and disease, these instruments should be systematically translated, adapted and validated for use in other cultures. Since its creation and validation, the WRFQ has been adapted to Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. The objectives of this study were to translate and adapt the WRFQ to Spanish spoken in Spain and evaluate its psychometric properties.
Methods The WRFQ is a self-administered questionnaire containing 27 items grouped into 5 subscales: work scheduling demands, output demands, physical demands, mental demands and social demands. The first two columns of Table 1 show all items and subscales of the original English version. The recall period is 4 weeks and each subscale is measured by the percentage of time in a working day the employee has difficulty performing those demands. Response options vary on a five-point scale: 0 = all of the time (100 %), 1 = most of the time, 2 = half of the time (50 %), 3 = some of the time, 4 = none of the time (0 %) and 5 = does not apply to my job. Option 5 enables
567
employees to answer even though a particular demand is not part of their work. For each subscale, item scores were summed up, divided by the number of items included in the subscale, and then multiplied by 25 to obtain percentages for each subscale, ranging from 0 % (difficulty all the time) to 100 % (no difficulty at any time). The same process was repeated for the global scale. The answers ‘‘does not apply to my job’’ were transformed to missing values. Scales containing subscales with more than 20 % missing values or ‘‘does not apply to my job’’ were excluded from the analysis [19].
Translation and Cross-Cultural Adaptation of the WRFQ Translation was carried out following a systematic and standardized procedure consisting of five steps: (1) direct translation, (2) synthesis of translations, (3) back-translation, (4) consolidation of translations by a committee of experts and (5) pre-test [20–24]. To complete the direct translation, three bilingual translators whose native language was Spanish spoken in Spain were selected. The first one was aware of the objectives and concepts of the WRFQ. The second one did not know them but had previous experience in technical translation of medical texts. The last translator had no previous knowledge of medicine or rehabilitation and did not know the study objectives. They worked independently and were provided with common instructions to ensure a uniform translation of the entire questionnaire. This was followed by a synthesis of translations, comparing versions and identifying discrepancies that were discussed to reach consensus between translators and researchers. The back-translation into English was done by two bilingual translators whose native language was English spoken in the USA. They had no knowledge of medicine or rehabilitation and were unaware of the study objectives. They worked independently and were blind to the original version of the questionnaire to minimize information bias. A multidisciplinary expert committee of bilingual professionals, consisting of an occupational health technician, an occupational physician, an occupational nurse, two linguists and a methodology expert, evaluated the process. Discrepancies between the two back-translations were identified, and, following methodological guidelines [20, 21], a consensus was reached on a pre-final version of the WRFQ adapted to Spanish spoken in Spain. Finally, a pre-test study was carried out to assess the equivalence of the questionnaire, its understandability and applicability in the Spanish context. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable.
J Occup Rehabil (2013) 23:566–575
In occupational health, rehabilitation and/or accommodation programs to adapt work conditions to worker skills and health are being increasingly used to support an active work life and better quality of life [6, 7]. The effectiveness of rehabilitation and work accommodation programs needs to be assessed using outcomes such as work status (active, temporary disability, permanent disability), time to return to work, duration of functional disability and costs of inability to work [7–9]. However, these outcomes can be useful but are limited, as they mainly assess whether workers are present or absent from their jobs [10]. They do not offer information about the worker’s participation in the job or the degree to which he or she is able to respond to the job’s demands [10, 11]. To fully assess effectiveness of intervention, outcome measures are required that describe the extent to which people increase their ability to meet the demands of the job. In the 1990s a series of work-role specific functioning questionnaires were developed; among these, the Work Limitations Questionnaire (WLQ), the Work Limitations-26 (WL-26) and the Work Role Functioning Questionnaire (WRFQ) [10, 12]. The WRFQ measures perceived disability in terms of work limitation to perform the job due to health problems. Work limitation is defined as the level of difficulty encountered by the worker to carry out the demands of his/her job. Numerous studies have demonstrated the usefulness of these tools in English language-speaking health care environments [13–15], but no versions have been adapted for Spanish-speaking health care environments. Due to possible cultural differences in perception of work, health and disease, these instruments should be systematically translated, adapted and validated for use in other cultures. Since its creation and validation, the WRFQ has been adapted to Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. The objectives of this study were to translate and adapt the WRFQ to Spanish spoken in Spain and evaluate its psychometric properties.
Methods The WRFQ is a self-administered questionnaire containing 27 items grouped into 5 subscales: work scheduling demands, output demands, physical demands, mental demands and social demands. The first two columns of Table 1 show all items and subscales of the original English version. The recall period is 4 weeks and each subscale is measured by the percentage of time in a working day the employee has difficulty performing those demands. Response options vary on a five-point scale: 0 = all of the time (100 %), 1 = most of the time, 2 = half of the time (50 %), 3 = some of the time, 4 = none of the time (0 %) and 5 = does not apply to my job. Option 5 enables
123 40
567
employees to answer even though a particular demand is not part of their work. For each subscale, item scores were summed up, divided by the number of items included in the subscale, and then multiplied by 25 to obtain percentages for each subscale, ranging from 0 % (difficulty all the time) to 100 % (no difficulty at any time). The same process was repeated for the global scale. The answers ‘‘does not apply to my job’’ were transformed to missing values. Scales containing subscales with more than 20 % missing values or ‘‘does not apply to my job’’ were excluded from the analysis [19].
Translation and Cross-Cultural Adaptation of the WRFQ Translation was carried out following a systematic and standardized procedure consisting of five steps: (1) direct translation, (2) synthesis of translations, (3) back-translation, (4) consolidation of translations by a committee of experts and (5) pre-test [20–24]. To complete the direct translation, three bilingual translators whose native language was Spanish spoken in Spain were selected. The first one was aware of the objectives and concepts of the WRFQ. The second one did not know them but had previous experience in technical translation of medical texts. The last translator had no previous knowledge of medicine or rehabilitation and did not know the study objectives. They worked independently and were provided with common instructions to ensure a uniform translation of the entire questionnaire. This was followed by a synthesis of translations, comparing versions and identifying discrepancies that were discussed to reach consensus between translators and researchers. The back-translation into English was done by two bilingual translators whose native language was English spoken in the USA. They had no knowledge of medicine or rehabilitation and were unaware of the study objectives. They worked independently and were blind to the original version of the questionnaire to minimize information bias. A multidisciplinary expert committee of bilingual professionals, consisting of an occupational health technician, an occupational physician, an occupational nurse, two linguists and a methodology expert, evaluated the process. Discrepancies between the two back-translations were identified, and, following methodological guidelines [20, 21], a consensus was reached on a pre-final version of the WRFQ adapted to Spanish spoken in Spain. Finally, a pre-test study was carried out to assess the equivalence of the questionnaire, its understandability and applicability in the Spanish context. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable.
123 40
WSD OD
OD OD
FD FD
Do your work without stopping to take extra breaks or restsa Stick to a routine or schedulea Handle the work loada Work fast enough Finish work on time Do your work without making mistakes Satisfy the people who judge your worka Feel a sense of accomplishment in your worka Feel you have done what you are capable of doing Walk or move around different work locations (for example, going to meetings)a, Lift, carry, or move objects at work weighing more than 10 pounds Sit, stand, or stay in one position for longer than 15 min while working Repeat the same motions over and over again while working Bend, twist, or reach while workinga Use hand-held tools or equipment (for example, a phone, pen, keyboard, computer mouse, drill, hairdryer or sander)b Keep your mind on your work Think clearly when working Do work carefully Concentrate on your work Work without losing your train of thoughta
4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
19. 20. 21. 22. 23.
b
WSD
Start on your job as soon as you arrive at worka
3.
4 (10.0)
1 (2.5)
2 (5.0)
1 (2.5)
2 (5.0)
6 (15.0)
8 (20.0)
8 (20.0)
5 (12.5)
10 (25.0)
1 (2.5)
5 (12.5)
2 (5.0)
1 (2.5)
1 (2.5)
3 (7.5)
10 (25.0)
3 (7.5)
4 (10.0)
5 (12.5)
5 (12.5)
4 (10.0)
3 (7.5)
0 (100 %)
2 (5.0)
4 (10.0)
5 (12.5)
4 (10.0)
5 (12.5)
5 (12.5)
5 (12.5)
6 (15.0)
6 (15.0)
5 (12.5)
9 (22.5)
4 (10.0)
5 (12.5)
4 (10.0)
5 (12.5)
9 (22.5)
5 (12.5)
9 (22.5)
5 (12.5)
8 (20.0)
4 (10.0)
4 (10.0)
9 (22.5)
1
Responses n (%)
3 (7.5)
7 (17.5)
4 (10.0)
7 (17.5)
5 (12.5)
3 (7.5)
7 (17.5)
6 (15.0)
5 (12.5)
1 (2.5)
2 (5.0)
6 (15.0)
8 (20.0)
4 (10.0)
3 (7.5)
5 (12.5)
5 (12.5)
5 (12.5)
1 (2.5)
2 (5.0)
3 (7.5)
4 (10.0)
1 (2.5)
2 (50 %)
18 (45.0)
13 (32.5)
11 (27.5)
11 (27.5)
15 (37.5)
8 (20.0)
7 (17.5)
7 (17.5)
12 (30.0)
6 (15.0)
7 (17.5)
11 (27.5)
10 (25.0)
10 (25.0)
17 (42.5)
9 (22.5)
14 (35.0)
13 (32.5)
6 (15.0)
13 (32.5)
10 (25.0)
12 (30.0)
19 (47.5)
3
123 41 Stick to a routine or schedulea 5.
b
WSD
Do your work without stopping to take extra breaks or restsa
4.
OD OD
FD FD
Work fast enough Finish work on time Do your work without making mistakes Satisfy the people who judge your worka Feel a sense of accomplishment in your worka Feel you have done what you are capable of doing Walk or move around different work locations (for example, going to meetings)a, Lift, carry, or move objects at work weighing more than 10 pounds Sit, stand, or stay in one position for longer than 15 min while working Repeat the same motions over and over again while working Bend, twist, or reach while workinga Use hand-held tools or equipment (for example, a phone, pen, keyboard, computer mouse, drill, hairdryer or sander)b Keep your mind on your work Think clearly when working Do work carefully Concentrate on your work Work without losing your train of thoughta
8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
19. 20. 21. 22. 23.
MD
MD
MD
MD
MD
4 (10.0)
1 (2.5)
2 (5.0)
1 (2.5)
2 (5.0)
6 (15.0)
8 (20.0)
8 (20.0)
5 (12.5)
10 (25.0)
1 (2.5)
5 (12.5)
2 (5.0)
1 (2.5)
1 (2.5)
3 (7.5)
10 (25.0)
3 (7.5)
4 (10.0)
5 (12.5)
5 (12.5)
4 (10.0)
3 (7.5)
0 (100 %)
2 (5.0)
4 (10.0)
5 (12.5)
4 (10.0)
5 (12.5)
5 (12.5)
5 (12.5)
6 (15.0)
6 (15.0)
5 (12.5)
9 (22.5)
4 (10.0)
5 (12.5)
4 (10.0)
5 (12.5)
9 (22.5)
5 (12.5)
9 (22.5)
5 (12.5)
8 (20.0)
4 (10.0)
4 (10.0)
9 (22.5)
1
Responses n (%)
3 (7.5)
7 (17.5)
4 (10.0)
7 (17.5)
5 (12.5)
3 (7.5)
7 (17.5)
6 (15.0)
5 (12.5)
1 (2.5)
2 (5.0)
6 (15.0)
8 (20.0)
4 (10.0)
3 (7.5)
5 (12.5)
5 (12.5)
5 (12.5)
1 (2.5)
2 (5.0)
3 (7.5)
4 (10.0)
1 (2.5)
2 (50 %)
18 (45.0)
13 (32.5)
11 (27.5)
11 (27.5)
15 (37.5)
8 (20.0)
7 (17.5)
7 (17.5)
12 (30.0)
6 (15.0)
7 (17.5)
11 (27.5)
10 (25.0)
10 (25.0)
17 (42.5)
9 (22.5)
14 (35.0)
13 (32.5)
6 (15.0)
13 (32.5)
10 (25.0)
12 (30.0)
19 (47.5)
3
12 (30.0)
15 (37.5)
18 (45.0)
17 (42.5)
13 (32.5)
16 (40.0)
12 (30.0)
9 (22.5)
11 (27.5)
9 (22.5)
16 (40.0)
14 (35.0)
15 (37.5)
14 (35.0)
13 (32.5)
13 (32.5)
5 (12.5)
10 (25.0)
23 (57.5)
2 (5.0)
17 (42.5)
16 (40.0)
8 (20.0)
4 (0 %)
0/1
0/0
0/0
0/0
0/0
0/2
0/1
0/4
0/1
2/7
0/5
0/0
0/0
2/5
0/1
0/1
0/1
0/0
0/1
6/4
0/1
0/0
0/0
n missing/does not apply to my job’
0/1
0/0
0/0
0/0
0/0
0/2
0/1
0/4
0/1
2/7
0/5
0/0
0/0
2/5
0/1
0/1
0/1
0/0
0/1
6/4
0/1
0/0
0/0
2.8
2.9
3.0
3.0
2.8
2.5
2.2
1.9
2.4
1.5
2.5
2.6
2.8
2.5
2.9
2.5
1.9
2.5
2.9
2.3
2.7
2.8
2.5
Mean scale 0–4
2.8
2.9
3.0
3.0
2.8
2.5
2.2
1.9
2.4
1.5
2.5
2.6
2.8
2.5
2.9
2.5
1.9
2.5
2.9
2.3
2.7
2.8
2.5
Mean scale 0–4
0.89
0.96
0.89
0.93
0.92
0.81
0.93
0.95
0.93
0.89
0.82
0.73
0.79
0.78
0.66
0.90
0.84
0.80
0.76
0.65
0.89
0.88
0.90
Correlations item-subscale
0.89
0.96
0.89
0.93
0.92
0.81
0.93
0.95
0.93
0.89
0.82
0.73
0.79
0.78
0.66
0.90
0.84
0.80
0.76
0.65
0.89
0.88
0.90
Correlations item-subscale
0.46
0.69
0.87
0.70
0.79
0.74
0.82
0.82
0.83
0.71
0.74
0.62
0.67
0.77
0.65
0.81
0.88
0.82
0.69
0.62
0.87
0.83
0.88
Correlations item-total
0.46
0.69
0.87
0.70
0.79
0.74
0.82
0.82
0.83
0.71
0.74
0.62
0.67
0.77
0.65
0.81
0.88
0.82
0.69
0.62
0.87
0.83
0.88
Correlations item-total
568
FD
FD
FD
FD
OD
OD
OD
OD
Handle the work load 7.
OD
WSD
6.
a
WSD
Start on your job as soon as you arrive at worka
3.
WSD
Get going easily at the beginning of the work daya
2.
WSD
Work the required number of hours
1.
Sub-scale
12 (30.0)
15 (37.5)
18 (45.0)
17 (42.5)
13 (32.5)
16 (40.0)
12 (30.0)
9 (22.5)
11 (27.5)
9 (22.5)
16 (40.0)
14 (35.0)
15 (37.5)
14 (35.0)
13 (32.5)
13 (32.5)
5 (12.5)
10 (25.0)
23 (57.5)
2 (5.0)
17 (42.5)
16 (40.0)
8 (20.0)
4 (0 %)
n missing/does not apply to my job’
J Occup Rehabil (2013) 23:566–575
Items (original version)
Table 1 Responses for item-level of the Spanish version of the Work Role Functioning Questionnaire (WRFQ)
MD
MD
MD
MD
MD
FD
FD
FD
FD
OD
OD
OD
OD
WSD
WSD
Get going easily at the beginning of the work daya
WSD
Work the required number of hours
2.
Sub-scale
1.
Items (original version)
Table 1 Responses for item-level of the Spanish version of the Work Role Functioning Questionnaire (WRFQ)
568 J Occup Rehabil (2013) 23:566–575
123 41
123 42
0.57
Forty volunteer patients of both sexes, with a physical (musculoskeletal) and/or a mental (anxiety-depression) health problem with a minimum duration of 1 month were recruited among outpatients at the orthopedics, rehabilitation and psychiatry clinics of a large public hospital in Barcelona. Patients were between 18 and 65 years old and had different cultural levels. All spoke Spanish as their first language, were able to read and understand what they were reading and were working at least 10 h per week in the last 4 weeks.
Items modified after pre-test
Items with several alternatives or with difficulties in the translation process
Procedure
b
WSD work scheduling demands; OD output demands; FD physical demands; MD mental demands; SD social demands
Participants were requested to fill out the Spanish version of the WRFQ on paper, and underline or mark any difficulty on the questionnaire. In addition, they described difficult to understand questions during a 15 min structured interview that was recorded.
a
2.8 0/2 19 (47.5) 9 (22.5) Help other people to get work done 27.
Materials
Assessment of internal consistency (Cronbach’s alpha) for each item-subscale and item-total scale (n = 40). April–May 2012
0.78
Sample
0.70
3.0 0/0 16 (40.0) 14 (35.0) 6 (15.0)
2 (5.0) 6 (15.0)
3 (7.5) 1 (2.5)
2 (5.0) SD
SD Control your temper around people when working 26.
a
0 (0.0)
2 (5.0) 4 (10.0)
4 (10.0) 2 (5.0)
1 (2.5) SD Speak with people in person, in meetings or on the phone 25.
MD Easily read or use your eyes when working 24.
1 0 (100 %)
0.70
0.75
0.61 0.84 3.0 0/3 22 (55.0)
0.87 3.2 0/1 24 (60.0)
4 (0 %) 2 (50 %) Sub-scale Items (original version)
During the interview each participant was systematically asked about the understandability of the instructions, of each response option and the 27 items. All comments related to difficulties on any of these questions were recorded and later reviewed by the expert committee. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable. Revisions were made to a specific questionnaire item when 15 % or more of participants described difficulties with that item [19]. The internal consistency of the total scale and each subscale was evaluated using Cronbach’s alpha, with appropriate values C0.70 [25, 26]. Correlations between the subscales, subscale-total, item-subscale and item-total were evaluated, with appropriate values C0.46 [27]. The repeatability or stability of the instrument was assessed through test–retest reliability. The WRFQ was administered to the same group of 40 workers at two different time points, test and retest. The retest was conducted after a period ranging from 7 to 15 days. This period was considered sufficient to avoid the memory of responses and prevent variations on the observed phenomenon that could affect repeatability. The intraclass correlation coefficient (ICC) was calculated to assess the test–retest reliability. The stability or repeatability of a subscale or total scale was considered good when the ICC was above 0.70 and very good when it was above 0.90 [26–28]. Face validity is the extent to which a questionnaire, in the opinion of the experts and users, is a logical measure of what
Responses n (%)
3
Procedure
8 (20.0)
Correlations item-subscale Mean scale 0–4
Participants were requested to fill out the Spanish version of the WRFQ on paper, and underline or mark any difficulty on the questionnaire. In addition, they described difficult to understand questions during a 15 min structured interview that was recorded.
Table 1 continued
Items with several alternatives or with difficulties in the translation process
Items modified after pre-test b
WSD work scheduling demands; OD output demands; FD physical demands; MD mental demands; SD social demands
a
Assessment of internal consistency (Cronbach’s alpha) for each item-subscale and item-total scale (n = 40). April–May 2012
2.8 0/2 19 (47.5) 9 (22.5) 2 (5.0) 6 (15.0) 2 (5.0) SD Help other people to get work done 27.
Materials
n missing/does not apply to my job’
0.57
Forty volunteer patients of both sexes, with a physical (musculoskeletal) and/or a mental (anxiety-depression) health problem with a minimum duration of 1 month were recruited among outpatients at the orthopedics, rehabilitation and psychiatry clinics of a large public hospital in Barcelona. Patients were between 18 and 65 years old and had different cultural levels. All spoke Spanish as their first language, were able to read and understand what they were reading and were working at least 10 h per week in the last 4 weeks.
569
Evaluation of the Pre-Final Questionnaire Psychometric Properties
9 (22.5)
Correlations item-total
J Occup Rehabil (2013) 23:566–575
Sample
0.70
0.61
0.78 3.0 0/0 SD Control your temper around people when workinga 26.
1 (2.5)
3 (7.5)
6 (15.0)
16 (40.0)
0.70
0.75
14 (35.0)
0.84 3.0 0/3 22 (55.0) 8 (20.0)
0.87 3.2 0/1 24 (60.0) 9 (22.5)
2 (5.0)
0 (0.0) 4 (10.0)
4 (10.0) 1 (2.5)
2 (5.0)
SD Speak with people in person, in meetings or on the phone 25.
MD Easily read or use your eyes when working 24.
2 (50 %) 1 0 (100 %)
Responses n (%) Sub-scale Items (original version)
Table 1 continued
569
Evaluation of the Pre-Final Questionnaire Psychometric Properties
3
4 (0 %)
Mean scale 0–4 n missing/does not apply to my job’
Correlations item-subscale
Correlations item-total
J Occup Rehabil (2013) 23:566–575
During the interview each participant was systematically asked about the understandability of the instructions, of each response option and the 27 items. All comments related to difficulties on any of these questions were recorded and later reviewed by the expert committee. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable. Revisions were made to a specific questionnaire item when 15 % or more of participants described difficulties with that item [19]. The internal consistency of the total scale and each subscale was evaluated using Cronbach’s alpha, with appropriate values C0.70 [25, 26]. Correlations between the subscales, subscale-total, item-subscale and item-total were evaluated, with appropriate values C0.46 [27]. The repeatability or stability of the instrument was assessed through test–retest reliability. The WRFQ was administered to the same group of 40 workers at two different time points, test and retest. The retest was conducted after a period ranging from 7 to 15 days. This period was considered sufficient to avoid the memory of responses and prevent variations on the observed phenomenon that could affect repeatability. The intraclass correlation coefficient (ICC) was calculated to assess the test–retest reliability. The stability or repeatability of a subscale or total scale was considered good when the ICC was above 0.70 and very good when it was above 0.90 [26–28]. Face validity is the extent to which a questionnaire, in the opinion of the experts and users, is a logical measure of what
123 42
570
it intends to measure. It is usually evaluated empirically trough comments from participating experts and users. In our study, this was assessed by the expert committee, analyzing the comments made by participants during the structured interviews. Content validity measures whether the tool is able to measure most of the construct dimensions. It was also evaluated using an empirical approach, based on judgments from the tool’s original authors (BA), as well as arguments made by the expert committee and by conducting a qualitative analysis of the comments made by the participants during the pre-test. We also explored the floor and ceiling effects which occur when a percentage of responses to certain questions cluster at the top or the bottom of the scale. Their presence indicates a lack of discriminative ability of the question and the absence of the questionnaire’s ability to differentiate between high and low scores. Content validity is good when floor and ceiling effects do not exceed 15 % [28]. Averages, ranges and medians of the scores were determined to further describe the distribution of the responses. Finally, construct validity was assessed using validity analysis techniques for known groups, comparing the results of the subscales in the patient groups with physical and mental illnesses. It was hypothesized that patients with only mental illness would score lower (meaning more disability) for the subscales of psychological and social demands, and patients with only physical illness would obtain lower scores for the subscales of work scheduling, output and physical demands. Patients with both types of illness (n = 6) were excluded of this comparative analysis. Since the distribution of subscale scores in both groups of patients did not follow a normal distribution, the hypothesis was evaluated by comparing the medians of each subscale in both groups of patients. The statistical significance was assessed using the U Mann–Whitney non parametric test. The protocol of this study was approved by the Ethics Committee of Parc de Salut Mar and it respects all the principles of the Declaration of Helsinki and the Spanish legal regulations on protection of personal data.
J Occup Rehabil (2013) 23:566–575
consideration by the committee of experts to reach a consensus to ensure semantic and idiomatic equivalence of both versions. In item 14 the units of measure were converted from pounds to kilograms. When the back-translation was compared with the original version, some discrepancies were found in the language equivalence of certain words contained in the instructions and various items. Items 2 (get going easily), 5 (stick to a routine), 11 (sense of accomplishment), 16 (repeat some motions), 17 (bend, twist or reach while working), 23 (train of thought), 25 (speak with people in person), 26 (control your temper), and 27 (to get work done) had several translation alternatives and required reconsideration by the committee of experts (table 1). Lastly, a pre-final questionnaire was consolidated in Spanish spoken in Spain, which guaranteed the semantic, idiomatic, conceptual and experiential equivalence with the original questionnaire, reaching consensus to partially reformulate the last paragraph of the instructions and wording of items 2, 11, 23, 25, 26 and 27. It was not necessary to modify or reshape the rest of the instructions, response options and other items. The pre-final questionnaire was administered to 40 patients. Table 2 describes their socio-demographic characteristics. Comments were analyzed by the committee of experts. Most participants found no difficulty understanding the items. Nine participants (22.5 %) reported the last paragraph of the instructions was ambiguous, so it was amended, emphasizing that the questions related to ‘‘working time’’. Table 2 Participants’ socio-demographic characteristics
Age in years, mean (SD)
Men n = 15 (37.5 %)
Women n = 25 (62.5 %)
49.1 (10.0)
47.9 (8.9)
49.8 (10.7)
Education level, n (%)
Low
13 (32.5)
7 (46.7)
6 (24.0)
Middle High
15 (37.5) 12 (30.0)
6 (40.0) 2 (13.3)
9 (36.0) 10 (40.0)
Job type, n (%)
Manual
17 (42.5)
6 (40.0)
11 (44.0)
Nonmanual
11 (27.5)
5 (33.3)
6 (24.0)
Mixed
Results The direct translation was carried out without difficulty. However, several challenges were found related to the idiomatic usage of words used in items 2 (get going easily), 11 (sense of accomplishment), 23 (train of thought) and 26 (control your temper), which were discussed and agreed with the translators. On the other hand, items 3–6 (start on your job, extra breaks or rests, stick to a routine, workload), 10 (people who judge), 13 (move around different locations) and 17 (bend) had several translation alternatives and required
Total n = 40
Working hours/ week, mean (SD) Disease type, n(%)
4 (26.7) 46.1 (9.6)
8 (32.0)
17 (42.5)
6 (40.0)
11 (44.0)
Mental
17 (42.5)
8 (53.3)
9 (36.0)
6 (15.0)
1 (6.7)
5 (20.0)
34.7 (51.1)
23.1 (22.4)
it intends to measure. It is usually evaluated empirically trough comments from participating experts and users. In our study, this was assessed by the expert committee, analyzing the comments made by participants during the structured interviews. Content validity measures whether the tool is able to measure most of the construct dimensions. It was also evaluated using an empirical approach, based on judgments from the tool’s original authors (BA), as well as arguments made by the expert committee and by conducting a qualitative analysis of the comments made by the participants during the pre-test. We also explored the floor and ceiling effects which occur when a percentage of responses to certain questions cluster at the top or the bottom of the scale. Their presence indicates a lack of discriminative ability of the question and the absence of the questionnaire’s ability to differentiate between high and low scores. Content validity is good when floor and ceiling effects do not exceed 15 % [28]. Averages, ranges and medians of the scores were determined to further describe the distribution of the responses. Finally, construct validity was assessed using validity analysis techniques for known groups, comparing the results of the subscales in the patient groups with physical and mental illnesses. It was hypothesized that patients with only mental illness would score lower (meaning more disability) for the subscales of psychological and social demands, and patients with only physical illness would obtain lower scores for the subscales of work scheduling, output and physical demands. Patients with both types of illness (n = 6) were excluded of this comparative analysis. Since the distribution of subscale scores in both groups of patients did not follow a normal distribution, the hypothesis was evaluated by comparing the medians of each subscale in both groups of patients. The statistical significance was assessed using the U Mann–Whitney non parametric test. The protocol of this study was approved by the Ethics Committee of Parc de Salut Mar and it respects all the principles of the Declaration of Helsinki and the Spanish legal regulations on protection of personal data.
41.6 (61.8)
Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012
123
J Occup Rehabil (2013) 23:566–575
consideration by the committee of experts to reach a consensus to ensure semantic and idiomatic equivalence of both versions. In item 14 the units of measure were converted from pounds to kilograms. When the back-translation was compared with the original version, some discrepancies were found in the language equivalence of certain words contained in the instructions and various items. Items 2 (get going easily), 5 (stick to a routine), 11 (sense of accomplishment), 16 (repeat some motions), 17 (bend, twist or reach while working), 23 (train of thought), 25 (speak with people in person), 26 (control your temper), and 27 (to get work done) had several translation alternatives and required reconsideration by the committee of experts (table 1). Lastly, a pre-final questionnaire was consolidated in Spanish spoken in Spain, which guaranteed the semantic, idiomatic, conceptual and experiential equivalence with the original questionnaire, reaching consensus to partially reformulate the last paragraph of the instructions and wording of items 2, 11, 23, 25, 26 and 27. It was not necessary to modify or reshape the rest of the instructions, response options and other items. The pre-final questionnaire was administered to 40 patients. Table 2 describes their socio-demographic characteristics. Comments were analyzed by the committee of experts. Most participants found no difficulty understanding the items. Nine participants (22.5 %) reported the last paragraph of the instructions was ambiguous, so it was amended, emphasizing that the questions related to ‘‘working time’’. Table 2 Participants’ socio-demographic characteristics
Age in years, mean (SD)
The direct translation was carried out without difficulty. However, several challenges were found related to the idiomatic usage of words used in items 2 (get going easily), 11 (sense of accomplishment), 23 (train of thought) and 26 (control your temper), which were discussed and agreed with the translators. On the other hand, items 3–6 (start on your job, extra breaks or rests, stick to a routine, workload), 10 (people who judge), 13 (move around different locations) and 17 (bend) had several translation alternatives and required
Total n = 40
Men n = 15 (37.5 %)
Women n = 25 (62.5 %)
49.1 (10.0)
47.9 (8.9)
49.8 (10.7)
Education level, n (%)
Low
13 (32.5)
7 (46.7)
6 (24.0)
Middle High
15 (37.5) 12 (30.0)
6 (40.0) 2 (13.3)
9 (36.0) 10 (40.0)
Job type, n (%)
Manual
17 (42.5)
6 (40.0)
11 (44.0)
Nonmanual
11 (27.5)
5 (33.3)
6 (24.0)
Mixed
Results
36.7 (9.8)
Physical Both
Disease duration in months, mean (SD)
12 (30.0) 40.2 (10.7)
570
Working hours/ week, mean (SD) Disease type, n(%)
4 (26.7) 46.1 (9.6)
8 (32.0) 36.7 (9.8)
Physical
17 (42.5)
6 (40.0)
11 (44.0)
Mental
17 (42.5)
8 (53.3)
9 (36.0)
6 (15.0)
1 (6.7)
Both Disease duration in months, mean (SD)
12 (30.0) 40.2 (10.7)
34.7 (51.1)
23.1 (22.4)
5 (20.0) 41.6 (61.8)
Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012
123 43
43
J Occup Rehabil (2013) 23:566–575
571
Table 3 Pre-test results with the Spanish version of the Work Role Functioning Questionnaire (WRFQ) (n = 40) a
Valid n (missing/not applicable)*
Mean (SD)
Work scheduling demands
39 (1)
67.7 (27.8)
Output demands
39 (1)
Physical demands
36 (4)
Mental demands
Range
J Occup Rehabil (2013) 23:566–575
571
Table 3 Pre-test results with the Spanish version of the Work Role Functioning Questionnaire (WRFQ) (n = 40) Valid n (missing/not applicable)*
Meana (SD)
Work scheduling demands
39 (1)
67.7 (27.8)
0.94
Output demands
39 (1)
0.88
Physical demands
36 (4)
0.96
0.81
Mental demands
5 (12.5)
0.56
0.83
0 (0.0)
0.97
–
Median
n at floor (0 %) n (%)
n at ceiling (100 %) n (%)
Cronbach’s alpha
Subscale-total correlations
5–100
75.0
0 (0.0)
3 (7.5)
0.88
0.95
64.4 (25.8)
14.3–100
67.9
0 (0.0)
1 (2.5)
0.90
59.0 (32.3)
4.17–100
62.5
0 (0.0)
5 (12.5)
0.95
40 (0)
73.9 (26.1)
0–100
79.2
1 (2.5)
9 (22.5)
Social demands
35 (5)
76.9 (21.1)
25–100
83.3
0 (0.0)
Total score
40 (0)
67.6 (22.7)
21.3–98.1
74.5
0 (0.0)
Range
Median
n at floor (0 %) n (%)
n at ceiling (100 %) n (%)
Cronbach’s alpha
Subscale-total correlations
5–100
75.0
0 (0.0)
3 (7.5)
0.88
0.95
64.4 (25.8)
14.3–100
67.9
0 (0.0)
1 (2.5)
0.90
0.94
59.0 (32.3)
4.17–100
62.5
0 (0.0)
5 (12.5)
0.95
0.88
40 (0)
73.9 (26.1)
0–100
79.2
1 (2.5)
9 (22.5)
0.96
0.81
Social demands
35 (5)
76.9 (21.1)
25–100
83.3
0 (0.0)
5 (12.5)
0.56
0.83
Total score
40 (0)
67.6 (22.7)
21.3–98.1
74.5
0 (0.0)
0 (0.0)
0.97
–
April–May, 2012
April–May, 2012
Subscales with more than 20 % of items scoring ‘‘does not apply to my job’’ or missing values were excluded
Subscales with more than 20 % of items scoring ‘‘does not apply to my job’’ or missing values were excluded
Each subscale is scored from 0 to 100. Higher scores indicate better work functioning: difficulties all the time 0/100; difficulties no of the time 100/100
Each subscale is scored from 0 to 100. Higher scores indicate better work functioning: difficulties all the time 0/100; difficulties no of the time 100/100
Eight participants (20 %) found the expression ‘‘difficult’’ located at the top of the column where the items were located hard to interpret. After weighing various alternatives, a decision was made to incorporate this expression in each of the possible answers as follows: 0 = was difficult all the time (100 %), 1 = was difficult most of the time, 2 = was difficult half the time (50 %), 3 = was difficult part of the time, 4 = never was difficult (0 %). No participant expressed difficulty with the response option ‘‘does not apply to my job’’. Ten participants (25 %) had difficulties with item 13 and eight participants (20 %) with item 18. All answered ‘‘does not apply to my job’’ since the examples did not fit their job. The committee of experts decided to delete the examples from these items. Table 3 shows the average scores for each subscale; higher values indicate less disability at work. The social demands subscale scored the highest (76.9 SD = 21.1) and the physical demands the lowest (59.0 SD = 32.3). The items that most frequently obtained the answer ‘‘does not apply to my job’’ were item 14 (lift, carry, or move objects at work weighing more than 10 pounds) and item 13 (walk or move around different work locations, for example, going to meetings) and 10 (satisfy the people who judge your work). After judging the comments made by participants during the pre-test, and resolved by consensus, the committee of experts drafted the final version of WRFQ translated and adapted to Spanish spoken in Spain (‘‘Appendix’’ 1). Assessing the internal consistency, the Cronbach’s alpha was 0.97 for the total scale. All subscales obtained Cronbach’s alpha coefficients above 0.85, except for social demands which was 0.56. Correlations between the subscales, subscale-total, item-subscale and item-total were all C0.46 and considered appropriate [27]. Scale ceiling effects were lowest for output demands (2.5 %) and highest for mental demands (22.5 %), exceeding the 15 % criterion [28] (Table 3).
Eight participants (20 %) found the expression ‘‘difficult’’ located at the top of the column where the items were located hard to interpret. After weighing various alternatives, a decision was made to incorporate this expression in each of the possible answers as follows: 0 = was difficult all the time (100 %), 1 = was difficult most of the time, 2 = was difficult half the time (50 %), 3 = was difficult part of the time, 4 = never was difficult (0 %). No participant expressed difficulty with the response option ‘‘does not apply to my job’’. Ten participants (25 %) had difficulties with item 13 and eight participants (20 %) with item 18. All answered ‘‘does not apply to my job’’ since the examples did not fit their job. The committee of experts decided to delete the examples from these items. Table 3 shows the average scores for each subscale; higher values indicate less disability at work. The social demands subscale scored the highest (76.9 SD = 21.1) and the physical demands the lowest (59.0 SD = 32.3). The items that most frequently obtained the answer ‘‘does not apply to my job’’ were item 14 (lift, carry, or move objects at work weighing more than 10 pounds) and item 13 (walk or move around different work locations, for example, going to meetings) and 10 (satisfy the people who judge your work). After judging the comments made by participants during the pre-test, and resolved by consensus, the committee of experts drafted the final version of WRFQ translated and adapted to Spanish spoken in Spain (‘‘Appendix’’ 1). Assessing the internal consistency, the Cronbach’s alpha was 0.97 for the total scale. All subscales obtained Cronbach’s alpha coefficients above 0.85, except for social demands which was 0.56. Correlations between the subscales, subscale-total, item-subscale and item-total were all C0.46 and considered appropriate [27]. Scale ceiling effects were lowest for output demands (2.5 %) and highest for mental demands (22.5 %), exceeding the 15 % criterion [28] (Table 3).
Table 4 Test–retest reliability Subscales
Test-retest CCI
95 % CI*
Work scheduling demands
0.92
(0.85–0.96)
Output demands
0.89
(0.78–0.94)
Physical demands
0.93
(0.84–0.97)
Mental demands
0.85
(0.72–0.92)
Social demands
0.77
(0.58–0.88)
Total scale
0.94
(0.83–0.98)
Intraclass correlation coefficients (ICC). Pre-test of the Spanish version of the Work Role Functioning Questionnaire (WRFQ), April–May 2012 * 95 % CI
Table 4 shows the results of the test–retest reliability; ICCs ranged between 0.77 and 0.93. The ICC for the total scale was 0.94. The expert committee estimated that the face validity of the questionnaire was adequate and the participants appreciated the applicability, usability and understandability of the questionnaire. These aspects were collected in the comments made during the interviews, concluding that the questionnaire measures work disability in a logical way. Content validity was considered adequate according to the criteria and judgment of the authors of the original version of WRFQ [16–18], the arguments made by the committee of experts during the process of cross-cultural adaptation and the qualitative analysis of participant comments. Construct validity was likewise reasonable. The median scores for the physical demands subscale were significantly lower (30 points) in participants with a physical (musculoskeletal) health problem and the median scores for the mental demands subscale were significantly lower (21 points) for patients with a mental (anxiety-depression) health problem (Table 5), although these differences were not statistically significant.
123 44
Table 4 Test–retest reliability Subscales
Test-retest CCI
95 % CI*
Work scheduling demands
0.92
(0.85–0.96)
Output demands
0.89
(0.78–0.94)
Physical demands
0.93
(0.84–0.97)
Mental demands
0.85
(0.72–0.92)
Social demands
0.77
(0.58–0.88)
Total scale
0.94
(0.83–0.98)
Intraclass correlation coefficients (ICC). Pre-test of the Spanish version of the Work Role Functioning Questionnaire (WRFQ), April–May 2012 * 95 % CI
Table 4 shows the results of the test–retest reliability; ICCs ranged between 0.77 and 0.93. The ICC for the total scale was 0.94. The expert committee estimated that the face validity of the questionnaire was adequate and the participants appreciated the applicability, usability and understandability of the questionnaire. These aspects were collected in the comments made during the interviews, concluding that the questionnaire measures work disability in a logical way. Content validity was considered adequate according to the criteria and judgment of the authors of the original version of WRFQ [16–18], the arguments made by the committee of experts during the process of cross-cultural adaptation and the qualitative analysis of participant comments. Construct validity was likewise reasonable. The median scores for the physical demands subscale were significantly lower (30 points) in participants with a physical (musculoskeletal) health problem and the median scores for the mental demands subscale were significantly lower (21 points) for patients with a mental (anxiety-depression) health problem (Table 5), although these differences were not statistically significant.
123 44
572
J Occup Rehabil (2013) 23:566–575
Table 5 Subscale description by type of health problem (mental or physical) Mediana Mental health problem
Physical health problem
Test U of Mann–Whitney Asymptotic significance (bilateral)
Work scheduling demands
85.0
65.0
0.478
Output demands
78.6
82.1
0.850
Physical demands
85.0
55.0
0.007
Mental demands
75.0
95.8
0.018
Social demands
83.3
87.5
0.917
Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012 a
Each subscale is scored from 0 to 100. Higher scores indicate better work functioning: difficulties all the time 0/100; difficulties no of the time 100/100
Discussion This rigorous, stepwise procedure for translation and crosscultural adaptation of the WRFQ led to the development of a Spanish spoken in Spain version equivalent to the original English version. Minor changes were made to maximize questionnaire understandability. It was necessary to adjust the wording of the instructions, as happened when the questionnaire was adapted into Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. During the adaptation to Portuguese, a decision was made to incorporate the term ‘‘difficult’’ within each item. In the adaptation to Spanish this has been incorporated in each of the response options to facilitate understandability. Several items needed to be changed after the pre-test. There are similarities with the difficulties in items 2, 6 and 26 encountered by Durand et al. [16], Gallasch et al. [17] and Abma et al. [18]. Like them, examples were removed for items 13 and 18 because their interpretation could be misleading. The absence of ceiling and floor effects above 15 % (with the exception of 22.5 % for the ceiling effect of the mental demands subscale) indicates that the questionnaire items have acceptable discriminate ability to distinguish high and low scores, providing evidence of questionnaire content validity [28]. The highest frequency of the response option ‘‘does not apply to my job’’ was obtained for the items in the physical demands subscale, as in other cultural adaptations made of the WRFQ [16–18]. A likely cause is that these items describe movements specific to manual work and do not apply to nonmanual work, which accounted for 28 % of the sample. The highest ceiling effect for mental demands observed in our study is consistent with the results of Durand et al. [16], probably because musculoskeletal health problems have less impact on the ability of workers to handle the mental demands of work. The internal consistency of the Spanish version of the WRFQ was very good for all subscales except for social demands. This result is consistent with those obtained by
Durand at el [16] and Gallasch et al. [17]. All items, except 4, had higher correlations with their own subscale than with the total scale, confirming that the translation and cross-cultural adaptation did not alter the internal consistency of the questionnaire. However, we observed some variability in subject responses to the items of the social demands subscale (Cronbach’s alpha of 0.56) and thus, coinciding with Durand et al. [16], we believe that the internal consistency of this subscale should be interpreted with caution. The results of the test–retest reliability are very similar to those obtained by Gallasch et al. [17]. The stability or repeatability of the questionnaire can be considered good for the output, mental and social demands subscales and very good for the physical and work scheduling demands subscales [26–28]. The results show adequate construct validity of the WRFQ. On the one hand, the median scores obtained by participants, all of whom were patients with active health problems, for all subscales ranged between 62.5 and 83.3 %, indicating important difficulties in carrying out the demands of their jobs, which is not surprising. On the other hand, as expected, the comparisons of scores between the two groups of patients indicates lower scores on the subscales of scheduling and physical demands for those with only physical health problems and, conversely, lower scores on the subscales of mental and social demands for patients with only a mental health problem. One limitation of this study could be the sample size in the pre-test; however it is consistent with the previous literature. In conclusion, our results confirm that the process used for translation and cross-cultural adaptation of the WRFQ to Spanish spoken in Spain was carried out successfully and indicate the existence of a good preservation of its psychometric properties. Acknowledgments We want to thank Concepcio´n Go´mez-Mora´n, Carlos Enric Delclo´s, M a Jose´ Roma´n and Cliff Grossman for their professional involvement in the direct and back translation of the WRFQ. Thanks to Julia` del Prado, Josefina Pi-Sunyer, Rocı´o Villar for their kind participation with the translators in the Expert Committee, and Nuria Gonza´lez, Chelo Sancho and Carmen Sa´nchez for their collaboration in the distribution and collection of questionnaires, all of them staff of the Occupational Health Service, Parc de Salut MAR (OHS PSMAR), Barcelona. Also thanks to Joan Mirabent (OHS PSMAR), Marta Tejero and Gemma Pidemont for their patient and generous collaboration in the recruitment process of patients. Finally, thanks so much ´ ngeles to Ram Dulthummon, Jose´ Ramada, Borja Ramada and A Ramada for their generous collaboration creating and assessing the quality of the database. This project has been partially supported by a grant from the Fondo de Investigaciones Sanitarias (FIS: PI12/02556), Instituto de Salud Carlos III, Subdireccio´n General de Evaluacio´n y Fomento de la Investigacio´n, Ministerio de Ciencia e Innovacio´n. Conflict of interest of interest.
The authors declare that they have no conflict
Appendix 1: Work Role Functioning Questionnaire adapted to Spanish Spoken into Spain
123
572
J Occup Rehabil (2013) 23:566–575
Table 5 Subscale description by type of health problem (mental or physical) Mediana Mental health problem
Physical health problem
Test U of Mann–Whitney Asymptotic significance (bilateral)
Work scheduling demands
85.0
65.0
0.478
Output demands
78.6
82.1
0.850
Physical demands
85.0
55.0
0.007
Mental demands
75.0
95.8
0.018
Social demands
83.3
87.5
0.917
Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012 a
Each subscale is scored from 0 to 100. Higher scores indicate better work functioning: difficulties all the time 0/100; difficulties no of the time 100/100
Discussion This rigorous, stepwise procedure for translation and crosscultural adaptation of the WRFQ led to the development of a Spanish spoken in Spain version equivalent to the original English version. Minor changes were made to maximize questionnaire understandability. It was necessary to adjust the wording of the instructions, as happened when the questionnaire was adapted into Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. During the adaptation to Portuguese, a decision was made to incorporate the term ‘‘difficult’’ within each item. In the adaptation to Spanish this has been incorporated in each of the response options to facilitate understandability. Several items needed to be changed after the pre-test. There are similarities with the difficulties in items 2, 6 and 26 encountered by Durand et al. [16], Gallasch et al. [17] and Abma et al. [18]. Like them, examples were removed for items 13 and 18 because their interpretation could be misleading. The absence of ceiling and floor effects above 15 % (with the exception of 22.5 % for the ceiling effect of the mental demands subscale) indicates that the questionnaire items have acceptable discriminate ability to distinguish high and low scores, providing evidence of questionnaire content validity [28]. The highest frequency of the response option ‘‘does not apply to my job’’ was obtained for the items in the physical demands subscale, as in other cultural adaptations made of the WRFQ [16–18]. A likely cause is that these items describe movements specific to manual work and do not apply to nonmanual work, which accounted for 28 % of the sample. The highest ceiling effect for mental demands observed in our study is consistent with the results of Durand et al. [16], probably because musculoskeletal health problems have less impact on the ability of workers to handle the mental demands of work. The internal consistency of the Spanish version of the WRFQ was very good for all subscales except for social demands. This result is consistent with those obtained by
Durand at el [16] and Gallasch et al. [17]. All items, except 4, had higher correlations with their own subscale than with the total scale, confirming that the translation and cross-cultural adaptation did not alter the internal consistency of the questionnaire. However, we observed some variability in subject responses to the items of the social demands subscale (Cronbach’s alpha of 0.56) and thus, coinciding with Durand et al. [16], we believe that the internal consistency of this subscale should be interpreted with caution. The results of the test–retest reliability are very similar to those obtained by Gallasch et al. [17]. The stability or repeatability of the questionnaire can be considered good for the output, mental and social demands subscales and very good for the physical and work scheduling demands subscales [26–28]. The results show adequate construct validity of the WRFQ. On the one hand, the median scores obtained by participants, all of whom were patients with active health problems, for all subscales ranged between 62.5 and 83.3 %, indicating important difficulties in carrying out the demands of their jobs, which is not surprising. On the other hand, as expected, the comparisons of scores between the two groups of patients indicates lower scores on the subscales of scheduling and physical demands for those with only physical health problems and, conversely, lower scores on the subscales of mental and social demands for patients with only a mental health problem. One limitation of this study could be the sample size in the pre-test; however it is consistent with the previous literature. In conclusion, our results confirm that the process used for translation and cross-cultural adaptation of the WRFQ to Spanish spoken in Spain was carried out successfully and indicate the existence of a good preservation of its psychometric properties. Acknowledgments We want to thank Concepcio´n Go´mez-Mora´n, Carlos Enric Delclo´s, M a Jose´ Roma´n and Cliff Grossman for their professional involvement in the direct and back translation of the WRFQ. Thanks to Julia` del Prado, Josefina Pi-Sunyer, Rocı´o Villar for their kind participation with the translators in the Expert Committee, and Nuria Gonza´lez, Chelo Sancho and Carmen Sa´nchez for their collaboration in the distribution and collection of questionnaires, all of them staff of the Occupational Health Service, Parc de Salut MAR (OHS PSMAR), Barcelona. Also thanks to Joan Mirabent (OHS PSMAR), Marta Tejero and Gemma Pidemont for their patient and generous collaboration in the recruitment process of patients. Finally, thanks so much ´ ngeles to Ram Dulthummon, Jose´ Ramada, Borja Ramada and A Ramada for their generous collaboration creating and assessing the quality of the database. This project has been partially supported by a grant from the Fondo de Investigaciones Sanitarias (FIS: PI12/02556), Instituto de Salud Carlos III, Subdireccio´n General de Evaluacio´n y Fomento de la Investigacio´n, Ministerio de Ciencia e Innovacio´n. Conflict of interest of interest.
The authors declare that they have no conflict
Appendix 1: Work Role Functioning Questionnaire adapted to Spanish Spoken into Spain
123 45
45
J Occup Rehabil (2013) 23:566–575
573
J Occup Rehabil (2013) 23:566–575
123 46
573
123 46
574
J Occup Rehabil (2013) 23:566–575
References 1. Brault MW, Hootman J, Helmick CG, Theis KA, Armour BS. Prevalence and most common causes of disability among adultsUnited States, 2005. MMWR 2009; 58:421–6. 2. Rice DP, LaPlante MP. Medical expenditures for disability and disabling comorbidity. Am J Public Health. 1992;82:739–41. 3. Dupre´ D, Karjalainen A (2003) Eurostat, statistics in focus: Employment of disabled people in Europe in 2002, Eurostat theme 3: population and social conditions. Available from: http:// epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-NK-03-026/ EN/KS-NK-03-026-EN.PDF. 4. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169–71. 5. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers employment: the case for delaying the retirement age. J Eur Economic Assoc. 2010;8:1034–76. 6. Macdonald EB, Sanati KA. Occupational health services now and in the future: the need for a paradigm shift. J Occup Environ Med. 2010;52:1273–7. 7. Sampere M, Gimeno D, Serra C, Plana M, Martı´nez JM, Delclos GL, Benavides FG. Organizational return to work support and sick leave duration: a cohort of Spanish workers with a long-term non-work-related sick leave episode. J Occup Environ Med. 2011;53:674–9. 8. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to return employees to work following long-term sickness absence due to musculoskeletal disorders. J Public Health (Oxf). 2012;34:115–24. 9. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic evaluation on the cost-effectiveness of employability interventions among work disabled employees or employees at risk
10.
11.
12.
13.
14.
15.
16.
17.
18.
of work disability: the CASE-study. BMC Public Health. 2012; 18(12):43. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses and recommended measures. Spine. 2000;25:3152–60. Baldwin ML, Johnson WG, Butler RJ. The error of using returnsto-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632–41. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72–85. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-related worklimitations among employed persons in the United States. Disabil Rehabil. 2000; 22:225–32. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity student-role functioning scale using item response theory. Work. 2002;19:105–16. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et al. Validity and responsiveness of presenteeism scales in chronic work-related upper-extremity disorders. Phys Ther. 2011;91:254–66. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil. 2004;27:261–8. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil. 2007; 17:701–11. Abma FI. Amick Iii BC, Brouwer S, van der Klink JJ, Bu¨ltmann U. The cross-cultural adaptation of the work role functioning questionnaire to Dutch. Work. 2012;43:203–10.
123
574
J Occup Rehabil (2013) 23:566–575
References 1. Brault MW, Hootman J, Helmick CG, Theis KA, Armour BS. Prevalence and most common causes of disability among adultsUnited States, 2005. MMWR 2009; 58:421–6. 2. Rice DP, LaPlante MP. Medical expenditures for disability and disabling comorbidity. Am J Public Health. 1992;82:739–41. 3. Dupre´ D, Karjalainen A (2003) Eurostat, statistics in focus: Employment of disabled people in Europe in 2002, Eurostat theme 3: population and social conditions. Available from: http:// epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-NK-03-026/ EN/KS-NK-03-026-EN.PDF. 4. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169–71. 5. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers employment: the case for delaying the retirement age. J Eur Economic Assoc. 2010;8:1034–76. 6. Macdonald EB, Sanati KA. Occupational health services now and in the future: the need for a paradigm shift. J Occup Environ Med. 2010;52:1273–7. 7. Sampere M, Gimeno D, Serra C, Plana M, Martı´nez JM, Delclos GL, Benavides FG. Organizational return to work support and sick leave duration: a cohort of Spanish workers with a long-term non-work-related sick leave episode. J Occup Environ Med. 2011;53:674–9. 8. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to return employees to work following long-term sickness absence due to musculoskeletal disorders. J Public Health (Oxf). 2012;34:115–24. 9. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic evaluation on the cost-effectiveness of employability interventions among work disabled employees or employees at risk
10.
11.
12.
13.
14.
15.
16.
17.
18.
of work disability: the CASE-study. BMC Public Health. 2012; 18(12):43. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses and recommended measures. Spine. 2000;25:3152–60. Baldwin ML, Johnson WG, Butler RJ. The error of using returnsto-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632–41. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72–85. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-related worklimitations among employed persons in the United States. Disabil Rehabil. 2000; 22:225–32. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity student-role functioning scale using item response theory. Work. 2002;19:105–16. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et al. Validity and responsiveness of presenteeism scales in chronic work-related upper-extremity disorders. Phys Ther. 2011;91:254–66. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil. 2004;27:261–8. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil. 2007; 17:701–11. Abma FI. Amick Iii BC, Brouwer S, van der Klink JJ, Bu¨ltmann U. The cross-cultural adaptation of the work role functioning questionnaire to Dutch. Work. 2012;43:203–10.
123 47
47
J Occup Rehabil (2013) 23:566–575 19. Amick BC III, Habeck RV, Ossmann J, Fossel AH, Keller R, Katz JN. Predictors of successful work role functioning after carpal tunnel release surgery. J Occup Environ Med. 2004;46: 490–500. 20. Beaton DE, Bombardier C, Guillemin F, BosiFerraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine. 2000;25:3186–91. 21. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol. 1995;24:61–3. 22. Hutchinson A, Bentzen N, Konig-Zanhn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO; 1996. 23. Alexandre NMC, Guirardello EB. Adaptacio´n cultural de instrumentos utilizados en salud ocupacional. Rev Panam Salud Publica. 2002;11:109–11.
48
575 24. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994. 25. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334. 26. Sanchez-Fernandez P, Aguilar de Armas I, Fentelsaz G, MorenoCasbas MT, Hidalgo-Garcı´a R. Fiabilidad de los instrumentos de medicio´n en ciencias de la salud. Enferm Clin. 2005;15:227–36. 27. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 4th ed. New York: Oxford University Press Inc.; 2008. 28. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.
123
J Occup Rehabil (2013) 23:566–575 19. Amick BC III, Habeck RV, Ossmann J, Fossel AH, Keller R, Katz JN. Predictors of successful work role functioning after carpal tunnel release surgery. J Occup Environ Med. 2004;46: 490–500. 20. Beaton DE, Bombardier C, Guillemin F, BosiFerraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine. 2000;25:3186–91. 21. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol. 1995;24:61–3. 22. Hutchinson A, Bentzen N, Konig-Zanhn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO; 1996. 23. Alexandre NMC, Guirardello EB. Adaptacio´n cultural de instrumentos utilizados en salud ocupacional. Rev Panam Salud Publica. 2002;11:109–11.
48
575 24. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994. 25. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334. 26. Sanchez-Fernandez P, Aguilar de Armas I, Fentelsaz G, MorenoCasbas MT, Hidalgo-Garcı´a R. Fiabilidad de los instrumentos de medicio´n en ciencias de la salud. Enferm Clin. 2005;15:227–36. 27. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 4th ed. New York: Oxford University Press Inc.; 2008. 28. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.
123
4. PAPER # 3
4. PAPER # 3
Reliability and validity of the Work Role Functioning
Reliability and validity of the Work Role Functioning
Questionnaire (Spanish version). [Submitted for peer-
Questionnaire (Spanish version). [Submitted for peer-
review].
review].
49
49
184
184
TITLE:
TITLE:
Reliability and validity of the Work Role Functioning Questionnaire (Spanish version). AUTHORS:
Reliability and validity of the Work Role Functioning Questionnaire (Spanish version). AUTHORS:
1,2,3
1,2,3
José M Ramada Rodilla, MD, MSc
José M Ramada Rodilla, MD, MSc
1,2,3
1,2,3
Consol Serra Pujadas, MD, PhD Benjamin C Amick, PhD4,5 Femke I Abma, PhD6 Juan R Castaño Asins, MD7 Gemma Pidemunt Moli, MD, PhD8 Ute Bültmann, PhD6 George L Delclós Clanchet, MD, MPH, PhD1,3,4
Consol Serra Pujadas, MD, PhD Benjamin C Amick, PhD4,5 Femke I Abma, PhD6 Juan R Castaño Asins, MD7 Gemma Pidemunt Moli, MD, PhD8 Ute Bültmann, PhD6 George L Delclós Clanchet, MD, MPH, PhD1,3,4
AFFILIATIONS:
AFFILIATIONS:
1
Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra, Barcelona, España.
1
Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra, Barcelona, España.
2
Servicio de Salud Laboral, Parc de Salut MAR, Barcelona, España.
2
Servicio de Salud Laboral, Parc de Salut MAR, Barcelona, España.
3
CIBER de Epidemiología y Salud Pública (CIBERESP).
3
CIBER de Epidemiología y Salud Pública (CIBERESP).
4
Southwest Center for Occupational and Environmental Health, The University of Texas School of Public Health. Houston, Texas, USA.
4
Southwest Center for Occupational and Environmental Health, The University of Texas School of Public Health. Houston, Texas, USA.
5
Institute for work & Health. 80 University Avenue, Toronto, Ontario, Canada.
5
Institute for work & Health. 80 University Avenue, Toronto, Ontario, Canada.
6
Department of Health Sciences, Work & Health, University Medical Center Groningen, University of Groningen. Groningen, The Netherlands.
6
Department of Health Sciences, Work & Health, University Medical Center Groningen, University of Groningen. Groningen, The Netherlands.
7
Psychiatry Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.
7
Psychiatry Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.
8
Orthopedic Surgery and Traumatology Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.
8
Orthopedic Surgery and Traumatology Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.
CORRESPONDING AUTHOR:
CORRESPONDING AUTHOR:
José Mª Ramada Rodilla CiSAL - Universidad Pompeu Fabra Dr. Aiguader, 88 08003-Barcelona Correo electrónico:
[email protected] Tel. 932483066
José Mª Ramada Rodilla CiSAL - Universidad Pompeu Fabra Dr. Aiguader, 88 08003-Barcelona Correo electrónico:
[email protected] Tel. 932483066 51
51
ABSTRACT
ABSTRACT
Purpose: Recently, the cross-cultural adaptation of the Work Role Functioning
Purpose: Recently, the cross-cultural adaptation of the Work Role Functioning
Questionnaire to Spanish was carried out, achieving satisfactory psychometric properties.
Questionnaire to Spanish was carried out, achieving satisfactory psychometric properties.
Now we examined the reliability and validity of the adapted Spanish version (WRFQ-SpV)
Now we examined the reliability and validity of the adapted Spanish version (WRFQ-SpV)
in a general working population with and without (physical and mental) health issues to
in a general working population with and without (physical and mental) health issues to
evaluate its measurement properties.
evaluate its measurement properties.
Methods: A cross-sectional study was conducted among active workers. For reliability,
Methods: A cross-sectional study was conducted among active workers. For reliability,
we calculated Cronbach alpha to assess ‘internal consistency’, and the standard error of
we calculated Cronbach alpha to assess ‘internal consistency’, and the standard error of
measurement (SEM) to evaluate ‘measurement error’. We assessed the 'structural
measurement (SEM) to evaluate ‘measurement error’. We assessed the 'structural
validity' through confirmatory factor analyses and 'construct validity' by means of
validity' through confirmatory factor analyses and 'construct validity' by means of
hypotheses testing. The consensus-based standard for the selection of health status
hypotheses testing. The consensus-based standard for the selection of health status
measurement instruments (COSMIN) taxonomy were used in the design of the study.
measurement instruments (COSMIN) taxonomy were used in the design of the study.
Results: A total of 455 workers completed the questionnaire. It showed excellent internal
Results: A total of 455 workers completed the questionnaire. It showed excellent internal
consistency (α=0.98). The SEM for the overall scale was 7.10. The original five factor
consistency (α=0.98). The SEM for the overall scale was 7.10. The original five factor
structure reflected fair dimensionality of the construct (Chi-square, 1445.8; 314 degrees of
structure reflected fair dimensionality of the construct (Chi-square, 1445.8; 314 degrees of
freedom; RMSEA=0.08; CFI > 0.95 and WRMR > 0.90). For construct validity, all
freedom; RMSEA=0.08; CFI > 0.95 and WRMR > 0.90). For construct validity, all
hypotheses were confirmed differentiating groups with different jobs, health conditions and
hypotheses were confirmed differentiating groups with different jobs, health conditions and
ages. Moderate to strong correlations were found between WRFQ-SpV and a related
ages. Moderate to strong correlations were found between WRFQ-SpV and a related
construct (work ability).
construct (work ability).
Conclusions: Our study provides evidence of the reliability and validity of the WRFQ-SpV
Conclusions: Our study provides evidence of the reliability and validity of the WRFQ-SpV
to measure health-related work functioning in day-to-day practice and research in
to measure health-related work functioning in day-to-day practice and research in
occupational health care and the rehabilitation of disabled workers. It should be useful to
occupational health care and the rehabilitation of disabled workers. It should be useful to
monitor improvements in work functioning after implementing rehabilitation and/or
monitor improvements in work functioning after implementing rehabilitation and/or
accommodation programs. Longitudinal studies are needed to assess the responsiveness
accommodation programs. Longitudinal studies are needed to assess the responsiveness
of the questionnaire.
of the questionnaire.
Key terms: validity; reliability; work-functioning instrument; measurement instrument;
Key terms: validity; reliability; work-functioning instrument; measurement instrument;
psychometric properties; self-report.
psychometric properties; self-report.
52
52 52
52
INTRODUCTION
INTRODUCTION
Increasing life expectancy in developed countries and delayed retirement age are
Increasing life expectancy in developed countries and delayed retirement age are
increasing the overall age of the workforce. Aging workers are more likely to have
increasing the overall age of the workforce. Aging workers are more likely to have
chronic health issues and a certain degree of disability, but most are able to
chronic health issues and a certain degree of disability, but most are able to
maintain job competence with some workplace adjustments and/or rehabilitation
maintain job competence with some workplace adjustments and/or rehabilitation
programs [1-4]. Also, there is evidence showing that work has positive health
programs [1-4]. Also, there is evidence showing that work has positive health
effects when conditions are reasonably acceptable; therefore, promoting an active
effects when conditions are reasonably acceptable; therefore, promoting an active
working life is recommendable [5,6].
working life is recommendable [5,6].
Quality work functioning tools are required to obtain valid measurements to
Quality work functioning tools are required to obtain valid measurements to
evaluate the impact of health on work functioning and to monitor the extent to
evaluate the impact of health on work functioning and to monitor the extent to
which workers improve their ability to meet job demands after a rehabilitation or
which workers improve their ability to meet job demands after a rehabilitation or
accommodation program. This will enable healthcare professionals, human
accommodation program. This will enable healthcare professionals, human
resources managers, employers and other stakeholders to support an active and
resources managers, employers and other stakeholders to support an active and
healthy labor force. Moreover, valid outcome measures are needed to assess how
healthy labor force. Moreover, valid outcome measures are needed to assess how
workers function at work over the course of their job careers and the existing
workers function at work over the course of their job careers and the existing
continuum between working successfully at one extreme and disability and work-
continuum between working successfully at one extreme and disability and work-
absence at the other [7].
absence at the other [7].
There are a number of tools to measure constructs related to self-perceived work
There are a number of tools to measure constructs related to self-perceived work
functioning, including the Functional Status Index [8], the Work Productivity and
functioning, including the Functional Status Index [8], the Work Productivity and
Activity Impairment Questionnaire [9], the Health and Labor Questionnaire [10],
Activity Impairment Questionnaire [9], the Health and Labor Questionnaire [10],
the Endicott Work Productivity Scale [11], the Work Ability Index [12], the Role-
the Endicott Work Productivity Scale [11], the Work Ability Index [12], the Role-
based Performance Scale [13], the Stanford Presenteeism Scale [14], the Work
based Performance Scale [13], the Stanford Presenteeism Scale [14], the Work
Instability Scale [15], and the Work Activity Limitations Scale [16].
Instability Scale [15], and the Work Activity Limitations Scale [16].
Since 'being present at work without being able to meet job demands'
Since 'being present at work without being able to meet job demands'
(presenteeism) [17] is not the same as 'performing work demands successfully', a
(presenteeism) [17] is not the same as 'performing work demands successfully', a
series of work-role specific functioning questionnaires were developed in the
series of work-role specific functioning questionnaires were developed in the
53
53
2000’s. Among those, there are different versions of the Work Limitations
2000’s. Among those, there are different versions of the Work Limitations
Questionnaire [18] and the Work Role Functioning Questionnaire (WRFQ) [19].
Questionnaire [18] and the Work Role Functioning Questionnaire (WRFQ) [19].
The WRFQ measures perceived difficulties to perform the job due to health
The WRFQ measures perceived difficulties to perform the job due to health
problems. This questionnaire is a generic instrument conceptually developed to
problems. This questionnaire is a generic instrument conceptually developed to
represent a wide range of health conditions and work demands. Furthermore, it is
represent a wide range of health conditions and work demands. Furthermore, it is
freely available in the literature for professionals and researchers. Recently, it has
freely available in the literature for professionals and researchers. Recently, it has
been successfully translated, adapted and validated to be used in different
been successfully translated, adapted and validated to be used in different
contexts (e.g. Canadian French [20], Brazilian Portuguese [21], Dutch [7,22] and
contexts (e.g. Canadian French [20], Brazilian Portuguese [21], Dutch [7,22] and
Spanish spoken in Spain [23]). These versions have shown good psychometric
Spanish spoken in Spain [23]). These versions have shown good psychometric
properties in different populations.
properties in different populations.
Before using an adapted instrument it is important to assess its measurement
Before using an adapted instrument it is important to assess its measurement
properties [24]. Recent reviews have shown that health-related work outcome
properties [24]. Recent reviews have shown that health-related work outcome
measures and health-related work functioning instruments need better validation
measures and health-related work functioning instruments need better validation
studies to make them more meaningful for researchers, practitioners and patients
studies to make them more meaningful for researchers, practitioners and patients
[25,26]. The cross-cultural adaptation of the WRFQ to Spanish was recently
[25,26]. The cross-cultural adaptation of the WRFQ to Spanish was recently
carried out, and the questionnaire showed good test-retest reliability (intraclass
carried out, and the questionnaire showed good test-retest reliability (intraclass
correlation coefficients, ICCs between 0.77 and 0.93 for all subscales) [23], but
correlation coefficients, ICCs between 0.77 and 0.93 for all subscales) [23], but
further assessment of the validity and reliability of the questionnaire in a larger
further assessment of the validity and reliability of the questionnaire in a larger
sample was recommended.
sample was recommended.
Therefore, the objective of this study was to examine the reliability and validity of
Therefore, the objective of this study was to examine the reliability and validity of
the Spanish version of the WRFQ (WRFQ-SpV) in a general working population of
the Spanish version of the WRFQ (WRFQ-SpV) in a general working population of
Barcelona (Spain), with and without (physical and mental) health issues.
Barcelona (Spain), with and without (physical and mental) health issues.
54
54 54
54
METHODS
METHODS
Procedures and sample characteristics
Procedures and sample characteristics
After carrying out the cross-cultural adaptation of the WRFQ to Spanish spoken in
After carrying out the cross-cultural adaptation of the WRFQ to Spanish spoken in
Spain [23], it was necessary to assess its reliability and validity in a larger sample
Spain [23], it was necessary to assess its reliability and validity in a larger sample
so that it could be used in both occupational health and rehabilitation settings;
so that it could be used in both occupational health and rehabilitation settings;
hence a cross-sectional study was conducted among active workers of a general
hence a cross-sectional study was conducted among active workers of a general
working population of Barcelona (Spain). The consensus-based standard for the
working population of Barcelona (Spain). The consensus-based standard for the
selection of health status measurement instruments (COSMIN) taxonomy was
selection of health status measurement instruments (COSMIN) taxonomy was
used in the study design [27-29].
used in the study design [27-29].
Participants were recruited at a large public hospital in Barcelona, among patients,
Participants were recruited at a large public hospital in Barcelona, among patients,
persons accompanying patients, hospital workers and other workers that were
persons accompanying patients, hospital workers and other workers that were
carrying out different duties at the hospital (ambulance drivers, bar tenders,
carrying out different duties at the hospital (ambulance drivers, bar tenders,
kitchen and cleaning staff). Patients were recruited through the outpatient services
kitchen and cleaning staff). Patients were recruited through the outpatient services
of psychiatry, physical medicine and rehabilitation, orthopedic surgery and
of psychiatry, physical medicine and rehabilitation, orthopedic surgery and
traumatology. The inclusion criteria were: 1) active workers of both sexes, working
traumatology. The inclusion criteria were: 1) active workers of both sexes, working
at least 10 hours per week in the past four weeks, 2) age 18 years and older, and
at least 10 hours per week in the past four weeks, 2) age 18 years and older, and
3) able to read and understand Spanish (the language of the questionnaire).
3) able to read and understand Spanish (the language of the questionnaire).
Participants were excluded if they had plans to stop working within the following
Participants were excluded if they had plans to stop working within the following
six months.
six months.
The study protocol and the informed consent process was reviewed and approved
The study protocol and the informed consent process was reviewed and approved
by the Clinical Research Ethical Committee of the Parc de Salut Mar (Barcelona).
by the Clinical Research Ethical Committee of the Parc de Salut Mar (Barcelona).
All participants received information about the study purpose and signed the
All participants received information about the study purpose and signed the
informed consent to participate in it.
informed consent to participate in it.
Measures
Measures
The WRFQ-SpV is a self-administered questionnaire containing 27 items grouped
The WRFQ-SpV is a self-administered questionnaire containing 27 items grouped
into 5 subscales reflecting different work demands: work scheduling, output,
into 5 subscales reflecting different work demands: work scheduling, output,
55
55
physical, mental and social demands [23]. The recall period is four weeks and
physical, mental and social demands [23]. The recall period is four weeks and
each subscale is measured by the percentage of time in a working day the
each subscale is measured by the percentage of time in a working day the
employee has difficulty performing those demands. Response options vary on a
employee has difficulty performing those demands. Response options vary on a
five-point scale: 0=all of the time (100%), 1=most of the time, 2=half of the time
five-point scale: 0=all of the time (100%), 1=most of the time, 2=half of the time
(50%), 3=some of the time, 4=none of the time (0%) and 5=does not apply to my
(50%), 3=some of the time, 4=none of the time (0%) and 5=does not apply to my
job. For each subscale and for the overall scale, item scores were summed,
job. For each subscale and for the overall scale, item scores were summed,
divided by the number of items included in the subscale (or the overall scale), and
divided by the number of items included in the subscale (or the overall scale), and
then multiplied by 25 to obtain the scores, ranging from 0% (difficulty all the time)
then multiplied by 25 to obtain the scores, ranging from 0% (difficulty all the time)
to 100% (no difficulty at any time). The scores for "does not apply to my job" were
to 100% (no difficulty at any time). The scores for "does not apply to my job" were
transformed to missing values. Scales and/or subscales containing more than 20%
transformed to missing values. Scales and/or subscales containing more than 20%
missing values were set to missing.
missing values were set to missing.
All participants were invited to complete the WRFQ-SpV on paper, providing self-
All participants were invited to complete the WRFQ-SpV on paper, providing self-
reported information on age, gender, level of education (primary, secondary,
reported information on age, gender, level of education (primary, secondary,
higher), job type (manual, non-manual, mixed), working hours and primary health
higher), job type (manual, non-manual, mixed), working hours and primary health
condition (none, musculoskeletal, mental, others).
condition (none, musculoskeletal, mental, others).
Three single items of the work ability index (WAI) [12] were included in the survey
Three single items of the work ability index (WAI) [12] were included in the survey
for a convenience subsample of participants, who voluntarily accepted to answer
for a convenience subsample of participants, who voluntarily accepted to answer
to these items. The first was the overall item 'current work ability compared with
to these items. The first was the overall item 'current work ability compared with
the life-time best', with a possible score of 0=completely unable to work to
the life-time best', with a possible score of 0=completely unable to work to
10=work ability at its best. Recent studies showed that this overall single item
10=work ability at its best. Recent studies showed that this overall single item
highly correlates with the overall WAI score [30] and also showed the convergent
highly correlates with the overall WAI score [30] and also showed the convergent
validity and the similarity in results between the overall WAI scores and the scores
validity and the similarity in results between the overall WAI scores and the scores
of the overall single item of the WAI in large samples of participants [31]. Also,
of the overall single item of the WAI in large samples of participants [31]. Also,
there is an increasing number of studies using the overall single item of the WAI to
there is an increasing number of studies using the overall single item of the WAI to
assess 'work ability' in different populations [7,30,32,33]. The other two items
assess 'work ability' in different populations [7,30,32,33]. The other two items
measure work ability in relation to physical and mental job demands, with a
measure work ability in relation to physical and mental job demands, with a
possible score of 1=very poor to 5=very good, and are questions already validated
possible score of 1=very poor to 5=very good, and are questions already validated
in the original version of the questionnaire [12].
in the original version of the questionnaire [12].
56
56 56
56
Reliability assessment:
Reliability assessment:
Reliability is defined as the degree to which the measurement is free from
Reliability is defined as the degree to which the measurement is free from
measurement error [27], and can also be defined as the extent to which scores for
measurement error [27], and can also be defined as the extent to which scores for
participants who have not changed are the same for repeated measurement under
participants who have not changed are the same for repeated measurement under
several conditions [35]: 1) using different sets of items from the same muli-item
several conditions [35]: 1) using different sets of items from the same muli-item
measurement instrument (internal consistency); 2) over time (test-retest reliability);
measurement instrument (internal consistency); 2) over time (test-retest reliability);
3) by different raters on the same occasion (inter-rater reliability) or 4) by the same
3) by different raters on the same occasion (inter-rater reliability) or 4) by the same
raters on different occasions (intra-rater reliability). The COSMIN taxonomy [27,35]
raters on different occasions (intra-rater reliability). The COSMIN taxonomy [27,35]
also considers measurement error as an aspect of reliability.
also considers measurement error as an aspect of reliability.
Validity assessment:
Validity assessment:
Validity of a questionnaire is defined in the literature as the degree to which an
Validity of a questionnaire is defined in the literature as the degree to which an
instrument truly measures the construct it purposes to measure. In general, three
instrument truly measures the construct it purposes to measure. In general, three
different types of validity can be distinguished: content validity, criterion validity
different types of validity can be distinguished: content validity, criterion validity
and construct validity, and within these three main types of validity there are some
and construct validity, and within these three main types of validity there are some
subtypes [35].
subtypes [35].
Content validity focuses on whether the content of the instrument corresponds with
Content validity focuses on whether the content of the instrument corresponds with
the construct that the instrument measures, with regard to relevance and
the construct that the instrument measures, with regard to relevance and
comprehensiveness. This type of validity is frequently assessed by means of a
comprehensiveness. This type of validity is frequently assessed by means of a
systematic empiric procedure in which the authors of the questionnaire, a panel of
systematic empiric procedure in which the authors of the questionnaire, a panel of
experts and a sample of the target population participate. It was already assessed
experts and a sample of the target population participate. It was already assessed
in our previous manuscript about the cross-cultural adaptation of the Work Role
in our previous manuscript about the cross-cultural adaptation of the Work Role
Functioning Questionnaire (WRFQ), following rigorously the recommendations of
Functioning Questionnaire (WRFQ), following rigorously the recommendations of
the literature [23].
the literature [23].
Criterion validity can be assessed only in situations in which there is a gold
Criterion validity can be assessed only in situations in which there is a gold
standard for the construct to be measured, and refers to how well the scores of the
standard for the construct to be measured, and refers to how well the scores of the
measurement instrument agree with the scores obtained with the gold standard.
measurement instrument agree with the scores obtained with the gold standard.
57
57
Since 'Work Functioning' is a construct that has not a gold standard, this type of
Since 'Work Functioning' is a construct that has not a gold standard, this type of
validity cannot be assessed for the Work Role Functioning Questionnaire (WRFQ).
validity cannot be assessed for the Work Role Functioning Questionnaire (WRFQ).
Construct validity should be evaluated in those situations in which there is no gold
Construct validity should be evaluated in those situations in which there is no gold
standard, and refers to whether the instrument provides the expected scores,
standard, and refers to whether the instrument provides the expected scores,
based on existing knowledge about the construct [35]. There is an international
based on existing knowledge about the construct [35]. There is an international
consensus of experts [27-29] recommending to assess construct validity
consensus of experts [27-29] recommending to assess construct validity
evaluating the 'cross-cultural validity', which we already did in our previous
evaluating the 'cross-cultural validity', which we already did in our previous
manuscript [23]; the 'structural validity' which we carried out by means of a
manuscript [23]; the 'structural validity' which we carried out by means of a
Confirmatory Factor Analysis (CFA) and 'hypotheses testing', which we carried
Confirmatory Factor Analysis (CFA) and 'hypotheses testing', which we carried
out testing seven hypotheses.
out testing seven hypotheses.
Statistical analysis
Statistical analysis
WRFQ-SpV mean scores, standard deviations (SD), median scores and ranges
WRFQ-SpV mean scores, standard deviations (SD), median scores and ranges
were calculated. Floor and ceiling effects were also explored. These effects occur
were calculated. Floor and ceiling effects were also explored. These effects occur
when more than 15% of the participants' responses to a certain question cluster at
when more than 15% of the participants' responses to a certain question cluster at
the top or the bottom of the scale [34]. Since the original version of the WRFQ was
the top or the bottom of the scale [34]. Since the original version of the WRFQ was
developed for a working population with health problems [19], and our population
developed for a working population with health problems [19], and our population
contains a percentage of participants declaring no health issues, we carried out a
contains a percentage of participants declaring no health issues, we carried out a
sensitivity analysis of floor and ceiling effects, restricting the sample to only those
sensitivity analysis of floor and ceiling effects, restricting the sample to only those
participants reporting health problems to explore if there were differences in the
participants reporting health problems to explore if there were differences in the
presence of these effects due to the characteristic of the sample.
presence of these effects due to the characteristic of the sample.
Participant scores were presented by job type (manual, non-manual, mixed),
Participant scores were presented by job type (manual, non-manual, mixed),
reported health issues (none, physical, mental) and groups of age (18-35 years,
reported health issues (none, physical, mental) and groups of age (18-35 years,
36-45 years, 46-55 years, 56-65 years), assessing the statistical significance of
36-45 years, 46-55 years, 56-65 years), assessing the statistical significance of
the differences by means of the Kruskall Wallis H test (to compare median scores)
the differences by means of the Kruskall Wallis H test (to compare median scores)
and analysis of variance (ANOVA) to compare mean scores. Post-hoc paired
and analysis of variance (ANOVA) to compare mean scores. Post-hoc paired
analyses (comparing median or mean scores for each of the two groups) were
analyses (comparing median or mean scores for each of the two groups) were
performed to determine which group or groups were responsible for significant
performed to determine which group or groups were responsible for significant
differences. When comparing median scores between two groups, Mann-Whitney
differences. When comparing median scores between two groups, Mann-Whitney
58
58
58
58
test for two independent samples were used, and when comparing mean scores
test for two independent samples were used, and when comparing mean scores
between two groups t-Tests were used.
between two groups t-Tests were used.
Internal consistency was assessed using Cronbach alpha coefficients considering
Internal consistency was assessed using Cronbach alpha coefficients considering
appropriate values ≥ 0.70 [34]. The standard error of measurement (SEM) was
appropriate values ≥ 0.70 [34]. The standard error of measurement (SEM) was
calculated for a stable subgroup of participants (n=40) that completed the
calculated for a stable subgroup of participants (n=40) that completed the
questionnaire twice in similar conditions, within an interval that varied from 7 to 15
questionnaire twice in similar conditions, within an interval that varied from 7 to 15
days [35]. This subgroup of participants was composed of the first 40 participants
days [35]. This subgroup of participants was composed of the first 40 participants
of the study who completed the first round and accepted to complete the
of the study who completed the first round and accepted to complete the
questionnaire a second time within this interval.
questionnaire a second time within this interval.
A CFA was conducted to analyze the structural validity of the WRFQ-SpV, testing
A CFA was conducted to analyze the structural validity of the WRFQ-SpV, testing
whether data collected in this general working population (N=455) had an
whether data collected in this general working population (N=455) had an
adequate fit in the predetermined five factor model structure defined by the
adequate fit in the predetermined five factor model structure defined by the
authors of the original questionnaire [19]. A four factor model structure was also
authors of the original questionnaire [19]. A four factor model structure was also
tested because the Work Limitations Questionnaire [18], designed to measure on-
tested because the Work Limitations Questionnaire [18], designed to measure on-
the-job impact of chronic health problems, has a structure with four factors (one of
the-job impact of chronic health problems, has a structure with four factors (one of
them named mental-interpersonal) and earlier studies [20,21,23] recommended
them named mental-interpersonal) and earlier studies [20,21,23] recommended
caution when interpreting the internal consistency of the social demands subscale.
caution when interpreting the internal consistency of the social demands subscale.
Thus, we hypothesized it might be necessary to collapse the subscales of mental
Thus, we hypothesized it might be necessary to collapse the subscales of mental
and social demands into a single factor of psychosocial demands with seven
and social demands into a single factor of psychosocial demands with seven
items.
items.
Following recommendations in the literature regarding CFA, we did not use the
Following recommendations in the literature regarding CFA, we did not use the
standard maximum likelihood theory (applicable to continuous variables). Instead,
standard maximum likelihood theory (applicable to continuous variables). Instead,
we used the robust categorical least squares (applicable to categorical variables),
we used the robust categorical least squares (applicable to categorical variables),
based on the fact that the observed variables are measured on a Likert scale and
based on the fact that the observed variables are measured on a Likert scale and
the variables are approximately symmetrical [36-38].
the variables are approximately symmetrical [36-38].
Rhemtulla [36] suggests that when there is a minimum of five categorical variables
Rhemtulla [36] suggests that when there is a minimum of five categorical variables
in the response options, which is the case of the WRFQ, the CFA could also be
in the response options, which is the case of the WRFQ, the CFA could also be
assessed applying “the method of the standard theory of maximum likelihood”
assessed applying “the method of the standard theory of maximum likelihood”
59
59
treating these variables as if they were continuous (but we would be at the limit of
treating these variables as if they were continuous (but we would be at the limit of
acceptance of this method). To verify the possible existence of differences
acceptance of this method). To verify the possible existence of differences
depending on the method, calculations were performed applying both methods.
depending on the method, calculations were performed applying both methods.
Chi-squared tests for goodness of fit, the root mean square error of approximation
Chi-squared tests for goodness of fit, the root mean square error of approximation
(RMSEA), the comparative fit index (CFI) and the weighed root mean residual
(RMSEA), the comparative fit index (CFI) and the weighed root mean residual
(WRMR) were used to evaluate the models. Reference values for RMSEA ≤ 0.05
(WRMR) were used to evaluate the models. Reference values for RMSEA ≤ 0.05
indicating close fit, between 0.06 and 0.08, fair fit and between 0.09 and 0.1,
indicating close fit, between 0.06 and 0.08, fair fit and between 0.09 and 0.1,
mediocre fit. Reference values for CFI ≥ 0.95 and WRMR > 0.90 for acceptance
mediocre fit. Reference values for CFI ≥ 0.95 and WRMR > 0.90 for acceptance
[39].
[39].
Correlations were evaluated for item-subscale, item-total, among subscales and
Correlations were evaluated for item-subscale, item-total, among subscales and
subscale-total, using Pearson’s correlation coefficient (r), considering r ≥ 0.40 as
subscale-total, using Pearson’s correlation coefficient (r), considering r ≥ 0.40 as
evidence of moderate or strong correlations [40,41].
evidence of moderate or strong correlations [40,41].
Construct validity was assessed by means of hypotheses testing. Significance of
Construct validity was assessed by means of hypotheses testing. Significance of
the differences among groups were tested using the non-parametric Kruskall
the differences among groups were tested using the non-parametric Kruskall
Wallis H test when comparing differences among median scores and analysis of
Wallis H test when comparing differences among median scores and analysis of
the variance (ANOVA) when differences among mean scores were compared.
the variance (ANOVA) when differences among mean scores were compared.
Correlations between constructs were assessed using Pearson’s correlation
Correlations between constructs were assessed using Pearson’s correlation
coefficient (r) interpreting: r < 0.4= ’weak’; 0.4 ≤ r ≤ 0.7= ’moderate; r > 0.7=
coefficient (r) interpreting: r < 0.4= ’weak’; 0.4 ≤ r ≤ 0.7= ’moderate; r > 0.7=
’strong’ [41].
’strong’ [41].
The basic principle of construct validation by means of hypotheses testing is that
The basic principle of construct validation by means of hypotheses testing is that
hypotheses are formulated about differences in the instrument scores between
hypotheses are formulated about differences in the instrument scores between
subgroups of participants or about the relationships of the scores of the instrument
subgroups of participants or about the relationships of the scores of the instrument
under study with scores on other similar or dissimilar measuring tools [35],
under study with scores on other similar or dissimilar measuring tools [35],
therefore, seven hypotheses were formulated to asses construct validity:
therefore, seven hypotheses were formulated to asses construct validity:
Hypothesis 1, addressing health issues: 1a) Participants without health issues
Hypothesis 1, addressing health issues: 1a) Participants without health issues
report higher scores on the overall scale of the WRFQ than those with health
report higher scores on the overall scale of the WRFQ than those with health
issues; 1b) Participants with physical health issues report the lowest score on the
issues; 1b) Participants with physical health issues report the lowest score on the
60
60
subscale of physical demands; 1c) Participants with mental health issues report
subscale of physical demands; 1c) Participants with mental health issues report
the lowest score on the subscale of mental demands.
the lowest score on the subscale of mental demands.
Hypothesis 2, addressing job types: Participants with physical health issues and
Hypothesis 2, addressing job types: Participants with physical health issues and
manual job report a lower score on the WRFQ subscale of physical demands than
manual job report a lower score on the WRFQ subscale of physical demands than
those with physical health issues and non-manual or mixed jobs.
those with physical health issues and non-manual or mixed jobs.
Hypothesis 3, addressing correlation between WRFQ scores and scores of a
Hypothesis 3, addressing correlation between WRFQ scores and scores of a
related construct (work ability): 3a) There are moderate to strong correlations
related construct (work ability): 3a) There are moderate to strong correlations
between the score of the overall work ability item of the WAI (that measures a
between the score of the overall work ability item of the WAI (that measures a
related construct) and the overall score of the WRFQ; 3b) There are moderate to
related construct) and the overall score of the WRFQ; 3b) There are moderate to
strong correlations between the scores of the mental and physical demands items
strong correlations between the scores of the mental and physical demands items
of the WAI and those of the subscales of physical and mental demands of the
of the WAI and those of the subscales of physical and mental demands of the
WRFQ.
WRFQ.
Hypothesis 4, addressing age: Consistently with other studies finding that both,
Hypothesis 4, addressing age: Consistently with other studies finding that both,
chronological and functional age, are associated with a decrease in work ability
chronological and functional age, are associated with a decrease in work ability
and/or work outcomes [42-46], there is a trend on the overall scores of the WRFQ
and/or work outcomes [42-46], there is a trend on the overall scores of the WRFQ
showing worse work functioning with increasing age.
showing worse work functioning with increasing age.
All analyses were performed with SPSS (Version 15.0. Chicago, IL; 2006) and
All analyses were performed with SPSS (Version 15.0. Chicago, IL; 2006) and
Mplus (Version 7. Los Angeles, CA; 2012).
Mplus (Version 7. Los Angeles, CA; 2012).
RESULTS
RESULTS
Sample characteristics. Four hundred fifty-five participants completed the WRFQ-
Sample characteristics. Four hundred fifty-five participants completed the WRFQ-
SpV and were included in the analyses. All were active employees working an
SpV and were included in the analyses. All were active employees working an
average of 39 hours per week (SD=8.5), mean age of 42 years (SD=11) and with
average of 39 hours per week (SD=8.5), mean age of 42 years (SD=11) and with
different levels of education, job types and health issues (table 1). Compared with
different levels of education, job types and health issues (table 1). Compared with
the general Spanish working population, women and participants with higher
the general Spanish working population, women and participants with higher
educational level were overrepresented [47]. A subgroup of 181 participants also
educational level were overrepresented [47]. A subgroup of 181 participants also
completed the WAI items [Supplementary materials (1)].
completed the WAI items [Supplementary materials (1)].
61
61
184
184
Table 1. Participants' characteristics.
Table 1. Participants' characteristics. Total n=455
Age in years, mean (SD)
Participants with health issues (n=299)
Participants without health issues (n=156)
42.1
(11.1)
43.7
(10.8)
39.0
Low
73
(16.0)
61
(20.4)
Middle
157
(34.5)
121
High
225
(49.5)
117
Manual
111
(24.4)
Non-manual
125
(27.5)
Education level, n (%)
Job type, n (%)
Mixed
Total n=455
(11.0)
Age in years, mean (SD)
12
(7.7)
Education level, n (%)
(40.5)
36
(39.1)
108
81
(27.1)
30
(19.2)
82
(27.4)
43
(27.6)
83
(53.2)
(11.1)
43.7
(10.8)
39.0
Low
73
(16.0)
61
(20.4)
12
(7.7)
(23.1)
Middle
157
(34.5)
121
(40.5)
36
(23.1)
(69.2)
High
225
(49.5)
117
(39.1)
108
(69.2)
Manual
111
(24.4)
81
(27.1)
30
(19.2)
Non-manual
125
(27.5)
82
(27.4)
43
(27.6)
83
(53.2)
Job type, n (%)
218
(47.9)
136
(45.5)
(8.5)
38.8
(7.8)
38.7
(9.7)
Working hours/week, mean (SD)
None
156
(34.3)
0
(0.0)
156
(100.0)
Health issue type, n(%)
Physical
139
(30.5)
139
(46.5)
0
Mental health
125
(27.5)
125
(41.8)
Others
35 13.0
(7.7) (27.7)
35 19.9
(11.7) (32.2)
Health issue type, n(%)
Disease duration in months, mean (SD)
Mixed
Extended survey with WAI a WAI overall-item, mean (SD) b
WAI physical demands, mean (SD) b
WAI mental demands, mean (SD)
7.6 3.8 3.9
(2.1) (1.0) (1.2)
Men n=71 (39.2%) 7.6 3.7 3.9
(2.1)
218
(47.9)
136
(45.5)
(8.5)
38.8
(7.8)
38.7
(9.7)
None
156
(34.3)
0
(0.0)
156
(100.0)
(0.0)
Physical
139
(30.5)
139
(46.5)
0
(0.0)
0
(0.0)
Mental health
125
(27.5)
125
(41.8)
0
(0.0)
0 0
(0.0) (0.0)
Others
35 13.0
(7.7) (27.7)
35 19.9
(11.7) (32.2)
0 0
(0.0) (0.0)
Disease duration in months, mean (SD)
(1.0) (1.2)
Supplementary materials (1). Work Ability Index (WAI) scores obtained in a convenience subsample of participants (n=181).
Women n=110 (60.8%) 7.7 3.8 3.8
(11.0)
38.7
Supplementary materials (1). Work Ability Index (WAI) scores obtained in a convenience subsample of participants (n=181). Total n=181
Participants without health issues (n=156)
42.1
38.7
Working hours/week, mean (SD)
Participants with health issues (n=299)
Extended survey with WAI a WAI overall-item, mean (SD)
(2.0) (1.0)
Women n=110 (60.8%)
7.6
(2.1)
7.6
(2.1)
7.7
(2.0)
3.8
(1.0)
3.7
(1.0)
3.8
(1.0)
b
3.9
(1.2)
3.9
(1.2)
3.8
(1.2)
WAI mental demands, mean (SD)
(a) Single item question of the work ability index (scale 0-10)
(a) Single item question of the work ability index (scale 0-10)
(b) Single item question of the work ability index (scale 0-5).
(b) Single item question of the work ability index (scale 0-5).
63
Men n=71 (39.2%)
b
WAI physical demands, mean (SD)
(1.2)
Total n=181
63
184
184
Table 2 shows the mean, SD and median scores for each WRFQ-SpV subscale
Table 2 shows the mean, SD and median scores for each WRFQ-SpV subscale
and the overall scale. Higher values indicate better work functioning (less disability
and the overall scale. Higher values indicate better work functioning (less disability
at work). Mental and social demands subscales scored the highest mean and
at work). Mental and social demands subscales scored the highest mean and
median, and the output demands subscale scored the lowest.
median, and the output demands subscale scored the lowest.
Floor effects were not found for any subscale, but ceiling effects were found for the
Floor effects were not found for any subscale, but ceiling effects were found for the
subscales of work scheduling (20%), mental (29%) and social demands (31%),
subscales of work scheduling (20%), mental (29%) and social demands (31%),
exceeding the 15% criterion [34]. A sensitivity analysis was carried out, restricting
exceeding the 15% criterion [34]. A sensitivity analysis was carried out, restricting
the sample to only those participants reporting health problems (n=299; 66% of the
the sample to only those participants reporting health problems (n=299; 66% of the
sample), and ceiling effects also appeared for the same subscales.
sample), and ceiling effects also appeared for the same subscales.
Reliability assessment: The SEMs were 7.1 for the overall score, 8.5 for work
Reliability assessment: The SEMs were 7.1 for the overall score, 8.5 for work
scheduling, 8.9 for output, 8.6 for physical, 10.6 for mental and 13.3 for social
scheduling, 8.9 for output, 8.6 for physical, 10.6 for mental and 13.3 for social
demands [Supplementary materials (2)]. Cronbach alpha coefficients were 0.98 for
demands [Supplementary materials (2)]. Cronbach alpha coefficients were 0.98 for
the overall scale and above 0.81 for all subscales (table 2).
the overall scale and above 0.81 for all subscales (table 2).
Structural validity assessment: Fit was fair for the five factor model applying
Structural validity assessment: Fit was fair for the five factor model applying
method of the robust categorical least squares for categorical variables (Chi-
method of the robust categorical least squares for categorical variables (Chi-
square, 1285.8; 314 degrees of freedom, p