Translation, cross-cultural adaptation and validation ...

Translation, cross-cultural adaptation and validation of the

Translation, cross-cultural adaptation and validation of the

Work Role Functioning Questionnaire (WRFQ) to Spanish

Work Role Functioning Questionnaire (WRFQ) to Spanish

spoken in Spain

spoken in Spain

Traducción, adaptación cultural y validación del Work Role

Traducción, adaptación cultural y validación del Work Role

Functioning Questionnaire (WRFQ) al castellano hablado en España.

Functioning Questionnaire (WRFQ) al castellano hablado en España.

José María Ramada Rodilla

José María Ramada Rodilla

TESI DOCTORAL UPF / 2014

TESI DOCTORAL UPF / 2014

DIRECTORS DE LA TESI

DIRECTORS DE LA TESI

Dra. Consol Serra Pujadas CiSAL – Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. PRBB Building. Doctor Aiguader, 88. 08003- Barcelona, España.

Dra. Consol Serra Pujadas CiSAL – Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. PRBB Building. Doctor Aiguader, 88. 08003- Barcelona, España.

Dr. George L Delclós Clanchet Southwest Center for Occupational and Environmental Health, the University of Texas School of Public Health. 1200 Pressler Street. Houston, Texas 77030, USA.

Dr. George L Delclós Clanchet Southwest Center for Occupational and Environmental Health, the University of Texas School of Public Health. 1200 Pressler Street. Houston, Texas 77030, USA.

DEPARTAMENT DE CIÈNCIES EXPERIMENTALS I DE LA SALUT

DEPARTAMENT DE CIÈNCIES EXPERIMENTALS I DE LA SALUT

i

i

184

184

A mis tres hijos, que son la pasión de mi vida, por su capacidad para comprender, aceptar y amar. A la memoria de mi padre a quien tanto le debo.

A mis tres hijos, que son la pasión de mi vida, por su capacidad para comprender, aceptar y amar. A la memoria de mi padre a quien tanto le debo.

iii

iii

184

184

AKNOWLEDGEMENTS (Agradecimientos – Agraïments)

AKNOWLEDGEMENTS (Agradecimientos – Agraïments)

Esta tesis llega cuando en mis gafas ya necesito una cierta adición de dioptrías

Esta tesis llega cuando en mis gafas ya necesito una cierta adición de dioptrías

para poder leer de cerca. Así que quiero comenzar por dar las gracias a todos los

para poder leer de cerca. Así que quiero comenzar por dar las gracias a todos los

que toman decisiones sobre el futuro académico de las personas sabiendo que el

que toman decisiones sobre el futuro académico de las personas sabiendo que el

empuje de la juventud no está en la edad sino en el espíritu.

empuje de la juventud no está en la edad sino en el espíritu.

Me viene a la cabeza la leyenda de San Agustín de Hipona y el niño en la playa y

Me viene a la cabeza la leyenda de San Agustín de Hipona y el niño en la playa y

no sé cómo voy a meter, con mi capacidad limitada, todo el agua del mar en un

no sé cómo voy a meter, con mi capacidad limitada, todo el agua del mar en un

hoyo tan pequeño. También ahora, en esta parte de mi tesis, se me ha pasado

hoyo tan pequeño. También ahora, en esta parte de mi tesis, se me ha pasado

por la cabeza buscar el respaldo de mis Directores para controlar los sesgos, pero

por la cabeza buscar el respaldo de mis Directores para controlar los sesgos, pero

no he querido correr el riesgo de que me recomienden la realización de otra

no he querido correr el riesgo de que me recomienden la realización de otra

revisión sistemática. Así que con mis solas palabras asumo el reto en solitario.

revisión sistemática. Así que con mis solas palabras asumo el reto en solitario.

Moltíssimes gràcies Consol. Gràcies mestra. La llista dels motius pel quals em

Moltíssimes gràcies Consol. Gràcies mestra. La llista dels motius pel quals em

sento tan agraït amb tu podria ser interminable. Comptes amb tota la meva

sento tan agraït amb tu podria ser interminable. Comptes amb tota la meva

admiració i respecte per la teva decència professional com a metgessa del treball i

admiració i respecte per la teva decència professional com a metgessa del treball i

investigadora, per la teva particular i innovadora visió de la salut laboral,

investigadora, per la teva particular i innovadora visió de la salut laboral,

professionalitat i capacitat com a directora de tesi. Gràcies pel teu suport

professionalitat i capacitat com a directora de tesi. Gràcies pel teu suport

permanent en aquest projecte i per ser, en l’hospital on treballem, una cap i una

permanent en aquest projecte i per ser, en l’hospital on treballem, una cap i una

companya de treball infatigable, incombustible. Mil gràcies per totes les

companya de treball infatigable, incombustible. Mil gràcies per totes les

oportunitats que m'has anat oferint al llarg d’aquestos darrers anys i que he

oportunitats que m'has anat oferint al llarg d’aquestos darrers anys i que he

intentat aprofitar sempre.

intentat aprofitar sempre.

Gracias Jordi, por guiar mis primeros pasos en el mundo de la investigación en

Gracias Jordi, por guiar mis primeros pasos en el mundo de la investigación en

2010, siendo mi tutor del Máster en Salud Laboral; en ese momento empezó a

2010, siendo mi tutor del Máster en Salud Laboral; en ese momento empezó a

gestarse la posibilidad de seguir más allá, después del Máster. Gracias maestro,

gestarse la posibilidad de seguir más allá, después del Máster. Gracias maestro,

por enseñarme tanto en la Unidad de Patología Laboral, por la magnífica

por enseñarme tanto en la Unidad de Patología Laboral, por la magnífica

experiencia en la Escuela de Salud Pública de la Universidad de Texas, por poner

experiencia en la Escuela de Salud Pública de la Universidad de Texas, por poner

a mi disposición tu enorme valía como clínico y como profesor. Gracias por tu

a mi disposición tu enorme valía como clínico y como profesor. Gracias por tu

humanidad, consideración, accesibilidad, ejemplaridad como investigador y

humanidad, consideración, accesibilidad, ejemplaridad como investigador y

v

v

paciencia como director de tesis. También, gracias a Conchita por su hospitalidad

paciencia como director de tesis. También, gracias a Conchita por su hospitalidad

y amabilidad durante mi estancia en Houston (Texas) en 2013.

y amabilidad durante mi estancia en Houston (Texas) en 2013.

Gracias Fernando, por abrirme las puertas del CiSAL, por tus constantes

Gracias Fernando, por abrirme las puertas del CiSAL, por tus constantes

propuestas para involucrarme en proyectos y por tu apoyo desde el primer minuto

propuestas para involucrarme en proyectos y por tu apoyo desde el primer minuto

para que mi estancia en la Universidad de Groningen (Holanda) en 2013 fuera

para que mi estancia en la Universidad de Groningen (Holanda) en 2013 fuera

posible.

posible.

Thank you Ute, Femke and Iris (University Medical Center Groningen, The

Thank you Ute, Femke and Iris (University Medical Center Groningen, The

Netherlands). Your warm welcome, support and help in this thesis at any time in

Netherlands). Your warm welcome, support and help in this thesis at any time in

Groningen were invaluable. Thank you, Roy, for making statistical analysis

Groningen were invaluable. Thank you, Roy, for making statistical analysis

understandable and also for your kind and disinterested help and availability at all

understandable and also for your kind and disinterested help and availability at all

times.

times.

Gràcies a la Direcció del PSMAR, on exerceixo com a metge del treball. Gràcies

Gràcies a la Direcció del PSMAR, on exerceixo com a metge del treball. Gràcies

per fer realitat que aquesta organització sigui un dels pols més dinàmics de

per fer realitat que aquesta organització sigui un dels pols més dinàmics de

coneixement assistencial, docent i de recerca de la ciutat de Barcelona. Gràcies

coneixement assistencial, docent i de recerca de la ciutat de Barcelona. Gràcies

pel suport prestat per l'obtenció de la menció europea al títol de doctorat. Gràcies

pel suport prestat per l'obtenció de la menció europea al títol de doctorat. Gràcies

als/les companys/es metges/esses i infermers/es del PSMAR, que m'han ajudat

als/les companys/es metges/esses i infermers/es del PSMAR, que m'han ajudat

amb tanta generositat en el treball de camp.

amb tanta generositat en el treball de camp.

Gràcies als companys i companyes del Servei de Salut Laboral del PSMAR, pel

Gràcies als companys i companyes del Servei de Salut Laboral del PSMAR, pel

suport donat a aquest projecte sempre que l’he necessitat: a Aida, a Carmen, a

suport donat a aquest projecte sempre que l’he necessitat: a Aida, a Carmen, a

Chelo, a Julià i a Nuria. Mil gràcies a Fina Pi-Sunyer i a Joan Mirabent, per ser tan

Chelo, a Julià i a Nuria. Mil gràcies a Fina Pi-Sunyer i a Joan Mirabent, per ser tan

excel·lents infermers del treball i per tanta generositat en vostra col·laboració

excel·lents infermers del treball i per tanta generositat en vostra col·laboració

durant el treball de camp. Moltíssimes gràcies Dra. Villar (Rocio), sense el teu

durant el treball de camp. Moltíssimes gràcies Dra. Villar (Rocio), sense el teu

suport i companyonia de veritable col·lega tal vegada la estada a Groningen no

suport i companyonia de veritable col·lega tal vegada la estada a Groningen no

hagués estat possible. De tot cor, moltes gràcies companys i companyes.

hagués estat possible. De tot cor, moltes gràcies companys i companyes.

Gracias a los amigos y amigas del CiSAL-UPF por contar siempre conmigo, a

Gracias a los amigos y amigas del CiSAL-UPF por contar siempre conmigo, a

pesar de no estar físicamente presente en el PRBB. Gracias a Montse Fernández

pesar de no estar físicamente presente en el PRBB. Gracias a Montse Fernández

y a Sandra Garrido por su ayuda siempre diligente en cualquier gestión con la

y a Sandra Garrido por su ayuda siempre diligente en cualquier gestión con la

vi

vi

vi

vi

Universidad y con el CIBERSP. Gracias a María López por el ánimo ofrecido en

Universidad y con el CIBERSP. Gracias a María López por el ánimo ofrecido en

todo momento y a Sergio Vargas por todos los favores realizados durante estos

todo momento y a Sergio Vargas por todos los favores realizados durante estos

años.

años.

Y para terminar gracias a mis hermanas y a mi madre, siempre disponibles. Yo

Y para terminar gracias a mis hermanas y a mi madre, siempre disponibles. Yo

siento por vosotras tres verdadera devoción; a Ram Dulthummon por su apoyo y

siento por vosotras tres verdadera devoción; a Ram Dulthummon por su apoyo y

por sus palabras (always in English) para poner algún límite a mi fantasía a veces

por sus palabras (always in English) para poner algún límite a mi fantasía a veces

desbordada y ayudarme a mantener los pies sobre la tierra; a mis tres hijos José,

desbordada y ayudarme a mantener los pies sobre la tierra; a mis tres hijos José,

Borja y María Ángeles, por su generosa ayuda con la base de datos, por soportar

Borja y María Ángeles, por su generosa ayuda con la base de datos, por soportar

mil veces los ensayos de mis exposiciones y por sus muestras de aliento

mil veces los ensayos de mis exposiciones y por sus muestras de aliento

permanente.

permanente.

Y como no, gracias a la madre de mis tres hijos, Ángeles Calaforra, por su

Y como no, gracias a la madre de mis tres hijos, Ángeles Calaforra, por su

inmensa generosidad, por estar siempre ahí cuando ha sido necesario, por su

inmensa generosidad, por estar siempre ahí cuando ha sido necesario, por su

paciencia inagotable, y por sus consejos siempre sensatos cuando han aparecido

paciencia inagotable, y por sus consejos siempre sensatos cuando han aparecido

dudas sobre el sentido de este proyecto en este momento de mi vida.

dudas sobre el sentido de este proyecto en este momento de mi vida.

A todos y todas, un millón de gracias.

A todos y todas, un millón de gracias.

vii

vii

184

184

SUMMARY

SUMMARY

Background

Background

Health and work mutually influence the working population. Health-related work

Health and work mutually influence the working population. Health-related work

functioning is the worker’s ability to meet work demands for a given health status.

functioning is the worker’s ability to meet work demands for a given health status.

Quality validated measurement tools are needed to assess how workers function


at work along their professional life course and to evaluate interventions to

at work along their professional life course and to evaluate interventions to

accommodate job conditions to the worker’s skills and health status.

accommodate job conditions to the worker’s skills and health status.

The use of directly translated measurement tools may lead to unreliable or

The use of directly translated measurement tools may lead to unreliable or

misleading results in research and practice, and could limit the exchange of

misleading results in research and practice, and could limit the exchange of

information in the scientific community. Due to possible cultural differences in

information in the scientific community. Due to possible cultural differences in

perception of work, health and disease, instruments developed in other languages

perception of work, health and disease, instruments developed in other languages

or cultures should be systematically translated, adapted and validated for use in

or cultures should be systematically translated, adapted and validated for use in

different target languages or cultures.

different target languages or cultures.

The Work Role Functioning Questionnaire (WRFQ) is an instrument designed to

The Work Role Functioning Questionnaire (WRFQ) is an instrument designed to

measure self-perceived difficulties to perform work, in active workers, given a

measure self-perceived difficulties to perform work, in active workers, given a

certain health condition. Its results can be interpreted in terms of work functioning,

certain health condition. Its results can be interpreted in terms of work functioning,

work performance, work productivity, work disability and presenteeism, and they

work performance, work productivity, work disability and presenteeism, and they

can be transformed into meaningful social and economic outcomes.

can be transformed into meaningful social and economic outcomes.

Objective

Objective

The aim of this thesis was to provide a high quality validated instrument in

The aim of this thesis was to provide a high quality validated instrument in

Spanish, able to assess the impact of health on “work functioning” and describe

Spanish, able to assess the impact of health on “work functioning” and describe

the extent to which workers improve or deteriorate their ability to meet the

the extent to which workers improve or deteriorate their ability to meet the

demands of the job in Spanish-speaking populations.

demands of the job in Spanish-speaking populations.

This overall objective was carried out through three specific objectives: 1) to

This overall objective was carried out through three specific objectives: 1) to

review the literature on the methodology for cross-cultural adaptation and

review the literature on the methodology for cross-cultural adaptation and

ix

ix

validation (CCAV) of health questionnaires; 2) to estimate the degree of

validation (CCAV) of health questionnaires; 2) to estimate the degree of

compliance with literature recommendations for CCAV in Spanish and Latin

compliance with literature recommendations for CCAV in Spanish and Latin

American scientific journals; and 3) to translate and cross-culturally adapt the

American scientific journals; and 3) to translate and cross-culturally adapt the

WRFQ and validate it in a sample of a general working Spanish-speaking

WRFQ and validate it in a sample of a general working Spanish-speaking

population.

population.

Methods

Methods

An evidence-based decision was taken to select a generic measurement

An evidence-based decision was taken to select a generic measurement

instrument that evaluates health-related work functioning. A comprehensive

instrument that evaluates health-related work functioning. A comprehensive

literature review was performed to identify and synthesize recommendations on

literature review was performed to identify and synthesize recommendations on

the methodology of CCAV of health questionnaires. Five high impact journals in

the methodology of CCAV of health questionnaires. Five high impact journals in

epidemiology and/or public health from Spain and Latin America were analyzed to

epidemiology and/or public health from Spain and Latin America were analyzed to

estimate the degree of compliance with the methodological recommendations.

estimate the degree of compliance with the methodological recommendations.

A systematic 5-step procedure (direct translation, synthesis, back-translation,

A systematic 5-step procedure (direct translation, synthesis, back-translation,

consolidation by an expert committee and pre-test) described in the literature was

consolidation by an expert committee and pre-test) described in the literature was

followed to translate, cross-cultural adapt and validate the WRFQ. The

followed to translate, cross-cultural adapt and validate the WRFQ. The

applicability, readability and integrity of the Spanish version of the Work Role

applicability, readability and integrity of the Spanish version of the Work Role

Functioning Questionnaire (WRFQ-SpV), together with its preliminary internal

Functioning Questionnaire (WRFQ-SpV), together with its preliminary internal

consistency, test-retest reliability and validity were assessed in a pre-test with 40

consistency, test-retest reliability and validity were assessed in a pre-test with 40

participants.

participants.

Next, a cross-sectional study was conducted among 455 active workers of a

Next, a cross-sectional study was conducted among 455 active workers of a

general working population to evaluate the reliability and validity of the WRFQ-

general working population to evaluate the reliability and validity of the WRFQ-

SpV. A longitudinal survey was carried out to examine the responsiveness in a

SpV. A longitudinal survey was carried out to examine the responsiveness in a

sample of 102 workers of this general working population. The consensus-based

sample of 102 workers of this general working population. The consensus-based

standards on measurement properties of health status measurement instruments

standards on measurement properties of health status measurement instruments

(COSMIN) guided the design of the different studies.

(COSMIN) guided the design of the different studies.

x

x

x

x

Results

Results

To identify and synthesize the literature recommendations on the methodology of

To identify and synthesize the literature recommendations on the methodology of

CCAV of health questionnaires, 21 articles (out of 214 citations) and seven

CCAV of health questionnaires, 21 articles (out of 214 citations) and seven

relevant books were selected for full text analysis. A high degree of consensus

relevant books were selected for full text analysis. A high degree of consensus

was found on the steps to follow to guarantee conceptual, semantic, idiomatic and

was found on the steps to follow to guarantee conceptual, semantic, idiomatic and

experiential equivalence. Two steps were widely recommended to carry out the

experiential equivalence. Two steps were widely recommended to carry out the

CCAV process: first, the cross-cultural adaptation process (following a systematic

CCAV process: first, the cross-cultural adaptation process (following a systematic

and rigorous procedure); and secondly, validation in the target language

and rigorous procedure); and secondly, validation in the target language

(evaluating reliability, validity and responsiveness). Only 6% of the retrieved

(evaluating reliability, validity and responsiveness). Only 6% of the retrieved

articles followed all recommended steps.

articles followed all recommended steps.

The CCAV of the WRFQ was carried out without major difficulty. Idiomatic

The CCAV of the WRFQ was carried out without major difficulty. Idiomatic

challenges were found and an expert committee provided a solution. The

challenges were found and an expert committee provided a solution. The

questionnaire showed adequate applicability and good face and content validity.

questionnaire showed adequate applicability and good face and content validity.

Internal consistency was satisfactory (Cronbach alpha =0.98). The original five

Internal consistency was satisfactory (Cronbach alpha =0.98). The original five

factor structure of the WRFQ reflected fair dimensionality of the construct (Chi-

factor structure of the WRFQ reflected fair dimensionality of the construct (Chi-

square, 1445.8; 314 degrees of freedom; root mean square error of approximation

square, 1445.8; 314 degrees of freedom; root mean square error of approximation

[RMSEA] =0.08; comparative fit index [CFI] >0.95 and weighed root mean residual

[RMSEA] =0.08; comparative fit index [CFI] >0.95 and weighed root mean residual

[WRMR] >0.90). The test–retest reliability showed good reproducibility of the

[WRMR] >0.90). The test–retest reliability showed good reproducibility of the

questionnaire outcomes (0.77 ≤ intraclass correlation coefficient [ICC] ≤ 0.93 and

questionnaire outcomes (0.77 ≤ intraclass correlation coefficient [ICC] ≤ 0.93 and

standard error of measurement [SEM] =7.10). For construct validity assessment,

standard error of measurement [SEM] =7.10). For construct validity assessment,

all formulated hypotheses were confirmed differentiating groups with different jobs,

all formulated hypotheses were confirmed differentiating groups with different jobs,

health conditions and ages. Moreover, we verified that the WRFQ-SpV was able to

health conditions and ages. Moreover, we verified that the WRFQ-SpV was able to

detect (true) changes over time.

detect (true) changes over time.

Conclusions:

Conclusions:

The CCAV process should follow several well established steps. However, the

The CCAV process should follow several well established steps. However, the

degree of compliance of the scientific literature with the methodological

degree of compliance of the scientific literature with the methodological

recommendations for CCAV can be improved. The WRFQ-SpV is a reliable and

recommendations for CCAV can be improved. The WRFQ-SpV is a reliable and

valid instrument to measure health-related work functioning in day-to-day practice

valid instrument to measure health-related work functioning in day-to-day practice

xi

xi

and research in occupational health. Suggestive evidence about the possible use

and research in occupational health. Suggestive evidence about the possible use

of the WRFQ-SpV in evaluative studies was found. More research is needed to

of the WRFQ-SpV in evaluative studies was found. More research is needed to

examine the instrument responsiveness for groups who do not experience health

examine the instrument responsiveness for groups who do not experience health

improvement or deteriorate.

improvement or deteriorate.

Key words:

Key words:

Work functioning instrument; questionnaires; scales; health survey; measurement

Work functioning instrument; questionnaires; scales; health survey; measurement

instrument; cross-cultural comparison; validation studies; psychometric properties;

instrument; cross-cultural comparison; validation studies; psychometric properties;

reliability; validity; responsiveness.

reliability; validity; responsiveness.

xii

xii

xii

xii

RESUMEN

RESUMEN

Antecedentes

Antecedentes

Salud y trabajo constituyen un binomio con una permanente influencia mutua. El

Salud y trabajo constituyen un binomio con una permanente influencia mutua. El

desempeño del trabajo en relación con la salud se define como la capacidad de

desempeño del trabajo en relación con la salud se define como la capacidad de

un/a trabajador/a para dar respuesta a las demandas del trabajo dado un

un/a trabajador/a para dar respuesta a las demandas del trabajo dado un

determinado estado de salud. Se necesitan herramientas de medición validadas

determinado estado de salud. Se necesitan herramientas de medición validadas

de calidad para evaluar los niveles de desempeño del trabajo a lo largo de la vida

de calidad para evaluar los niveles de desempeño del trabajo a lo largo de la vida

laboral y para evaluar las intervenciones destinadas a adaptar las condiciones de

laboral y para evaluar las intervenciones destinadas a adaptar las condiciones de

trabajo a las habilidades y el estado de salud de la población trabajadora.

trabajo a las habilidades y el estado de salud de la población trabajadora.

El uso de instrumentos literalmente traducidos puede dar lugar a resultados poco

El uso de instrumentos literalmente traducidos puede dar lugar a resultados poco

fiables o engañosos en la práctica y en la investigación, pudiendo limitar el

fiables o engañosos en la práctica y en la investigación, pudiendo limitar el

intercambio de información en la comunidad científica. Debido a las posibles

intercambio de información en la comunidad científica. Debido a las posibles

diferencias culturales en la percepción del trabajo, la salud y la enfermedad, los

diferencias culturales en la percepción del trabajo, la salud y la enfermedad, los

instrumentos desarrollados en otros idiomas o culturas deberían ser traducidos de

instrumentos desarrollados en otros idiomas o culturas deberían ser traducidos de

manera sistemática, adaptados y validados para su uso en idiomas o culturas

manera sistemática, adaptados y validados para su uso en idiomas o culturas

diferentes.

diferentes.

El Cuestionario de Desempeño del Trabajo (del inglés Work Role Functioning

El Cuestionario de Desempeño del Trabajo (del inglés Work Role Functioning

Questionnaire, WRFQ) es un instrumento para medir las dificultades auto-

Questionnaire, WRFQ) es un instrumento para medir las dificultades auto-

percibidas para desempeñar el trabajo, en trabajadores en activo, dado un

percibidas para desempeñar el trabajo, en trabajadores en activo, dado un

determinado estado de salud. Sus resultados pueden ser interpretados en

determinado estado de salud. Sus resultados pueden ser interpretados en

términos de desempeño, rendimiento o productividad en el trabajo, discapacidad

términos de desempeño, rendimiento o productividad en el trabajo, discapacidad

laboral y presentismo, pudiendo ser transformados en resultados con significación

laboral y presentismo, pudiendo ser transformados en resultados con significación

social y económica.

social y económica.

xiii

xiii

Objetivo

Objetivo

El objetivo de esta tesis fue poner a disposición un instrumento de calidad

El objetivo de esta tesis fue poner a disposición un instrumento de calidad

validado en español, capaz de evaluar el impacto de la salud en el desempeño del

validado en español, capaz de evaluar el impacto de la salud en el desempeño del

trabajo, y describir el grado en que los trabajadores mejoran o empeoran su

trabajo, y describir el grado en que los trabajadores mejoran o empeoran su

capacidad para dar respuesta a las demandas del trabajo.

capacidad para dar respuesta a las demandas del trabajo.

Este objetivo general se llevó a cabo por medio de tres objetivos específicos: 1)

Este objetivo general se llevó a cabo por medio de tres objetivos específicos: 1)

revisar la literatura sobre la metodología para la traducción, adaptación cultural y

revisar la literatura sobre la metodología para la traducción, adaptación cultural y

validación (TACV) de cuestionarios de salud; 2) estimar el grado de cumplimiento

validación (TACV) de cuestionarios de salud; 2) estimar el grado de cumplimiento

de las recomendaciones metodológicas en revistas científicas Españolas y de

de las recomendaciones metodológicas en revistas científicas Españolas y de

América Latina; 3) Traducir y adaptar el WRFQ y validarlo en una muestra de la

América Latina; 3) Traducir y adaptar el WRFQ y validarlo en una muestra de la

población general trabajadora hispano-parlante.

población general trabajadora hispano-parlante.

Métodos

Métodos

Se seleccionó un instrumento genérico para evaluar el desempeño del trabajo en

Se seleccionó un instrumento genérico para evaluar el desempeño del trabajo en

relación con la salud en base a la evidencia. Se llevó a cabo una revisión

relación con la salud en base a la evidencia. Se llevó a cabo una revisión

bibliográfica exhaustiva para identificar y sistematizar las recomendaciones de la

bibliográfica exhaustiva para identificar y sistematizar las recomendaciones de la

literatura sobre la TACV de cuestionarios de salud y adicionalmente se analizaron

literatura sobre la TACV de cuestionarios de salud y adicionalmente se analizaron

cinco revistas de epidemiología y/o salud pública de España y América Latina,

cinco revistas de epidemiología y/o salud pública de España y América Latina,

con los factores de impacto más altos, para estimar el grado de cumplimiento con

con los factores de impacto más altos, para estimar el grado de cumplimiento con

las recomendaciones metodológicas.

las recomendaciones metodológicas.

Se

siguió

un

procedimiento

en

5

pasos

(traducción

directa,

síntesis,

Se

siguió

un

procedimiento

en

5

pasos

(traducción

directa,

síntesis,

retrotraducción, consolidación por un comité de expertos y pre-test) descrito en la

retrotraducción, consolidación por un comité de expertos y pre-test) descrito en la

literatura para traducir, adaptar y validar el WRFQ. Se realizó un pre-test con 40

literatura para traducir, adaptar y validar el WRFQ. Se realizó un pre-test con 40

participantes para evaluar la aplicabilidad, legibilidad e integridad de la versión

participantes para evaluar la aplicabilidad, legibilidad e integridad de la versión

española del WRFQ (WRFQ-SpV), junto con su consistencia interna, fiabilidad

española del WRFQ (WRFQ-SpV), junto con su consistencia interna, fiabilidad

test-retest y validez.

test-retest y validez.

xiv

xiv

xiv

xiv

Posteriormente, se llevó a cabo un estudio transversal con una muestra de 455

Posteriormente, se llevó a cabo un estudio transversal con una muestra de 455

trabajadores en activo para evaluar la fiabilidad y validez del WRFQ-SpV. Se llevó

trabajadores en activo para evaluar la fiabilidad y validez del WRFQ-SpV. Se llevó

a cabo un estudio longitudinal en una muestra de 102 trabajadores en activo de

a cabo un estudio longitudinal en una muestra de 102 trabajadores en activo de

una población general para examinar su sensibilidad al cambio. Se utilizaron los

una población general para examinar su sensibilidad al cambio. Se utilizaron los

estándares de consenso para la evaluación de las propiedades de medición de

estándares de consenso para la evaluación de las propiedades de medición de

los cuestionarios de salud (COSMIN) en el diseño de los diferentes estudios.

los cuestionarios de salud (COSMIN) en el diseño de los diferentes estudios.

Resultados

Resultados

Para identificar y sistematizar las recomendaciones metodológicas existentes en

Para identificar y sistematizar las recomendaciones metodológicas existentes en

la literatura, se seleccionaron 21 artículos (de un total de 214 citas) y siete libros

la literatura, se seleccionaron 21 artículos (de un total de 214 citas) y siete libros

relevantes para su análisis. Se encontró un alto grado de consenso en la

relevantes para su análisis. Se encontró un alto grado de consenso en la

realización de dos pasos en la TACV para garantizar la equivalencia conceptual,

realización de dos pasos en la TACV para garantizar la equivalencia conceptual,

semántica, idiomática y vivencial. El primero, el proceso de adaptación cultural

semántica, idiomática y vivencial. El primero, el proceso de adaptación cultural

(siguiendo un procedimiento sistemático y riguroso), y el segundo, la validación en

(siguiendo un procedimiento sistemático y riguroso), y el segundo, la validación en

el idioma de destino (evaluando la fiabilidad, validez y sensibilidad al cambio).

el idioma de destino (evaluando la fiabilidad, validez y sensibilidad al cambio).

El grado de cumplimiento de las recomendaciones metodológicas para llevar a

El grado de cumplimiento de las recomendaciones metodológicas para llevar a

cabo la TACV puede ser mejorado. El 6% de los artículos recuperados siguieron

cabo la TACV puede ser mejorado. El 6% de los artículos recuperados siguieron

todos los pasos recomendados en la literatura que les eran aplicables.

todos los pasos recomendados en la literatura que les eran aplicables.

La TACV del WRFQ se llevó a cabo sin dificultades relevantes. Se encontraron

La TACV del WRFQ se llevó a cabo sin dificultades relevantes. Se encontraron

desafíos idiomáticos y un comité de expertos proporcionó una solución. El

desafíos idiomáticos y un comité de expertos proporcionó una solución. El

cuestionario mostró una adecuada aplicabilidad, validez aparente o lógica así

cuestionario mostró una adecuada aplicabilidad, validez aparente o lógica así

como de contenido. La consistencia interna fue satisfactoria (alfa de Cronbach

como de contenido. La consistencia interna fue satisfactoria (alfa de Cronbach

=0.98). La estructura original de cinco factores del WRFQ refleja una adecuada

=0.98). La estructura original de cinco factores del WRFQ refleja una adecuada

dimensionalidad del constructo (Chi-cuadrado, 1445,8; 314 grados de libertad;

dimensionalidad del constructo (Chi-cuadrado, 1445,8; 314 grados de libertad;

error cuadrático medio de aproximación [RMSEA] =0,08, índice de ajuste

error cuadrático medio de aproximación [RMSEA] =0,08, índice de ajuste

comparativo [CFI] >0,95 y media ponderada de la raíz residual [WRMR] >0,90). La

comparativo [CFI] >0,95 y media ponderada de la raíz residual [WRMR] >0,90). La

fiabilidad test-retest mostró una buena reproductibilidad de las puntuaciones del

fiabilidad test-retest mostró una buena reproductibilidad de las puntuaciones del

cuestionario (0.77 ≤ coeficiente de correlación intraclase [CCI] ≤ 0.93 y error

cuestionario (0.77 ≤ coeficiente de correlación intraclase [CCI] ≤ 0.93 y error

estándar de la medida [SEM] =7.10). Para la evaluación de la validez de

estándar de la medida [SEM] =7.10). Para la evaluación de la validez de

xv

xv

constructo se confirmaron todas las hipótesis formuladas, diferenciando grupos

constructo se confirmaron todas las hipótesis formuladas, diferenciando grupos

con diferentes trabajos, problemas de salud y grupos de edad. Se verificó que el

con diferentes trabajos, problemas de salud y grupos de edad. Se verificó que el

WRFQ-SpV fue capaz de detectar cambios (verdaderos) a lo largo del tiempo.

WRFQ-SpV fue capaz de detectar cambios (verdaderos) a lo largo del tiempo.

Conclusiones

Conclusiones

El proceso de TACV debería seguir varios pasos bien establecidos. Sin embargo,

El proceso de TACV debería seguir varios pasos bien establecidos. Sin embargo,

el grado de cumplimiento de las recomendaciones metodológicas propuestas en

el grado de cumplimiento de las recomendaciones metodológicas propuestas en

la literatura científica para la TACV puede ser mejorado. El WRFQ-SpV es un

la literatura científica para la TACV puede ser mejorado. El WRFQ-SpV es un

instrumento fiable y válido para medir el desempeño del trabajo en relación con la

instrumento fiable y válido para medir el desempeño del trabajo en relación con la

salud tanto para la práctica diaria como para la investigación en salud laboral. Se

salud tanto para la práctica diaria como para la investigación en salud laboral. Se

ha encontrado evidencia sugerente sobre el posible uso de la WRFQ-SpV con

ha encontrado evidencia sugerente sobre el posible uso de la WRFQ-SpV con

fines evaluativos. Se necesita investigación adicional para examinar la

fines evaluativos. Se necesita investigación adicional para examinar la

sensibilidad al cambio del instrumento en grupos que no experimentan mejoría o

sensibilidad al cambio del instrumento en grupos que no experimentan mejoría o

que sufren deterioro de su salud.

que sufren deterioro de su salud.

Palabras clave:

Palabras clave:

Desempeño en el trabajo; cuestionarios, escalas; encuesta de salud; instrumento

Desempeño en el trabajo; cuestionarios, escalas; encuesta de salud; instrumento

de

de

medición;

adaptación

cultural;

estudios

de

validación;

propiedades

psicométricas; fiabilidad; validez; sensibilidad al cambio.

xvi

medición;

adaptación

cultural;

estudios

de

validación;

propiedades

psicométricas; fiabilidad; validez; sensibilidad al cambio.

xvi

xvi

xvi

PREFACE

PREFACE

The analysis of measurement instruments for use in occupational health research

The analysis of measurement instruments for use in occupational health research

and practice is currently an area of research interest within the Center for

and practice is currently an area of research interest within the Center for

Research in Occupational Health (CiSAL), and it is in this context that this doctoral

Research in Occupational Health (CiSAL), and it is in this context that this doctoral

thesis was undertaken. Its content is part of a CiSAL research project entitled

thesis was undertaken. Its content is part of a CiSAL research project entitled

“Evaluation of health-related work functioning and identification of preventive

“Evaluation of health-related work functioning and identification of preventive

interventions

interventions

with

the

Spanish

version

of

the Work

Role

Functioning

with

the

Spanish

version

of

the Work

Role

Functioning

Questionnaire”. This project is funded by the Instituto de Salud Carlos III, ISCIII

Questionnaire”. This project is funded by the Instituto de Salud Carlos III, ISCIII

(Ministry of

(Ministry of

Economy and Competitiveness, Spanish Government), FIS:

Economy and Competitiveness, Spanish Government), FIS:

PI12/02556 (Principal Investigator, Consol Serra Pujadas; co-investigators, José

PI12/02556 (Principal Investigator, Consol Serra Pujadas; co-investigators, José

María Ramada and George Delclos).

María Ramada and George Delclos).

This project arises from the need for validated instruments to assess the impact of

This project arises from the need for validated instruments to assess the impact of

health on “work functioning” in Spanish-speaking populations. There are a number

health on “work functioning” in Spanish-speaking populations. There are a number

of instruments to evaluate “health-related work functioning” in English, but these

of instruments to evaluate “health-related work functioning” in English, but these

have not always been adapted and/or validated into the Spanish context. Thus,

have not always been adapted and/or validated into the Spanish context. Thus,

identifying and selecting an instrument to properly measure health-related work

identifying and selecting an instrument to properly measure health-related work

functioning and then translating, adapting and validating its measurement

functioning and then translating, adapting and validating its measurement

properties, for future use in research, was consistent with the goals of this project.

properties, for future use in research, was consistent with the goals of this project.

According to the policy of the Doctoral Program Committee in the Department of

According to the policy of the Doctoral Program Committee in the Department of

Experimental and Health Sciences at Pompeu Fabra University, this thesis is

Experimental and Health Sciences at Pompeu Fabra University, this thesis is

presented as a compendium of four scientific publications, derived from the

presented as a compendium of four scientific publications, derived from the

literature review and field work conducted in the Parc de Salut Mar de Barcelona

literature review and field work conducted in the Parc de Salut Mar de Barcelona

health system. The first publication was written in Spanish and the other three in

health system. The first publication was written in Spanish and the other three in

English. All have been published recently in international occupational health peer-

English. All have been published recently in international occupational health peer-

reviewed journals, indexed in PubMed, with the PhD candidate as first author.

reviewed journals, indexed in PubMed, with the PhD candidate as first author.

xvii

xvii

The results have been presented in part at several scientific meetings, specifically:

The results have been presented in part at several scientific meetings, specifically:

the First CiSAL Annual Scientific Meeting (1), the Second Scientific Conference on

the First CiSAL Annual Scientific Meeting (1), the Second Scientific Conference on

Work Disability Prevention and Integration (WDPI) ( 2 ), the XXII Diada of the

Work Disability Prevention and Integration (WDPI) ( 2 ), the XXII Diada of the

Catalan Society of Safety and Occupational Medicine (3), the Third CiSAL Annual

Catalan Society of Safety and Occupational Medicine (3), the Third CiSAL Annual

Scientific Meeting (4) and the First BiblioPRO Scientific Meeting (5).

Scientific Meeting (4) and the First BiblioPRO Scientific Meeting (5).

In addition to the funding from the Instituto de Salud Carlos III (PI12/ 02556), this

In addition to the funding from the Instituto de Salud Carlos III (PI12/ 02556), this

thesis received partial financial support from the The University of Texas School of

thesis received partial financial support from the The University of Texas School of

Public Health at Houston (USA) and from the Network of Biomedical Research

Public Health at Houston (USA) and from the Network of Biomedical Research

Centers in Epidemiology and Public Health (CIBERESP), for completion of short-

Centers in Epidemiology and Public Health (CIBERESP), for completion of short-

term stays at international universities, in order to fulfill the requirements for a

term stays at international universities, in order to fulfill the requirements for a

doctorate with European mention.

doctorate with European mention.

(1) Ramada JM, Serra C, Delclós J. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. 1ª Jornada Científica CISAL. Barcelona, 2011.


(2) Ramada JM, Serra C, Delclós GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Second Scientific Conference on Work Disability Prevention and Integration ‘Healthy ageing in a working society’. WDPI; Groningen, 2012.


(3) Ramada JM. Qüestionaris de salut de qualitat: requisits bàsics. XXII Diada de la Societat Catalana de Seguretat i Medicina del Treball. Barcelona, 2012.


(4) Ramada JM, Serra C, Delclós J. Traducción, adaptación cultural y validación del “Work role functioning questionnaire (WRFQ-27)”. 3ª Jornada Científica CISAL. Barcelona, 2013.


(5) Ramada JM, Serra C, Amick BC, Castaño JR, Delclós GL. Adaptación cultural del "Work Role Functioning Questionnaire (WRFQ)" al castellano hablado en España. I Jonada Científica BiblioPRO. IMIM-CIBERESP. Barcelona, 2013.


xviii

xviii

xviii

xviii

PRÓLOGO

PRÓLOGO

El análisis de instrumentos de medición para su uso en la investigación y la

El análisis de instrumentos de medición para su uso en la investigación y la

práctica diaria en salud laboral es, en estos momentos, un área de interés para la

práctica diaria en salud laboral es, en estos momentos, un área de interés para la

investigación del Centro de Investigación en Salud Laboral (CiSAL), y es en este

investigación del Centro de Investigación en Salud Laboral (CiSAL), y es en este

contexto en el que se ha desarrollado la presente tesis doctoral. El contenido de

contexto en el que se ha desarrollado la presente tesis doctoral. El contenido de

esta tesis forma parte del proyecto de investigación del CiSAL, titulado

esta tesis forma parte del proyecto de investigación del CiSAL, titulado

“Evaluación de la capacidad para trabajar y posibilidades de intervención

“Evaluación de la capacidad para trabajar y posibilidades de intervención

mediante el Work Role Functioning Questionnaire adaptado al castellano”. Este

mediante el Work Role Functioning Questionnaire adaptado al castellano”. Este

proyecto ha sido financiado por el Instituto de Salud Carlos III, ISCIII (Ministerio de

proyecto ha sido financiado por el Instituto de Salud Carlos III, ISCIII (Ministerio de

Economía

Economía

y

Competitividad,

Gobierno

de

España),

FIS:

PI12/02556,

y

Competitividad,

Gobierno

de

España),

FIS:

PI12/02556,

(Investigadora principal Consol Serra Pujadas; co-investigadores José María

(Investigadora principal Consol Serra Pujadas; co-investigadores José María

Ramada y George Delclós).

Ramada y George Delclós).

Este proyecto surge de la necesidad de disponer de instrumentos en Español

Este proyecto surge de la necesidad de disponer de instrumentos en Español

validados para evaluar el impacto de la salud sobre el “desempeño del trabajo” en

validados para evaluar el impacto de la salud sobre el “desempeño del trabajo” en

poblaciones hispano-parlantes. Existe un número de instrumentos para evaluar el

poblaciones hispano-parlantes. Existe un número de instrumentos para evaluar el

“desempeño del trabajo” en relación con la salud en Inglés, pero no siempre han

“desempeño del trabajo” en relación con la salud en Inglés, pero no siempre han

sido adaptados y/o validados en el contexto Español. Por ello, la identificación y

sido adaptados y/o validados en el contexto Español. Por ello, la identificación y

selección de un instrumento para medir adecuadamente el “desempeño del

selección de un instrumento para medir adecuadamente el “desempeño del

trabajo” en relación con la salud y proceder a su traducción, adaptación y

trabajo” en relación con la salud y proceder a su traducción, adaptación y

validación de sus propiedades de medición, para su uso en futuras

validación de sus propiedades de medición, para su uso en futuras

investigaciones, es consistente con los objetivos de este proyecto.

investigaciones, es consistente con los objetivos de este proyecto.

Conforme a la normativa dada por la Comisión de Dirección del Programa de

Conforme a la normativa dada por la Comisión de Dirección del Programa de

Doctorado del Departamento de Ciencias Experimentales y de la Salud de la

Doctorado del Departamento de Ciencias Experimentales y de la Salud de la

Universidad Pompeu Fabra, esta tesis doctoral se presenta como un compendio

Universidad Pompeu Fabra, esta tesis doctoral se presenta como un compendio

de cuatro publicaciones científicas en las que el doctorando es el primer autor,

de cuatro publicaciones científicas en las que el doctorando es el primer autor,

fruto de la revisión de la literatura y el trabajo de campo llevado a cabo en el

fruto de la revisión de la literatura y el trabajo de campo llevado a cabo en el

sistema hospitalario del Parc de Salut Mar de Barcelona. La primera de las

sistema hospitalario del Parc de Salut Mar de Barcelona. La primera de las

publicaciones fue escrita en español y las tres restantes en inglés. Tres de ellas

publicaciones fue escrita en español y las tres restantes en inglés. Tres de ellas

xix

xix

han sido publicadas recientemente en revistas internacionales de salud laboral,

han sido publicadas recientemente en revistas internacionales de salud laboral,

indexadas en PubMed y con revisión por pares. La cuarta se encuentra en el

indexadas en PubMed y con revisión por pares. La cuarta se encuentra en el

momento de la impresión de esta tesis en proceso de revisión por pares, en una

momento de la impresión de esta tesis en proceso de revisión por pares, en una

revista internacional de salud laboral, asimismo indexada en Pubmed.

revista internacional de salud laboral, asimismo indexada en Pubmed.

Los resultados han sido presentados parcialmente en la Primera Jornada

Los resultados han sido presentados parcialmente en la Primera Jornada

Científica Anual del CiSAL (6); la Second Scientific Conference on Work Disability

Científica Anual del CiSAL (6); la Second Scientific Conference on Work Disability

Prevention and Integration (WDPI) (7); la XXII Diada de la Societat Catalana de

Prevention and Integration (WDPI) (7); la XXII Diada de la Societat Catalana de

Seguretat i Medicina del Treball (8); la Tercera Jornada Científica Anual del CiSAL

Seguretat i Medicina del Treball (8); la Tercera Jornada Científica Anual del CiSAL

(9) y en la Primera Jornada Científica BiblioPRO (10).

(9) y en la Primera Jornada Científica BiblioPRO (10).

Adicionalmente a la financiación del Instituto de Salud Carlos III (PI12/ 02556),

Adicionalmente a la financiación del Instituto de Salud Carlos III (PI12/ 02556),

esta tesis recibió apoyo económico parcial de la Escuela de Salud Pública de la

esta tesis recibió apoyo económico parcial de la Escuela de Salud Pública de la

Universidad de Texas (Estados Unidos de América) y del Centro de Investigación

Universidad de Texas (Estados Unidos de América) y del Centro de Investigación

Biomédica en Red de Epidemiología y Salud Pública (CIBERESP).

Biomédica en Red de Epidemiología y Salud Pública (CIBERESP).











xx

xx

TABLE OF CONTENTS

TABLE OF CONTENTS Page

Page

ACKNOWLEDGEMENTS (Agradecimientos – Agraïments)

v

ACKNOWLEDGEMENTS (Agradecimientos – Agraïments)

v

SUMMARY

ix

SUMMARY

ix

RESUMEN

xiii

RESUMEN

xiii

PREFACE

xvii

PREFACE

xvii

PRÓLOGO

xix

PRÓLOGO

xix

1. INTRODUCTION

1

1. INTRODUCTION

1

1.1. Statement of the problem

1


1

1.2. From work disability to health-related work functioning

2

1.2. From work disability to health-related work functioning

2

1.3. General overview of work outcome measurement tools

7


7

1.4. Methodological quality in health-questionnaire validation

11


11

2. OBJECTIVES

23

2. OBJECTIVES

23

2.1. Study I Objectives

23


23

2.2. Study II Objectives

23


23

2.3. Study III Objectives

23

2.3. Study III Objectives

23

2.4. Study IV Objectives

23

2.4. Study IV Objectives

23

xxi

xxi

Page 3. PAPER # 1

25

Page 3. PAPER # 1

Ramada JM, Serra C, Delclós GL. Adaptación cultural y validación de

Ramada JM, Serra C, Delclós GL. Adaptación cultural y validación de

cuestionarios de salud: revisión y recomendaciones metodológicas.

cuestionarios de salud: revisión y recomendaciones metodológicas.

Salud Publica Mex. 2013;55:57-66.

Salud Publica Mex. 2013;55:57-66.

4. PAPER # 2

37

4. PAPER # 2

Ramada JM, Serra C, Amick III BC, Castaño JR, Delclos GL. Cross-

Ramada JM, Serra C, Amick III BC, Castaño JR, Delclos GL. Cross-

cultural adaptation of the work role functioning questionnaire to Spanish

cultural adaptation of the work role functioning questionnaire to Spanish

spoken in Spain. J Occup Rehabil. 2013;23:566-75.

spoken in Spain. J Occup Rehabil. 2013;23:566-75.

5. PAPER # 3

49

5. PAPER # 3

Ramada JM, Serra C, Amick III BC, Abma FI, Castaño JR, Pidemunt G,

Ramada JM, Serra C, Amick III BC, Abma FI, Castaño JR, Pidemunt G,

Bültmann U, Delclos GL. Reliability and validity of the Work Role

Bültmann U, Delclos GL. Reliability and validity of the Work Role

Functioning Questionnaire (Spanish version). [Submitted for peer-

Functioning Questionnaire (Spanish version). [Submitted for peer-

review].

review].

PAPER # 4

99

PAPER # 4

Ramada JM, Delclos GL, Amick III BC, Abma FI, Castaño JR, Pidemunt

Ramada JM, Delclos GL, Amick III BC, Abma FI, Castaño JR, Pidemunt

G, Bültmann, Serra C.Responsiveness of the Work Role Functioning

G, Bültmann, Serra C.Responsiveness of the Work Role Functioning

Questionnaire (Spanish version). J Occup Environ Med. [In Press

Questionnaire (Spanish version). J Occup Environ Med. [In Press

2013].

2013].

25

37

49

99

6. GENERAL DISCUSSION

135

6. GENERAL DISCUSSION

135

6.1. The concept of health-related work functioning.

135

6.1. The concept of health-related work functioning.

135

6.2. Selection of an instrument to measure health-related work functioning.

139

6.2. Selection of an instrument to measure health-related work functioning.

139

xxii

xxii

xxii

xxii

6.3. Cross-cultural adaptation and validation process (reliability, validity and responsiveness).

6.3. Cross-cultural adaptation and validation process (reliability, validity and 140

6.4. Standards to be used for methodological quality in health-questionnaire validation.

responsiveness).

140

6.4. Standards to be used for methodological quality in health-questionnaire 144

validation.

144

6.5. Implications for research and practice.

146

6.5. Implications for research and practice.

146

6.6. Future research.

147

6.6. Future research.

147

7. GENERAL CONCLUSIONS

157

7. GENERAL CONCLUSIONS

157

8. APPENDICES

158

8. APPENDICES

158

Appendix I: WRFQ (English version)

158

Appendix I: WRFQ (English version)

158

Appendix II: WRFQ (Spanish version)

161

Appendix II: WRFQ (Spanish version)

161

Appendix III: Single items of the WAI

165

Appendix III: Single items of the WAI

165

Appendix IV: Global perceived effect question (GPE-Q)

167

Appendix IV: Global perceived effect question (GPE-Q)

167

Appendix V: Clinical Research Ethical Committee approval

169

Appendix V: Clinical Research Ethical Committee approval

169

Appendix VI: Informed consent

171

Appendix VI: Informed consent

171

Appendix VII: Poster Primera Jornada Científica CiSAL (2011)

173

Appendix VII: Poster Primera Jornada Científica CiSAL (2011)

173

Appendix VIII: Poster WDPI, Groningen, The Netherlands (2012)

177

Appendix VIII: Poster WDPI, Groningen, The Netherlands (2012)

177

Appendix iX: Poster Tercera Jornada Científica Cisal (2013)

181

Appendix iX: Poster Tercera Jornada Científica Cisal (2013)

181

xxiii

xxiii

184

184

1. INTRODUCTION

1. INTRODUCTION



Health and work form an indivisible duality in which mutual influence is permanent.

Health and work form an indivisible duality in which mutual influence is permanent.

The World Health Organization (WHO) defines health as "a state of complete

The World Health Organization (WHO) defines health as "a state of complete

physical, mental and social well-being" and not merely the absence of disease.

physical, mental and social well-being" and not merely the absence of disease.

This definition is part of the Declaration of Principles of the WHO since its founding

This definition is part of the Declaration of Principles of the WHO since its founding

in 1948 (1).

in 1948 (1).

Work is a health determinant and there is an increasing body of evidence showing

Work is a health determinant and there is an increasing body of evidence showing

that work has positive health effects when working conditions are reasonably

that work has positive health effects when working conditions are reasonably

acceptable (2,3). Decent work sums up the aspirations of people in their working

acceptable (2,3). Decent work sums up the aspirations of people in their working

lives. It involves opportunities for productive work, delivers a fair income, security

lives. It involves opportunities for productive work, delivers a fair income, security

in the workplace and social protection for families, opportunities for personal

in the workplace and social protection for families, opportunities for personal

development and social integration, freedom for people to express their concerns,

development and social integration, freedom for people to express their concerns,

organize and participate in the decisions that affect their lives and equality of

organize and participate in the decisions that affect their lives and equality of

opportunity and treatment for all women and men. A community or a country

opportunity and treatment for all women and men. A community or a country

improves population health status when everyone who is able to work can get a

improves population health status when everyone who is able to work can get a

decent job (4).

decent job (4).

Increased life expectancy and prolongation of retirement age are increasing the

Increased life expectancy and prolongation of retirement age are increasing the

overall age of the workforce, and might result in an increasing number of

overall age of the workforce, and might result in an increasing number of

employees working with chronic diseases (5-7). Interventions to keep these

employees working with chronic diseases (5-7). Interventions to keep these

workers in the labor market and promote work participation are being increasingly

workers in the labor market and promote work participation are being increasingly

developed to support a sustainable, active, and productive work life (7,8).

developed to support a sustainable, active, and productive work life (7,8).

Furthermore, rehabilitation programs and interventions to adapt or accommodate

Furthermore, rehabilitation programs and interventions to adapt or accommodate

working conditions to the workers' health and skills are becoming more frequent,

working conditions to the workers' health and skills are becoming more frequent,

with the goal of achieving a safe return to work after a period of sick leave.

with the goal of achieving a safe return to work after a period of sick leave.

1

1

The effectiveness of these rehabilitation programs and interventions has usually

The effectiveness of these rehabilitation programs and interventions has usually

been assessed using outcome measures such as work status (active, temporary

been assessed using outcome measures such as work status (active, temporary

or permanent disability), time to return to work, duration of functional disability and

or permanent disability), time to return to work, duration of functional disability and

costs of incapacity to work (8-11). These outcomes have been useful but are

costs of incapacity to work (8-11). These outcomes have been useful but are

limited, as they mainly assess whether workers are present or absent from their

limited, as they mainly assess whether workers are present or absent from their

jobs. They do not offer information about the worker's participation in the job or the

jobs. They do not offer information about the worker's participation in the job or the

degree to which the worker is able to respond to the job's demands (12,13).

degree to which the worker is able to respond to the job's demands (12,13).



at work along their professional life course, and the existing continuum between

at work along their professional life course, and the existing continuum between

working successfully at one extreme and work absence at the other (14).

working successfully at one extreme and work absence at the other (14).

Outcome measures able to describe the extent to which workers increase or

Outcome measures able to describe the extent to which workers increase or

decrease their ability to meet job demands and to fully assess rehabilitation

decrease their ability to meet job demands and to fully assess rehabilitation

programs and intervention effectiveness are needed in Spanish-speaking

programs and intervention effectiveness are needed in Spanish-speaking

occupational health settings, yet there is a lack of quality validated instruments in

occupational health settings, yet there is a lack of quality validated instruments in

Spanish for this purpose. Thus, the rationale for this thesis is to provide an

Spanish for this purpose. Thus, the rationale for this thesis is to provide an

evidence base for an instrument to evaluate health-related work functioning, and

evidence base for an instrument to evaluate health-related work functioning, and

make it available to Spanish-speaking occupational health professionals and

make it available to Spanish-speaking occupational health professionals and

researchers for use in daily practice and research.

researchers for use in daily practice and research.

1.2.

1.2.

From work disability to health-related work functioning

From work disability to health-related work functioning

Disability can be described as the environmentally determined effect of an

Disability can be described as the environmentally determined effect of an

impairment that, in interaction with other factors and within a specific social

impairment that, in interaction with other factors and within a specific social

context, is likely to cause an individual to experience an undue disadvantage in his

context, is likely to cause an individual to experience an undue disadvantage in his

or her personal, social or professional life (15). Work disability could be defined as

or her personal, social or professional life (15). Work disability could be defined as

the effect of an illness or an accident in the ability of a person to perform a

the effect of an illness or an accident in the ability of a person to perform a

particular work activity.

particular work activity.

Disability is not an absolute attribute of an individual; rather, it is a social construct.

Disability is not an absolute attribute of an individual; rather, it is a social construct.

A person who is blind, or deaf, or needs a wheelchair to move can be completely

A person who is blind, or deaf, or needs a wheelchair to move can be completely

2

2

2

2

dependent in one setting, but fully autonomous and functional in a different one.

dependent in one setting, but fully autonomous and functional in a different one.

Thus, the effect of impairment will always be referred to a given environment, and

Thus, the effect of impairment will always be referred to a given environment, and

if we restrict disability to the functional effects of this impairment, regardless of the

if we restrict disability to the functional effects of this impairment, regardless of the

environment, we put the burden of the problem and the responsibility to find a

environment, we put the burden of the problem and the responsibility to find a

solution on the individual.

solution on the individual.

From a social perspective, work disability should be understood as a manageable

From a social perspective, work disability should be understood as a manageable

situation, where different stake holders (workers, employers, human resource

situation, where different stake holders (workers, employers, human resource

managers, supervisors, unions and occupational health professionals) should be

managers, supervisors, unions and occupational health professionals) should be

involved to respond to an individual’s needs so that he/she can function

involved to respond to an individual’s needs so that he/she can function

successfully at work. Disability is, therefore, a social rather than a medical issue

successfully at work. Disability is, therefore, a social rather than a medical issue

and from this perspective it is easier to understand that positive action towards

and from this perspective it is easier to understand that positive action towards

integration and job participation is required, rather than merely passive measures

integration and job participation is required, rather than merely passive measures

to provide income support (15).

to provide income support (15).

Once tucked into this paradigm, it is possible to analyze from a broader

Once tucked into this paradigm, it is possible to analyze from a broader

perspective the economic and social impact of removing barriers for integration of

perspective the economic and social impact of removing barriers for integration of

individuals with disabilities. Imaginative and economically viable solutions

individuals with disabilities. Imaginative and economically viable solutions

addressing a wider range of interventions may arise from this, varying from

addressing a wider range of interventions may arise from this, varying from

improving the workers’ skills (through training and rehabilitation programs), to

improving the workers’ skills (through training and rehabilitation programs), to

facilitating accommodation in suitable workplaces or intervening to adapt the

facilitating accommodation in suitable workplaces or intervening to adapt the

workplace and/or working conditions to the specific needs of these individuals.

workplace and/or working conditions to the specific needs of these individuals.

A significant number of research teams and occupational health services are

A significant number of research teams and occupational health services are

increasingly designing and implementing rehabilitation and/or accommodation

increasingly designing and implementing rehabilitation and/or accommodation

programs to adapt working conditions to worker skills and health to support an

programs to adapt working conditions to worker skills and health to support an

active working life (7,8,16,17). To fully assess intervention effectiveness requires

active working life (7,8,16,17). To fully assess intervention effectiveness requires

outcome measures that describe the extent to which people increase their ability

outcome measures that describe the extent to which people increase their ability

to meet the demands of the job.

to meet the demands of the job.

3

3

Health-related work functioning is a comprehensive concept that incorporates the

Health-related work functioning is a comprehensive concept that incorporates the

previously described paradigm shift, and can be defined as the ability of a worker

previously described paradigm shift, and can be defined as the ability of a worker

to meet work demands for a given physical and emotional health status (18).

to meet work demands for a given physical and emotional health status (18).

Theoretically, working conditions and demands are modifiable and health is a

Theoretically, working conditions and demands are modifiable and health is a

dynamic concept that can change over a lifetime. Hence, health-related work

dynamic concept that can change over a lifetime. Hence, health-related work

functioning constitutes a continuum rather than a dichotomy, with “working

functioning constitutes a continuum rather than a dichotomy, with “working

successfully” at one end and “work absence” at the other. Measuring the results of

successfully” at one end and “work absence” at the other. Measuring the results of

the impact of health on work in terms of "present” versus “absent" is not enough to

the impact of health on work in terms of "present” versus “absent" is not enough to

understand what happens along this continuum (19). Based on the individual’s

understand what happens along this continuum (19). Based on the individual’s

work performance and on-the-job productivity (Figure 1), and especially in the

work performance and on-the-job productivity (Figure 1), and especially in the

current European socio-economic context, it constitutes a phenomenon of great

current European socio-economic context, it constitutes a phenomenon of great

interest in occupational health care settings and research.

interest in occupational health care settings and research.

The rationale for this thesis arises from the need for quality validated

The rationale for this thesis arises from the need for quality validated

measurement instruments to assess health-related work functioning in Spanish-

measurement instruments to assess health-related work functioning in Spanish-

speaking settings. This will serve to enhance the evaluation of rehabilitation,

speaking settings. This will serve to enhance the evaluation of rehabilitation,

accommodation or adaptation programs. The emphasis is on the ability of the

accommodation or adaptation programs. The emphasis is on the ability of the

instrument to measure the worker's participation, and not only whether workers are

instrument to measure the worker's participation, and not only whether workers are

present or absent from their jobs.

present or absent from their jobs.

4

4

4

4

5 5

Health Status

Work Demands

WORK FUNCTIONING

Work Absence

Exhausting Oneself

Societal Context

Labour Market Context

Organizational Context (Workplace System)

Occupational Health Care

WORKER

Productive & Healthy

Working Successfully

WORK FUNCTIONING

Working Healthy

Participation

Business Productivity

NONDISCRIMINATION

CONFIDENTIALITY

RESPECT

VOLUNTARINESS

Health Status

Work Demands

WORK FUNCTIONING

Work Absence

Exhausting Oneself

Productive & Healthy

Working Successfully

WORK FUNCTIONING

Societal Context



Occupational Health Care

WORKER

Human Resources Management



Societal Context

Working Healthy

Participation

Business Productivity

NONDISCRIMINATION

CONFIDENTIALITY

Figure 1. Conceptual frame of Health-Related Work Functioning based on Amick, Gimeno (18) and Abma (19) and ethical use of the questionnaire.

RESPECT

VOLUNTARINESS

Human Resources Management



Societal Context

Figure 1. Conceptual frame of Health-Related Work Functioning based on Amick, Gimeno (18) and Abma (19) and ethical use of the questionnaire.

5

5

184

184



When reviewing the literature on work outcome measures it is possible to find

When reviewing the literature on work outcome measures it is possible to find

different approaches to work outcome measurement and, in general, it is possible

different approaches to work outcome measurement and, in general, it is possible

to retrieve four groups of work outcome measures (12). Several assess the labor

to retrieve four groups of work outcome measures (12). Several assess the labor

force status (mainly time to return to work and duration of functional disability).

force status (mainly time to return to work and duration of functional disability).

Another group assesses the economic impact of work outcomes (especially lost

Another group assesses the economic impact of work outcomes (especially lost

time from work and self-reported effectiveness in performing the job). A third set of

time from work and self-reported effectiveness in performing the job). A third set of

measures assesses the impact of health on role functioning (mixing work-role with

measures assesses the impact of health on role functioning (mixing work-role with

other roles). And finally, there is a group of work-role specific functioning

other roles). And finally, there is a group of work-role specific functioning

measurement instruments that measure health-related functioning at work.

measurement instruments that measure health-related functioning at work.

Several studies and reviews have analyzed both strengths and weaknesses of

Several studies and reviews have analyzed both strengths and weaknesses of

each group of measurement tools (12,18,20-26).

each group of measurement tools (12,18,20-26).

Focusing on the instruments that measure our phenomenon of interest (health-

Focusing on the instruments that measure our phenomenon of interest (health-

related work functioning), a number of health and/or job specific work functioning

related work functioning), a number of health and/or job specific work functioning

measurement instruments together with other generic instruments have been

measurement instruments together with other generic instruments have been

developed. The most relevant are shown in Table 1.

developed. The most relevant are shown in Table 1.

When measuring health-related work functioning in research and practice,

When measuring health-related work functioning in research and practice,

evidence-based decisions should be made about which instrument to use. Evans

evidence-based decisions should be made about which instrument to use. Evans

recommends considering three areas when choosing a questionnaire: the

recommends considering three areas when choosing a questionnaire: the

psychometric properties of the instrument, administration complexity, and the

psychometric properties of the instrument, administration complexity, and the

setting of the evaluation (27). Firstly, it is essential to know the purpose for use of

setting of the evaluation (27). Firstly, it is essential to know the purpose for use of

the instrument (in medicine, for example, it could be for diagnosis, evaluation or

the instrument (in medicine, for example, it could be for diagnosis, evaluation or

prediction) (28). Then, depending on this, it is necessary to find out whether the

prediction) (28). Then, depending on this, it is necessary to find out whether the

measurement properties of the instrument have been assessed with quality

measurement properties of the instrument have been assessed with quality

methodology.

methodology.

If the instrument is going to be applied for diagnostic or prognostic purposes, such

If the instrument is going to be applied for diagnostic or prognostic purposes, such

as to estimate work functioning status or to distinguish between different courses

as to estimate work functioning status or to distinguish between different courses

7

7

(or outcomes) of work functioning, evidence of its discriminative ability should be

(or outcomes) of work functioning, evidence of its discriminative ability should be

provided; in this case, parameters of reliability are very important (including those

provided; in this case, parameters of reliability are very important (including those

of measurement error). But if the aim is to apply the instrument to evaluate

of measurement error). But if the aim is to apply the instrument to evaluate

interventions or to monitor work functioning in individuals, the instrument needs to

interventions or to monitor work functioning in individuals, the instrument needs to

provide evidence of its ability to detect (true) changes over time; in this case,

provide evidence of its ability to detect (true) changes over time; in this case,

parameters of responsiveness (on top of measurement error) are crucial (28).

parameters of responsiveness (on top of measurement error) are crucial (28).

It is also necessary to know in which language or culture the questionnaire was

It is also necessary to know in which language or culture the questionnaire was

originally developed. If the intention is to use it in a different language, then it is

originally developed. If the intention is to use it in a different language, then it is

necessary to determine whether the process of cross-cultural adaptation and

necessary to determine whether the process of cross-cultural adaptation and

validation in the target language employed quality evidence-based methods.

validation in the target language employed quality evidence-based methods.

In the 2000s a series of specific work-role functioning questionnaires were

In the 2000s a series of specific work-role functioning questionnaires were

developed; among them, the Work Limitations Questionnaire (WLQ) and the Work

developed; among them, the Work Limitations Questionnaire (WLQ) and the Work

Role Functioning Questionnaire (WRFQ) (12,29) where developed as generic

Role Functioning Questionnaire (WRFQ) (12,29) where developed as generic

instruments to measure work functioning. These instruments provide an overall

instruments to measure work functioning. These instruments provide an overall

work functioning score, but also allow an estimation of work functioning in relation

work functioning score, but also allow an estimation of work functioning in relation

to each domain of work demands (work scheduling, output, physical, mental and

to each domain of work demands (work scheduling, output, physical, mental and

social demands).

social demands).

The WRFQ measures perceived difficulties to perform the job due to health


problems. As mentioned above, it is a generic instrument conceptually developed

problems. As mentioned above, it is a generic instrument conceptually developed

to represent a wide range of health conditions and work demands and is freely

to represent a wide range of health conditions and work demands and is freely

available in the literature for professionals and researchers. The questionnaire has

available in the literature for professionals and researchers. The questionnaire has

undergone various levels of validity and reliability testing and has displayed

undergone various levels of validity and reliability testing and has displayed

relevant levels of reliability and content, construct and criterion validity. Numerous

relevant levels of reliability and content, construct and criterion validity. Numerous

studies have demonstrated the usefulness of this tool in English-speaking health

studies have demonstrated the usefulness of this tool in English-speaking health

care environments (30-32) and it has been successfully translated, adapted and

care environments (30-32) and it has been successfully translated, adapted and

validated in Canadian French (33), Brazilian Portuguese (34) and Dutch

validated in Canadian French (33), Brazilian Portuguese (34) and Dutch

(14,19,35). No such version exists in Spanish.

(14,19,35). No such version exists in Spanish.

8

8

8

8

Type Reference Generic WF instrument. Single global rating. (36) Generic WF instrument. Single global rating. (37) Generic WF instrument. Single global rating. (38) Specific WF instrument for lost productive time. (39) Generic WF instrument. Overall and subscales rating. (40) Specific WF instrument for daily follow-up. (41) Specific WFfor quantity and quality of work. (42) Specific WF for rheumatic conditions. (43) Specific WF for angina pectoris. (44) Specific WF or arthritic population. (45) Specific WF for clinically depressed population. (46) Specific WF for nurses with common mental disorders (47) Generic WF instrument. Overall rating. (48) Generic WF instrument. Overall rating. (49) Generic WF instrument. Overall and subscales rating. (29) Generic WF instrument. Overall and subscales rating. (12)

Acronym HPQ WPAI WPSI WHI HLQ HRPQ-D QQ HAQ − WALS LEAPS NWFQ EWPS SPS WLQ WRFQ

Name of the Instrument Health and Work Performance Questionnaire Work Productivity and Activity Impairment Questionnaire Work Productivity Short Inventory Work and Health Interview Health and Labor Questionnaire Health Related Productivity Questionnaire Dairy Quantity and Quality Instrument Health Assessment Questionnaire Angina-related Limitations at Work Questionnaire Workplace Activity Limitations Scale Lam Employment Absence and Productivity Scale Nurses Work Functioning Questionnaire Endicott Work Productivity Scale Standford Presenteeism Scale Work Limitations Questionnaire Work Role Functioning Questionnaire

Type Reference Generic WF instrument. Single global rating. (36) Generic WF instrument. Single global rating. (37) Generic WF instrument. Single global rating. (38) Specific WF instrument for lost productive time. (39) Generic WF instrument. Overall and subscales rating. (40) Specific WF instrument for daily follow-up. (41) Specific WFfor quantity and quality of work. (42) Specific WF for rheumatic conditions. (43) Specific WF for angina pectoris. (44) Specific WF or arthritic population. (45) Specific WF for clinically depressed population. (46) Specific WF for nurses with common mental disorders (47) Generic WF instrument. Overall rating. (48) Generic WF instrument. Overall rating. (49) Generic WF instrument. Overall and subscales rating. (29) Generic WF instrument. Overall and subscales rating. (12)

Table 1. Specific and generic work functioning measurement instruments.

Name of the Instrument Health and Work Performance Questionnaire Work Productivity and Activity Impairment Questionnaire Work Productivity Short Inventory Work and Health Interview Health and Labor Questionnaire Health Related Productivity Questionnaire Dairy Quantity and Quality Instrument Health Assessment Questionnaire Angina-related Limitations at Work Questionnaire Workplace Activity Limitations Scale Lam Employment Absence and Productivity Scale Nurses Work Functioning Questionnaire Endicott Work Productivity Scale Standford Presenteeism Scale Work Limitations Questionnaire Work Role Functioning Questionnaire

Table 1. Specific and generic work functioning measurement instruments. Acronym HPQ WPAI WPSI WHI HLQ HRPQ-D QQ HAQ − WALS LEAPS NWFQ EWPS SPS WLQ WRFQ

9 9 9

9

184

184



Since measurement is at the core of occupational health research and practice,

Since measurement is at the core of occupational health research and practice,

access to quality measurement instruments is essential. Ensuring that it is well-

access to quality measurement instruments is essential. Ensuring that it is well-

designed and its content appropriate to measuring what it claims to measure

designed and its content appropriate to measuring what it claims to measure

should not be underestimated. In absolute terms, valid instruments do not exist.

should not be underestimated. In absolute terms, valid instruments do not exist.

Validating a measuring instrument is a process, sometimes complex, in which a

Validating a measuring instrument is a process, sometimes complex, in which a

base of evidence has to be constructed to support that the instrument meets a

base of evidence has to be constructed to support that the instrument meets a

number of measurement properties. When quality evidence is provided about the

number of measurement properties. When quality evidence is provided about the

presence or absence of these properties, it is possible to assign a degree of

presence or absence of these properties, it is possible to assign a degree of

quality to the instrument for a specific purpose. Hence, the methodology used to

quality to the instrument for a specific purpose. Hence, the methodology used to

carry out a validation process becomes the most important determinant to accept

carry out a validation process becomes the most important determinant to accept

or reject the quality of a measurement instrument.

or reject the quality of a measurement instrument.

This process becomes more challenging when using a measurement instrument

This process becomes more challenging when using a measurement instrument

developed in a particular language or culture with the aim of using it in a different

developed in a particular language or culture with the aim of using it in a different

one. In these cases, a simple (direct) translation of the questionnaire could be

one. In these cases, a simple (direct) translation of the questionnaire could be

unreliable, because misinterpretation could appear due to language and cultural

unreliable, because misinterpretation could appear due to language and cultural

differences in the perception of work, health and/or disease. In these

differences in the perception of work, health and/or disease. In these

circumstances, it is necessary to perform a cross-cultural validation, following a

circumstances, it is necessary to perform a cross-cultural validation, following a

systematic procedure. For several authors the cross-cultural validation is part of

systematic procedure. For several authors the cross-cultural validation is part of

the construct validation and should be assessed to guarantee the validity of the

the construct validation and should be assessed to guarantee the validity of the

instrument (28,50-52).

instrument (28,50-52).

There are several approaches in the literature to address the validation process of

There are several approaches in the literature to address the validation process of

a measuring instrument. Some approaches come from internationally renowned

a measuring instrument. Some approaches come from internationally renowned

experts in the design and validation methodology of questionnaires (28,53-59).

experts in the design and validation methodology of questionnaires (28,53-59).

Others come from different research groups that have achieved international

Others come from different research groups that have achieved international

standards. Among the latter, the following stand out: the consensus-based

standards. Among the latter, the following stand out: the consensus-based

standards on terminology and recommendations to assess the methodological

standards on terminology and recommendations to assess the methodological

quality of studies on measurement properties of health status measurements

quality of studies on measurement properties of health status measurements

11

11

instruments (COSMIN) (50-52);the standardized methodology for evaluating the

instruments (COSMIN) (50-52);the standardized methodology for evaluating the

measurement of patient-reported outcomes (EMPRO) to assist the choice of

measurement of patient-reported outcomes (EMPRO) to assist the choice of

instruments (60); the methodology of the Health Technology Assessment

instruments (60); the methodology of the Health Technology Assessment

Programme (HTA Programme) to evaluate patient-based outcome measures for

Programme (HTA Programme) to evaluate patient-based outcome measures for

use in clinical trials (61) and the criteria proposed by the Scientific Advisory

use in clinical trials (61) and the criteria proposed by the Scientific Advisory

Committee of the Medical Outcomes Trust (62).

Committee of the Medical Outcomes Trust (62).

To state that a questionnaire has been validated, it is necessary to provide

To state that a questionnaire has been validated, it is necessary to provide

evidence about certain features: 1) whether an instrument measures what it

evidence about certain features: 1) whether an instrument measures what it

purports to measure, 2) how it reflects the theory underlying the phenomenon

purports to measure, 2) how it reflects the theory underlying the phenomenon

being measured, 3) the degree to which the scores are an adequate reflection of

being measured, 3) the degree to which the scores are an adequate reflection of

a gold standard, 4) the extent to which the scores of the instrument are consistent

a gold standard, 4) the extent to which the scores of the instrument are consistent

with stated hypotheses,5) the degree of simplicity, feasibility and acceptability to

with stated hypotheses,5) the degree of simplicity, feasibility and acceptability to

patients, users and researchers, 4) the ability to measure free from error and,

patients, users and researchers, 4) the ability to measure free from error and,

therefore, ability to provide reproducible results when applied to individuals who

therefore, ability to provide reproducible results when applied to individuals who

have not changed over time, and 5) the sensitivity to detecting true changes over

have not changed over time, and 5) the sensitivity to detecting true changes over

time. All these features are related to three properties of the questionnaires:

time. All these features are related to three properties of the questionnaires:

validity, reliability and responsiveness.

validity, reliability and responsiveness.

However, the terminology found in the literature can be confusing for several

However, the terminology found in the literature can be confusing for several

reasons. First, there are differences in terms used as synonyms for measurement

reasons. First, there are differences in terms used as synonyms for measurement

properties (e.g. reliability, repeatability, stability, reproducibility and precision are

properties (e.g. reliability, repeatability, stability, reproducibility and precision are

used interchangeably). Second, there are different definitions given to the same

used interchangeably). Second, there are different definitions given to the same

concept (e.g. different authors give different definitions for responsiveness). Third,

concept (e.g. different authors give different definitions for responsiveness). Third,

different research groups evaluate different properties and characteristics of the

different research groups evaluate different properties and characteristics of the

instruments when assessing their quality (e.g. evaluation of appropriateness,

instruments when assessing their quality (e.g. evaluation of appropriateness,

interpretability, acceptability or feasibility are recommended in some guides but not

interpretability, acceptability or feasibility are recommended in some guides but not

others). Fourth, there is a wide variety of classifications of measurement properties

others). Fourth, there is a wide variety of classifications of measurement properties

depending on authors and research groups (e.g. some authors, but not all,

depending on authors and research groups (e.g. some authors, but not all,

consider evaluating the cross-cultural adaptation as a part of construct validity;

consider evaluating the cross-cultural adaptation as a part of construct validity;

12

12

12

12

some consider responsiveness to be an aspect of validity, and also that face

some consider responsiveness to be an aspect of validity, and also that face

validity is an aspect of content validity).

validity is an aspect of content validity).

In this thesis a comprehensive review of the literature was conducted to

In this thesis a comprehensive review of the literature was conducted to

systematize the steps involved in validating a health questionnaire, the Work Role

systematize the steps involved in validating a health questionnaire, the Work Role

Functioning

Functioning

Questionnaire

(WRFQ),

following

the

methodological

Questionnaire

(WRFQ),

following

the

methodological

recommendations which found greater consensus. Next, the requirements for

recommendations which found greater consensus. Next, the requirements for

conducting a quality cross-cultural adaptation of health questionnaires were

conducting a quality cross-cultural adaptation of health questionnaires were

defined in detail and the properties evaluated were based on those most

defined in detail and the properties evaluated were based on those most

frequently recommended by experts and consensus groups, and then applied to

frequently recommended by experts and consensus groups, and then applied to

this questionnaire.

this questionnaire.

13

13

REFERENCES

REFERENCES

1. WHO: Constitution of the World Health Organization [Internet]. Geneva: World

1. WHO: Constitution of the World Health Organization [Internet]. Geneva: World

health Organization; 1948-2013. International Health Conference, 1948; [cited

health Organization; 1948-2013. International Health Conference, 1948; [cited

2013

2013

November

19];

Available

from:

http://apps.who.int/gb/bd/PDF/bd47/SP/constitucion-sp.pdf

November

19];

Available

from:

http://apps.who.int/gb/bd/PDF/bd47/SP/constitucion-sp.pdf

2. Wadell G, Burton T, Aylward M. Work and common health problems. J Insur

2. Wadell G, Burton T, Aylward M. Work and common health problems. J Insur

Med. 2007;39:109-20.

Med. 2007;39:109-20.

3. Butterworth P, Leach LS, Strazdins L, Olesen SC, Rodgers B, Broom DH. The

3. Butterworth P, Leach LS, Strazdins L, Olesen SC, Rodgers B, Broom DH. The

psychosocial quality of work determines whether employment has benefits for

psychosocial quality of work determines whether employment has benefits for

mental health: results from a longitudinal national household panel survey. J

mental health: results from a longitudinal national household panel survey. J

Occup Environ Med. 2011;68:806-12.

Occup Environ Med. 2011;68:806-12.

4. ILO: Promoting jobs, protecting people [Internet]. Geneva: International Labor

4. ILO: Promoting jobs, protecting people [Internet]. Geneva: International Labor

Organization; 1996-2013. Decent Work; [cited 2013 July 19]; [about 2 screens];

Organization; 1996-2013. Decent Work; [cited 2013 July 19]; [about 2 screens];

Available from: http://www.ilo.org/global/topics/decent-work/lang--es/index.htm

Available from: http://www.ilo.org/global/topics/decent-work/lang--es/index.htm

5. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169-71.

5. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169-71.

6. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers

6. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers

employment: the case for delaying the retirement age. J Eur Economic Assoc.

employment: the case for delaying the retirement age. J Eur Economic Assoc.

2010;8:1034-76.

2010;8:1034-76.

7. Macdonald EB, Sanati KA. Occupational health services now and in the future:

7. Macdonald EB, Sanati KA. Occupational health services now and in the future:

the need for a paradigm shift. J Occup Environ Med. 2010;52:1273-7.

the need for a paradigm shift. J Occup Environ Med. 2010;52:1273-7.

8. Sampere M, Gimeno D, Serra C, Plana M, Martínez JM, Delclos GL, Benavides

8. Sampere M, Gimeno D, Serra C, Plana M, Martínez JM, Delclos GL, Benavides

FG. Organizational return to work support and sick leave duration: a cohort of

FG. Organizational return to work support and sick leave duration: a cohort of

Spanish workers with a long-term non-work-related sick leave episode. J Occup

Spanish workers with a long-term non-work-related sick leave episode. J Occup

Environ Med. 2011;53:674-9.

Environ Med. 2011;53:674-9.

14

14

14

14

9. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to

9. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to

return employees to work following long-term sickness absence due to

return employees to work following long-term sickness absence due to

musculoskeletal disorders. J Public Health (Oxf).2012;34:115-24.

musculoskeletal disorders. J Public Health (Oxf).2012;34:115-24.

10. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic

10. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic

evaluation on the cost-effectiveness of employability interventions among work

evaluation on the cost-effectiveness of employability interventions among work

disabled employees or employees at risk of work disability: the CASE-study.

disabled employees or employees at risk of work disability: the CASE-study.

BMC Public Health. 2012;18:12:43.

BMC Public Health. 2012;18:12:43.

11. Arends I, Bültmann U, van Rhenen W, Groen H, van der Klink JJ. Economic

11. Arends I, Bültmann U, van Rhenen W, Groen H, van der Klink JJ. Economic

evaluation of a problem solving intervention to prevent recurrent sickness

evaluation of a problem solving intervention to prevent recurrent sickness

absence in workers with common mental disorders. PLoS One. 2013;8:e71937.

absence in workers with common mental disorders. PLoS One. 2013;8:e71937.

12. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-

12. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-

related work outcome measures and their uses and recommended measures.

related work outcome measures and their uses and recommended measures.

Spine. 2000;25:3152-60.

Spine. 2000;25:3152-60.

13. Baldwin ML, Johnson WG, Butler RJ. The error of using returns-to-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632-41.

13. Baldwin ML, Johnson WG, Butler RJ. The error of using returns-to-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632-41.

14. Abma FI, van der Klink JJ, Bültmann U. The work role functioning questionnaire

14. Abma FI, van der Klink JJ, Bültmann U. The work role functioning questionnaire

2.0 (Dutch version): examination of its reliability, validity and responsiveness in

2.0 (Dutch version): examination of its reliability, validity and responsiveness in

the general working population. J Occup Rehabil. 2013;23:135-47.

the general working population. J Occup Rehabil. 2013;23:135-47.

15. ILO Encyclopaedia of Occupational health and Safety [Internet]. Part III.

15. ILO Encyclopaedia of Occupational health and Safety [Internet]. Part III.

Management & Policy. Chapter 17: Disability and work [cited October 2013];

Management & Policy. Chapter 17: Disability and work [cited October 2013];

Available from: http://www.ilo.org/oshenc/part-iii/disability-and-work/item/170-

Available from: http://www.ilo.org/oshenc/part-iii/disability-and-work/item/170-

disability-concepts-and-definitions

disability-concepts-and-definitions

16. Arends I, Bruinvels DJ, Rebergen DS, Nieuwenhuijsen K, Madan I, Neumeyer-

16. Arends I, Bruinvels DJ, Rebergen DS, Nieuwenhuijsen K, Madan I, Neumeyer-

Gromen A et al. Interventions to facilitate return to work in adults with

Gromen A et al. Interventions to facilitate return to work in adults with

adjustment disorders. Cochrane Database Syst Rev. 2012, Dec 12;12.

adjustment disorders. Cochrane Database Syst Rev. 2012, Dec 12;12.

15

15

17. Arends I, van der Klink JJ, Bültmann U. Prevention of recurrent sickness

17. Arends I, van der Klink JJ, Bültmann U. Prevention of recurrent sickness

absence among employees with common mental disorders: design of a cluster-

absence among employees with common mental disorders: design of a cluster-

randomized controlled trial with cost-benefit and effectiveness evaluation. BMC

randomized controlled trial with cost-benefit and effectiveness evaluation. BMC

Public Health. 2010;10:132.

Public Health. 2010;10:132.

18. Amick BC III, Gimeno D. Measuring work outcomes with a focus on health-

18. Amick BC III, Gimeno D. Measuring work outcomes with a focus on health-

related work productivity loss. In: Wittink H & Carr D Editors. Evidence,

related work productivity loss. In: Wittink H & Carr D Editors. Evidence,

Outcomes & Quality of Life in Pain Treatment: A Handbook for Pain Treatment

Outcomes & Quality of Life in Pain Treatment: A Handbook for Pain Treatment

Professionals. London, UK: Elsevier, 2007. pp. 329-343.

Professionals. London, UK: Elsevier, 2007. pp. 329-343.

19. Abma FI. Work functioning: development and evaluation of a measurement tool

19. Abma FI. Work functioning: development and evaluation of a measurement tool

[PhD thesis]. Groningen, NL: University of Groningen; 2012. [Internet]. Available

[PhD thesis]. Groningen, NL: University of Groningen; 2012. [Internet]. Available

from: http://irs.ub.rug.nl/ppn/351176438

from: http://irs.ub.rug.nl/ppn/351176438

20. Lofland, J. H., Pizzi, L., & Frick, K. D. A review of health-related workplace

20. Lofland, J. H., Pizzi, L., & Frick, K. D. A review of health-related workplace

productivity loss instruments. Pharmacoeconomics. 2004;22:165-84.

productivity loss instruments. Pharmacoeconomics. 2004;22:165-84.

21. Prasad, M., Wahlqvist, P., Shikiar, R., & Shih, Y. T. A review of self-report

21. Prasad, M., Wahlqvist, P., Shikiar, R., & Shih, Y. T. A review of self-report

instruments measuring health-related work productivity. Pharmacoeconomics.

instruments measuring health-related work productivity. Pharmacoeconomics.

2004;22:225-44.

2004;22:225-44.

22. Ozminkowski, R. J., Goetzel, R. Z., Chang, S., & Long, S. The application of

22. Ozminkowski, R. J., Goetzel, R. Z., Chang, S., & Long, S. The application of

two health and productivity instruments at a large employer. J Occup Environ

two health and productivity instruments at a large employer. J Occup Environ

Med. 2004;46:635-48.

Med. 2004;46:635-48.

23. Williams RM, Schmuck G, Allwood S, Sanchez M, Shea R, Wark G.

23. Williams RM, Schmuck G, Allwood S, Sanchez M, Shea R, Wark G.

Psychometric evaluation of health-related work outcome measures for

Psychometric evaluation of health-related work outcome measures for

musculoskeletal disorders: a systematic review. J Occup Rehabil. 2007;17:504-

musculoskeletal disorders: a systematic review. J Occup Rehabil. 2007;17:504-

21.

21.

24. Beaton DE, Tang K, Gignac MA, Lacaille D, Badley EM, Anis AH et al.

24. Beaton DE, Tang K, Gignac MA, Lacaille D, Badley EM, Anis AH et al.

Reliability, validity, and responsiveness of five at-work productivity measures in

Reliability, validity, and responsiveness of five at-work productivity measures in

16

16

16

16

patients with rheumatoid arthritis or osteoarthritis. Arthritis Care Res (Hoboken).

patients with rheumatoid arthritis or osteoarthritis. Arthritis Care Res (Hoboken).

2010;62:28-37.

2010;62:28-37.

25. Nieuwenhuijsen K, Franche RL, van Dijk FJ. Work functioning measurement:

25. Nieuwenhuijsen K, Franche RL, van Dijk FJ. Work functioning measurement:

tools for occupational mental health research. J Occup Environ Med.

tools for occupational mental health research. J Occup Environ Med.

2010;52:778-90.

2010;52:778-90.

26. Abma FI, van der Klink JJ, Terwee CB, Amick BC 3rd, Bültmann U.Evaluation

26. Abma FI, van der Klink JJ, Terwee CB, Amick BC 3rd, Bültmann U.Evaluation

of the measurement properties of self-reported health-related work-functioning

of the measurement properties of self-reported health-related work-functioning

instruments among workers with common mental disorders.Scand J Work

instruments among workers with common mental disorders.Scand J Work

Environ Health. 2012;38:5-18.

Environ Health. 2012;38:5-18.

27. Evans CJ. Health and work productivity assessment: state of the art or state of flux? J Occup Environ Med. 2004;46(6 Suppl):S3-11.

27. Evans CJ. Health and work productivity assessment: state of the art or state of flux? J Occup Environ Med. 2004;46(6 Suppl):S3-11.

28. de Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: A practical guide. 1st ed. Cambridge, UK: The University Press Cambridge, 2011. 29. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72-85.

28. de Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: A practical guide. 1st ed. Cambridge, UK: The University Press Cambridge, 2011. 29. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72-85.

30. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-

30. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-

related work limitations among employed persons in the United States. Disabil

related work limitations among employed persons in the United States. Disabil

Rehabil. 2000;22:225-32.

Rehabil. 2000;22:225-32.

31. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity

31. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity

student-role functioning scale using item response theory. Work. 2002;19:105-

student-role functioning scale using item response theory. Work. 2002;19:105-

16.

16.

32. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et

32. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et

al. Validity and responsiveness of presenteeism scales in chronic work-related

al. Validity and responsiveness of presenteeism scales in chronic work-related

upper-extremity disorders. Phys Ther. 2011;91:254-66.

upper-extremity disorders. Phys Ther. 2011;91:254-66.

17

17

33. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-

33. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-

cultural adaptation of the work role functioning questionnaire in Canadian

cultural adaptation of the work role functioning questionnaire in Canadian

French.Int J Rehabil 2004;27:261-8.

French.Int J Rehabil 2004;27:261-8.

34. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability,

34. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability,

and validity of the work role functioning questionnaire to Brazilian Portuguese. J

and validity of the work role functioning questionnaire to Brazilian Portuguese. J

Occup Rehabil 2007;17:701-11.

Occup Rehabil 2007;17:701-11.

35. Abma FI, Amick III BC, Brouwer S, van der Klink JJ, Bültmann U. The cross-

35. Abma FI, Amick III BC, Brouwer S, van der Klink JJ, Bültmann U. The cross-

cultural adaptation of the work role functioning questionnaire to Dutch. Work.

cultural adaptation of the work role functioning questionnaire to Dutch. Work.

2012;43:203-10.

2012;43:203-10.

36. Kessler R, Barber C, Beck A, Berglund P, Cleary PD, McKenas D et al. The

36. Kessler R, Barber C, Beck A, Berglund P, Cleary PD, McKenas D et al. The

World Health Organization health and work performance questionnaire (HPQ). J

World Health Organization health and work performance questionnaire (HPQ). J

Occup Environ Med. 2003;45;156-74.

Occup Environ Med. 2003;45;156-74.

37. Reilly MC, Zbrozek AS, Dukes EM. The validity and reproducibility of a work productivity

and

activity

impairment

instrument.

37. Reilly MC, Zbrozek AS, Dukes EM. The validity and reproducibility of a work

Pharmacoeconomics.

productivity

1993;4:353-65.

and

activity

impairment

instrument.

Pharmacoeconomics.

1993;4:353-65.

38. Goetzel RZ, Ozminkowski RJ, Long, SR. Development and reliability analysis of

38. Goetzel RZ, Ozminkowski RJ, Long, SR. Development and reliability analysis of

the Work Productivity Short Inventory (WPSI) instrument measuring employee

the Work Productivity Short Inventory (WPSI) instrument measuring employee

health and productivity. J Occup Environ Med. 2003;45:743-62.

health and productivity. J Occup Environ Med. 2003;45:743-62.

39. Stewart WF, Ricci JA, Leotta C, Chee E. Validation of the work and health

39. Stewart WF, Ricci JA, Leotta C, Chee E. Validation of the work and health

interview. Pharmacoeconomics. 2004;22:1127-40.

interview. Pharmacoeconomics. 2004;22:1127-40.

40. van Roijen L, Essink-Bot ML, Koopmanschap MA, Bonsel G, Rutten FF. Labor

40. van Roijen L, Essink-Bot ML, Koopmanschap MA, Bonsel G, Rutten FF. Labor

and health status in economic evaluation of health care. The Health and Labor

and health status in economic evaluation of health care. The Health and Labor

Questionnaire. Int J Technol Assess Health Care. 1996;12:405-15.

Questionnaire. Int J Technol Assess Health Care. 1996;12:405-15.

41. Kumar RN, Hass SL, Li JZ, Nickens DJ, Daenzer CL, Wathen LK. Validation of

41. Kumar RN, Hass SL, Li JZ, Nickens DJ, Daenzer CL, Wathen LK. Validation of

the Health-Related Productivity Questionnaire Diary (HRPQ-D) on a sample of

the Health-Related Productivity Questionnaire Diary (HRPQ-D) on a sample of

18

18

18

18

patients with infectious mononucleosis: results from a phase 1 multicenter

patients with infectious mononucleosis: results from a phase 1 multicenter

clinical trial. J Occup Environ Med. 2003;45:899-907.

clinical trial. J Occup Environ Med. 2003;45:899-907.

42. Meerding WJ, IJzelenberg W, Koopmanschap MA, Severens JL, Burdorf A.

42. Meerding WJ, IJzelenberg W, Koopmanschap MA, Severens JL, Burdorf A.

Health problems lead to considerable productivity loss at work among workers

Health problems lead to considerable productivity loss at work among workers

with high physical load jobs. J Clin Epidemiol. 2005;58:517-23.

with high physical load jobs. J Clin Epidemiol. 2005;58:517-23.

43. Wolfe F, Michaud K, Pincus T. Development and validation of the health

43. Wolfe F, Michaud K, Pincus T. Development and validation of the health

assessment questionnaire II: a revised version of the health assessment

assessment questionnaire II: a revised version of the health assessment

questionnaire. Arthritis Rheum. 2004;50:3296-305.

questionnaire. Arthritis Rheum. 2004;50:3296-305.

44. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH, Gomes DR, Salem DN. The Angina-related Limitations at Work Questionnaire. Qual Life Res. 1998;7:23-32.

44. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH, Gomes DR, Salem DN. The Angina-related Limitations at Work Questionnaire. Qual Life Res. 1998;7:23-32.

45. Gignac MA, Badley EM, Lacaille D, Cott CC, Adam P, Anis AH. Managing

45. Gignac MA, Badley EM, Lacaille D, Cott CC, Adam P, Anis AH. Managing

arthritis and employment: making arthritis-related work changes as a means of

arthritis and employment: making arthritis-related work changes as a means of

adaptation. Arthritis Rheum. 2004;51:909-16.

adaptation. Arthritis Rheum. 2004;51:909-16.

46. Lam RW, Michalak EE, Yatham LN. A new clinical rating scale for work

46. Lam RW, Michalak EE, Yatham LN. A new clinical rating scale for work

absence and productivity: validation in patients with major depressive disorder.

absence and productivity: validation in patients with major depressive disorder.

BMC Psychiatry. 2009;9:78.

BMC Psychiatry. 2009;9:78.

47. Gärtner FR, Nieuwenhuijsen K, van Dijk FJ, Sluiter JK. Psychometric properties

47. Gärtner FR, Nieuwenhuijsen K, van Dijk FJ, Sluiter JK. Psychometric properties

of the Nurses Work Functioning Questionnaire (NWFQ). PLoS One.

of the Nurses Work Functioning Questionnaire (NWFQ). PLoS One.

2011;6:e26565.

2011;6:e26565.

48. Endicott J, Nee J. Endicott Work Productivity Scale (EWPS): a new measure to

48. Endicott J, Nee J. Endicott Work Productivity Scale (EWPS): a new measure to

assess treatment effects. Endicott Work Productivity Scale (EWPS): a new

assess treatment effects. Endicott Work Productivity Scale (EWPS): a new

measure to assess treatment effects. Psychopharmacol Bull. 1997;33:13-6.

measure to assess treatment effects. Psychopharmacol Bull. 1997;33:13-6.

49. Koopman C, Pelletier KR, Murray JF, Sharda CE, Berger ML, Turpin RS, et al.

49. Koopman C, Pelletier KR, Murray JF, Sharda CE, Berger ML, Turpin RS, et al.

Stanford presenteeism scale: health status and employee productivity. J Occup

Stanford presenteeism scale: health status and employee productivity. J Occup

Environ Med. 2002;44:14-20.

Environ Med. 2002;44:14-20.

19

19

50. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The


COSMIN study reached international consensus on taxonomy, terminology, and

COSMIN study reached international consensus on taxonomy, terminology, and

definitions of measurement properties for health-related patient-reported

definitions of measurement properties for health-related patient-reported

outcomes. J Clin Epidemiol. 2010;63:737-45.

outcomes. J Clin Epidemiol. 2010;63:737-45.

51. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The

51. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The

COSMIN checklist for evaluating the methodological quality of studies on

COSMIN checklist for evaluating the methodological quality of studies on

measurement properties: A clarification of its content. BMC Med Res Methodol.

measurement properties: A clarification of its content. BMC Med Res Methodol.

2010;10:22.

2010;10:22.



COSMIN checklist for assessing the methodological quality of studies on

COSMIN checklist for assessing the methodological quality of studies on

measurement properties of health status measurement instruments: an

measurement properties of health status measurement instruments: an

international Delphi study. Qual Life Res. 2010;19:539-49.

international Delphi study. Qual Life Res. 2010;19:539-49.

53. Guillemin F. Cross-cultural adaptation and validation of health status measures.

53. Guillemin F. Cross-cultural adaptation and validation of health status measures.

Scand J Rheumatol.1995;24:61-63.

Scand J Rheumatol.1995;24:61-63.

54. Beaton DE, Bombardier C, Guillemin F, Bosi Ferraz M. Guidelines for the

54. Beaton DE, Bombardier C, Guillemin F, Bosi Ferraz M. Guidelines for the

process of cross-cultural adaptation of self-reports measures. Spine. 2000;

process of cross-cultural adaptation of self-reports measures. Spine. 2000;

25:3186-3191.

25:3186-3191.

55. Aday LA, Cornelius LJ. Designing and conducting health surveys: a

55. Aday LA, Cornelius LJ. Designing and conducting health surveys: a

comprehensive guide. 3rd ed. San Francisco, CA: Jossey-Bass publisher; 2006.

comprehensive guide. 3rd ed. San Francisco, CA: Jossey-Bass publisher; 2006.

56. Streiner DL, Norman GR. Health measurement scales: a practical guide to their

56. Streiner DL, Norman GR. Health measurement scales: a practical guide to their

development and use. 4thed. New York: Oxford University Press Inc.; 2008.

development and use. 4thed. New York: Oxford University Press Inc.; 2008.

57. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación

57. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación

de cuestionarios. Reumatol Clin. 2009;5:171-77.

de cuestionarios. Reumatol Clin. 2009;5:171-77.

58. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J

58. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J

Psychosom Res. 2010;68:319-23.

20

Psychosom Res. 2010;68:319-23.

20

20

20

59. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al.

59. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al.

Quality criteria were proposed for measurement properties of health status

Quality criteria were proposed for measurement properties of health status

questionnaires. J Clin Epidemiol. 2007;60:34-42.

questionnaires. J Clin Epidemiol. 2007;60:34-42.

60. Valderas JM, Ferrer M, Mendívil J, Garin O, Rajmil L, Herdman M, et al.

60. Valderas JM, Ferrer M, Mendívil J, Garin O, Rajmil L, Herdman M, et al.

Development of EMPRO: a tool for the standardized assessment of patient-

Development of EMPRO: a tool for the standardized assessment of patient-

reported outcome measures. Value Health. 2008;11:700-8.

reported outcome measures. Value Health. 2008;11:700-8.

61. Fitzpatrick R, Davey C, Buxton MJ, Jones DR. Evaluating patient-based

61. Fitzpatrick R, Davey C, Buxton MJ, Jones DR. Evaluating patient-based

outcome measures for use in clinical trials. Health Technol Assessment.

outcome measures for use in clinical trials. Health Technol Assessment.

1998;2:1-74.

1998;2:1-74.

62. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, et al.

62. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, et al.

Assessing health status and quality-of-life instruments: attributes and review

Assessing health status and quality-of-life instruments: attributes and review

criteria. Qual Life Res. 2002;11:193-205.

criteria. Qual Life Res. 2002;11:193-205.

21

21

184

184

2. OBJECTIVES

2. OBJECTIVES



To review the literature on the methodology for cross-cultural adaptation and

To review the literature on the methodology for cross-cultural adaptation and

validation

validation

(CCAV)

of

health

questionnaires

and

to

synthesize

(CCAV)

of

health

questionnaires

and

to

synthesize

recommendations based on the scientific literature to facilitate this process.

recommendations based on the scientific literature to facilitate this process.

To

To

evaluate

the

degree

of

compliance

with

the

methodological

evaluate

the

degree

of

compliance

with

the

methodological

recommendations for the CCAV of health questionnaires in a selection of

recommendations for the CCAV of health questionnaires in a selection of

Spanish-language scientific journals.

Spanish-language scientific journals.



To translate and adapt the Work Role Functioning Questionnaire to Spanish

To translate and adapt the Work Role Functioning Questionnaire to Spanish

spoken in Spain.

spoken in Spain.

To perform a preliminary evaluation of the Spanish version of the Work Role

To perform a preliminary evaluation of the Spanish version of the Work Role

Functioning Questionnaire psychometric properties by means of a pre-test.

Functioning Questionnaire psychometric properties by means of a pre-test.

2.3. Study III Objective

2.3. Study III Objective

To examine the reliability and validity of the Spanish version of the Work Role

To examine the reliability and validity of the Spanish version of the Work Role

Functioning Questionnaire in a Spanish-speaking general working population.


2.4. Study IV Objective

2.4. Study IV Objective

To examine the responsiveness of the Spanish version of the Work Role

To examine the responsiveness of the Spanish version of the Work Role



23

23

184

184

3. PAPER # 1

3. PAPER # 1

Adaptación cultural y validación de cuestionarios de

Adaptación cultural y validación de cuestionarios de

salud: revisión y recomendaciones metodológicas. Salud

salud: revisión y recomendaciones metodológicas. Salud

Pública de México. 2013; 55:57-66.

Pública de México. 2013; 55:57-66.

25

25

184

184

Adaptación y validación de cuestionarios


Artículo de revisión


Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas

Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas

José María Ramada-Rodilla, MD, MOH,(1,2) Consol Serra-Pujadas, MD, PhD,(1,2,3) George L Delclós-Clanchet, MD, MPH, PhD.(2,3,4)

José María Ramada-Rodilla, MD, MOH,(1,2) Consol Serra-Pujadas, MD, PhD,(1,2,3) George L Delclós-Clanchet, MD, MPH, PhD.(2,3,4)

Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. Salud Publica Mex 2013;55:57-66.

Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Adaptación cultural y validación de cuestionarios de salud: revisión y recomendaciones metodológicas. Salud Publica Mex 2013;55:57-66.

Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Salud Publica Mex 2013;55:57-66.

Ramada-Rodilla JM, Serra-Pujadas C, Delclós-Clanchet GL. Cross-cultural adaptation and health questionnaires validation: revision and methodological recommendations. Salud Publica Mex 2013;55:57-66.

Resumen La traducción simple de un cuestionario puede dar lugar a interpretaciones erróneas debido a diferencias culturales y de lenguaje. Cuando se utilicen cuestionarios desarrollados en otros países e idiomas en estudios científicos, además de traducirlos, es necesaria su adaptación cultural y validación. El objetivo de este trabajo es revisar la literatura sobre la traducción, adaptación cultural y validación (TACV) de cuestionarios de salud, y sintetizar y proponer recomendaciones basadas en la literatura científica que faciliten este proceso. La TACV debe seguir un proceso sistematizado, por lo que se recomiendan dos etapas: a) adaptación cultural: traducción directa, síntesis, traducción inversa, consolidación por comité de expertos y pre-test, y b) validación (con hasta siete pasos): evaluación de la consistencia interna, fiabilidad intra e interobservador, validez lógica, de contenido, criterio y constructo. La falta de equivalencia de los cuestionarios limita las posibilidades de comparación entre poblaciones con idiomas o culturas diferentes y el intercambio de información en la comunidad científica.

Abstract The simple translation of a questionnaire may lead to misinterpretation due to language and cultural differences. When using questionnaires developed in other countries and languages in scientific studies it is necessary, besides the translation, to carry out a cross-cultural adaptation and validation. Our objective was to review the literature on cross-cultural adaptation and validation (CCAV) of health questionnaires, and to synthesize and propose recommendations based on the scientific literature to facilitate this process. The CCAV should follow a systematic process. Two steps are recommended: 1) cross-cultural adaptation: direct translation, synthesis, back translation, expert committee consolidation and pre-testing, and 2) validation (with up to seven steps): assessment of internal consistency, reliability, intra- and interobserver reliability, face, content, criterion and construct validity. Lack of equivalence between questionnaires limits the comparability of results among populations with different cultures and languages and the exchange of information in the scientific community.

Resumen La traducción simple de un cuestionario puede dar lugar a interpretaciones erróneas debido a diferencias culturales y de lenguaje. Cuando se utilicen cuestionarios desarrollados en otros países e idiomas en estudios científicos, además de traducirlos, es necesaria su adaptación cultural y validación. El objetivo de este trabajo es revisar la literatura sobre la traducción, adaptación cultural y validación (TACV) de cuestionarios de salud, y sintetizar y proponer recomendaciones basadas en la literatura científica que faciliten este proceso. La TACV debe seguir un proceso sistematizado, por lo que se recomiendan dos etapas: a) adaptación cultural: traducción directa, síntesis, traducción inversa, consolidación por comité de expertos y pre-test, y b) validación (con hasta siete pasos): evaluación de la consistencia interna, fiabilidad intra e interobservador, validez lógica, de contenido, criterio y constructo. La falta de equivalencia de los cuestionarios limita las posibilidades de comparación entre poblaciones con idiomas o culturas diferentes y el intercambio de información en la comunidad científica.

Abstract The simple translation of a questionnaire may lead to misinterpretation due to language and cultural differences. When using questionnaires developed in other countries and languages in scientific studies it is necessary, besides the translation, to carry out a cross-cultural adaptation and validation. Our objective was to review the literature on cross-cultural adaptation and validation (CCAV) of health questionnaires, and to synthesize and propose recommendations based on the scientific literature to facilitate this process. The CCAV should follow a systematic process. Two steps are recommended: 1) cross-cultural adaptation: direct translation, synthesis, back translation, expert committee consolidation and pre-testing, and 2) validation (with up to seven steps): assessment of internal consistency, reliability, intra- and interobserver reliability, face, content, criterion and construct validity. Lack of equivalence between questionnaires limits the comparability of results among populations with different cultures and languages and the exchange of information in the scientific community.

Palabras clave: cuestionarios; escalas; encuestas de salud; comparación transcultural; estudios de validación; confiabilidad y validez

Key words: questionnaires; scales; health survey; cross-cultural comparison; validation studies; reliability and validity

Palabras clave: cuestionarios; escalas; encuestas de salud; comparación transcultural; estudios de validación; confiabilidad y validez

Key words: questionnaires; scales; health survey; cross-cultural comparison; validation studies; reliability and validity

(1) (2) (3) (4)

Servicio de Salud Laboral, Parc de Salut MAR. Barcelona, España. Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra. Barcelona, España. CIBER de Epidemiología y Salud Pública (CIBERESP). Barcelona, España. Epidemiology, Human Genetics and Environmental Sciences Division, The University of Texas School of Public Health. Houston, Texas, EUA.

(1) (2) (3) (4)

Fecha de recibido: 2 de enero de 2012 • Fecha de aceptado: 21 de septiembre de 2012 Autor de correspondencia: José Ma. Ramada Rodilla. Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. Dr. Aiguader, 88, 08003-Barcelona, España. Correo electrónico: [email protected] salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013

Servicio de Salud Laboral, Parc de Salut MAR. Barcelona, España. Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra. Barcelona, España. CIBER de Epidemiología y Salud Pública (CIBERESP). Barcelona, España. Epidemiology, Human Genetics and Environmental Sciences Division, The University of Texas School of Public Health. Houston, Texas, EUA.

Fecha de recibido: 2 de enero de 2012 • Fecha de aceptado: 21 de septiembre de 2012 Autor de correspondencia: José Ma. Ramada Rodilla. Centro de Investigación en Salud Laboral, Universidad Pompeu Fabra. Dr. Aiguader, 88, 08003-Barcelona, España. Correo electrónico: [email protected] 57

27

salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013

57

27


V

aldría la pena imaginar a un investigador que se encuentra aplicando un cuestionario británico a una muestra de peatones alemanes. En el cuestionario se pregunta sobre la costumbre de “mirar a la derecha” antes de cruzar una vía de doble sentido de circulación. Es probable que se detecte una carencia en la formación vial de los peatones alemanes, ya que éstos no miran a la derecha cuando cruzan. Sin embargo, este resultado estará más bien relacionado con una inadecuada adaptación cultural del cuestionario porque en Alemania se circula por la derecha y, por tanto, “se mira a la izquierda” antes de cruzar. La traducción simple de un cuestionario puede conducir a una interpretación errónea debido a diferencias culturales y de lenguaje. Si el proceso de traducción, adaptación cultural y validación (TACV) no se realiza correctamente pueden producirse errores de índole diversa dependiendo del objetivo del cuestionario. Una inadecuada TACV de cuestionarios como el Goldberg (GHQ), 1 el Nordic Occupational Skin Questionnaire (NOSQ),2 el Test de Control de Asma (ACT)3 o del Michigan Alcohol Screening Test (MAST),4 provocarían errores de clasificación en el despistaje de pacientes con trastornos ansioso-depresivos, dermatosis profesionales, asma o alcoholismo. Deficiencias en la TACV de cuestionarios como el Work Ability Index (WAI),5 o el Work Role Functioning Questionnaire (WRFQ),6 podrían dar lugar a errores en la evaluación del grado de capacidad para el trabajo, afectando la orientación de las medidas preventivas. Una TACV poco sistemática de cuestionarios para la vigilancia epidemiológica de enfermedades y exposiciones, como el Cuestionario de Detección Epidemiológica para Artritis Reumatoide,7,8 el Cuestionario Nórdico Estandarizado para la Detección de Síntomas Músculoesqueléticos en Salud Ocupacional,9 o el Cuestionario para la Detección Integrada de Obesidad, Diabetes e Hipertensión Arterial de la Secretaría de Salud de México,10podría llegar a inducir el diseño y puesta en marcha de políticas públicas inadecuadas. La TACV es necesaria incluso cuando se desea aplicar un cuestionario en países distintos que hablan un mismo idioma. En ocasiones se asume que la adaptación cultural a un idioma diferente garantiza las propiedades psicométricas del cuestionario. Esto no siempre es así. Por ejemplo, las diferencias en cómo se realiza la actividad laboral en los países pueden modificar la validez de un cuestionario de aplicación en salud laboral.11-13 La necesidad de intercambiar experiencias y llevar a cabo comparaciones entre poblaciones y países distintos precisa de versiones lingüísticas adecuadamente adaptadas y validadas de los instrumentos de medida.14,15

58

28

Ramada-Rodilla JM y col.

El grado de cumplimiento de los pasos metodológicos que se recomiendan en la literatura internacional para la realización de la TACV es bajo. Para deobjetivar esta afirmación, se recuperaron todos los artículos, sin límite temporal ni de idioma, publicados en cinco de las revistas de epidemiología y salud pública con mayor factor de impacto en América Latina y en España -Revista Panamericana de Salud Pública, Revista de Saúde Pública, Salud Pública de México, Gaceta Sanitaria y Revista Española de Salud Pública-, usando los términos MeSH: cuestionarios, escalas, encuestas de salud, comparación transcultural, estudios de validación, confiabilidad y validez. Se incluyeron aquellos artículos cuyo objetivo fue la TACV de un cuestionario a un idioma diferente del original. Se excluyeron aquellos que perseguían el diseño y validación de un cuestionario o bien la validación del mismo, partiendo de un cuestionario cuya adaptación cultural había sido publicada en un estudio anterior. Se obtuvieron en total 32 artículos que se analizaron en su versión completa. De ellos, 25% siguió menos de la mitad de los pasos recomendados; 72% siguió menos de 80% de dichos pasos, y sólo 6% de los artículos siguió la totalidad de éstos (cuadro I). No se ha identificado ninguna revisión en la literatura que integre y sistematice todo el proceso de TACV, por lo que el objetivo de este trabajo fue revisar y sintetizar la literatura proponiendo recomendaciones que faciliten el proceso de TACV para su aplicación en cuestionarios de salud.

Material y métodos Se realizó una revisión bibliográfica exhaustiva para localizar la información disponible sobre la metodología de la TACV de cuestionarios de salud. La búsqueda bibliográfica se inició con la revisión de varios libros y monografías especializadas en metodología para el diseño, adaptación y validación de cuestionarios publicados entre 1996 y 2007.11,16-21 A partir de las citas bibliográficas de dichas publicaciones, se recuperaron diversos artículos sobre la TACV de cuestionarios de salud y sus aspectos metodológicos, que estuvieran publicados en inglés, francés, italiano, español y portugués. Se seleccionaron las palabras clave que agrupaban un mayor número de términos y se contrastaron con el tesauro de Medline, identificando los términos (MeSH terms): 1) “health survey”; 2) “health questionnaire”; 3) “scale”; 4) “cross cultural adaptation”; 5) “validation”; 6) “validity”, y 7) reliability”. Con la combinación de estos términos se realizó la búsqueda en Medline, de tal manera que se obtuvieron 214 citas.



V

aldría la pena imaginar a un investigador que se encuentra aplicando un cuestionario británico a una muestra de peatones alemanes. En el cuestionario se pregunta sobre la costumbre de “mirar a la derecha” antes de cruzar una vía de doble sentido de circulación. Es probable que se detecte una carencia en la formación vial de los peatones alemanes, ya que éstos no miran a la derecha cuando cruzan. Sin embargo, este resultado estará más bien relacionado con una inadecuada adaptación cultural del cuestionario porque en Alemania se circula por la derecha y, por tanto, “se mira a la izquierda” antes de cruzar. La traducción simple de un cuestionario puede conducir a una interpretación errónea debido a diferencias culturales y de lenguaje. Si el proceso de traducción, adaptación cultural y validación (TACV) no se realiza correctamente pueden producirse errores de índole diversa dependiendo del objetivo del cuestionario. Una inadecuada TACV de cuestionarios como el Goldberg (GHQ), 1 el Nordic Occupational Skin Questionnaire (NOSQ),2 el Test de Control de Asma (ACT)3 o del Michigan Alcohol Screening Test (MAST),4 provocarían errores de clasificación en el despistaje de pacientes con trastornos ansioso-depresivos, dermatosis profesionales, asma o alcoholismo. Deficiencias en la TACV de cuestionarios como el Work Ability Index (WAI),5 o el Work Role Functioning Questionnaire (WRFQ),6 podrían dar lugar a errores en la evaluación del grado de capacidad para el trabajo, afectando la orientación de las medidas preventivas. Una TACV poco sistemática de cuestionarios para la vigilancia epidemiológica de enfermedades y exposiciones, como el Cuestionario de Detección Epidemiológica para Artritis Reumatoide,7,8 el Cuestionario Nórdico Estandarizado para la Detección de Síntomas Músculoesqueléticos en Salud Ocupacional,9 o el Cuestionario para la Detección Integrada de Obesidad, Diabetes e Hipertensión Arterial de la Secretaría de Salud de México,10podría llegar a inducir el diseño y puesta en marcha de políticas públicas inadecuadas. La TACV es necesaria incluso cuando se desea aplicar un cuestionario en países distintos que hablan un mismo idioma. En ocasiones se asume que la adaptación cultural a un idioma diferente garantiza las propiedades psicométricas del cuestionario. Esto no siempre es así. Por ejemplo, las diferencias en cómo se realiza la actividad laboral en los países pueden modificar la validez de un cuestionario de aplicación en salud laboral.11-13 La necesidad de intercambiar experiencias y llevar a cabo comparaciones entre poblaciones y países distintos precisa de versiones lingüísticas adecuadamente adaptadas y validadas de los instrumentos de medida.14,15

58

28


El grado de cumplimiento de los pasos metodológicos que se recomiendan en la literatura internacional para la realización de la TACV es bajo. Para deobjetivar esta afirmación, se recuperaron todos los artículos, sin límite temporal ni de idioma, publicados en cinco de las revistas de epidemiología y salud pública con mayor factor de impacto en América Latina y en España -Revista Panamericana de Salud Pública, Revista de Saúde Pública, Salud Pública de México, Gaceta Sanitaria y Revista Española de Salud Pública-, usando los términos MeSH: cuestionarios, escalas, encuestas de salud, comparación transcultural, estudios de validación, confiabilidad y validez. Se incluyeron aquellos artículos cuyo objetivo fue la TACV de un cuestionario a un idioma diferente del original. Se excluyeron aquellos que perseguían el diseño y validación de un cuestionario o bien la validación del mismo, partiendo de un cuestionario cuya adaptación cultural había sido publicada en un estudio anterior. Se obtuvieron en total 32 artículos que se analizaron en su versión completa. De ellos, 25% siguió menos de la mitad de los pasos recomendados; 72% siguió menos de 80% de dichos pasos, y sólo 6% de los artículos siguió la totalidad de éstos (cuadro I). No se ha identificado ninguna revisión en la literatura que integre y sistematice todo el proceso de TACV, por lo que el objetivo de este trabajo fue revisar y sintetizar la literatura proponiendo recomendaciones que faciliten el proceso de TACV para su aplicación en cuestionarios de salud.

Material y métodos Se realizó una revisión bibliográfica exhaustiva para localizar la información disponible sobre la metodología de la TACV de cuestionarios de salud. La búsqueda bibliográfica se inició con la revisión de varios libros y monografías especializadas en metodología para el diseño, adaptación y validación de cuestionarios publicados entre 1996 y 2007.11,16-21 A partir de las citas bibliográficas de dichas publicaciones, se recuperaron diversos artículos sobre la TACV de cuestionarios de salud y sus aspectos metodológicos, que estuvieran publicados en inglés, francés, italiano, español y portugués. Se seleccionaron las palabras clave que agrupaban un mayor número de términos y se contrastaron con el tesauro de Medline, identificando los términos (MeSH terms): 1) “health survey”; 2) “health questionnaire”; 3) “scale”; 4) “cross cultural adaptation”; 5) “validation”; 6) “validity”, y 7) reliability”. Con la combinación de estos términos se realizó la búsqueda en Medline, de tal manera que se obtuvieron 214 citas.


Cuadro I

salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013 No Sí No No Sí No No Sí Sí No No Sí No Sí Sí Sí Sí Sí Sí No No No Sí No No Sí No No No Sí Sí Sí

Validez constructo

59

29 No Sí No No Sí No No Sí Sí No No Sí No Sí Sí Sí Sí Sí Sí No No No Sí No No Sí No No No Sí Sí Sí

Validez constructo



TACV: Traducción, adaptación cultural y validación. GS: Gaceta Sanitaria . RESP: Revista Española de Salud Pública. SPM: Salud Pública de México RDSP: Revista de Saúde Pública. RPSP: Revista Panamericana de Salud Pública . Sí: Paso realizado. No: Paso no realizado. NP: No procede

Artículo Revista Adaptación cultural Validación Fiabilidad Validez Traducción Síntesis Traducción Comité Pre-test Consistencia test- Fiabilidad Validez Validez Validez directa traducciones inversa expertos interna retest interobservador aparente contenido criterio Mas Pons 1998 RESP Sí Sí Sí Sí Sí No No NP No No No López-Alvarenga 2001 SPM Sí No No No No No Sí NP No No Sí Amaral-Pinheiro 2002 RDSP Sí No No No Sí No No NP No Sí Sí Serra-Sutton 2002 RESP Sí Sí Sí Sí Sí No No NP No No NP López-Vázquez 2004 SPM Sí No No No No Sí No No No No NP Guimaraes de Mello 2004 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No No Melgar-Quiñonez 2005 SPM NP NP NP Sí Sí No No NP Sí Sí Sí Avanci 2005 RDSP Sí No Sí Sí No Sí Sí NP Sí Sí No Aymerich 2005 GS Sí Sí Sí Sí Sí Sí No NP Sí Sí No Torres 2005 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No Majdalani 2005 RPSP NP NP NP Sí Sí Sí Sí Sí Sí Sí Sí Rodriguez da Silva 2005 RDSP Sí No Sí Sí Sí Sí Sí No Sí No Sí López-Carmona 2006 SPM NP NP NP Sí No Sí Sí NP No No Sí Carpio 2006 RPSP Sí No No No No No No Sí Sí Sí Sí Álvarez 2006 SPM NP NP NP No Sí Sí No NP No No Sí Reichenheim 2007 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí Esteva 2007 GS Sí No Sí Sí Sí Sí Sí No Sí Sí No Pinto Guedes 2007 RDSP Sí Sí Sí Sí Sí No Sí NP Sí Sí NP Peña de León 2007 RPSP Sí Sí Sí Sí Sí Sí No NP No No Sí Remor 2007 RDSP Sí Sí Sí No No Sí No NP No No Sí Aguirre Jaime 2008 RESP NP NP NP Sí No Sí No No Sí Sí Sí González-Block 2008 SPM Sí No No No No No No No Sí Sí Sí Pedro Gómez 2009 RESP Sí No Sí Sí No Sí No NP No No NP Zurbarán 2009 RPSP Sí Sí Sí Sí Sí Sí No No No No Sí Martínez-Gómez 2009 RESP Sí No Sí Sí Sí Sí Sí NP Sí Sí Sí ShiNohara 2010 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No Sí Silva 2010 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No de Souza-Machado 2010 RDSP Sí No No No Sí NP Sí NP No No No Garrido-Urrutia 2010 RESP Sí Sí No No No Sí Sí NP Sí Sí NP Gutiérrez Sánchez 2011 RESP Sí No Sí No No Sí No NP No No NP Amaral Saliba 2011 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí de Barrios Leite 2011 RDSP Sí Sí Sí Sí Sí No Sí NP No Sí No


Cumplimiento de los pasos metodológicos para la TACV de cuestionarios publicados en las revistas GS, RESP, SPM, RDSP, y RPSP, sin límite temporal ni de idioma, hasta el 1 de noviembre de 2011. Barcelona, España, noviembre 2011

Cuadro I

TACV: Traducción, adaptación cultural y validación. GS: Gaceta Sanitaria . RESP: Revista Española de Salud Pública. SPM: Salud Pública de México RDSP: Revista de Saúde Pública. RPSP: Revista Panamericana de Salud Pública . Sí: Paso realizado. No: Paso no realizado. NP: No procede

Artículo Revista Adaptación cultural Validación Fiabilidad Validez Traducción Síntesis Traducción Comité Pre-test Consistencia test- Fiabilidad Validez Validez Validez directa traducciones inversa expertos interna retest interobservador aparente contenido criterio Mas Pons 1998 RESP Sí Sí Sí Sí Sí No No NP No No No López-Alvarenga 2001 SPM Sí No No No No No Sí NP No No Sí Amaral-Pinheiro 2002 RDSP Sí No No No Sí No No NP No Sí Sí Serra-Sutton 2002 RESP Sí Sí Sí Sí Sí No No NP No No NP López-Vázquez 2004 SPM Sí No No No No Sí No No No No NP Guimaraes de Mello 2004 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No No Melgar-Quiñonez 2005 SPM NP NP NP Sí Sí No No NP Sí Sí Sí Avanci 2005 RDSP Sí No Sí Sí No Sí Sí NP Sí Sí No Aymerich 2005 GS Sí Sí Sí Sí Sí Sí No NP Sí Sí No Torres 2005 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No Majdalani 2005 RPSP NP NP NP Sí Sí Sí Sí Sí Sí Sí Sí Rodriguez da Silva 2005 RDSP Sí No Sí Sí Sí Sí Sí No Sí No Sí López-Carmona 2006 SPM NP NP NP Sí No Sí Sí NP No No Sí Carpio 2006 RPSP Sí No No No No No No Sí Sí Sí Sí Álvarez 2006 SPM NP NP NP No Sí Sí No NP No No Sí Reichenheim 2007 RDSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí Esteva 2007 GS Sí No Sí Sí Sí Sí Sí No Sí Sí No Pinto Guedes 2007 RDSP Sí Sí Sí Sí Sí No Sí NP Sí Sí NP Peña de León 2007 RPSP Sí Sí Sí Sí Sí Sí No NP No No Sí Remor 2007 RDSP Sí Sí Sí No No Sí No NP No No Sí Aguirre Jaime 2008 RESP NP NP NP Sí No Sí No No Sí Sí Sí González-Block 2008 SPM Sí No No No No No No No Sí Sí Sí Pedro Gómez 2009 RESP Sí No Sí Sí No Sí No NP No No NP Zurbarán 2009 RPSP Sí Sí Sí Sí Sí Sí No No No No Sí Martínez-Gómez 2009 RESP Sí No Sí Sí Sí Sí Sí NP Sí Sí Sí ShiNohara 2010 RDSP Sí Sí Sí Sí Sí Sí Sí NP No No Sí Silva 2010 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí No de Souza-Machado 2010 RDSP Sí No No No Sí NP Sí NP No No No Garrido-Urrutia 2010 RESP Sí Sí No No No Sí Sí NP Sí Sí NP Gutiérrez Sánchez 2011 RESP Sí No Sí No No Sí No NP No No NP Amaral Saliba 2011 RPSP Sí Sí Sí Sí Sí Sí Sí NP Sí Sí Sí de Barrios Leite 2011 RDSP Sí Sí Sí Sí Sí No Sí NP No Sí No

Cumplimiento de los pasos metodológicos para la TACV de cuestionarios publicados en las revistas GS, RESP, SPM, RDSP, y RPSP, sin límite temporal ni de idioma, hasta el 1 de noviembre de 2011. Barcelona, España, noviembre 2011

Adaptación y validación de cuestionarios Artículo de revisión

59

29



Fueron criterios de inclusión que el artículo tratara sobre aspectos metodológicos de los procesos de TACV de cuestionarios de salud y que fueran publicados en los idiomas mencionados. Con base en estos criterios y partiendo de la lectura de los resúmenes, se seleccionaron 20 artículos que se analizaron en su versión completa.12,13,15,22-38 Asimismo, se realizó una búsqueda de la literatura gris a través de Internet, introduciendo como criterios de búsqueda las palabras clave obtenidas, así como los autores identificados en el proceso anterior. Finalmente, se incluyeron siete libros11,16-21 y 21 artículos.12-15,22-38 A partir de esta revisión, se elaboró una propuesta con las recomendaciones metodológicas sobre las que existía un mayor consenso entre los autores y se formuló un glosario con los términos más comúnmente empleados en los procesos de TACV de cuestionarios (cuadro II). Síntesis y recomendaciones Existe amplio consenso en recomendar dos etapas para el proceso de TACV: a) adaptación cultural, donde es necesario tener en cuenta los giros idiomáticos, el contexto cultural, y las diferencias en la percepción de la salud y la enfermedad de las poblaciones, y b) la validación en el idioma de destino, para evaluar el grado de preservación de las propiedades psicométricas. Primera etapa: traducción y adaptación cultural En esta etapa se traduce la herramienta partiendo de su versión original y procurando mantener la estructura del cuestionario. El objetivo es conseguir que el instrumento resultante mantenga la equivalencia semántica, idiomática, conceptual y experiencial con el cuestionario original.22,23 En la literatura existe consenso sobre cómo abordar esta primera etapa,12,13,22-27 recomendándose una secuencia de cinco pasos (figura 1): Traducción directa: se realiza una traducción conceptual del instrumento. Deben participar, al menos, dos traductores bilingües independientes cuya lengua materna sea el idioma de destino. Uno de los traductores deberá conocer los objetivos y los conceptos considerados en el cuestionario, y tendrá experiencia previa en la traducción técnica de textos. El otro u otros traductores no tendrán conocimientos previos sobre el cuestionario y desconocerán los objetivos del estudio. Estos traductores ofrecerán una traducción más ajustada al lenguaje de uso coloquial, detectando las dificultades de comprensión y traducción derivadas del uso de vocablos técnicos o poco comunes. 60

30

Cuadro II

Glosario de términos comúnmente empleados en los procesos de traducción y adaptación cultural de cuestionarios.

Barcelona, España, noviembre 2011

Adaptación cultural (cross-cultural adaptation): tomar en consideración el contexto cultural, los giros idiomáticos y las diferencias en la percepción de la salud y la enfermedad de aquellas poblaciones en las cuales se desea aplicar. Consistencia interna (internal consistency reliability): es el grado de interrelación y coherencia de los componentes (ítems o variables) del instrumento de medida. Constructo (construct): teoría subyacente en el fenómeno o concepto que se quiere medir. Se trata de una cualidad no observable en una población de sujetos. Criterio o prueba de referencia (gold standard): método de medición alternativo equivalente, independiente de los resultados de un cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida. Escala (scale): graduación utilizada en diversos instrumentos de medida para posibilitar la medición de una magnitud. Especificidad (specificity): capacidad para detectar a los individuos que no presentan el fenómeno de estudio. Fiabilidad (reliability): grado en que un instrumento es capaz de medir sin errores. Es la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos. Fiabilidad inter-observador (inter-rater reliability): mide el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento. Fiabilidad intra-observador o fiabilidad test-retest (test-retest reliability): mide la estabilidad de las puntuaciones otorgadas por el mismo evaluador, en los mismos sujetos y con el mismo método en momentos diferentes. Ítem (item): cada uno de los componentes o variables de un instrumento de medida; cada una de las partes o unidades de que se compone una prueba, un test o un cuestionario. Sensibilidad (responsiveness): capacidad de detectar y medir cambios, tanto en los diferentes individuos como en la respuesta de un mismo individuo a través del tiempo. Traducción (translation): expresar en una lengua algo que se ha expresado anteriormente o que está escrito en otra diferente. Traducción directa (forward translation): es aquella que se realiza de un idioma extranjero al idioma del traductor. Traducción inversa (back translation): es la traducción de un texto a su idioma original partiendo de una traducción de este texto realizada previamente a otro idioma. Traducción literal (literal translation): es aquella en la que se respeta el sentido del texto original. Validación (validation): evaluación del grado de preservación de las propiedades psicométricas del cuestionario. Validez (validity): capacidad que tiene el instrumento de medir aquel constructo para el que ha sido diseñado. Validez aparente (face validity): grado en que los ítems de un cuestionario, a juicio de los expertos y de los usuarios, miden de modo lógico o reflejan adecuadamente el constructo que se quiere medir. Validez de contenido (content validity): grado en que el contenido de un instrumento es capaz de medir la mayor parte de las dimensiones del constructo que se quiere estudiar. Validez de constructo (construct validity): grado en que las mediciones que resulten de las respuestas del cuestionario puedan considerarse como una medición del fenómeno estudiado. Validez de criterio (criterion validity): grado en que el resultado del cuestionario predice o concuerda con algún criterio de “valor real” o gold standard. Valor predictivo positivo (positive predictive validity): es la probabilidad de que esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es positivo. Valor predictivo negativo (negative predictive validity): es la probabilidad de que no esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es negativo.




Fueron criterios de inclusión que el artículo tratara sobre aspectos metodológicos de los procesos de TACV de cuestionarios de salud y que fueran publicados en los idiomas mencionados. Con base en estos criterios y partiendo de la lectura de los resúmenes, se seleccionaron 20 artículos que se analizaron en su versión completa.12,13,15,22-38 Asimismo, se realizó una búsqueda de la literatura gris a través de Internet, introduciendo como criterios de búsqueda las palabras clave obtenidas, así como los autores identificados en el proceso anterior. Finalmente, se incluyeron siete libros11,16-21 y 21 artículos.12-15,22-38 A partir de esta revisión, se elaboró una propuesta con las recomendaciones metodológicas sobre las que existía un mayor consenso entre los autores y se formuló un glosario con los términos más comúnmente empleados en los procesos de TACV de cuestionarios (cuadro II). Síntesis y recomendaciones Existe amplio consenso en recomendar dos etapas para el proceso de TACV: a) adaptación cultural, donde es necesario tener en cuenta los giros idiomáticos, el contexto cultural, y las diferencias en la percepción de la salud y la enfermedad de las poblaciones, y b) la validación en el idioma de destino, para evaluar el grado de preservación de las propiedades psicométricas. Primera etapa: traducción y adaptación cultural En esta etapa se traduce la herramienta partiendo de su versión original y procurando mantener la estructura del cuestionario. El objetivo es conseguir que el instrumento resultante mantenga la equivalencia semántica, idiomática, conceptual y experiencial con el cuestionario original.22,23 En la literatura existe consenso sobre cómo abordar esta primera etapa,12,13,22-27 recomendándose una secuencia de cinco pasos (figura 1): Traducción directa: se realiza una traducción conceptual del instrumento. Deben participar, al menos, dos traductores bilingües independientes cuya lengua materna sea el idioma de destino. Uno de los traductores deberá conocer los objetivos y los conceptos considerados en el cuestionario, y tendrá experiencia previa en la traducción técnica de textos. El otro u otros traductores no tendrán conocimientos previos sobre el cuestionario y desconocerán los objetivos del estudio. Estos traductores ofrecerán una traducción más ajustada al lenguaje de uso coloquial, detectando las dificultades de comprensión y traducción derivadas del uso de vocablos técnicos o poco comunes. 60

30

Cuadro II

Glosario de términos comúnmente empleados en los procesos de traducción y adaptación cultural de cuestionarios.

Barcelona, España, noviembre 2011

Adaptación cultural (cross-cultural adaptation): tomar en consideración el contexto cultural, los giros idiomáticos y las diferencias en la percepción de la salud y la enfermedad de aquellas poblaciones en las cuales se desea aplicar. Consistencia interna (internal consistency reliability): es el grado de interrelación y coherencia de los componentes (ítems o variables) del instrumento de medida. Constructo (construct): teoría subyacente en el fenómeno o concepto que se quiere medir. Se trata de una cualidad no observable en una población de sujetos. Criterio o prueba de referencia (gold standard): método de medición alternativo equivalente, independiente de los resultados de un cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida. Escala (scale): graduación utilizada en diversos instrumentos de medida para posibilitar la medición de una magnitud. Especificidad (specificity): capacidad para detectar a los individuos que no presentan el fenómeno de estudio. Fiabilidad (reliability): grado en que un instrumento es capaz de medir sin errores. Es la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos. Fiabilidad inter-observador (inter-rater reliability): mide el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento. Fiabilidad intra-observador o fiabilidad test-retest (test-retest reliability): mide la estabilidad de las puntuaciones otorgadas por el mismo evaluador, en los mismos sujetos y con el mismo método en momentos diferentes. Ítem (item): cada uno de los componentes o variables de un instrumento de medida; cada una de las partes o unidades de que se compone una prueba, un test o un cuestionario. Sensibilidad (responsiveness): capacidad de detectar y medir cambios, tanto en los diferentes individuos como en la respuesta de un mismo individuo a través del tiempo. Traducción (translation): expresar en una lengua algo que se ha expresado anteriormente o que está escrito en otra diferente. Traducción directa (forward translation): es aquella que se realiza de un idioma extranjero al idioma del traductor. Traducción inversa (back translation): es la traducción de un texto a su idioma original partiendo de una traducción de este texto realizada previamente a otro idioma. Traducción literal (literal translation): es aquella en la que se respeta el sentido del texto original. Validación (validation): evaluación del grado de preservación de las propiedades psicométricas del cuestionario. Validez (validity): capacidad que tiene el instrumento de medir aquel constructo para el que ha sido diseñado. Validez aparente (face validity): grado en que los ítems de un cuestionario, a juicio de los expertos y de los usuarios, miden de modo lógico o reflejan adecuadamente el constructo que se quiere medir. Validez de contenido (content validity): grado en que el contenido de un instrumento es capaz de medir la mayor parte de las dimensiones del constructo que se quiere estudiar. Validez de constructo (construct validity): grado en que las mediciones que resulten de las respuestas del cuestionario puedan considerarse como una medición del fenómeno estudiado. Validez de criterio (criterion validity): grado en que el resultado del cuestionario predice o concuerda con algún criterio de “valor real” o gold standard. Valor predictivo positivo (positive predictive validity): es la probabilidad de que esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es positivo. Valor predictivo negativo (negative predictive validity): es la probabilidad de que no esté presente el fenómeno de estudio en un individuo cuando el resultado del cuestionario es negativo.



Todo el cuestionario, incluyendo las instrucciones, los ítems y las opciones de respuesta, se traducirá utilizando este método, recopilando todo en un informe. Síntesis de traducciones: las traducciones serán comparadas por los traductores. Se identificarán y se discutirán las discrepancias entre las versiones traducidas hasta alcanzar el consenso. En el caso de que no exista consenso, se requerirá la participación del equipo de investigación. Al final, se realizará un informe del proceso en el que aparecerá una única traducción del cuestionario que será la versión de síntesis en el idioma de destino. Traducción inversa (retro traducción): la versión de síntesis será retro traducida al idioma original, al menos por dos traductores profesionales bilingües cuya lengua materna sea la del cuestionario original. Los traductores trabajarán de forma independiente, estarán ciegos para la versión original del cuestionario, no tendrán conocimientos previos sobre el tema y desconocerán los objetivos del estudio.12,13 Los traductores deberán subrayar las redacciones difíciles y las incertidumbres encontradas durante el

P r i m e r a f a s e

T r a d u c c i ó n

a d a p t a c i ó n

c u l t u r a l

Paso 1 Traducción directa



Paso 2

proceso de traducción. Se determinará si la traducción ha dado lugar a diferencias semánticas o conceptuales importantes entre el cuestionario original y la versión de síntesis obtenida en el paso anterior. Todo lo anterior se recopilará en un informe.

Síntesis de traducciones

Todo el cuestionario, incluyendo las instrucciones, los ítems y las opciones de respuesta, se traducirá utilizando este método, recopilando todo en un informe. Síntesis de traducciones: las traducciones serán comparadas por los traductores. Se identificarán y se discutirán las discrepancias entre las versiones traducidas hasta alcanzar el consenso. En el caso de que no exista consenso, se requerirá la participación del equipo de investigación. Al final, se realizará un informe del proceso en el que aparecerá una única traducción del cuestionario que será la versión de síntesis en el idioma de destino.

Consolidación por un comité de expertos: se recomienda constituir un comité multidisciplinar, si es posible de expertos bilingües en el tema sobre el que trata el cuestionario: un experto en metodología, un lingüista y un profesional de la salud, además de los traductores que han participado en el proceso. El objetivo de este comité será llegar a un único cuestionario consolidado pre-final adaptado al idioma de destino.16,17 En este paso se dispondrá de las traducciones directas (paso 1), la versión de síntesis (paso 2) y las retrotraducciones (paso 3). Se identificarán y discutirán las discrepancias encontradas. Se cerciorará de que la versión pre-final sea totalmente comprensible y equivalente al cuestionario original. Se asegurará que el cuestionario pre-final resulte comprensible para una persona escolarizada con conocimientos equivalentes a un individuo de 12 años de edad. Paso 3 Traducción inversa

Paso 4 Consolidación comité expertos

Traducción inversa (retro traducción): la versión de síntesis será retro traducida al idioma original, al menos por dos traductores profesionales bilingües cuya lengua materna sea la del cuestionario original. Los traductores trabajarán de forma independiente, estarán ciegos para la versión original del cuestionario, no tendrán conocimientos previos sobre el tema y desconocerán los objetivos del estudio.12,13 Los traductores deberán subrayar las redacciones difíciles y las incertidumbres encontradas durante el

Paso 5 P r i m e r a

Pre-test (viabilidad)

f a s e

Informes del proceso

y

T r a d u c c i ó n

a d a p t a c i ó n

c u l t u r a l

f a s e

v a l i d a c i ó n

Figura 1. Proceso noviembre 2011

Fiabilidad

Validez

de traducción, adaptación cultural y validación (adaptado de referencia


Consolidación por un comité de expertos: se recomienda constituir un comité multidisciplinar, si es posible de expertos bilingües en el tema sobre el que trata el cuestionario: un experto en metodología, un lingüista y un profesional de la salud, además de los traductores que han participado en el proceso. El objetivo de este comité será llegar a un único cuestionario consolidado pre-final adaptado al idioma de destino.16,17 En este paso se dispondrá de las traducciones directas (paso 1), la versión de síntesis (paso 2) y las retrotraducciones (paso 3). Se identificarán y discutirán las discrepancias encontradas. Se cerciorará de que la versión pre-final sea totalmente comprensible y equivalente al cuestionario original. Se asegurará que el cuestionario pre-final resulte comprensible para una persona escolarizada con conocimientos equivalentes a un individuo de 12 años de edad.

Paso 2

Paso 3

Traducción directa

Síntesis de traducciones

Traducción inversa

Paso 4 Consolidación comité expertos

Paso 5 Pre-test (viabilidad)

Informes del proceso

y Versión traducida y adaptada culturalmente

S e g u n d a

1. Consistencia interna 2. Fiabilidad intra-observador 3. Fiabilidad inter-observador 1. Validez aparente o lógica 2. Validez de contenido 3. Validez de criterio 4. Validez de constructo

proceso de traducción. Se determinará si la traducción ha dado lugar a diferencias semánticas o conceptuales importantes entre el cuestionario original y la versión de síntesis obtenida en el paso anterior. Todo lo anterior se recopilará en un informe.

Paso 1

Versión traducida y adaptada culturalmente

S e g u n d a


Versión validada

f a s e

22). Barcelona,

v a l i d a c i ó n

Figura 1. Proceso noviembre 2011

61

31

Fiabilidad

Validez

1. Consistencia interna 2. Fiabilidad intra-observador 3. Fiabilidad inter-observador 1. Validez aparente o lógica 2. Validez de contenido 3. Validez de criterio 4. Validez de constructo

de traducción, adaptación cultural y validación (adaptado de referencia


Versión validada

22). Barcelona,

61

31


En el caso de que surjan incertidumbres se recurrirá, de ser posible, con alguno de los autores del cuestionario para solicitar su participación. Se elaborará un informe que sintetice las decisiones del comité, incluyendo la versión consolidada. Pre-test (aplicabilidad / viabilidad): su realización permitirá evaluar la calidad de la traducción, la adaptación cultural y la aplicabilidad o viabilidad del cuestionario. Asimismo permitirá calcular si el tiempo de cumplimentación se encuentra dentro de límites razonables. Investigadores como Durand y colaboradores,25 y Gallasch y colaboradores,26 realizaron el pre-test durante el proceso de traducción y adaptación cultural del Work Role Functioning Questionnaire (WRFQ-27) con una muestra de 30-40 trabajadores, y se obtuvieron resultados satisfactorios. Lo mismo realizaron De Soárez y colaboradores para el Work Limitations Questionnaire (WLQ), incluyendo a 20 voluntarios.27 Beaton propuso incluir en la muestra entre 30 y 40 participantes, basándose en una revisión bibliográfica de adaptaciones culturales.22 Se recomienda la realización del pre-test con participantes de distintos niveles educativos y, si se trata de cuestionarios autocumplimentados, los participantes deberán saber leer y comprender lo leído. Para seleccionar la muestra, es importante definir los criterios de inclusión y exclusión, así como el modo en que serán reclutados los participantes. En el caso de cuestionarios de aplicación en salud laboral, se recomienda incluir en el pre-test a trabajadores en activo, con una jornada mayor o igual a 10 horas semanales, de ambos sexos, con edades entre 18 y 65 años, con diferentes niveles educativos y que hablen como primera lengua, lean y comprendan el idioma de destino si se trata de cuestionarios autocumplimentados. De cada participante se recopilarán datos, al menos, sobre sus características sociodemográficas, nivel educativo y ocupación.25,26 Se solicitará a los participantes que llenen la versión consolidada y, mediante una entrevista estructurada, se les invitará a comentar cualquier aspecto que haya resultado difícil de entender. Se recomienda grabar estas entrevistas así como la autorización previa de los participantes, con el fin de poder revisarlas tantas veces como sea necesario. Al final, se realizará un informe donde se identificarán las posibles dificultades en la comprensión de las instrucciones del cuestionario, las preguntas y las opciones de respuesta. Se recomienda la revisión de cualquier pregunta del cuestionario si al menos 15% de los participantes encuentran dificultades en la misma.27

62

32


Segunda etapa: validación del cuestionario en el idioma destino La correcta traducción y adaptación cultural de un cuestionario no siempre garantiza la preservación de sus propiedades psicométricas, por lo que es necesaria su validación en el idioma de destino.22 Para que un cuestionario se considere válido, debe de reunir las siguientes características: a) ser fiable y capaz de medir sin error; b) ser capaz de detectar y medir cambios, tanto entre individuos como en la respuesta de un mismo individuo a través del tiempo; c) ser sencillo, viable y aceptado por pacientes, usuarios e investigadores; d) ser adecuado para medir el fenómeno que se pretende medir, y e) reflejar la teoría subyacente en el fenómeno o concepto que se quiere medir. Todas estas características están relacionadas con dos propiedades de los cuestionarios: la fiabilidad y la validez.14 La Sociedad Internacional para la Evaluación de la Calidad de Vida (en inglés, IQOLA)8,18,19 y otros investigadores como Aday,19 Lam,30 Mokkink,31-33 Ren,34 Scott-Lennox35 y Wiesinger,36 han propuesto o empleado diferentes métodos de evaluación de la fiabilidad y validez de los cuestionarios. De acuerdo con esas experiencias, se propone la validación de cuestionarios con la siguiente secuencia (figura 1):


En el caso de que surjan incertidumbres se recurrirá, de ser posible, con alguno de los autores del cuestionario para solicitar su participación. Se elaborará un informe que sintetice las decisiones del comité, incluyendo la versión consolidada.

1.1. Consistencia interna: es el grado de interrelación y coherencia de los ítems. A través de este aspecto, se evalúa si los ítems que miden un mismo constructo presentan homogeneidad entre ellos.33,39 Cuando la escala de un instrumento es consistente, se garantiza que todos los ítems miden un solo constructo y, en general, se asegura la existencia de una relación lineal entre la suma de las puntuaciones de los ítems y el constructo medido.

Pre-test (aplicabilidad / viabilidad): su realización permitirá evaluar la calidad de la traducción, la adaptación cultural y la aplicabilidad o viabilidad del cuestionario. Asimismo permitirá calcular si el tiempo de cumplimentación se encuentra dentro de límites razonables. Investigadores como Durand y colaboradores,25 y Gallasch y colaboradores,26 realizaron el pre-test durante el proceso de traducción y adaptación cultural del Work Role Functioning Questionnaire (WRFQ-27) con una muestra de 30-40 trabajadores, y se obtuvieron resultados satisfactorios. Lo mismo realizaron De Soárez y colaboradores para el Work Limitations Questionnaire (WLQ), incluyendo a 20 voluntarios.27 Beaton propuso incluir en la muestra entre 30 y 40 participantes, basándose en una revisión bibliográfica de adaptaciones culturales.22 Se recomienda la realización del pre-test con participantes de distintos niveles educativos y, si se trata de cuestionarios autocumplimentados, los participantes deberán saber leer y comprender lo leído. Para seleccionar la muestra, es importante definir los criterios de inclusión y exclusión, así como el modo en que serán reclutados los participantes. En el caso de cuestionarios de aplicación en salud laboral, se recomienda incluir en el pre-test a trabajadores en activo, con una jornada mayor o igual a 10 horas semanales, de ambos sexos, con edades entre 18 y 65 años, con diferentes niveles educativos y que hablen como primera lengua, lean y comprendan el idioma de destino si se trata de cuestionarios autocumplimentados. De cada participante se recopilarán datos, al menos, sobre sus características sociodemográficas, nivel educativo y ocupación.25,26 Se solicitará a los participantes que llenen la versión consolidada y, mediante una entrevista estructurada, se les invitará a comentar cualquier aspecto que haya resultado difícil de entender. Se recomienda grabar estas entrevistas así como la autorización previa de los participantes, con el fin de poder revisarlas tantas veces como sea necesario. Al final, se realizará un informe donde se identificarán las posibles dificultades en la comprensión de las instrucciones del cuestionario, las preguntas y las opciones de respuesta. Se recomienda la revisión de cualquier pregunta del cuestionario si al menos 15% de los participantes encuentran dificultades en la misma.27


62

1. Fiabilidad: es el grado en que un instrumento es capaz de medir sin errores. Mide la proporción de variación en las mediciones que es debida a la diversidad de valores que adopta la variable y no al posible error sistemático o aleatorio.14,33 La fiabilidad determina la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos.20,33,37 Dependiendo de las características del cuestionario, su fiabilidad puede evaluarse para todas o algunas de sus tres dimensiones: 1) consistencia interna; 2) fiabilidad intra-observador o fiabilidad test-retest, y 3) fiabilidad inter-observador.

32


Segunda etapa: validación del cuestionario en el idioma destino La correcta traducción y adaptación cultural de un cuestionario no siempre garantiza la preservación de sus propiedades psicométricas, por lo que es necesaria su validación en el idioma de destino.22 Para que un cuestionario se considere válido, debe de reunir las siguientes características: a) ser fiable y capaz de medir sin error; b) ser capaz de detectar y medir cambios, tanto entre individuos como en la respuesta de un mismo individuo a través del tiempo; c) ser sencillo, viable y aceptado por pacientes, usuarios e investigadores; d) ser adecuado para medir el fenómeno que se pretende medir, y e) reflejar la teoría subyacente en el fenómeno o concepto que se quiere medir. Todas estas características están relacionadas con dos propiedades de los cuestionarios: la fiabilidad y la validez.14 La Sociedad Internacional para la Evaluación de la Calidad de Vida (en inglés, IQOLA)8,18,19 y otros investigadores como Aday,19 Lam,30 Mokkink,31-33 Ren,34 Scott-Lennox35 y Wiesinger,36 han propuesto o empleado diferentes métodos de evaluación de la fiabilidad y validez de los cuestionarios. De acuerdo con esas experiencias, se propone la validación de cuestionarios con la siguiente secuencia (figura 1): 1. Fiabilidad: es el grado en que un instrumento es capaz de medir sin errores. Mide la proporción de variación en las mediciones que es debida a la diversidad de valores que adopta la variable y no al posible error sistemático o aleatorio.14,33 La fiabilidad determina la proporción de la variancia total atribuible a diferencias verdaderas entre los sujetos.20,33,37 Dependiendo de las características del cuestionario, su fiabilidad puede evaluarse para todas o algunas de sus tres dimensiones: 1) consistencia interna; 2) fiabilidad intra-observador o fiabilidad test-retest, y 3) fiabilidad inter-observador. 1.1. Consistencia interna: es el grado de interrelación y coherencia de los ítems. A través de este aspecto, se evalúa si los ítems que miden un mismo constructo presentan homogeneidad entre ellos.33,39 Cuando la escala de un instrumento es consistente, se garantiza que todos los ítems miden un solo constructo y, en general, se asegura la existencia de una relación lineal entre la suma de las puntuaciones de los ítems y el constructo medido.



Un constructo es una cualidad latente o intangible de un sujeto o de una población que no se puede observar y medir directamente con un instrumento de medida, ya que esta cualidad tiene lugar dentro de una teoría. Son ejemplos el estrés laboral, la motivación, la discapacidad o el liderazgo. Evaluar la fiabilidad de un instrumento no ofrece mayores problemas cuando se trata de cuantificar cualidades objetivas, como el peso o la talla. No obstante, para los constructos es necesario probar de forma empírica que el instrumento sirve para medir aquello que se pretende medir. La medición de los constructos se realiza frecuentemente mediante cuestionarios donde se supone que cada ítem está relacionado con la cualidad no observable de interés. Para cada ítem se suele solicitar una respuesta a la que se asigna una puntuación. La suma de las puntuaciones proporciona la escala del cuestionario. En ocasiones, una escala puede estar compuesta por un grupo de subescalas. Por ejemplo, el riesgo laboral psicosocial es un constructo que, a su vez, puede estar compuesto por varias dimensiones como el nivel de demanda del trabajo, las recompensas, el nivel de control y el apoyo social. El coeficiente alfa de Cronbach permite cuantificar el nivel de fiabilidad de una escala si se cumplen dos requisitos: a) debe estar formada por un conjunto de ítems, cuyas puntuaciones se suman para calcular una puntuación global, y b) todas las puntuaciones de los ítems deben medir en la misma dirección; por ejemplo, a mayor puntuación mayor capacidad funcional o mayor bienestar emocional. El coeficiente alfa de Cronbach es la media ponderada de las correlaciones entre los ítems que forman parte de una escala.39 Cuando el instrumento está compuesto por un grupo de subescalas, debe calcularse el coeficiente alfa de Cronbach para los ítems respecto de la puntuación global (correlación ítem-total) y para los ítems de cada subescala respecto del valor de la misma (correlación ítem-subescala). El coeficiente alfa de Cronbach no viene acompañado de ningún valor de p que permita rechazar o no la hipótesis de fiabilidad de la escala. Puede adoptar valores entre 0 y 1. Se considera que valores alfa superiores a 0.70 son suficientes para garantizar la consistencia interna de la escala. salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013



Un constructo es una cualidad latente o intangible de un sujeto o de una población que no se puede observar y medir directamente con un instrumento de medida, ya que esta cualidad tiene lugar dentro de una teoría. Son ejemplos el estrés laboral, la motivación, la discapacidad o el liderazgo. Evaluar la fiabilidad de un instrumento no ofrece mayores problemas cuando se trata de cuantificar cualidades objetivas, como el peso o la talla. No obstante, para los constructos es necesario probar de forma empírica que el instrumento sirve para medir aquello que se pretende medir. La medición de los constructos se realiza frecuentemente mediante cuestionarios donde se supone que cada ítem está relacionado con la cualidad no observable de interés. Para cada ítem se suele solicitar una respuesta a la que se asigna una puntuación. La suma de las puntuaciones proporciona la escala del cuestionario. En ocasiones, una escala puede estar compuesta por un grupo de subescalas. Por ejemplo, el riesgo laboral psicosocial es un constructo que, a su vez, puede estar compuesto por varias dimensiones como el nivel de demanda del trabajo, las recompensas, el nivel de control y el apoyo social. El coeficiente alfa de Cronbach permite cuantificar el nivel de fiabilidad de una escala si se cumplen dos requisitos: a) debe estar formada por un conjunto de ítems, cuyas puntuaciones se suman para calcular una puntuación global, y b) todas las puntuaciones de los ítems deben medir en la misma dirección; por ejemplo, a mayor puntuación mayor capacidad funcional o mayor bienestar emocional. El coeficiente alfa de Cronbach es la media ponderada de las correlaciones entre los ítems que forman parte de una escala.39 Cuando el instrumento está compuesto por un grupo de subescalas, debe calcularse el coeficiente alfa de Cronbach para los ítems respecto de la puntuación global (correlación ítem-total) y para los ítems de cada subescala respecto del valor de la misma (correlación ítem-subescala). El coeficiente alfa de Cronbach no viene acompañado de ningún valor de p que permita rechazar o no la hipótesis de fiabilidad de la escala. Puede adoptar valores entre 0 y 1. Se considera que valores alfa superiores a 0.70 son suficientes para garantizar la consistencia interna de la escala.

1.2 Fiabilidad intra-observador o fiabilidad test-retest: este aspecto hace referencia a la repetibilidad del instrumento, cuando se administra con el mismo método a la misma población en dos momentos diferentes.14,33 Cuando la escala es cuantitativa, su análisis se realiza mediante el cálculo del coeficiente de correlación intraclase (CCI), y cuando es cualitativa se realiza mediante el cálculo del índice Kappa de Cohen.21,37 El tiempo que debe transcurrir entre la primera vez (test) y la segunda (retest) dependerá de lo que se esté midiendo. No debe ser muy largo para evitar que el fenómeno observado sufra variaciones que alterarían el valor de la repetibilidad y tampoco debe ser demasiado corto para evitar el recuerdo de las respuestas (efecto aprendizaje). 1.3 Fiabilidad inter-observador: es el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento.33 Esta propiedad no es evaluable cuando se trata de cuestionarios autocumplimentados, ya que es el propio individuo quien proporciona las respuestas sin que exista interferencia de los investigadores. Si se requiere su evaluación, se realizará mediante el cálculo del coeficiente de correlación intraclase (CCI) cuando la escala sea cuantitativa, y el índice Kappa de Cohen cuando sea cualitativa. Las limitaciones principales se deben a la posibilidad de que existan de acuerdos entre los observadores debidos al azar y la posibilidad de que exista un error sistemático (sesgo de información) de alguno de los evaluadores. 2. Validez: es la capacidad del cuestionario de medir aquel constructo para el que ha sido diseñado.19,33 Puede evaluarse para todas o sólo para alguna de sus cuatro dimensiones: validez aparente o lógica, de contenido, de criterio y de constructo. 2.1 Validez aparente o lógica: se refiere al grado en que un cuestionario, a juicio de los expertos y de los usuarios, mide de forma lógica lo que quiere medir.14,19 Cuando se carece de validez aparente o lógica, los sujetos sometidos a estudio pueden no ver la relación entre las preguntas que se les formulan y el objeto para el cual han accedido a contestar. Este hecho puede provocar el rechazo de los participantes. 63

33



1.2 Fiabilidad intra-observador o fiabilidad test-retest: este aspecto hace referencia a la repetibilidad del instrumento, cuando se administra con el mismo método a la misma población en dos momentos diferentes.14,33 Cuando la escala es cuantitativa, su análisis se realiza mediante el cálculo del coeficiente de correlación intraclase (CCI), y cuando es cualitativa se realiza mediante el cálculo del índice Kappa de Cohen.21,37 El tiempo que debe transcurrir entre la primera vez (test) y la segunda (retest) dependerá de lo que se esté midiendo. No debe ser muy largo para evitar que el fenómeno observado sufra variaciones que alterarían el valor de la repetibilidad y tampoco debe ser demasiado corto para evitar el recuerdo de las respuestas (efecto aprendizaje). 1.3 Fiabilidad inter-observador: es el grado de acuerdo que hay entre dos o más evaluadores que valoran a los mismos sujetos con el mismo instrumento.33 Esta propiedad no es evaluable cuando se trata de cuestionarios autocumplimentados, ya que es el propio individuo quien proporciona las respuestas sin que exista interferencia de los investigadores. Si se requiere su evaluación, se realizará mediante el cálculo del coeficiente de correlación intraclase (CCI) cuando la escala sea cuantitativa, y el índice Kappa de Cohen cuando sea cualitativa. Las limitaciones principales se deben a la posibilidad de que existan de acuerdos entre los observadores debidos al azar y la posibilidad de que exista un error sistemático (sesgo de información) de alguno de los evaluadores. 2. Validez: es la capacidad del cuestionario de medir aquel constructo para el que ha sido diseñado.19,33 Puede evaluarse para todas o sólo para alguna de sus cuatro dimensiones: validez aparente o lógica, de contenido, de criterio y de constructo. 2.1 Validez aparente o lógica: se refiere al grado en que un cuestionario, a juicio de los expertos y de los usuarios, mide de forma lógica lo que quiere medir.14,19 Cuando se carece de validez aparente o lógica, los sujetos sometidos a estudio pueden no ver la relación entre las preguntas que se les formulan y el objeto para el cual han accedido a contestar. Este hecho puede provocar el rechazo de los participantes. 63

33



Esta dimensión de la validez debe evaluarse en el momento de su diseño; no obstante, si en el proceso de TACV se detectan desajustes debidos al proceso de traducción o adaptación cultural, será necesario corregirlos. 2.2 Validez de contenido: los constructos suelen estar compuestos por varias dimensiones. La validez de contenido es el grado en que la herramienta es capaz de medir la mayor parte de las dimensiones del constructo.14,19,33 Un cuestionario con alta validez de contenido es aquel que mide todas las dimensiones relacionadas con el constructo que se quiere estudiar. Su evaluación es un proceso formal que siempre debe realizarse en un proceso de TACV y consiste en valorar si los ítems del cuestionario son una muestra representativa de aquello que se quiere medir. Se trata de una evaluación empírica, basada en juicios de diferente procedencia, como son las opiniones de los autores de la herramienta, los resultados de estudios piloto, los razonamientos realizados por el comité de expertos en un proceso de TACV y el análisis cualitativo de los comentarios realizados por los participantes durante el proceso de pre-test. 2.3 Validez de criterio: establece la validez de un instrumento comparándola con algún criterio externo o prueba de referencia (“gold standard”,GS). Tiene dos dimensiones: 1) la validez concurrente o grado en que el resultado del cuestionario concuerda con algún GS, y 2) la validez predictiva o grado en que es capaz de pronosticar un determinado resultado.14,19,33 El GS debe ser un método alternativo equivalente, independiente de los resultados del cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida.14,19 Cuando reúne estos requisitos es capaz de dar un resultado siempre positivo en presencia del fenómeno a estudiar y siempre negativo en ausencia del mismo. Por ejemplo, la electromiografía realizada en condiciones adecuadas podría ser el GS frente a un cuestionario para la evaluación de la presencia del síndrome del túnel carpiano. Siempre que haya un GS, debería evaluarse la validez de criterio concurrente, siguiendo cinco pasos: 1) selección del GS; 2) selección de una muestra de sujetos representativa de la población; 3) administración del cuestionario y obtención del resultado para cada individuo; 4) evaluación de cada individuo con el GE, y 5) 64

34

comparación de los resultados obtenidos con el cuestionario y el GS. El análisis de la validez de criterio concurrente consiste en examinar la fuerza de la correlación existente entre el resultado del cuestionario y el del GS y se puede cuantificar mediante el cálculo del coeficiente de correlación de Pearson (r). Otro enfoque para cuantificar la validez de criterio concurrente consiste en analizar la sensibilidad y la especificidad.19,21 La sensibilidad es la capacidad que tiene el cuestionario para detectar a los individuos que presentan el fenómeno de estudio. Se puede definir como la probabilidad de que un individuo que realmente tenga el fenómeno de estudio obtenga un resultado positivo cuando se le aplique el cuestionario. Se calcula mediante el cociente entre los verdaderos positivos (VP) y la suma de los VP y los falsos negativos (FN). De ahí que también que se le conozca como la fracción de verdaderos positivos (FVP). Sensibilidad=VP/(VP+FN). La especificidad es la capacidad de detectar a los que no presentan el fenómeno de estudio, y es la probabilidad de que un individuo que no tenga el fenómeno de estudio obtenga un resultado negativo cuando se le aplique el cuestionario. Se puede calcular mediante el cociente entre los VN y la suma de los VN y los FP, y se le conoce como la fracción de verdaderos negativos (FVN); especificidad = VN/(VN+FP) (cuadro III). Cuanto más alta sea la sensibilidad y especificidad, y menor sea el porcentaje de FP y FN, mayor será la validez concurrente.

Esta dimensión de la validez debe evaluarse en el momento de su diseño; no obstante, si en el proceso de TACV se detectan desajustes debidos al proceso de traducción o adaptación cultural, será necesario corregirlos. 2.2 Validez de contenido: los constructos suelen estar compuestos por varias dimensiones. La validez de contenido es el grado en que la herramienta es capaz de medir la mayor parte de las dimensiones del constructo.14,19,33 Un cuestionario con alta validez de contenido es aquel que mide todas las dimensiones relacionadas con el constructo que se quiere estudiar. Su evaluación es un proceso formal que siempre debe realizarse en un proceso de TACV y consiste en valorar si los ítems del cuestionario son una muestra representativa de aquello que se quiere medir. Se trata de una evaluación empírica, basada en juicios de diferente procedencia, como son las opiniones de los autores de la herramienta, los resultados de estudios piloto, los razonamientos realizados por el comité de expertos en un proceso de TACV y el análisis cualitativo de los comentarios realizados por los participantes durante el proceso de pre-test. 2.3 Validez de criterio: establece la validez de un instrumento comparándola con algún criterio externo o prueba de referencia (“gold standard”,GS). Tiene dos dimensiones: 1) la validez concurrente o grado en que el resultado del cuestionario concuerda con algún GS, y 2) la validez predictiva o grado en que es capaz de pronosticar un determinado resultado.14,19,33 El GS debe ser un método alternativo equivalente, independiente de los resultados del cuestionario, fiable, exacto, objetivo y ampliamente aceptado como medida válida.14,19 Cuando reúne estos requisitos es capaz de dar un resultado siempre positivo en presencia del fenómeno a estudiar y siempre negativo en ausencia del mismo. Por ejemplo, la electromiografía realizada en condiciones adecuadas podría ser el GS frente a un cuestionario para la evaluación de la presencia del síndrome del túnel carpiano. Siempre que haya un GS, debería evaluarse la validez de criterio concurrente, siguiendo cinco pasos: 1) selección del GS; 2) selección de una muestra de sujetos representativa de la población; 3) administración del cuestionario y obtención del resultado para cada individuo; 4) evaluación de cada individuo con el GE, y 5)

Cuadro III

Cálculo de la sensibilidad, especificidad, valor predictivo positivo y valor predictivo negativo.* Barcelona, España, noviembre 2011. Resultado del cuestionario

Fenómeno de estudio (gold standard) Presente Ausente Total

Positivo Negativo Total

VP FN VP+FN

FP VN FP+VN

VP+FP FN+VN

Fuente: * Adaptado de referencia 21 VP: verdaderos positivos; FP: falsos positivos; FN: falsos negativos; VN: verdaderos negativos Sensibilidad: VP/(VP+FN) Especificidad: VN/(FP+VN) Valor Predictivo Positivo (VPP): VP/(VP+FP)




64

34

comparación de los resultados obtenidos con el cuestionario y el GS. El análisis de la validez de criterio concurrente consiste en examinar la fuerza de la correlación existente entre el resultado del cuestionario y el del GS y se puede cuantificar mediante el cálculo del coeficiente de correlación de Pearson (r). Otro enfoque para cuantificar la validez de criterio concurrente consiste en analizar la sensibilidad y la especificidad.19,21 La sensibilidad es la capacidad que tiene el cuestionario para detectar a los individuos que presentan el fenómeno de estudio. Se puede definir como la probabilidad de que un individuo que realmente tenga el fenómeno de estudio obtenga un resultado positivo cuando se le aplique el cuestionario. Se calcula mediante el cociente entre los verdaderos positivos (VP) y la suma de los VP y los falsos negativos (FN). De ahí que también que se le conozca como la fracción de verdaderos positivos (FVP). Sensibilidad=VP/(VP+FN). La especificidad es la capacidad de detectar a los que no presentan el fenómeno de estudio, y es la probabilidad de que un individuo que no tenga el fenómeno de estudio obtenga un resultado negativo cuando se le aplique el cuestionario. Se puede calcular mediante el cociente entre los VN y la suma de los VN y los FP, y se le conoce como la fracción de verdaderos negativos (FVN); especificidad = VN/(VN+FP) (cuadro III). Cuanto más alta sea la sensibilidad y especificidad, y menor sea el porcentaje de FP y FN, mayor será la validez concurrente. Cuadro III

Cálculo de la sensibilidad, especificidad, valor predictivo positivo y valor predictivo negativo.* Barcelona, España, noviembre 2011. Resultado del cuestionario

Fenómeno de estudio (gold standard) Presente Ausente Total

Positivo Negativo Total

VP FN VP+FN

FP VN FP+VN

VP+FP FN+VN

Fuente: * Adaptado de referencia 21 VP: verdaderos positivos; FP: falsos positivos; FN: falsos negativos; VN: verdaderos negativos Sensibilidad: VP/(VP+FN) Especificidad: VN/(FP+VN) Valor Predictivo Positivo (VPP): VP/(VP+FP)



Se considera que un cuestionario tiene una sensibilidad y especificidad aceptable cuando éstas son superiores a 0,80.20 A partir de aquí, puede ser de interés conocer la validez predictiva.21 El valor predictivo positivo (VPP) es la probabilidad de que un individuo presente el fenómeno de estudio que se busca medir con el cuestionario si se obtiene un resultado positivo en el mismo. Se calcula mediante la proporción de participantes con un resultado positivo en el cuestionario y que finalmente presentaban el fenómeno de estudio que se intentaba medir: VPP = VP/(VP+FP). El valor predictivo negativo (VPN) es la probabilidad de que no esté presente dicho fenómeno cuando el resultado del cuestionario es negativo: VPN = VN/(FN+VN). 2.4 Validez de constructo: es el grado en que las mediciones que resultan de las respuestas del cuestionario pueden considerarse una medición del fenómeno estudiado.14,19,33 Su evaluación consiste en contrastar las hipótesis que se han formulado sobre el comportamiento de las puntuaciones de un instrumento en situaciones diferentes. Existen varios métodos para su evaluación, que deben realizarse cuando el fenómeno a medir es abstracto o no es posible comparar con un GE. El uso de técnicas de análisis de la validez para grupos conocidos es un procedimiento muy adecuado en cuestionarios de salud laboral para medir el grado de capacidad física o cognitiva para el trabajo. Permite comparar los resultados obtenidos mediante la aplicación del cuestionario a grupos con un diagnóstico clínico conocido de salud física o mental.19,20

Conclusiones La TACV de cuestionarios para su uso en otros idiomas es un proceso que consume recursos; sin embargo, cuando se lleva a cabo de forma sistemática permite obtener una herramienta de medición equivalente a su versión original. El modo en que se realiza la TACV de cuestionarios de salud es perfectible; así entonces, es importante seguir las recomendaciones metodológicas. Si el proceso de TACV no se lleva a cabo de manera rigurosa, pueden producirse errores con implicaciones en el diagnóstico, en las decisiones que deben tomarse con respecto a la terapia individual, en los registros epidemiológicos e, incluso, en el diseño y puesta en marcha de políticas públicas. Además, el uso de herramientas no equivalentes salud pública de méxico / vol. 55, no. 1, enero-febrero de 2013



al cuestionario original puede producir resultados no fiables o confusos que podrían limitar el intercambio de información entre la comunidad científica.13,14,22-24 Esta propuesta para la TACV de cuestionarios de salud guarda coherencia con las recomendaciones de expertos como Alexandre,13 Beaton,22 Carvajal,23 Guillemin12 y Herdman24 para la realización de traducciones y adaptaciones culturales. El proceso de traducción y adaptación debe ir seguido de un proceso de validación en la lengua de destino, lo cual permite minimizar el sesgo de información que podría asociarse a la administración de cuestionarios en países con idiomas y culturas diferentes. Por ello, se complementa el proceso proponiendo una serie de pasos a seguir durante la etapa de validación, coherentes con las recomendaciones de expertos como Aday,19 Mokkink,31-33 Müller37 y Keszey.38

Se considera que un cuestionario tiene una sensibilidad y especificidad aceptable cuando éstas son superiores a 0,80.20 A partir de aquí, puede ser de interés conocer la validez predictiva.21 El valor predictivo positivo (VPP) es la probabilidad de que un individuo presente el fenómeno de estudio que se busca medir con el cuestionario si se obtiene un resultado positivo en el mismo. Se calcula mediante la proporción de participantes con un resultado positivo en el cuestionario y que finalmente presentaban el fenómeno de estudio que se intentaba medir: VPP = VP/(VP+FP). El valor predictivo negativo (VPN) es la probabilidad de que no esté presente dicho fenómeno cuando el resultado del cuestionario es negativo: VPN = VN/(FN+VN). 2.4 Validez de constructo: es el grado en que las mediciones que resultan de las respuestas del cuestionario pueden considerarse una medición del fenómeno estudiado.14,19,33 Su evaluación consiste en contrastar las hipótesis que se han formulado sobre el comportamiento de las puntuaciones de un instrumento en situaciones diferentes. Existen varios métodos para su evaluación, que deben realizarse cuando el fenómeno a medir es abstracto o no es posible comparar con un GE. El uso de técnicas de análisis de la validez para grupos conocidos es un procedimiento muy adecuado en cuestionarios de salud laboral para medir el grado de capacidad física o cognitiva para el trabajo. Permite comparar los resultados obtenidos mediante la aplicación del cuestionario a grupos con un diagnóstico clínico conocido de salud física o mental.19,20

Declaración de conflicto de intereses: Los autores declararon no tener conflicto de intereses.

Referencias 1. Goldberg D, Bridges K, Duncan-Jones P, Grayson D. Detecting anxiety and depression in general medical settings. BMJ 1988; 297: 897-899. 2. Susitaival P, Flyvholm MA, Meding B, Kanerva L, Lindberg M, Svensson A, et al. Nordic Occupational Skin Questionnaire (NOSQ-2002): a new tool for surveying occupational skin diseases and exposure. Contact Dermatitis 2003;49:70-76. 3. Melosini L, Dente FL, Bacci E, Bartoli ML, Cianchetti S, Costa F, et al. Asthma control test (ACT): comparison with clinical, functional, and biological markers of asthma control. J Asthma 2012;49:317-323. 4. Connor JP, Grier M, Feeney GF, Young RM. The validity of the Brief Michigan Alcohol Screening Test (bMAST) as a problem drinking severity measure. J Stud Alcohol Drugs 2007;68:771-779. 5. Tuomi K, Ilmarinen J, Eskelinen L, Järvinen E, Toikkanen J, Klockars M. Prevalence and incidence rates of diseases and work ability in different work categories of municipal occupations. Scand J Work Environ Health 1991;17 (Suppl 1):67-74. 6. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses, and recommended measures. Spine 2000; 25:3152-160. 7. Scublinsky D, González C, Iannantuono R, Somma LF, Rillo O, Casado G et al. Adaptación al español y validación del cuestionario de detección epidemiológica para artritis reumatoidea. Rev Argent Reumatol 2008; 19:33-35. 8. Simonsson M, Bergman S, Jacobsson L, Petersson I, Svensson B. The prevalence of rheumatoid arthritis in Sweden. Scand J Rheumatol 1999;28:340-343. 9. Kuorinka I, Jonsson B, Kilbom A, Vinterberg H, Biering-Sørensen F, Andersson G, et al. Standardised Nordic questionnaires for the analysis of musculoskeletal symptoms. Appl Ergon 1987; 18: 233-237. 10. Tapia-Conyer R, Velázquez-Monroy O, Lara-Esqueda A, Tapia-Olarte F, Aurora-Jiménez R, Sánchez-Montes J, et al. Guía de detección integrada de obesidad, diabetes e hipertensión arterial. [monografía en Internet]. Ciudad de México, DF: Secretaría de Salud de México; [consultado 2012 septiembre 18]. Disponible en: www.salud.gob.mx/unidades/cdi/documentos/DOCSAL7482.pdf

Conclusiones La TACV de cuestionarios para su uso en otros idiomas es un proceso que consume recursos; sin embargo, cuando se lleva a cabo de forma sistemática permite obtener una herramienta de medición equivalente a su versión original. El modo en que se realiza la TACV de cuestionarios de salud es perfectible; así entonces, es importante seguir las recomendaciones metodológicas. Si el proceso de TACV no se lleva a cabo de manera rigurosa, pueden producirse errores con implicaciones en el diagnóstico, en las decisiones que deben tomarse con respecto a la terapia individual, en los registros epidemiológicos e, incluso, en el diseño y puesta en marcha de políticas públicas. Además, el uso de herramientas no equivalentes

65

35



al cuestionario original puede producir resultados no fiables o confusos que podrían limitar el intercambio de información entre la comunidad científica.13,14,22-24 Esta propuesta para la TACV de cuestionarios de salud guarda coherencia con las recomendaciones de expertos como Alexandre,13 Beaton,22 Carvajal,23 Guillemin12 y Herdman24 para la realización de traducciones y adaptaciones culturales. El proceso de traducción y adaptación debe ir seguido de un proceso de validación en la lengua de destino, lo cual permite minimizar el sesgo de información que podría asociarse a la administración de cuestionarios en países con idiomas y culturas diferentes. Por ello, se complementa el proceso proponiendo una serie de pasos a seguir durante la etapa de validación, coherentes con las recomendaciones de expertos como Aday,19 Mokkink,31-33 Müller37 y Keszey.38 Declaración de conflicto de intereses: Los autores declararon no tener conflicto de intereses.

Referencias 1. Goldberg D, Bridges K, Duncan-Jones P, Grayson D. Detecting anxiety and depression in general medical settings. BMJ 1988; 297: 897-899. 2. Susitaival P, Flyvholm MA, Meding B, Kanerva L, Lindberg M, Svensson A, et al. Nordic Occupational Skin Questionnaire (NOSQ-2002): a new tool for surveying occupational skin diseases and exposure. Contact Dermatitis 2003;49:70-76. 3. Melosini L, Dente FL, Bacci E, Bartoli ML, Cianchetti S, Costa F, et al. Asthma control test (ACT): comparison with clinical, functional, and biological markers of asthma control. J Asthma 2012;49:317-323. 4. Connor JP, Grier M, Feeney GF, Young RM. The validity of the Brief Michigan Alcohol Screening Test (bMAST) as a problem drinking severity measure. J Stud Alcohol Drugs 2007;68:771-779. 5. Tuomi K, Ilmarinen J, Eskelinen L, Järvinen E, Toikkanen J, Klockars M. Prevalence and incidence rates of diseases and work ability in different work categories of municipal occupations. Scand J Work Environ Health 1991;17 (Suppl 1):67-74. 6. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses, and recommended measures. Spine 2000; 25:3152-160. 7. Scublinsky D, González C, Iannantuono R, Somma LF, Rillo O, Casado G et al. Adaptación al español y validación del cuestionario de detección epidemiológica para artritis reumatoidea. Rev Argent Reumatol 2008; 19:33-35. 8. Simonsson M, Bergman S, Jacobsson L, Petersson I, Svensson B. The prevalence of rheumatoid arthritis in Sweden. Scand J Rheumatol 1999;28:340-343. 9. Kuorinka I, Jonsson B, Kilbom A, Vinterberg H, Biering-Sørensen F, Andersson G, et al. Standardised Nordic questionnaires for the analysis of musculoskeletal symptoms. Appl Ergon 1987; 18: 233-237. 10. Tapia-Conyer R, Velázquez-Monroy O, Lara-Esqueda A, Tapia-Olarte F, Aurora-Jiménez R, Sánchez-Montes J, et al. Guía de detección integrada de obesidad, diabetes e hipertensión arterial. [monografía en Internet]. Ciudad de México, DF: Secretaría de Salud de México; [consultado 2012 septiembre 18]. Disponible en: www.salud.gob.mx/unidades/cdi/documentos/DOCSAL7482.pdf

65

35


11. Hutchinson A, Bentzen N, Konig-Zahn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO, 1996. 12. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol 1995;24:61-63. 13. Alexandre NMC, Guirardello Ede B. Cultural adaptation of instruments utilized in occupational health. Rev Panam Salud Publica 2002;11:109-111. 14. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación de cuestionarios. Reumatol Clin 2009;5:171-177. 15. Kulis D, Arnott M, Greimel ER, Bottomley A, Koller M. Trends in translation requests and arising issues regarding cultural adaptation. Expert Rev Pharmacoecon Outcomes Res 2011;11:307-314. 16. Lobiondo-Wood G, Haber J. Reliability and validity. Nursing research: methods, critical appraisal, and utilization. 4a. ed. St. Louis: Mosby, 1998 17. Burns N, Grove SK. The practice of nursing research: conduct, critique and utilization. 3a. ed. Philadelphia: Saunders, 1997. 18. Ware JE Jr, Gandec B, Keller S, IQOLA Group. Evaluating instruments used cross-nationally: Methods from the IQOLA Project. En: SpilkerB, ed. Quality of life and pharmacoeconomics in clinical trials. 2a. ed. Philadelphia: Lippincort-Raven Publishers, 1996: 681-692. 19. Aday LA, Cornelius LJ. Designing and conducting health surveys: a comprehensive guide. 3a. ed. San Francisco, CA: Jossey-Bass publisher, 2006. 20. Argimon-Pallas JM, Jimenez-Villa J. Métodos de investigación clínica y epidemiológica. 3a. ed. Madrid: Elsevier España, 2004. 21. Serra C, Company A. Vigilancia de la salud. En: Ruiz-Frutos C, García AM, Delclòs J, Benavides FG. Salud laboral, conceptos y técnicas para la prevención de riesgos laborales. 3a. ed. Barcelona: Masson, 2007: 255-264. 22. Beaton DE, Bombardier C, Guillemin F, Bosi-Ferraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine 2000;25:3186-3191. 23. Carvajal A, Centeno C, Watson R, Martínez M, Rubiales AS. How is an instrument for measuring health to be validated?. An Sist Sanit Navar 2011;34:63-72. 24. Herdman M, Fox-Rushby J, Badia X. A model of equivalence in the cultural adaptation of HRQoL instruments: the universalist approach. Qual Life Res 1998;7:323-335. 25. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil 2004;27:261-268. 26. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil 2007;17:701-711.

66

36


27. de Soárez PC, Kowalski CC, Ferraz MB, Ciconelli RM. Translation into Brazilian Portuguese and validation of the Work Limitations Questionnaire. Rev Panam Salud Publica 2007;22:21-28. 28. Bullinger M, Aonso J, Apolone G, et al. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998;51:913-923. 29. Gandek B, Ware JE Jr, IQOLA Group. Methods for validation and norming translations of health status questionnaires: the IQOLA project approach. International quality of life assessment. J Clin Epidemiol 1998;51:953-959. 30. Lam CL, Gandek B, Ren XS, Chan MS. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139-1147. 31. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010;19:539-549. 32. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content. BMC Med Res Methodol 2010;10: 22. 33. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737-745. 34. Ren XS, Amik B III, Zhou L, Gandek B. Translation and psychometric evaluation of a Chinese version of the SF-36 Health Survey in the United States. J Clin Epidemiol 1998;51:1129-1138. 35. Scott-Lenox JA, Wu AW, Boyer JG, Ware JE Jr. Reliability and validity of French, German, Italian, Dutch, and UK English translations of the medical outcomes study HIV Health Survey. Med Care 1999;37:908-925. 36. Wiesinger GF, Nhur M, Quitann M, Ebenbichler G, Wölfl G, FialkaMoser V. Cross-cultural adaptation of the Roland-Morris questionnaire for German-speaking patients with low back pain. Spine 1999;24:1099-1103. 37. Müller R, Büttner P. A critical discussion of intraclass correlation coefficients. Stat Med 1994;13:2465-2476. 38. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J Psychosom Res 2010;68:319-323. 39. Cronbach, LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334.



11. Hutchinson A, Bentzen N, Konig-Zahn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO, 1996. 12. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol 1995;24:61-63. 13. Alexandre NMC, Guirardello Ede B. Cultural adaptation of instruments utilized in occupational health. Rev Panam Salud Publica 2002;11:109-111. 14. García de Yébenes MJ, Rodriguez-Salvanés F, Carmona-Ortells L. Validación de cuestionarios. Reumatol Clin 2009;5:171-177. 15. Kulis D, Arnott M, Greimel ER, Bottomley A, Koller M. Trends in translation requests and arising issues regarding cultural adaptation. Expert Rev Pharmacoecon Outcomes Res 2011;11:307-314. 16. Lobiondo-Wood G, Haber J. Reliability and validity. Nursing research: methods, critical appraisal, and utilization. 4a. ed. St. Louis: Mosby, 1998 17. Burns N, Grove SK. The practice of nursing research: conduct, critique and utilization. 3a. ed. Philadelphia: Saunders, 1997. 18. Ware JE Jr, Gandec B, Keller S, IQOLA Group. Evaluating instruments used cross-nationally: Methods from the IQOLA Project. En: SpilkerB, ed. Quality of life and pharmacoeconomics in clinical trials. 2a. ed. Philadelphia: Lippincort-Raven Publishers, 1996: 681-692. 19. Aday LA, Cornelius LJ. Designing and conducting health surveys: a comprehensive guide. 3a. ed. San Francisco, CA: Jossey-Bass publisher, 2006. 20. Argimon-Pallas JM, Jimenez-Villa J. Métodos de investigación clínica y epidemiológica. 3a. ed. Madrid: Elsevier España, 2004. 21. Serra C, Company A. Vigilancia de la salud. En: Ruiz-Frutos C, García AM, Delclòs J, Benavides FG. Salud laboral, conceptos y técnicas para la prevención de riesgos laborales. 3a. ed. Barcelona: Masson, 2007: 255-264. 22. Beaton DE, Bombardier C, Guillemin F, Bosi-Ferraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine 2000;25:3186-3191. 23. Carvajal A, Centeno C, Watson R, Martínez M, Rubiales AS. How is an instrument for measuring health to be validated?. An Sist Sanit Navar 2011;34:63-72. 24. Herdman M, Fox-Rushby J, Badia X. A model of equivalence in the cultural adaptation of HRQoL instruments: the universalist approach. Qual Life Res 1998;7:323-335. 25. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil 2004;27:261-268. 26. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil 2007;17:701-711.

66

36


27. de Soárez PC, Kowalski CC, Ferraz MB, Ciconelli RM. Translation into Brazilian Portuguese and validation of the Work Limitations Questionnaire. Rev Panam Salud Publica 2007;22:21-28. 28. Bullinger M, Aonso J, Apolone G, et al. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998;51:913-923. 29. Gandek B, Ware JE Jr, IQOLA Group. Methods for validation and norming translations of health status questionnaires: the IQOLA project approach. International quality of life assessment. J Clin Epidemiol 1998;51:953-959. 30. Lam CL, Gandek B, Ren XS, Chan MS. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139-1147. 31. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010;19:539-549. 32. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content. BMC Med Res Methodol 2010;10: 22. 33. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737-745. 34. Ren XS, Amik B III, Zhou L, Gandek B. Translation and psychometric evaluation of a Chinese version of the SF-36 Health Survey in the United States. J Clin Epidemiol 1998;51:1129-1138. 35. Scott-Lenox JA, Wu AW, Boyer JG, Ware JE Jr. Reliability and validity of French, German, Italian, Dutch, and UK English translations of the medical outcomes study HIV Health Survey. Med Care 1999;37:908-925. 36. Wiesinger GF, Nhur M, Quitann M, Ebenbichler G, Wölfl G, FialkaMoser V. Cross-cultural adaptation of the Roland-Morris questionnaire for German-speaking patients with low back pain. Spine 1999;24:1099-1103. 37. Müller R, Büttner P. A critical discussion of intraclass correlation coefficients. Stat Med 1994;13:2465-2476. 38. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J Psychosom Res 2010;68:319-323. 39. Cronbach, LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334.


3. PAPER # 2

3. PAPER # 2

Cross-cultural adaptation of the work role functioning

Cross-cultural adaptation of the work role functioning

questionnaire to Spanish spoken in Spain. Journal of

questionnaire to Spanish spoken in Spain. Journal of

Occupational Rehabilitation. 2013;23:566-75.

Occupational Rehabilitation. 2013;23:566-75.

37

37

184

184

J Occup Rehabil (2013) 23:566–575 DOI 10.1007/s10926-013-9420-6

J Occup Rehabil (2013) 23:566–575 DOI 10.1007/s10926-013-9420-6

Cross-Cultural Adaptation of the Work Role Functioning Questionnaire to Spanish Spoken in Spain

Cross-Cultural Adaptation of the Work Role Functioning Questionnaire to Spanish Spoken in Spain

Jose´ M. Ramada • Consol Serra • Benjamin C. Amick III • Juan R. Castanõ • George L. Delclos

Jose´ M. Ramada • Consol Serra • Benjamin C. Amick III • Juan R. Castanõ • George L. Delclos

Published online: 29 January 2013 Ó Springer Science+Business Media New York 2013

Published online: 29 January 2013 Ó Springer Science+Business Media New York 2013

Abstract Purpose The Work Role Functioning Questionnaire (WRFQ) is a tool developed in the United States to measure work disability and assess the perceived impact of health problems on worker ability to perform jobs. We translated and adapted the WRFQ to Spanish spoken in Spain and assessed preservation of its psychometric properties. Methods Cross-cultural adaptation of the WRFQ was performed following a systematic 5-step procedure: (1) direct translation, (2) synthesis, (3) back-translation, (4) consolidation by an expert committee and (5) pre-test. Psychometric properties were evaluated by administering the questionnaire to 40 patients with different cultural levels and health problems. Applicability,

J. M. Ramada C. Serra (&) G. L. Delclos Center for Research in Occupational Health (CiSAL), University Pompeu Fabra, PRBB Building, Dr. Aiguader, 88, 08003 Barcelona, Spain e-mail: [email protected] J. M. Ramada C. Serra Occupational Health Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. M. Ramada C. Serra G. L. Delclos CIBER of Epidemiology and Public Health (CIBERESP), Barcelona, Spain B. C. Amick III G. L. Delclos Southwest Center for Occupational and Environmental Health, School of Public Health, University of Texas, 6901 Bertner, Houston, TX 77030, USA J. R. Castanõ Psychiatry Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. R. Castanõ Neuropsychiatry and Addictions Institute (INAD), Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain

usability, readability and integrity of the WRFQ were assessed, together with its validity and reliability. Results Questionnaire translation, back translation and consolidation were carried out without relevant difficulties. Idiomatic issues requiring reformulation were found in the instructions, response options and in 2 items. Participants appreciated the applicability, usability, readability and integrity of the questionnaire. The results indicated good face and content validity. Internal consistency was satisfactory for all subscales (Cronbach’s alpha between 0.88 and 0.96), except for social demands (Cronbach’s alpha = 0.56). Test–retest reliability showed good stability, with intraclass correlation coefficients between 0.77 and 0.93 for all subscales. Construct validity was considered preserved based on the comparison of median scores for each patient group and subscale. Conclusions Our results indicate the cross-cultural adaptation of the WRFQ to Spanish was satisfactory and preserved its psychometric properties, except for the subscale of social demands, whose internal consistency should be interpreted with caution. Keywords Work outcome measure Work disability measurement Questionnaires Scales Health survey Cross-cultural comparison Validation studies

Introduction Work disability is a health problem with high prevalence and economic costs in industrialized societies [1, 2]. In Europe, the proportion of workers with a long term health problem or disability varies between 5.8 % in Romania and 32.2 % in Finland [3]. Increased life expectancy and prolongation of the retirement age are increasing the overall age of the workforce. With an older workforce, more workers are working with health problems [4–6].

123

Abstract Purpose The Work Role Functioning Questionnaire (WRFQ) is a tool developed in the United States to measure work disability and assess the perceived impact of health problems on worker ability to perform jobs. We translated and adapted the WRFQ to Spanish spoken in Spain and assessed preservation of its psychometric properties. Methods Cross-cultural adaptation of the WRFQ was performed following a systematic 5-step procedure: (1) direct translation, (2) synthesis, (3) back-translation, (4) consolidation by an expert committee and (5) pre-test. Psychometric properties were evaluated by administering the questionnaire to 40 patients with different cultural levels and health problems. Applicability,

J. M. Ramada C. Serra (&) G. L. Delclos Center for Research in Occupational Health (CiSAL), University Pompeu Fabra, PRBB Building, Dr. Aiguader, 88, 08003 Barcelona, Spain e-mail: [email protected] J. M. Ramada C. Serra Occupational Health Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. M. Ramada C. Serra G. L. Delclos CIBER of Epidemiology and Public Health (CIBERESP), Barcelona, Spain B. C. Amick III G. L. Delclos Southwest Center for Occupational and Environmental Health, School of Public Health, University of Texas, 6901 Bertner, Houston, TX 77030, USA J. R. Castanõ Psychiatry Service, Parc de Salut MAR, Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain J. R. Castanõ Neuropsychiatry and Addictions Institute (INAD), Hospital del Mar, Passeig Marı´tim, 25-29, 08003 Barcelona, Spain

usability, readability and integrity of the WRFQ were assessed, together with its validity and reliability. Results Questionnaire translation, back translation and consolidation were carried out without relevant difficulties. Idiomatic issues requiring reformulation were found in the instructions, response options and in 2 items. Participants appreciated the applicability, usability, readability and integrity of the questionnaire. The results indicated good face and content validity. Internal consistency was satisfactory for all subscales (Cronbach’s alpha between 0.88 and 0.96), except for social demands (Cronbach’s alpha = 0.56). Test–retest reliability showed good stability, with intraclass correlation coefficients between 0.77 and 0.93 for all subscales. Construct validity was considered preserved based on the comparison of median scores for each patient group and subscale. Conclusions Our results indicate the cross-cultural adaptation of the WRFQ to Spanish was satisfactory and preserved its psychometric properties, except for the subscale of social demands, whose internal consistency should be interpreted with caution. Keywords Work outcome measure Work disability measurement Questionnaires Scales Health survey Cross-cultural comparison Validation studies

Introduction Work disability is a health problem with high prevalence and economic costs in industrialized societies [1, 2]. In Europe, the proportion of workers with a long term health problem or disability varies between 5.8 % in Romania and 32.2 % in Finland [3]. Increased life expectancy and prolongation of the retirement age are increasing the overall age of the workforce. With an older workforce, more workers are working with health problems [4–6].

123 39

39

J Occup Rehabil (2013) 23:566–575

In occupational health, rehabilitation and/or accommodation programs to adapt work conditions to worker skills and health are being increasingly used to support an active work life and better quality of life [6, 7]. The effectiveness of rehabilitation and work accommodation programs needs to be assessed using outcomes such as work status (active, temporary disability, permanent disability), time to return to work, duration of functional disability and costs of inability to work [7–9]. However, these outcomes can be useful but are limited, as they mainly assess whether workers are present or absent from their jobs [10]. They do not offer information about the worker’s participation in the job or the degree to which he or she is able to respond to the job’s demands [10, 11]. To fully assess effectiveness of intervention, outcome measures are required that describe the extent to which people increase their ability to meet the demands of the job. In the 1990s a series of work-role specific functioning questionnaires were developed; among these, the Work Limitations Questionnaire (WLQ), the Work Limitations-26 (WL-26) and the Work Role Functioning Questionnaire (WRFQ) [10, 12]. The WRFQ measures perceived disability in terms of work limitation to perform the job due to health problems. Work limitation is defined as the level of difficulty encountered by the worker to carry out the demands of his/her job. Numerous studies have demonstrated the usefulness of these tools in English language-speaking health care environments [13–15], but no versions have been adapted for Spanish-speaking health care environments. Due to possible cultural differences in perception of work, health and disease, these instruments should be systematically translated, adapted and validated for use in other cultures. Since its creation and validation, the WRFQ has been adapted to Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. The objectives of this study were to translate and adapt the WRFQ to Spanish spoken in Spain and evaluate its psychometric properties.

Methods The WRFQ is a self-administered questionnaire containing 27 items grouped into 5 subscales: work scheduling demands, output demands, physical demands, mental demands and social demands. The first two columns of Table 1 show all items and subscales of the original English version. The recall period is 4 weeks and each subscale is measured by the percentage of time in a working day the employee has difficulty performing those demands. Response options vary on a five-point scale: 0 = all of the time (100 %), 1 = most of the time, 2 = half of the time (50 %), 3 = some of the time, 4 = none of the time (0 %) and 5 = does not apply to my job. Option 5 enables

567

employees to answer even though a particular demand is not part of their work. For each subscale, item scores were summed up, divided by the number of items included in the subscale, and then multiplied by 25 to obtain percentages for each subscale, ranging from 0 % (difficulty all the time) to 100 % (no difficulty at any time). The same process was repeated for the global scale. The answers ‘‘does not apply to my job’’ were transformed to missing values. Scales containing subscales with more than 20 % missing values or ‘‘does not apply to my job’’ were excluded from the analysis [19].

Translation and Cross-Cultural Adaptation of the WRFQ Translation was carried out following a systematic and standardized procedure consisting of five steps: (1) direct translation, (2) synthesis of translations, (3) back-translation, (4) consolidation of translations by a committee of experts and (5) pre-test [20–24]. To complete the direct translation, three bilingual translators whose native language was Spanish spoken in Spain were selected. The first one was aware of the objectives and concepts of the WRFQ. The second one did not know them but had previous experience in technical translation of medical texts. The last translator had no previous knowledge of medicine or rehabilitation and did not know the study objectives. They worked independently and were provided with common instructions to ensure a uniform translation of the entire questionnaire. This was followed by a synthesis of translations, comparing versions and identifying discrepancies that were discussed to reach consensus between translators and researchers. The back-translation into English was done by two bilingual translators whose native language was English spoken in the USA. They had no knowledge of medicine or rehabilitation and were unaware of the study objectives. They worked independently and were blind to the original version of the questionnaire to minimize information bias. A multidisciplinary expert committee of bilingual professionals, consisting of an occupational health technician, an occupational physician, an occupational nurse, two linguists and a methodology expert, evaluated the process. Discrepancies between the two back-translations were identified, and, following methodological guidelines [20, 21], a consensus was reached on a pre-final version of the WRFQ adapted to Spanish spoken in Spain. Finally, a pre-test study was carried out to assess the equivalence of the questionnaire, its understandability and applicability in the Spanish context. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable.

J Occup Rehabil (2013) 23:566–575

In occupational health, rehabilitation and/or accommodation programs to adapt work conditions to worker skills and health are being increasingly used to support an active work life and better quality of life [6, 7]. The effectiveness of rehabilitation and work accommodation programs needs to be assessed using outcomes such as work status (active, temporary disability, permanent disability), time to return to work, duration of functional disability and costs of inability to work [7–9]. However, these outcomes can be useful but are limited, as they mainly assess whether workers are present or absent from their jobs [10]. They do not offer information about the worker’s participation in the job or the degree to which he or she is able to respond to the job’s demands [10, 11]. To fully assess effectiveness of intervention, outcome measures are required that describe the extent to which people increase their ability to meet the demands of the job. In the 1990s a series of work-role specific functioning questionnaires were developed; among these, the Work Limitations Questionnaire (WLQ), the Work Limitations-26 (WL-26) and the Work Role Functioning Questionnaire (WRFQ) [10, 12]. The WRFQ measures perceived disability in terms of work limitation to perform the job due to health problems. Work limitation is defined as the level of difficulty encountered by the worker to carry out the demands of his/her job. Numerous studies have demonstrated the usefulness of these tools in English language-speaking health care environments [13–15], but no versions have been adapted for Spanish-speaking health care environments. Due to possible cultural differences in perception of work, health and disease, these instruments should be systematically translated, adapted and validated for use in other cultures. Since its creation and validation, the WRFQ has been adapted to Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. The objectives of this study were to translate and adapt the WRFQ to Spanish spoken in Spain and evaluate its psychometric properties.

Methods The WRFQ is a self-administered questionnaire containing 27 items grouped into 5 subscales: work scheduling demands, output demands, physical demands, mental demands and social demands. The first two columns of Table 1 show all items and subscales of the original English version. The recall period is 4 weeks and each subscale is measured by the percentage of time in a working day the employee has difficulty performing those demands. Response options vary on a five-point scale: 0 = all of the time (100 %), 1 = most of the time, 2 = half of the time (50 %), 3 = some of the time, 4 = none of the time (0 %) and 5 = does not apply to my job. Option 5 enables

123 40

567

employees to answer even though a particular demand is not part of their work. For each subscale, item scores were summed up, divided by the number of items included in the subscale, and then multiplied by 25 to obtain percentages for each subscale, ranging from 0 % (difficulty all the time) to 100 % (no difficulty at any time). The same process was repeated for the global scale. The answers ‘‘does not apply to my job’’ were transformed to missing values. Scales containing subscales with more than 20 % missing values or ‘‘does not apply to my job’’ were excluded from the analysis [19].

Translation and Cross-Cultural Adaptation of the WRFQ Translation was carried out following a systematic and standardized procedure consisting of five steps: (1) direct translation, (2) synthesis of translations, (3) back-translation, (4) consolidation of translations by a committee of experts and (5) pre-test [20–24]. To complete the direct translation, three bilingual translators whose native language was Spanish spoken in Spain were selected. The first one was aware of the objectives and concepts of the WRFQ. The second one did not know them but had previous experience in technical translation of medical texts. The last translator had no previous knowledge of medicine or rehabilitation and did not know the study objectives. They worked independently and were provided with common instructions to ensure a uniform translation of the entire questionnaire. This was followed by a synthesis of translations, comparing versions and identifying discrepancies that were discussed to reach consensus between translators and researchers. The back-translation into English was done by two bilingual translators whose native language was English spoken in the USA. They had no knowledge of medicine or rehabilitation and were unaware of the study objectives. They worked independently and were blind to the original version of the questionnaire to minimize information bias. A multidisciplinary expert committee of bilingual professionals, consisting of an occupational health technician, an occupational physician, an occupational nurse, two linguists and a methodology expert, evaluated the process. Discrepancies between the two back-translations were identified, and, following methodological guidelines [20, 21], a consensus was reached on a pre-final version of the WRFQ adapted to Spanish spoken in Spain. Finally, a pre-test study was carried out to assess the equivalence of the questionnaire, its understandability and applicability in the Spanish context. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable.

123 40

WSD OD

OD OD

FD FD

Do your work without stopping to take extra breaks or restsa Stick to a routine or schedulea Handle the work loada Work fast enough Finish work on time Do your work without making mistakes Satisfy the people who judge your worka Feel a sense of accomplishment in your worka Feel you have done what you are capable of doing Walk or move around different work locations (for example, going to meetings)a, Lift, carry, or move objects at work weighing more than 10 pounds Sit, stand, or stay in one position for longer than 15 min while working Repeat the same motions over and over again while working Bend, twist, or reach while workinga Use hand-held tools or equipment (for example, a phone, pen, keyboard, computer mouse, drill, hairdryer or sander)b Keep your mind on your work Think clearly when working Do work carefully Concentrate on your work Work without losing your train of thoughta

4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

19. 20. 21. 22. 23.

b

WSD

Start on your job as soon as you arrive at worka

3.

4 (10.0)

1 (2.5)

2 (5.0)

1 (2.5)

2 (5.0)

6 (15.0)

8 (20.0)

8 (20.0)

5 (12.5)

10 (25.0)

1 (2.5)

5 (12.5)

2 (5.0)

1 (2.5)

1 (2.5)

3 (7.5)

10 (25.0)

3 (7.5)

4 (10.0)

5 (12.5)

5 (12.5)

4 (10.0)

3 (7.5)

0 (100 %)

2 (5.0)

4 (10.0)

5 (12.5)

4 (10.0)

5 (12.5)

5 (12.5)

5 (12.5)

6 (15.0)

6 (15.0)

5 (12.5)

9 (22.5)

4 (10.0)

5 (12.5)

4 (10.0)

5 (12.5)

9 (22.5)

5 (12.5)

9 (22.5)

5 (12.5)

8 (20.0)

4 (10.0)

4 (10.0)

9 (22.5)

1

Responses n (%)

3 (7.5)

7 (17.5)

4 (10.0)

7 (17.5)

5 (12.5)

3 (7.5)

7 (17.5)

6 (15.0)

5 (12.5)

1 (2.5)

2 (5.0)

6 (15.0)

8 (20.0)

4 (10.0)

3 (7.5)

5 (12.5)

5 (12.5)

5 (12.5)

1 (2.5)

2 (5.0)

3 (7.5)

4 (10.0)

1 (2.5)

2 (50 %)

18 (45.0)

13 (32.5)

11 (27.5)

11 (27.5)

15 (37.5)

8 (20.0)

7 (17.5)

7 (17.5)

12 (30.0)

6 (15.0)

7 (17.5)

11 (27.5)

10 (25.0)

10 (25.0)

17 (42.5)

9 (22.5)

14 (35.0)

13 (32.5)

6 (15.0)

13 (32.5)

10 (25.0)

12 (30.0)

19 (47.5)

3

123 41 Stick to a routine or schedulea 5.

b

WSD

Do your work without stopping to take extra breaks or restsa

4.

OD OD

FD FD

Work fast enough Finish work on time Do your work without making mistakes Satisfy the people who judge your worka Feel a sense of accomplishment in your worka Feel you have done what you are capable of doing Walk or move around different work locations (for example, going to meetings)a, Lift, carry, or move objects at work weighing more than 10 pounds Sit, stand, or stay in one position for longer than 15 min while working Repeat the same motions over and over again while working Bend, twist, or reach while workinga Use hand-held tools or equipment (for example, a phone, pen, keyboard, computer mouse, drill, hairdryer or sander)b Keep your mind on your work Think clearly when working Do work carefully Concentrate on your work Work without losing your train of thoughta

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

19. 20. 21. 22. 23.

MD

MD

MD

MD

MD

4 (10.0)

1 (2.5)

2 (5.0)

1 (2.5)

2 (5.0)

6 (15.0)

8 (20.0)

8 (20.0)

5 (12.5)

10 (25.0)

1 (2.5)

5 (12.5)

2 (5.0)

1 (2.5)

1 (2.5)

3 (7.5)

10 (25.0)

3 (7.5)

4 (10.0)

5 (12.5)

5 (12.5)

4 (10.0)

3 (7.5)

0 (100 %)

2 (5.0)

4 (10.0)

5 (12.5)

4 (10.0)

5 (12.5)

5 (12.5)

5 (12.5)

6 (15.0)

6 (15.0)

5 (12.5)

9 (22.5)

4 (10.0)

5 (12.5)

4 (10.0)

5 (12.5)

9 (22.5)

5 (12.5)

9 (22.5)

5 (12.5)

8 (20.0)

4 (10.0)

4 (10.0)

9 (22.5)

1

Responses n (%)

3 (7.5)

7 (17.5)

4 (10.0)

7 (17.5)

5 (12.5)

3 (7.5)

7 (17.5)

6 (15.0)

5 (12.5)

1 (2.5)

2 (5.0)

6 (15.0)

8 (20.0)

4 (10.0)

3 (7.5)

5 (12.5)

5 (12.5)

5 (12.5)

1 (2.5)

2 (5.0)

3 (7.5)

4 (10.0)

1 (2.5)

2 (50 %)

18 (45.0)

13 (32.5)

11 (27.5)

11 (27.5)

15 (37.5)

8 (20.0)

7 (17.5)

7 (17.5)

12 (30.0)

6 (15.0)

7 (17.5)

11 (27.5)

10 (25.0)

10 (25.0)

17 (42.5)

9 (22.5)

14 (35.0)

13 (32.5)

6 (15.0)

13 (32.5)

10 (25.0)

12 (30.0)

19 (47.5)

3

12 (30.0)

15 (37.5)

18 (45.0)

17 (42.5)

13 (32.5)

16 (40.0)

12 (30.0)

9 (22.5)

11 (27.5)

9 (22.5)

16 (40.0)

14 (35.0)

15 (37.5)

14 (35.0)

13 (32.5)

13 (32.5)

5 (12.5)

10 (25.0)

23 (57.5)

2 (5.0)

17 (42.5)

16 (40.0)

8 (20.0)

4 (0 %)

0/1

0/0

0/0

0/0

0/0

0/2

0/1

0/4

0/1

2/7

0/5

0/0

0/0

2/5

0/1

0/1

0/1

0/0

0/1

6/4

0/1

0/0

0/0

n missing/does not apply to my job’

0/1

0/0

0/0

0/0

0/0

0/2

0/1

0/4

0/1

2/7

0/5

0/0

0/0

2/5

0/1

0/1

0/1

0/0

0/1

6/4

0/1

0/0

0/0

2.8

2.9

3.0

3.0

2.8

2.5

2.2

1.9

2.4

1.5

2.5

2.6

2.8

2.5

2.9

2.5

1.9

2.5

2.9

2.3

2.7

2.8

2.5

Mean scale 0–4

2.8

2.9

3.0

3.0

2.8

2.5

2.2

1.9

2.4

1.5

2.5

2.6

2.8

2.5

2.9

2.5

1.9

2.5

2.9

2.3

2.7

2.8

2.5

Mean scale 0–4

0.89

0.96

0.89

0.93

0.92

0.81

0.93

0.95

0.93

0.89

0.82

0.73

0.79

0.78

0.66

0.90

0.84

0.80

0.76

0.65

0.89

0.88

0.90

Correlations item-subscale

0.89

0.96

0.89

0.93

0.92

0.81

0.93

0.95

0.93

0.89

0.82

0.73

0.79

0.78

0.66

0.90

0.84

0.80

0.76

0.65

0.89

0.88

0.90


0.46

0.69

0.87

0.70

0.79

0.74

0.82

0.82

0.83

0.71

0.74

0.62

0.67

0.77

0.65

0.81

0.88

0.82

0.69

0.62

0.87

0.83

0.88

Correlations item-total

0.46

0.69

0.87

0.70

0.79

0.74

0.82

0.82

0.83

0.71

0.74

0.62

0.67

0.77

0.65

0.81

0.88

0.82

0.69

0.62

0.87

0.83

0.88


568

FD

FD

FD

FD

OD

OD

OD

OD

Handle the work load 7.

OD

WSD

6.

a

WSD

Start on your job as soon as you arrive at worka

3.

WSD

Get going easily at the beginning of the work daya

2.

WSD

Work the required number of hours

1.

Sub-scale

12 (30.0)

15 (37.5)

18 (45.0)

17 (42.5)

13 (32.5)

16 (40.0)

12 (30.0)

9 (22.5)

11 (27.5)

9 (22.5)

16 (40.0)

14 (35.0)

15 (37.5)

14 (35.0)

13 (32.5)

13 (32.5)

5 (12.5)

10 (25.0)

23 (57.5)

2 (5.0)

17 (42.5)

16 (40.0)

8 (20.0)

4 (0 %)


J Occup Rehabil (2013) 23:566–575

Items (original version)

Table 1 Responses for item-level of the Spanish version of the Work Role Functioning Questionnaire (WRFQ)

MD

MD

MD

MD

MD

FD

FD

FD

FD

OD

OD

OD

OD

WSD

WSD

Get going easily at the beginning of the work daya

WSD

Work the required number of hours

2.

Sub-scale

1.

Items (original version)

Table 1 Responses for item-level of the Spanish version of the Work Role Functioning Questionnaire (WRFQ)

568 J Occup Rehabil (2013) 23:566–575

123 41

123 42

0.57

Forty volunteer patients of both sexes, with a physical (musculoskeletal) and/or a mental (anxiety-depression) health problem with a minimum duration of 1 month were recruited among outpatients at the orthopedics, rehabilitation and psychiatry clinics of a large public hospital in Barcelona. Patients were between 18 and 65 years old and had different cultural levels. All spoke Spanish as their first language, were able to read and understand what they were reading and were working at least 10 h per week in the last 4 weeks.

Items modified after pre-test

Items with several alternatives or with difficulties in the translation process

Procedure

b

WSD work scheduling demands; OD output demands; FD physical demands; MD mental demands; SD social demands

Participants were requested to fill out the Spanish version of the WRFQ on paper, and underline or mark any difficulty on the questionnaire. In addition, they described difficult to understand questions during a 15 min structured interview that was recorded.

a

2.8 0/2 19 (47.5) 9 (22.5) Help other people to get work done 27.

Materials

Assessment of internal consistency (Cronbach’s alpha) for each item-subscale and item-total scale (n = 40). April–May 2012

0.78

Sample

0.70

3.0 0/0 16 (40.0) 14 (35.0) 6 (15.0)

2 (5.0) 6 (15.0)

3 (7.5) 1 (2.5)

2 (5.0) SD

SD Control your temper around people when working 26.

a

0 (0.0)

2 (5.0) 4 (10.0)

4 (10.0) 2 (5.0)

1 (2.5) SD Speak with people in person, in meetings or on the phone 25.

MD Easily read or use your eyes when working 24.

1 0 (100 %)

0.70

0.75

0.61 0.84 3.0 0/3 22 (55.0)

0.87 3.2 0/1 24 (60.0)

4 (0 %) 2 (50 %) Sub-scale Items (original version)

During the interview each participant was systematically asked about the understandability of the instructions, of each response option and the 27 items. All comments related to difficulties on any of these questions were recorded and later reviewed by the expert committee. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable. Revisions were made to a specific questionnaire item when 15 % or more of participants described difficulties with that item [19]. The internal consistency of the total scale and each subscale was evaluated using Cronbach’s alpha, with appropriate values C0.70 [25, 26]. Correlations between the subscales, subscale-total, item-subscale and item-total were evaluated, with appropriate values C0.46 [27]. The repeatability or stability of the instrument was assessed through test–retest reliability. The WRFQ was administered to the same group of 40 workers at two different time points, test and retest. The retest was conducted after a period ranging from 7 to 15 days. This period was considered sufficient to avoid the memory of responses and prevent variations on the observed phenomenon that could affect repeatability. The intraclass correlation coefficient (ICC) was calculated to assess the test–retest reliability. The stability or repeatability of a subscale or total scale was considered good when the ICC was above 0.70 and very good when it was above 0.90 [26–28]. Face validity is the extent to which a questionnaire, in the opinion of the experts and users, is a logical measure of what

Responses n (%)

3

Procedure

8 (20.0)

Correlations item-subscale Mean scale 0–4

Participants were requested to fill out the Spanish version of the WRFQ on paper, and underline or mark any difficulty on the questionnaire. In addition, they described difficult to understand questions during a 15 min structured interview that was recorded.

Table 1 continued

Items with several alternatives or with difficulties in the translation process

Items modified after pre-test b

WSD work scheduling demands; OD output demands; FD physical demands; MD mental demands; SD social demands

a

Assessment of internal consistency (Cronbach’s alpha) for each item-subscale and item-total scale (n = 40). April–May 2012

2.8 0/2 19 (47.5) 9 (22.5) 2 (5.0) 6 (15.0) 2 (5.0) SD Help other people to get work done 27.

Materials


0.57

Forty volunteer patients of both sexes, with a physical (musculoskeletal) and/or a mental (anxiety-depression) health problem with a minimum duration of 1 month were recruited among outpatients at the orthopedics, rehabilitation and psychiatry clinics of a large public hospital in Barcelona. Patients were between 18 and 65 years old and had different cultural levels. All spoke Spanish as their first language, were able to read and understand what they were reading and were working at least 10 h per week in the last 4 weeks.

569

Evaluation of the Pre-Final Questionnaire Psychometric Properties

9 (22.5)


J Occup Rehabil (2013) 23:566–575

Sample

0.70

0.61

0.78 3.0 0/0 SD Control your temper around people when workinga 26.

1 (2.5)

3 (7.5)

6 (15.0)

16 (40.0)

0.70

0.75

14 (35.0)

0.84 3.0 0/3 22 (55.0) 8 (20.0)

0.87 3.2 0/1 24 (60.0) 9 (22.5)

2 (5.0)

0 (0.0) 4 (10.0)

4 (10.0) 1 (2.5)

2 (5.0)

SD Speak with people in person, in meetings or on the phone 25.

MD Easily read or use your eyes when working 24.

2 (50 %) 1 0 (100 %)

Responses n (%) Sub-scale Items (original version)

Table 1 continued

569

Evaluation of the Pre-Final Questionnaire Psychometric Properties

3

4 (0 %)

Mean scale 0–4 n missing/does not apply to my job’



J Occup Rehabil (2013) 23:566–575

During the interview each participant was systematically asked about the understandability of the instructions, of each response option and the 27 items. All comments related to difficulties on any of these questions were recorded and later reviewed by the expert committee. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable. Revisions were made to a specific questionnaire item when 15 % or more of participants described difficulties with that item [19]. The internal consistency of the total scale and each subscale was evaluated using Cronbach’s alpha, with appropriate values C0.70 [25, 26]. Correlations between the subscales, subscale-total, item-subscale and item-total were evaluated, with appropriate values C0.46 [27]. The repeatability or stability of the instrument was assessed through test–retest reliability. The WRFQ was administered to the same group of 40 workers at two different time points, test and retest. The retest was conducted after a period ranging from 7 to 15 days. This period was considered sufficient to avoid the memory of responses and prevent variations on the observed phenomenon that could affect repeatability. The intraclass correlation coefficient (ICC) was calculated to assess the test–retest reliability. The stability or repeatability of a subscale or total scale was considered good when the ICC was above 0.70 and very good when it was above 0.90 [26–28]. Face validity is the extent to which a questionnaire, in the opinion of the experts and users, is a logical measure of what

123 42

570

it intends to measure. It is usually evaluated empirically trough comments from participating experts and users. In our study, this was assessed by the expert committee, analyzing the comments made by participants during the structured interviews. Content validity measures whether the tool is able to measure most of the construct dimensions. It was also evaluated using an empirical approach, based on judgments from the tool’s original authors (BA), as well as arguments made by the expert committee and by conducting a qualitative analysis of the comments made by the participants during the pre-test. We also explored the floor and ceiling effects which occur when a percentage of responses to certain questions cluster at the top or the bottom of the scale. Their presence indicates a lack of discriminative ability of the question and the absence of the questionnaire’s ability to differentiate between high and low scores. Content validity is good when floor and ceiling effects do not exceed 15 % [28]. Averages, ranges and medians of the scores were determined to further describe the distribution of the responses. Finally, construct validity was assessed using validity analysis techniques for known groups, comparing the results of the subscales in the patient groups with physical and mental illnesses. It was hypothesized that patients with only mental illness would score lower (meaning more disability) for the subscales of psychological and social demands, and patients with only physical illness would obtain lower scores for the subscales of work scheduling, output and physical demands. Patients with both types of illness (n = 6) were excluded of this comparative analysis. Since the distribution of subscale scores in both groups of patients did not follow a normal distribution, the hypothesis was evaluated by comparing the medians of each subscale in both groups of patients. The statistical significance was assessed using the U Mann–Whitney non parametric test. The protocol of this study was approved by the Ethics Committee of Parc de Salut Mar and it respects all the principles of the Declaration of Helsinki and the Spanish legal regulations on protection of personal data.

J Occup Rehabil (2013) 23:566–575

consideration by the committee of experts to reach a consensus to ensure semantic and idiomatic equivalence of both versions. In item 14 the units of measure were converted from pounds to kilograms. When the back-translation was compared with the original version, some discrepancies were found in the language equivalence of certain words contained in the instructions and various items. Items 2 (get going easily), 5 (stick to a routine), 11 (sense of accomplishment), 16 (repeat some motions), 17 (bend, twist or reach while working), 23 (train of thought), 25 (speak with people in person), 26 (control your temper), and 27 (to get work done) had several translation alternatives and required reconsideration by the committee of experts (table 1). Lastly, a pre-final questionnaire was consolidated in Spanish spoken in Spain, which guaranteed the semantic, idiomatic, conceptual and experiential equivalence with the original questionnaire, reaching consensus to partially reformulate the last paragraph of the instructions and wording of items 2, 11, 23, 25, 26 and 27. It was not necessary to modify or reshape the rest of the instructions, response options and other items. The pre-final questionnaire was administered to 40 patients. Table 2 describes their socio-demographic characteristics. Comments were analyzed by the committee of experts. Most participants found no difficulty understanding the items. Nine participants (22.5 %) reported the last paragraph of the instructions was ambiguous, so it was amended, emphasizing that the questions related to ‘‘working time’’. Table 2 Participants’ socio-demographic characteristics

Age in years, mean (SD)

Men n = 15 (37.5 %)

Women n = 25 (62.5 %)

49.1 (10.0)

47.9 (8.9)

49.8 (10.7)

Education level, n (%)

Low

13 (32.5)

7 (46.7)

6 (24.0)

Middle High

15 (37.5) 12 (30.0)

6 (40.0) 2 (13.3)

9 (36.0) 10 (40.0)

Job type, n (%)

Manual

17 (42.5)

6 (40.0)

11 (44.0)

Nonmanual

11 (27.5)

5 (33.3)

6 (24.0)

Mixed

Results The direct translation was carried out without difficulty. However, several challenges were found related to the idiomatic usage of words used in items 2 (get going easily), 11 (sense of accomplishment), 23 (train of thought) and 26 (control your temper), which were discussed and agreed with the translators. On the other hand, items 3–6 (start on your job, extra breaks or rests, stick to a routine, workload), 10 (people who judge), 13 (move around different locations) and 17 (bend) had several translation alternatives and required

Total n = 40

Working hours/ week, mean (SD) Disease type, n(%)

4 (26.7) 46.1 (9.6)

8 (32.0)

17 (42.5)

6 (40.0)

11 (44.0)

Mental

17 (42.5)

8 (53.3)

9 (36.0)

6 (15.0)

1 (6.7)

5 (20.0)

34.7 (51.1)

23.1 (22.4)

it intends to measure. It is usually evaluated empirically trough comments from participating experts and users. In our study, this was assessed by the expert committee, analyzing the comments made by participants during the structured interviews. Content validity measures whether the tool is able to measure most of the construct dimensions. It was also evaluated using an empirical approach, based on judgments from the tool’s original authors (BA), as well as arguments made by the expert committee and by conducting a qualitative analysis of the comments made by the participants during the pre-test. We also explored the floor and ceiling effects which occur when a percentage of responses to certain questions cluster at the top or the bottom of the scale. Their presence indicates a lack of discriminative ability of the question and the absence of the questionnaire’s ability to differentiate between high and low scores. Content validity is good when floor and ceiling effects do not exceed 15 % [28]. Averages, ranges and medians of the scores were determined to further describe the distribution of the responses. Finally, construct validity was assessed using validity analysis techniques for known groups, comparing the results of the subscales in the patient groups with physical and mental illnesses. It was hypothesized that patients with only mental illness would score lower (meaning more disability) for the subscales of psychological and social demands, and patients with only physical illness would obtain lower scores for the subscales of work scheduling, output and physical demands. Patients with both types of illness (n = 6) were excluded of this comparative analysis. Since the distribution of subscale scores in both groups of patients did not follow a normal distribution, the hypothesis was evaluated by comparing the medians of each subscale in both groups of patients. The statistical significance was assessed using the U Mann–Whitney non parametric test. The protocol of this study was approved by the Ethics Committee of Parc de Salut Mar and it respects all the principles of the Declaration of Helsinki and the Spanish legal regulations on protection of personal data.

41.6 (61.8)

Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012

123

J Occup Rehabil (2013) 23:566–575

consideration by the committee of experts to reach a consensus to ensure semantic and idiomatic equivalence of both versions. In item 14 the units of measure were converted from pounds to kilograms. When the back-translation was compared with the original version, some discrepancies were found in the language equivalence of certain words contained in the instructions and various items. Items 2 (get going easily), 5 (stick to a routine), 11 (sense of accomplishment), 16 (repeat some motions), 17 (bend, twist or reach while working), 23 (train of thought), 25 (speak with people in person), 26 (control your temper), and 27 (to get work done) had several translation alternatives and required reconsideration by the committee of experts (table 1). Lastly, a pre-final questionnaire was consolidated in Spanish spoken in Spain, which guaranteed the semantic, idiomatic, conceptual and experiential equivalence with the original questionnaire, reaching consensus to partially reformulate the last paragraph of the instructions and wording of items 2, 11, 23, 25, 26 and 27. It was not necessary to modify or reshape the rest of the instructions, response options and other items. The pre-final questionnaire was administered to 40 patients. Table 2 describes their socio-demographic characteristics. Comments were analyzed by the committee of experts. Most participants found no difficulty understanding the items. Nine participants (22.5 %) reported the last paragraph of the instructions was ambiguous, so it was amended, emphasizing that the questions related to ‘‘working time’’. Table 2 Participants’ socio-demographic characteristics


The direct translation was carried out without difficulty. However, several challenges were found related to the idiomatic usage of words used in items 2 (get going easily), 11 (sense of accomplishment), 23 (train of thought) and 26 (control your temper), which were discussed and agreed with the translators. On the other hand, items 3–6 (start on your job, extra breaks or rests, stick to a routine, workload), 10 (people who judge), 13 (move around different locations) and 17 (bend) had several translation alternatives and required

Total n = 40

Men n = 15 (37.5 %)

Women n = 25 (62.5 %)

49.1 (10.0)

47.9 (8.9)

49.8 (10.7)


Low

13 (32.5)

7 (46.7)

6 (24.0)

Middle High

15 (37.5) 12 (30.0)

6 (40.0) 2 (13.3)

9 (36.0) 10 (40.0)

Job type, n (%)

Manual

17 (42.5)

6 (40.0)

11 (44.0)

Nonmanual

11 (27.5)

5 (33.3)

6 (24.0)

Mixed

Results

36.7 (9.8)

Physical Both

Disease duration in months, mean (SD)

12 (30.0) 40.2 (10.7)

570

Working hours/ week, mean (SD) Disease type, n(%)

4 (26.7) 46.1 (9.6)

8 (32.0) 36.7 (9.8)

Physical

17 (42.5)

6 (40.0)

11 (44.0)

Mental

17 (42.5)

8 (53.3)

9 (36.0)

6 (15.0)

1 (6.7)

Both Disease duration in months, mean (SD)

12 (30.0) 40.2 (10.7)

34.7 (51.1)

23.1 (22.4)

5 (20.0) 41.6 (61.8)

Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012

123 43

43

J Occup Rehabil (2013) 23:566–575

571

Table 3 Pre-test results with the Spanish version of the Work Role Functioning Questionnaire (WRFQ) (n = 40) a

Valid n (missing/not applicable)*

Mean (SD)

Work scheduling demands

39 (1)

67.7 (27.8)

Output demands

39 (1)

Physical demands

36 (4)

Mental demands

Range

J Occup Rehabil (2013) 23:566–575

571

Table 3 Pre-test results with the Spanish version of the Work Role Functioning Questionnaire (WRFQ) (n = 40) Valid n (missing/not applicable)*

Meana (SD)


39 (1)

67.7 (27.8)

0.94

Output demands

39 (1)

0.88

Physical demands

36 (4)

0.96

0.81

Mental demands

5 (12.5)

0.56

0.83

0 (0.0)

0.97

–

Median

n at floor (0 %) n (%)

n at ceiling (100 %) n (%)

Cronbach’s alpha

Subscale-total correlations

5–100

75.0

0 (0.0)

3 (7.5)

0.88

0.95

64.4 (25.8)

14.3–100

67.9

0 (0.0)

1 (2.5)

0.90

59.0 (32.3)

4.17–100

62.5

0 (0.0)

5 (12.5)

0.95

40 (0)

73.9 (26.1)

0–100

79.2

1 (2.5)

9 (22.5)

Social demands

35 (5)

76.9 (21.1)

25–100

83.3

0 (0.0)

Total score

40 (0)

67.6 (22.7)

21.3–98.1

74.5

0 (0.0)

Range

Median

n at floor (0 %) n (%)

n at ceiling (100 %) n (%)

Cronbach’s alpha

Subscale-total correlations

5–100

75.0

0 (0.0)

3 (7.5)

0.88

0.95

64.4 (25.8)

14.3–100

67.9

0 (0.0)

1 (2.5)

0.90

0.94

59.0 (32.3)

4.17–100

62.5

0 (0.0)

5 (12.5)

0.95

0.88

40 (0)

73.9 (26.1)

0–100

79.2

1 (2.5)

9 (22.5)

0.96

0.81

Social demands

35 (5)

76.9 (21.1)

25–100

83.3

0 (0.0)

5 (12.5)

0.56

0.83

Total score

40 (0)

67.6 (22.7)

21.3–98.1

74.5

0 (0.0)

0 (0.0)

0.97

–

April–May, 2012

April–May, 2012

Subscales with more than 20 % of items scoring ‘‘does not apply to my job’’ or missing values were excluded

Subscales with more than 20 % of items scoring ‘‘does not apply to my job’’ or missing values were excluded

Each subscale is scored from 0 to 100. Higher scores indicate better work functioning: difficulties all the time 0/100; difficulties no of the time 100/100


Eight participants (20 %) found the expression ‘‘difficult’’ located at the top of the column where the items were located hard to interpret. After weighing various alternatives, a decision was made to incorporate this expression in each of the possible answers as follows: 0 = was difficult all the time (100 %), 1 = was difficult most of the time, 2 = was difficult half the time (50 %), 3 = was difficult part of the time, 4 = never was difficult (0 %). No participant expressed difficulty with the response option ‘‘does not apply to my job’’. Ten participants (25 %) had difficulties with item 13 and eight participants (20 %) with item 18. All answered ‘‘does not apply to my job’’ since the examples did not fit their job. The committee of experts decided to delete the examples from these items. Table 3 shows the average scores for each subscale; higher values indicate less disability at work. The social demands subscale scored the highest (76.9 SD = 21.1) and the physical demands the lowest (59.0 SD = 32.3). The items that most frequently obtained the answer ‘‘does not apply to my job’’ were item 14 (lift, carry, or move objects at work weighing more than 10 pounds) and item 13 (walk or move around different work locations, for example, going to meetings) and 10 (satisfy the people who judge your work). After judging the comments made by participants during the pre-test, and resolved by consensus, the committee of experts drafted the final version of WRFQ translated and adapted to Spanish spoken in Spain (‘‘Appendix’’ 1). Assessing the internal consistency, the Cronbach’s alpha was 0.97 for the total scale. All subscales obtained Cronbach’s alpha coefficients above 0.85, except for social demands which was 0.56. Correlations between the subscales, subscale-total, item-subscale and item-total were all C0.46 and considered appropriate [27]. Scale ceiling effects were lowest for output demands (2.5 %) and highest for mental demands (22.5 %), exceeding the 15 % criterion [28] (Table 3).

Eight participants (20 %) found the expression ‘‘difficult’’ located at the top of the column where the items were located hard to interpret. After weighing various alternatives, a decision was made to incorporate this expression in each of the possible answers as follows: 0 = was difficult all the time (100 %), 1 = was difficult most of the time, 2 = was difficult half the time (50 %), 3 = was difficult part of the time, 4 = never was difficult (0 %). No participant expressed difficulty with the response option ‘‘does not apply to my job’’. Ten participants (25 %) had difficulties with item 13 and eight participants (20 %) with item 18. All answered ‘‘does not apply to my job’’ since the examples did not fit their job. The committee of experts decided to delete the examples from these items. Table 3 shows the average scores for each subscale; higher values indicate less disability at work. The social demands subscale scored the highest (76.9 SD = 21.1) and the physical demands the lowest (59.0 SD = 32.3). The items that most frequently obtained the answer ‘‘does not apply to my job’’ were item 14 (lift, carry, or move objects at work weighing more than 10 pounds) and item 13 (walk or move around different work locations, for example, going to meetings) and 10 (satisfy the people who judge your work). After judging the comments made by participants during the pre-test, and resolved by consensus, the committee of experts drafted the final version of WRFQ translated and adapted to Spanish spoken in Spain (‘‘Appendix’’ 1). Assessing the internal consistency, the Cronbach’s alpha was 0.97 for the total scale. All subscales obtained Cronbach’s alpha coefficients above 0.85, except for social demands which was 0.56. Correlations between the subscales, subscale-total, item-subscale and item-total were all C0.46 and considered appropriate [27]. Scale ceiling effects were lowest for output demands (2.5 %) and highest for mental demands (22.5 %), exceeding the 15 % criterion [28] (Table 3).

Table 4 Test–retest reliability Subscales

Test-retest CCI

95 % CI*


0.92

(0.85–0.96)

Output demands

0.89

(0.78–0.94)

Physical demands

0.93

(0.84–0.97)

Mental demands

0.85

(0.72–0.92)

Social demands

0.77

(0.58–0.88)

Total scale

0.94

(0.83–0.98)

Intraclass correlation coefficients (ICC). Pre-test of the Spanish version of the Work Role Functioning Questionnaire (WRFQ), April–May 2012 * 95 % CI

Table 4 shows the results of the test–retest reliability; ICCs ranged between 0.77 and 0.93. The ICC for the total scale was 0.94. The expert committee estimated that the face validity of the questionnaire was adequate and the participants appreciated the applicability, usability and understandability of the questionnaire. These aspects were collected in the comments made during the interviews, concluding that the questionnaire measures work disability in a logical way. Content validity was considered adequate according to the criteria and judgment of the authors of the original version of WRFQ [16–18], the arguments made by the committee of experts during the process of cross-cultural adaptation and the qualitative analysis of participant comments. Construct validity was likewise reasonable. The median scores for the physical demands subscale were significantly lower (30 points) in participants with a physical (musculoskeletal) health problem and the median scores for the mental demands subscale were significantly lower (21 points) for patients with a mental (anxiety-depression) health problem (Table 5), although these differences were not statistically significant.

123 44

Table 4 Test–retest reliability Subscales

Test-retest CCI

95 % CI*


0.92

(0.85–0.96)

Output demands

0.89

(0.78–0.94)

Physical demands

0.93

(0.84–0.97)

Mental demands

0.85

(0.72–0.92)

Social demands

0.77

(0.58–0.88)

Total scale

0.94

(0.83–0.98)

Intraclass correlation coefficients (ICC). Pre-test of the Spanish version of the Work Role Functioning Questionnaire (WRFQ), April–May 2012 * 95 % CI

Table 4 shows the results of the test–retest reliability; ICCs ranged between 0.77 and 0.93. The ICC for the total scale was 0.94. The expert committee estimated that the face validity of the questionnaire was adequate and the participants appreciated the applicability, usability and understandability of the questionnaire. These aspects were collected in the comments made during the interviews, concluding that the questionnaire measures work disability in a logical way. Content validity was considered adequate according to the criteria and judgment of the authors of the original version of WRFQ [16–18], the arguments made by the committee of experts during the process of cross-cultural adaptation and the qualitative analysis of participant comments. Construct validity was likewise reasonable. The median scores for the physical demands subscale were significantly lower (30 points) in participants with a physical (musculoskeletal) health problem and the median scores for the mental demands subscale were significantly lower (21 points) for patients with a mental (anxiety-depression) health problem (Table 5), although these differences were not statistically significant.

123 44

572

J Occup Rehabil (2013) 23:566–575

Table 5 Subscale description by type of health problem (mental or physical) Mediana Mental health problem

Physical health problem

Test U of Mann–Whitney Asymptotic significance (bilateral)


85.0

65.0

0.478

Output demands

78.6

82.1

0.850

Physical demands

85.0

55.0

0.007

Mental demands

75.0

95.8

0.018

Social demands

83.3

87.5

0.917

Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012 a


Discussion This rigorous, stepwise procedure for translation and crosscultural adaptation of the WRFQ led to the development of a Spanish spoken in Spain version equivalent to the original English version. Minor changes were made to maximize questionnaire understandability. It was necessary to adjust the wording of the instructions, as happened when the questionnaire was adapted into Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. During the adaptation to Portuguese, a decision was made to incorporate the term ‘‘difficult’’ within each item. In the adaptation to Spanish this has been incorporated in each of the response options to facilitate understandability. Several items needed to be changed after the pre-test. There are similarities with the difficulties in items 2, 6 and 26 encountered by Durand et al. [16], Gallasch et al. [17] and Abma et al. [18]. Like them, examples were removed for items 13 and 18 because their interpretation could be misleading. The absence of ceiling and floor effects above 15 % (with the exception of 22.5 % for the ceiling effect of the mental demands subscale) indicates that the questionnaire items have acceptable discriminate ability to distinguish high and low scores, providing evidence of questionnaire content validity [28]. The highest frequency of the response option ‘‘does not apply to my job’’ was obtained for the items in the physical demands subscale, as in other cultural adaptations made of the WRFQ [16–18]. A likely cause is that these items describe movements specific to manual work and do not apply to nonmanual work, which accounted for 28 % of the sample. The highest ceiling effect for mental demands observed in our study is consistent with the results of Durand et al. [16], probably because musculoskeletal health problems have less impact on the ability of workers to handle the mental demands of work. The internal consistency of the Spanish version of the WRFQ was very good for all subscales except for social demands. This result is consistent with those obtained by

Durand at el [16] and Gallasch et al. [17]. All items, except 4, had higher correlations with their own subscale than with the total scale, confirming that the translation and cross-cultural adaptation did not alter the internal consistency of the questionnaire. However, we observed some variability in subject responses to the items of the social demands subscale (Cronbach’s alpha of 0.56) and thus, coinciding with Durand et al. [16], we believe that the internal consistency of this subscale should be interpreted with caution. The results of the test–retest reliability are very similar to those obtained by Gallasch et al. [17]. The stability or repeatability of the questionnaire can be considered good for the output, mental and social demands subscales and very good for the physical and work scheduling demands subscales [26–28]. The results show adequate construct validity of the WRFQ. On the one hand, the median scores obtained by participants, all of whom were patients with active health problems, for all subscales ranged between 62.5 and 83.3 %, indicating important difficulties in carrying out the demands of their jobs, which is not surprising. On the other hand, as expected, the comparisons of scores between the two groups of patients indicates lower scores on the subscales of scheduling and physical demands for those with only physical health problems and, conversely, lower scores on the subscales of mental and social demands for patients with only a mental health problem. One limitation of this study could be the sample size in the pre-test; however it is consistent with the previous literature. In conclusion, our results confirm that the process used for translation and cross-cultural adaptation of the WRFQ to Spanish spoken in Spain was carried out successfully and indicate the existence of a good preservation of its psychometric properties. Acknowledgments We want to thank Concepcioń Go´mez-Morań, Carlos Enric Delclo´s, M a Jose´ Romań and Cliff Grossman for their professional involvement in the direct and back translation of the WRFQ. Thanks to Julia` del Prado, Josefina Pi-Sunyer, Rocıó Villar for their kind participation with the translators in the Expert Committee, and Nuria Gonza´lez, Chelo Sancho and Carmen Sańchez for their collaboration in the distribution and collection of questionnaires, all of them staff of the Occupational Health Service, Parc de Salut MAR (OHS PSMAR), Barcelona. Also thanks to Joan Mirabent (OHS PSMAR), Marta Tejero and Gemma Pidemont for their patient and generous collaboration in the recruitment process of patients. Finally, thanks so much ´ ngeles to Ram Dulthummon, Jose´ Ramada, Borja Ramada and A Ramada for their generous collaboration creating and assessing the quality of the database. This project has been partially supported by a grant from the Fondo de Investigaciones Sanitarias (FIS: PI12/02556), Instituto de Salud Carlos III, Subdireccioń General de Evaluacioń y Fomento de la Investigacioń, Ministerio de Ciencia e Innovacioń. Conflict of interest of interest.

The authors declare that they have no conflict

Appendix 1: Work Role Functioning Questionnaire adapted to Spanish Spoken into Spain

123

572

J Occup Rehabil (2013) 23:566–575

Table 5 Subscale description by type of health problem (mental or physical) Mediana Mental health problem

Physical health problem

Test U of Mann–Whitney Asymptotic significance (bilateral)


85.0

65.0

0.478

Output demands

78.6

82.1

0.850

Physical demands

85.0

55.0

0.007

Mental demands

75.0

95.8

0.018

Social demands

83.3

87.5

0.917

Pre-test with the adapted version of the Work Role Functioning Questionnaire (WRFQ) to Spanish spoken in Spain (n = 40). April–May, 2012 a


Discussion This rigorous, stepwise procedure for translation and crosscultural adaptation of the WRFQ led to the development of a Spanish spoken in Spain version equivalent to the original English version. Minor changes were made to maximize questionnaire understandability. It was necessary to adjust the wording of the instructions, as happened when the questionnaire was adapted into Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. During the adaptation to Portuguese, a decision was made to incorporate the term ‘‘difficult’’ within each item. In the adaptation to Spanish this has been incorporated in each of the response options to facilitate understandability. Several items needed to be changed after the pre-test. There are similarities with the difficulties in items 2, 6 and 26 encountered by Durand et al. [16], Gallasch et al. [17] and Abma et al. [18]. Like them, examples were removed for items 13 and 18 because their interpretation could be misleading. The absence of ceiling and floor effects above 15 % (with the exception of 22.5 % for the ceiling effect of the mental demands subscale) indicates that the questionnaire items have acceptable discriminate ability to distinguish high and low scores, providing evidence of questionnaire content validity [28]. The highest frequency of the response option ‘‘does not apply to my job’’ was obtained for the items in the physical demands subscale, as in other cultural adaptations made of the WRFQ [16–18]. A likely cause is that these items describe movements specific to manual work and do not apply to nonmanual work, which accounted for 28 % of the sample. The highest ceiling effect for mental demands observed in our study is consistent with the results of Durand et al. [16], probably because musculoskeletal health problems have less impact on the ability of workers to handle the mental demands of work. The internal consistency of the Spanish version of the WRFQ was very good for all subscales except for social demands. This result is consistent with those obtained by

Durand at el [16] and Gallasch et al. [17]. All items, except 4, had higher correlations with their own subscale than with the total scale, confirming that the translation and cross-cultural adaptation did not alter the internal consistency of the questionnaire. However, we observed some variability in subject responses to the items of the social demands subscale (Cronbach’s alpha of 0.56) and thus, coinciding with Durand et al. [16], we believe that the internal consistency of this subscale should be interpreted with caution. The results of the test–retest reliability are very similar to those obtained by Gallasch et al. [17]. The stability or repeatability of the questionnaire can be considered good for the output, mental and social demands subscales and very good for the physical and work scheduling demands subscales [26–28]. The results show adequate construct validity of the WRFQ. On the one hand, the median scores obtained by participants, all of whom were patients with active health problems, for all subscales ranged between 62.5 and 83.3 %, indicating important difficulties in carrying out the demands of their jobs, which is not surprising. On the other hand, as expected, the comparisons of scores between the two groups of patients indicates lower scores on the subscales of scheduling and physical demands for those with only physical health problems and, conversely, lower scores on the subscales of mental and social demands for patients with only a mental health problem. One limitation of this study could be the sample size in the pre-test; however it is consistent with the previous literature. In conclusion, our results confirm that the process used for translation and cross-cultural adaptation of the WRFQ to Spanish spoken in Spain was carried out successfully and indicate the existence of a good preservation of its psychometric properties. Acknowledgments We want to thank Concepcioń Go´mez-Morań, Carlos Enric Delclo´s, M a Jose´ Romań and Cliff Grossman for their professional involvement in the direct and back translation of the WRFQ. Thanks to Julia` del Prado, Josefina Pi-Sunyer, Rocıó Villar for their kind participation with the translators in the Expert Committee, and Nuria Gonza´lez, Chelo Sancho and Carmen Sańchez for their collaboration in the distribution and collection of questionnaires, all of them staff of the Occupational Health Service, Parc de Salut MAR (OHS PSMAR), Barcelona. Also thanks to Joan Mirabent (OHS PSMAR), Marta Tejero and Gemma Pidemont for their patient and generous collaboration in the recruitment process of patients. Finally, thanks so much ´ ngeles to Ram Dulthummon, Jose´ Ramada, Borja Ramada and A Ramada for their generous collaboration creating and assessing the quality of the database. This project has been partially supported by a grant from the Fondo de Investigaciones Sanitarias (FIS: PI12/02556), Instituto de Salud Carlos III, Subdireccioń General de Evaluacioń y Fomento de la Investigacioń, Ministerio de Ciencia e Innovacioń. Conflict of interest of interest.

The authors declare that they have no conflict

Appendix 1: Work Role Functioning Questionnaire adapted to Spanish Spoken into Spain

123 45

45

J Occup Rehabil (2013) 23:566–575

573

J Occup Rehabil (2013) 23:566–575

123 46

573

123 46

574

J Occup Rehabil (2013) 23:566–575

References 1. Brault MW, Hootman J, Helmick CG, Theis KA, Armour BS. Prevalence and most common causes of disability among adultsUnited States, 2005. MMWR 2009; 58:421–6. 2. Rice DP, LaPlante MP. Medical expenditures for disability and disabling comorbidity. Am J Public Health. 1992;82:739–41. 3. Dupre´ D, Karjalainen A (2003) Eurostat, statistics in focus: Employment of disabled people in Europe in 2002, Eurostat theme 3: population and social conditions. Available from: http:// epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-NK-03-026/ EN/KS-NK-03-026-EN.PDF. 4. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169–71. 5. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers employment: the case for delaying the retirement age. J Eur Economic Assoc. 2010;8:1034–76. 6. Macdonald EB, Sanati KA. Occupational health services now and in the future: the need for a paradigm shift. J Occup Environ Med. 2010;52:1273–7. 7. Sampere M, Gimeno D, Serra C, Plana M, Martıńez JM, Delclos GL, Benavides FG. Organizational return to work support and sick leave duration: a cohort of Spanish workers with a long-term non-work-related sick leave episode. J Occup Environ Med. 2011;53:674–9. 8. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to return employees to work following long-term sickness absence due to musculoskeletal disorders. J Public Health (Oxf). 2012;34:115–24. 9. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic evaluation on the cost-effectiveness of employability interventions among work disabled employees or employees at risk

10.

11.

12.

13.

14.

15.

16.

17.

18.

of work disability: the CASE-study. BMC Public Health. 2012; 18(12):43. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses and recommended measures. Spine. 2000;25:3152–60. Baldwin ML, Johnson WG, Butler RJ. The error of using returnsto-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632–41. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72–85. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-related worklimitations among employed persons in the United States. Disabil Rehabil. 2000; 22:225–32. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity student-role functioning scale using item response theory. Work. 2002;19:105–16. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et al. Validity and responsiveness of presenteeism scales in chronic work-related upper-extremity disorders. Phys Ther. 2011;91:254–66. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil. 2004;27:261–8. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil. 2007; 17:701–11. Abma FI. Amick Iii BC, Brouwer S, van der Klink JJ, Bu¨ltmann U. The cross-cultural adaptation of the work role functioning questionnaire to Dutch. Work. 2012;43:203–10.

123

574

J Occup Rehabil (2013) 23:566–575

References 1. Brault MW, Hootman J, Helmick CG, Theis KA, Armour BS. Prevalence and most common causes of disability among adultsUnited States, 2005. MMWR 2009; 58:421–6. 2. Rice DP, LaPlante MP. Medical expenditures for disability and disabling comorbidity. Am J Public Health. 1992;82:739–41. 3. Dupre´ D, Karjalainen A (2003) Eurostat, statistics in focus: Employment of disabled people in Europe in 2002, Eurostat theme 3: population and social conditions. Available from: http:// epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-NK-03-026/ EN/KS-NK-03-026-EN.PDF. 4. Ross D. Ageing and work: an overview. Occup Med (Lond). 2010;60:169–71. 5. Hairault JO, Langot F, Sopraseuth T. Distance to retirement and older workers employment: the case for delaying the retirement age. J Eur Economic Assoc. 2010;8:1034–76. 6. Macdonald EB, Sanati KA. Occupational health services now and in the future: the need for a paradigm shift. J Occup Environ Med. 2010;52:1273–7. 7. Sampere M, Gimeno D, Serra C, Plana M, Martıńez JM, Delclos GL, Benavides FG. Organizational return to work support and sick leave duration: a cohort of Spanish workers with a long-term non-work-related sick leave episode. J Occup Environ Med. 2011;53:674–9. 8. Squires H, Rick J, Carroll C, Hillage J. Cost-effectiveness of interventions to return employees to work following long-term sickness absence due to musculoskeletal disorders. J Public Health (Oxf). 2012;34:115–24. 9. Noben CY, Nijhuis FJ, de Rijk AE, Evers SM. Design of a trial-based economic evaluation on the cost-effectiveness of employability interventions among work disabled employees or employees at risk

10.

11.

12.

13.

14.

15.

16.

17.

18.

of work disability: the CASE-study. BMC Public Health. 2012; 18(12):43. Amick BC III, Lerner D, Rogers WH, Rooney T, Katz JN. A review of health-related work outcome measures and their uses and recommended measures. Spine. 2000;25:3152–60. Baldwin ML, Johnson WG, Butler RJ. The error of using returnsto-work to measure the outcomes of health care. Am J Ind Med. 1996;29:632–41. Lerner D, Amick BC 3rd, Rogers WH, Malspeis S, Bungay K, Cynn D. The Work Limitations Questionnaire. Med Care. 2001;39:72–85. Lerner DJ, Amick BC 3rd, Malspeis S, Rogers WH. A national survey of health-related worklimitations among employed persons in the United States. Disabil Rehabil. 2000; 22:225–32. Schmidt LL, Amick BC 3rd, Katz JN, Ellis BB. Evaluation of an upper extremity student-role functioning scale using item response theory. Work. 2002;19:105–16. Roy JS, MacDermid JC, Amick BC 3rd, Shannon HS, McMurtry R, Roth JH, et al. Validity and responsiveness of presenteeism scales in chronic work-related upper-extremity disorders. Phys Ther. 2011;91:254–66. Durand MJ, Vachon B, Hong QN, Imbeau D, Amick BC III, Loisel P. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil. 2004;27:261–8. Gallasch CH, Alexandre NM, Amick B 3rd. Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil. 2007; 17:701–11. Abma FI. Amick Iii BC, Brouwer S, van der Klink JJ, Bu¨ltmann U. The cross-cultural adaptation of the work role functioning questionnaire to Dutch. Work. 2012;43:203–10.

123 47

47

J Occup Rehabil (2013) 23:566–575 19. Amick BC III, Habeck RV, Ossmann J, Fossel AH, Keller R, Katz JN. Predictors of successful work role functioning after carpal tunnel release surgery. J Occup Environ Med. 2004;46: 490–500. 20. Beaton DE, Bombardier C, Guillemin F, BosiFerraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine. 2000;25:3186–91. 21. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol. 1995;24:61–3. 22. Hutchinson A, Bentzen N, Konig-Zanhn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO; 1996. 23. Alexandre NMC, Guirardello EB. Adaptacioń cultural de instrumentos utilizados en salud ocupacional. Rev Panam Salud Publica. 2002;11:109–11.

48

575 24. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994. 25. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334. 26. Sanchez-Fernandez P, Aguilar de Armas I, Fentelsaz G, MorenoCasbas MT, Hidalgo-Garcıá R. Fiabilidad de los instrumentos de medicioń en ciencias de la salud. Enferm Clin. 2005;15:227–36. 27. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 4th ed. New York: Oxford University Press Inc.; 2008. 28. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.

123

J Occup Rehabil (2013) 23:566–575 19. Amick BC III, Habeck RV, Ossmann J, Fossel AH, Keller R, Katz JN. Predictors of successful work role functioning after carpal tunnel release surgery. J Occup Environ Med. 2004;46: 490–500. 20. Beaton DE, Bombardier C, Guillemin F, BosiFerraz M. Guidelines for the process of cross-cultural adaptation of self-reports measures. Spine. 2000;25:3186–91. 21. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Rheumatol. 1995;24:61–3. 22. Hutchinson A, Bentzen N, Konig-Zanhn C. Cross cultural health outcome assessment: a user’s guide. The Netherlands: ERGHO; 1996. 23. Alexandre NMC, Guirardello EB. Adaptacioń cultural de instrumentos utilizados en salud ocupacional. Rev Panam Salud Publica. 2002;11:109–11.

48

575 24. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994. 25. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334. 26. Sanchez-Fernandez P, Aguilar de Armas I, Fentelsaz G, MorenoCasbas MT, Hidalgo-Garcıá R. Fiabilidad de los instrumentos de medicioń en ciencias de la salud. Enferm Clin. 2005;15:227–36. 27. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 4th ed. New York: Oxford University Press Inc.; 2008. 28. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.

123

4. PAPER # 3

4. PAPER # 3

Reliability and validity of the Work Role Functioning

Reliability and validity of the Work Role Functioning

Questionnaire (Spanish version). [Submitted for peer-

Questionnaire (Spanish version). [Submitted for peer-

review].

review].

49

49

184

184

TITLE:

TITLE:

Reliability and validity of the Work Role Functioning Questionnaire (Spanish version). AUTHORS:

Reliability and validity of the Work Role Functioning Questionnaire (Spanish version). AUTHORS:

1,2,3

1,2,3

José M Ramada Rodilla, MD, MSc

José M Ramada Rodilla, MD, MSc

1,2,3

1,2,3

Consol Serra Pujadas, MD, PhD Benjamin C Amick, PhD4,5 Femke I Abma, PhD6 Juan R Castaño Asins, MD7 Gemma Pidemunt Moli, MD, PhD8 Ute Bültmann, PhD6 George L Delclós Clanchet, MD, MPH, PhD1,3,4

Consol Serra Pujadas, MD, PhD Benjamin C Amick, PhD4,5 Femke I Abma, PhD6 Juan R Castaño Asins, MD7 Gemma Pidemunt Moli, MD, PhD8 Ute Bültmann, PhD6 George L Delclós Clanchet, MD, MPH, PhD1,3,4

AFFILIATIONS:

AFFILIATIONS:

1

Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra, Barcelona, España.

1

Centro de Investigación en Salud Laboral (CiSAL), Universidad Pompeu Fabra, Barcelona, España.

2

Servicio de Salud Laboral, Parc de Salut MAR, Barcelona, España.

2

Servicio de Salud Laboral, Parc de Salut MAR, Barcelona, España.

3

CIBER de Epidemiología y Salud Pública (CIBERESP).

3

CIBER de Epidemiología y Salud Pública (CIBERESP).

4

Southwest Center for Occupational and Environmental Health, The University of Texas School of Public Health. Houston, Texas, USA.

4

Southwest Center for Occupational and Environmental Health, The University of Texas School of Public Health. Houston, Texas, USA.

5

Institute for work & Health. 80 University Avenue, Toronto, Ontario, Canada.

5

Institute for work & Health. 80 University Avenue, Toronto, Ontario, Canada.

6

Department of Health Sciences, Work & Health, University Medical Center Groningen, University of Groningen. Groningen, The Netherlands.

6

Department of Health Sciences, Work & Health, University Medical Center Groningen, University of Groningen. Groningen, The Netherlands.

7

Psychiatry Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.

7

Psychiatry Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.

8

Orthopedic Surgery and Traumatology Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.

8

Orthopedic Surgery and Traumatology Service. Parc de Salut MAR. Hospital del Mar. Barcelona, Spain.

CORRESPONDING AUTHOR:

CORRESPONDING AUTHOR:

José Mª Ramada Rodilla CiSAL - Universidad Pompeu Fabra Dr. Aiguader, 88 08003-Barcelona Correo electrónico: [email protected] Tel. 932483066

José Mª Ramada Rodilla CiSAL - Universidad Pompeu Fabra Dr. Aiguader, 88 08003-Barcelona Correo electrónico: [email protected] Tel. 932483066 51

51

ABSTRACT

ABSTRACT

Purpose: Recently, the cross-cultural adaptation of the Work Role Functioning

Purpose: Recently, the cross-cultural adaptation of the Work Role Functioning

Questionnaire to Spanish was carried out, achieving satisfactory psychometric properties.

Questionnaire to Spanish was carried out, achieving satisfactory psychometric properties.

Now we examined the reliability and validity of the adapted Spanish version (WRFQ-SpV)

Now we examined the reliability and validity of the adapted Spanish version (WRFQ-SpV)

in a general working population with and without (physical and mental) health issues to

in a general working population with and without (physical and mental) health issues to

evaluate its measurement properties.

evaluate its measurement properties.

Methods: A cross-sectional study was conducted among active workers. For reliability,

Methods: A cross-sectional study was conducted among active workers. For reliability,

we calculated Cronbach alpha to assess ‘internal consistency’, and the standard error of

we calculated Cronbach alpha to assess ‘internal consistency’, and the standard error of

measurement (SEM) to evaluate ‘measurement error’. We assessed the 'structural

measurement (SEM) to evaluate ‘measurement error’. We assessed the 'structural

validity' through confirmatory factor analyses and 'construct validity' by means of

validity' through confirmatory factor analyses and 'construct validity' by means of

hypotheses testing. The consensus-based standard for the selection of health status

hypotheses testing. The consensus-based standard for the selection of health status

measurement instruments (COSMIN) taxonomy were used in the design of the study.

measurement instruments (COSMIN) taxonomy were used in the design of the study.

Results: A total of 455 workers completed the questionnaire. It showed excellent internal

Results: A total of 455 workers completed the questionnaire. It showed excellent internal

consistency (α=0.98). The SEM for the overall scale was 7.10. The original five factor

consistency (α=0.98). The SEM for the overall scale was 7.10. The original five factor

structure reflected fair dimensionality of the construct (Chi-square, 1445.8; 314 degrees of

structure reflected fair dimensionality of the construct (Chi-square, 1445.8; 314 degrees of

freedom; RMSEA=0.08; CFI > 0.95 and WRMR > 0.90). For construct validity, all

freedom; RMSEA=0.08; CFI > 0.95 and WRMR > 0.90). For construct validity, all

hypotheses were confirmed differentiating groups with different jobs, health conditions and

hypotheses were confirmed differentiating groups with different jobs, health conditions and

ages. Moderate to strong correlations were found between WRFQ-SpV and a related

ages. Moderate to strong correlations were found between WRFQ-SpV and a related

construct (work ability).

construct (work ability).

Conclusions: Our study provides evidence of the reliability and validity of the WRFQ-SpV

Conclusions: Our study provides evidence of the reliability and validity of the WRFQ-SpV

to measure health-related work functioning in day-to-day practice and research in

to measure health-related work functioning in day-to-day practice and research in

occupational health care and the rehabilitation of disabled workers. It should be useful to

occupational health care and the rehabilitation of disabled workers. It should be useful to

monitor improvements in work functioning after implementing rehabilitation and/or

monitor improvements in work functioning after implementing rehabilitation and/or

accommodation programs. Longitudinal studies are needed to assess the responsiveness

accommodation programs. Longitudinal studies are needed to assess the responsiveness

of the questionnaire.

of the questionnaire.

Key terms: validity; reliability; work-functioning instrument; measurement instrument;

Key terms: validity; reliability; work-functioning instrument; measurement instrument;

psychometric properties; self-report.

psychometric properties; self-report.

52

52 52

52

INTRODUCTION

INTRODUCTION

Increasing life expectancy in developed countries and delayed retirement age are

Increasing life expectancy in developed countries and delayed retirement age are

increasing the overall age of the workforce. Aging workers are more likely to have

increasing the overall age of the workforce. Aging workers are more likely to have

chronic health issues and a certain degree of disability, but most are able to

chronic health issues and a certain degree of disability, but most are able to

maintain job competence with some workplace adjustments and/or rehabilitation

maintain job competence with some workplace adjustments and/or rehabilitation

programs [1-4]. Also, there is evidence showing that work has positive health

programs [1-4]. Also, there is evidence showing that work has positive health

effects when conditions are reasonably acceptable; therefore, promoting an active

effects when conditions are reasonably acceptable; therefore, promoting an active

working life is recommendable [5,6].

working life is recommendable [5,6].

Quality work functioning tools are required to obtain valid measurements to

Quality work functioning tools are required to obtain valid measurements to

evaluate the impact of health on work functioning and to monitor the extent to

evaluate the impact of health on work functioning and to monitor the extent to

which workers improve their ability to meet job demands after a rehabilitation or

which workers improve their ability to meet job demands after a rehabilitation or

accommodation program. This will enable healthcare professionals, human

accommodation program. This will enable healthcare professionals, human

resources managers, employers and other stakeholders to support an active and

resources managers, employers and other stakeholders to support an active and

healthy labor force. Moreover, valid outcome measures are needed to assess how

healthy labor force. Moreover, valid outcome measures are needed to assess how

workers function at work over the course of their job careers and the existing

workers function at work over the course of their job careers and the existing

continuum between working successfully at one extreme and disability and work-

continuum between working successfully at one extreme and disability and work-

absence at the other [7].

absence at the other [7].

There are a number of tools to measure constructs related to self-perceived work

There are a number of tools to measure constructs related to self-perceived work

functioning, including the Functional Status Index [8], the Work Productivity and

functioning, including the Functional Status Index [8], the Work Productivity and

Activity Impairment Questionnaire [9], the Health and Labor Questionnaire [10],

Activity Impairment Questionnaire [9], the Health and Labor Questionnaire [10],

the Endicott Work Productivity Scale [11], the Work Ability Index [12], the Role-

the Endicott Work Productivity Scale [11], the Work Ability Index [12], the Role-

based Performance Scale [13], the Stanford Presenteeism Scale [14], the Work

based Performance Scale [13], the Stanford Presenteeism Scale [14], the Work

Instability Scale [15], and the Work Activity Limitations Scale [16].

Instability Scale [15], and the Work Activity Limitations Scale [16].

Since 'being present at work without being able to meet job demands'

Since 'being present at work without being able to meet job demands'

(presenteeism) [17] is not the same as 'performing work demands successfully', a

(presenteeism) [17] is not the same as 'performing work demands successfully', a

series of work-role specific functioning questionnaires were developed in the

series of work-role specific functioning questionnaires were developed in the

53

53

2000’s. Among those, there are different versions of the Work Limitations

2000’s. Among those, there are different versions of the Work Limitations

Questionnaire [18] and the Work Role Functioning Questionnaire (WRFQ) [19].

Questionnaire [18] and the Work Role Functioning Questionnaire (WRFQ) [19].



problems. This questionnaire is a generic instrument conceptually developed to

problems. This questionnaire is a generic instrument conceptually developed to

represent a wide range of health conditions and work demands. Furthermore, it is

represent a wide range of health conditions and work demands. Furthermore, it is

freely available in the literature for professionals and researchers. Recently, it has

freely available in the literature for professionals and researchers. Recently, it has

been successfully translated, adapted and validated to be used in different

been successfully translated, adapted and validated to be used in different

contexts (e.g. Canadian French [20], Brazilian Portuguese [21], Dutch [7,22] and

contexts (e.g. Canadian French [20], Brazilian Portuguese [21], Dutch [7,22] and

Spanish spoken in Spain [23]). These versions have shown good psychometric

Spanish spoken in Spain [23]). These versions have shown good psychometric

properties in different populations.

properties in different populations.

Before using an adapted instrument it is important to assess its measurement

Before using an adapted instrument it is important to assess its measurement

properties [24]. Recent reviews have shown that health-related work outcome

properties [24]. Recent reviews have shown that health-related work outcome

measures and health-related work functioning instruments need better validation

measures and health-related work functioning instruments need better validation

studies to make them more meaningful for researchers, practitioners and patients

studies to make them more meaningful for researchers, practitioners and patients

[25,26]. The cross-cultural adaptation of the WRFQ to Spanish was recently

[25,26]. The cross-cultural adaptation of the WRFQ to Spanish was recently

carried out, and the questionnaire showed good test-retest reliability (intraclass

carried out, and the questionnaire showed good test-retest reliability (intraclass

correlation coefficients, ICCs between 0.77 and 0.93 for all subscales) [23], but

correlation coefficients, ICCs between 0.77 and 0.93 for all subscales) [23], but

further assessment of the validity and reliability of the questionnaire in a larger

further assessment of the validity and reliability of the questionnaire in a larger

sample was recommended.

sample was recommended.

Therefore, the objective of this study was to examine the reliability and validity of

Therefore, the objective of this study was to examine the reliability and validity of

the Spanish version of the WRFQ (WRFQ-SpV) in a general working population of

the Spanish version of the WRFQ (WRFQ-SpV) in a general working population of

Barcelona (Spain), with and without (physical and mental) health issues.

Barcelona (Spain), with and without (physical and mental) health issues.

54

54 54

54

METHODS

METHODS

Procedures and sample characteristics

Procedures and sample characteristics

After carrying out the cross-cultural adaptation of the WRFQ to Spanish spoken in

After carrying out the cross-cultural adaptation of the WRFQ to Spanish spoken in

Spain [23], it was necessary to assess its reliability and validity in a larger sample

Spain [23], it was necessary to assess its reliability and validity in a larger sample

so that it could be used in both occupational health and rehabilitation settings;

so that it could be used in both occupational health and rehabilitation settings;

hence a cross-sectional study was conducted among active workers of a general

hence a cross-sectional study was conducted among active workers of a general

working population of Barcelona (Spain). The consensus-based standard for the

working population of Barcelona (Spain). The consensus-based standard for the

selection of health status measurement instruments (COSMIN) taxonomy was

selection of health status measurement instruments (COSMIN) taxonomy was

used in the study design [27-29].

used in the study design [27-29].

Participants were recruited at a large public hospital in Barcelona, among patients,

Participants were recruited at a large public hospital in Barcelona, among patients,

persons accompanying patients, hospital workers and other workers that were

persons accompanying patients, hospital workers and other workers that were

carrying out different duties at the hospital (ambulance drivers, bar tenders,

carrying out different duties at the hospital (ambulance drivers, bar tenders,

kitchen and cleaning staff). Patients were recruited through the outpatient services

kitchen and cleaning staff). Patients were recruited through the outpatient services

of psychiatry, physical medicine and rehabilitation, orthopedic surgery and

of psychiatry, physical medicine and rehabilitation, orthopedic surgery and

traumatology. The inclusion criteria were: 1) active workers of both sexes, working

traumatology. The inclusion criteria were: 1) active workers of both sexes, working

at least 10 hours per week in the past four weeks, 2) age 18 years and older, and

at least 10 hours per week in the past four weeks, 2) age 18 years and older, and

3) able to read and understand Spanish (the language of the questionnaire).

3) able to read and understand Spanish (the language of the questionnaire).

Participants were excluded if they had plans to stop working within the following

Participants were excluded if they had plans to stop working within the following

six months.

six months.

The study protocol and the informed consent process was reviewed and approved

The study protocol and the informed consent process was reviewed and approved

by the Clinical Research Ethical Committee of the Parc de Salut Mar (Barcelona).

by the Clinical Research Ethical Committee of the Parc de Salut Mar (Barcelona).

All participants received information about the study purpose and signed the

All participants received information about the study purpose and signed the

informed consent to participate in it.

informed consent to participate in it.

Measures

Measures

The WRFQ-SpV is a self-administered questionnaire containing 27 items grouped

The WRFQ-SpV is a self-administered questionnaire containing 27 items grouped

into 5 subscales reflecting different work demands: work scheduling, output,

into 5 subscales reflecting different work demands: work scheduling, output,

55

55

physical, mental and social demands [23]. The recall period is four weeks and

physical, mental and social demands [23]. The recall period is four weeks and

each subscale is measured by the percentage of time in a working day the

each subscale is measured by the percentage of time in a working day the

employee has difficulty performing those demands. Response options vary on a

employee has difficulty performing those demands. Response options vary on a

five-point scale: 0=all of the time (100%), 1=most of the time, 2=half of the time

five-point scale: 0=all of the time (100%), 1=most of the time, 2=half of the time

(50%), 3=some of the time, 4=none of the time (0%) and 5=does not apply to my

(50%), 3=some of the time, 4=none of the time (0%) and 5=does not apply to my

job. For each subscale and for the overall scale, item scores were summed,

job. For each subscale and for the overall scale, item scores were summed,

divided by the number of items included in the subscale (or the overall scale), and

divided by the number of items included in the subscale (or the overall scale), and

then multiplied by 25 to obtain the scores, ranging from 0% (difficulty all the time)

then multiplied by 25 to obtain the scores, ranging from 0% (difficulty all the time)

to 100% (no difficulty at any time). The scores for "does not apply to my job" were

to 100% (no difficulty at any time). The scores for "does not apply to my job" were

transformed to missing values. Scales and/or subscales containing more than 20%

transformed to missing values. Scales and/or subscales containing more than 20%

missing values were set to missing.

missing values were set to missing.

All participants were invited to complete the WRFQ-SpV on paper, providing self-

All participants were invited to complete the WRFQ-SpV on paper, providing self-

reported information on age, gender, level of education (primary, secondary,

reported information on age, gender, level of education (primary, secondary,

higher), job type (manual, non-manual, mixed), working hours and primary health

higher), job type (manual, non-manual, mixed), working hours and primary health

condition (none, musculoskeletal, mental, others).

condition (none, musculoskeletal, mental, others).

Three single items of the work ability index (WAI) [12] were included in the survey

Three single items of the work ability index (WAI) [12] were included in the survey

for a convenience subsample of participants, who voluntarily accepted to answer

for a convenience subsample of participants, who voluntarily accepted to answer

to these items. The first was the overall item 'current work ability compared with

to these items. The first was the overall item 'current work ability compared with

the life-time best', with a possible score of 0=completely unable to work to

the life-time best', with a possible score of 0=completely unable to work to

10=work ability at its best. Recent studies showed that this overall single item

10=work ability at its best. Recent studies showed that this overall single item

highly correlates with the overall WAI score [30] and also showed the convergent

highly correlates with the overall WAI score [30] and also showed the convergent

validity and the similarity in results between the overall WAI scores and the scores

validity and the similarity in results between the overall WAI scores and the scores

of the overall single item of the WAI in large samples of participants [31]. Also,

of the overall single item of the WAI in large samples of participants [31]. Also,

there is an increasing number of studies using the overall single item of the WAI to

there is an increasing number of studies using the overall single item of the WAI to

assess 'work ability' in different populations [7,30,32,33]. The other two items

assess 'work ability' in different populations [7,30,32,33]. The other two items

measure work ability in relation to physical and mental job demands, with a

measure work ability in relation to physical and mental job demands, with a

possible score of 1=very poor to 5=very good, and are questions already validated

possible score of 1=very poor to 5=very good, and are questions already validated

in the original version of the questionnaire [12].

in the original version of the questionnaire [12].

56

56 56

56

Reliability assessment:

Reliability assessment:

Reliability is defined as the degree to which the measurement is free from

Reliability is defined as the degree to which the measurement is free from

measurement error [27], and can also be defined as the extent to which scores for

measurement error [27], and can also be defined as the extent to which scores for

participants who have not changed are the same for repeated measurement under

participants who have not changed are the same for repeated measurement under

several conditions [35]: 1) using different sets of items from the same muli-item

several conditions [35]: 1) using different sets of items from the same muli-item

measurement instrument (internal consistency); 2) over time (test-retest reliability);

measurement instrument (internal consistency); 2) over time (test-retest reliability);

3) by different raters on the same occasion (inter-rater reliability) or 4) by the same

3) by different raters on the same occasion (inter-rater reliability) or 4) by the same

raters on different occasions (intra-rater reliability). The COSMIN taxonomy [27,35]

raters on different occasions (intra-rater reliability). The COSMIN taxonomy [27,35]

also considers measurement error as an aspect of reliability.

also considers measurement error as an aspect of reliability.

Validity assessment:

Validity assessment:

Validity of a questionnaire is defined in the literature as the degree to which an

Validity of a questionnaire is defined in the literature as the degree to which an

instrument truly measures the construct it purposes to measure. In general, three

instrument truly measures the construct it purposes to measure. In general, three

different types of validity can be distinguished: content validity, criterion validity

different types of validity can be distinguished: content validity, criterion validity

and construct validity, and within these three main types of validity there are some

and construct validity, and within these three main types of validity there are some

subtypes [35].

subtypes [35].

Content validity focuses on whether the content of the instrument corresponds with

Content validity focuses on whether the content of the instrument corresponds with

the construct that the instrument measures, with regard to relevance and

the construct that the instrument measures, with regard to relevance and

comprehensiveness. This type of validity is frequently assessed by means of a

comprehensiveness. This type of validity is frequently assessed by means of a

systematic empiric procedure in which the authors of the questionnaire, a panel of

systematic empiric procedure in which the authors of the questionnaire, a panel of

experts and a sample of the target population participate. It was already assessed

experts and a sample of the target population participate. It was already assessed

in our previous manuscript about the cross-cultural adaptation of the Work Role

in our previous manuscript about the cross-cultural adaptation of the Work Role

Functioning Questionnaire (WRFQ), following rigorously the recommendations of

Functioning Questionnaire (WRFQ), following rigorously the recommendations of

the literature [23].

the literature [23].

Criterion validity can be assessed only in situations in which there is a gold

Criterion validity can be assessed only in situations in which there is a gold

standard for the construct to be measured, and refers to how well the scores of the

standard for the construct to be measured, and refers to how well the scores of the

measurement instrument agree with the scores obtained with the gold standard.

measurement instrument agree with the scores obtained with the gold standard.

57

57

Since 'Work Functioning' is a construct that has not a gold standard, this type of

Since 'Work Functioning' is a construct that has not a gold standard, this type of

validity cannot be assessed for the Work Role Functioning Questionnaire (WRFQ).

validity cannot be assessed for the Work Role Functioning Questionnaire (WRFQ).

Construct validity should be evaluated in those situations in which there is no gold

Construct validity should be evaluated in those situations in which there is no gold

standard, and refers to whether the instrument provides the expected scores,

standard, and refers to whether the instrument provides the expected scores,

based on existing knowledge about the construct [35]. There is an international

based on existing knowledge about the construct [35]. There is an international

consensus of experts [27-29] recommending to assess construct validity

consensus of experts [27-29] recommending to assess construct validity

evaluating the 'cross-cultural validity', which we already did in our previous

evaluating the 'cross-cultural validity', which we already did in our previous

manuscript [23]; the 'structural validity' which we carried out by means of a

manuscript [23]; the 'structural validity' which we carried out by means of a

Confirmatory Factor Analysis (CFA) and 'hypotheses testing', which we carried

Confirmatory Factor Analysis (CFA) and 'hypotheses testing', which we carried

out testing seven hypotheses.

out testing seven hypotheses.

Statistical analysis

Statistical analysis

WRFQ-SpV mean scores, standard deviations (SD), median scores and ranges

WRFQ-SpV mean scores, standard deviations (SD), median scores and ranges

were calculated. Floor and ceiling effects were also explored. These effects occur

were calculated. Floor and ceiling effects were also explored. These effects occur

when more than 15% of the participants' responses to a certain question cluster at

when more than 15% of the participants' responses to a certain question cluster at

the top or the bottom of the scale [34]. Since the original version of the WRFQ was

the top or the bottom of the scale [34]. Since the original version of the WRFQ was

developed for a working population with health problems [19], and our population

developed for a working population with health problems [19], and our population

contains a percentage of participants declaring no health issues, we carried out a

contains a percentage of participants declaring no health issues, we carried out a

sensitivity analysis of floor and ceiling effects, restricting the sample to only those

sensitivity analysis of floor and ceiling effects, restricting the sample to only those

participants reporting health problems to explore if there were differences in the

participants reporting health problems to explore if there were differences in the

presence of these effects due to the characteristic of the sample.

presence of these effects due to the characteristic of the sample.

Participant scores were presented by job type (manual, non-manual, mixed),

Participant scores were presented by job type (manual, non-manual, mixed),

reported health issues (none, physical, mental) and groups of age (18-35 years,

reported health issues (none, physical, mental) and groups of age (18-35 years,

36-45 years, 46-55 years, 56-65 years), assessing the statistical significance of

36-45 years, 46-55 years, 56-65 years), assessing the statistical significance of

the differences by means of the Kruskall Wallis H test (to compare median scores)

the differences by means of the Kruskall Wallis H test (to compare median scores)

and analysis of variance (ANOVA) to compare mean scores. Post-hoc paired

and analysis of variance (ANOVA) to compare mean scores. Post-hoc paired

analyses (comparing median or mean scores for each of the two groups) were

analyses (comparing median or mean scores for each of the two groups) were

performed to determine which group or groups were responsible for significant

performed to determine which group or groups were responsible for significant

differences. When comparing median scores between two groups, Mann-Whitney

differences. When comparing median scores between two groups, Mann-Whitney

58

58

58

58

test for two independent samples were used, and when comparing mean scores

test for two independent samples were used, and when comparing mean scores

between two groups t-Tests were used.

between two groups t-Tests were used.

Internal consistency was assessed using Cronbach alpha coefficients considering

Internal consistency was assessed using Cronbach alpha coefficients considering

appropriate values ≥ 0.70 [34]. The standard error of measurement (SEM) was

appropriate values ≥ 0.70 [34]. The standard error of measurement (SEM) was

calculated for a stable subgroup of participants (n=40) that completed the

calculated for a stable subgroup of participants (n=40) that completed the

questionnaire twice in similar conditions, within an interval that varied from 7 to 15

questionnaire twice in similar conditions, within an interval that varied from 7 to 15

days [35]. This subgroup of participants was composed of the first 40 participants

days [35]. This subgroup of participants was composed of the first 40 participants

of the study who completed the first round and accepted to complete the

of the study who completed the first round and accepted to complete the

questionnaire a second time within this interval.

questionnaire a second time within this interval.

A CFA was conducted to analyze the structural validity of the WRFQ-SpV, testing

A CFA was conducted to analyze the structural validity of the WRFQ-SpV, testing

whether data collected in this general working population (N=455) had an

whether data collected in this general working population (N=455) had an

adequate fit in the predetermined five factor model structure defined by the

adequate fit in the predetermined five factor model structure defined by the

authors of the original questionnaire [19]. A four factor model structure was also

authors of the original questionnaire [19]. A four factor model structure was also

tested because the Work Limitations Questionnaire [18], designed to measure on-

tested because the Work Limitations Questionnaire [18], designed to measure on-

the-job impact of chronic health problems, has a structure with four factors (one of

the-job impact of chronic health problems, has a structure with four factors (one of

them named mental-interpersonal) and earlier studies [20,21,23] recommended

them named mental-interpersonal) and earlier studies [20,21,23] recommended

caution when interpreting the internal consistency of the social demands subscale.

caution when interpreting the internal consistency of the social demands subscale.

Thus, we hypothesized it might be necessary to collapse the subscales of mental

Thus, we hypothesized it might be necessary to collapse the subscales of mental

and social demands into a single factor of psychosocial demands with seven

and social demands into a single factor of psychosocial demands with seven

items.

items.

Following recommendations in the literature regarding CFA, we did not use the

Following recommendations in the literature regarding CFA, we did not use the

standard maximum likelihood theory (applicable to continuous variables). Instead,

standard maximum likelihood theory (applicable to continuous variables). Instead,

we used the robust categorical least squares (applicable to categorical variables),

we used the robust categorical least squares (applicable to categorical variables),

based on the fact that the observed variables are measured on a Likert scale and

based on the fact that the observed variables are measured on a Likert scale and

the variables are approximately symmetrical [36-38].

the variables are approximately symmetrical [36-38].

Rhemtulla [36] suggests that when there is a minimum of five categorical variables

Rhemtulla [36] suggests that when there is a minimum of five categorical variables

in the response options, which is the case of the WRFQ, the CFA could also be

in the response options, which is the case of the WRFQ, the CFA could also be

assessed applying “the method of the standard theory of maximum likelihood”

assessed applying “the method of the standard theory of maximum likelihood”

59

59

treating these variables as if they were continuous (but we would be at the limit of

treating these variables as if they were continuous (but we would be at the limit of

acceptance of this method). To verify the possible existence of differences

acceptance of this method). To verify the possible existence of differences

depending on the method, calculations were performed applying both methods.

depending on the method, calculations were performed applying both methods.

Chi-squared tests for goodness of fit, the root mean square error of approximation

Chi-squared tests for goodness of fit, the root mean square error of approximation

(RMSEA), the comparative fit index (CFI) and the weighed root mean residual

(RMSEA), the comparative fit index (CFI) and the weighed root mean residual

(WRMR) were used to evaluate the models. Reference values for RMSEA ≤ 0.05

(WRMR) were used to evaluate the models. Reference values for RMSEA ≤ 0.05

indicating close fit, between 0.06 and 0.08, fair fit and between 0.09 and 0.1,

indicating close fit, between 0.06 and 0.08, fair fit and between 0.09 and 0.1,

mediocre fit. Reference values for CFI ≥ 0.95 and WRMR > 0.90 for acceptance

mediocre fit. Reference values for CFI ≥ 0.95 and WRMR > 0.90 for acceptance

[39].

[39].

Correlations were evaluated for item-subscale, item-total, among subscales and

Correlations were evaluated for item-subscale, item-total, among subscales and

subscale-total, using Pearson’s correlation coefficient (r), considering r ≥ 0.40 as

subscale-total, using Pearson’s correlation coefficient (r), considering r ≥ 0.40 as

evidence of moderate or strong correlations [40,41].

evidence of moderate or strong correlations [40,41].

Construct validity was assessed by means of hypotheses testing. Significance of

Construct validity was assessed by means of hypotheses testing. Significance of

the differences among groups were tested using the non-parametric Kruskall

the differences among groups were tested using the non-parametric Kruskall

Wallis H test when comparing differences among median scores and analysis of

Wallis H test when comparing differences among median scores and analysis of

the variance (ANOVA) when differences among mean scores were compared.

the variance (ANOVA) when differences among mean scores were compared.

Correlations between constructs were assessed using Pearson’s correlation

Correlations between constructs were assessed using Pearson’s correlation

coefficient (r) interpreting: r < 0.4= ’weak’; 0.4 ≤ r ≤ 0.7= ’moderate; r > 0.7=

coefficient (r) interpreting: r < 0.4= ’weak’; 0.4 ≤ r ≤ 0.7= ’moderate; r > 0.7=

’strong’ [41].

’strong’ [41].

The basic principle of construct validation by means of hypotheses testing is that

The basic principle of construct validation by means of hypotheses testing is that

hypotheses are formulated about differences in the instrument scores between

hypotheses are formulated about differences in the instrument scores between

subgroups of participants or about the relationships of the scores of the instrument

subgroups of participants or about the relationships of the scores of the instrument

under study with scores on other similar or dissimilar measuring tools [35],

under study with scores on other similar or dissimilar measuring tools [35],

therefore, seven hypotheses were formulated to asses construct validity:

therefore, seven hypotheses were formulated to asses construct validity:

Hypothesis 1, addressing health issues: 1a) Participants without health issues

Hypothesis 1, addressing health issues: 1a) Participants without health issues

report higher scores on the overall scale of the WRFQ than those with health

report higher scores on the overall scale of the WRFQ than those with health

issues; 1b) Participants with physical health issues report the lowest score on the

issues; 1b) Participants with physical health issues report the lowest score on the

60

60

subscale of physical demands; 1c) Participants with mental health issues report

subscale of physical demands; 1c) Participants with mental health issues report

the lowest score on the subscale of mental demands.

the lowest score on the subscale of mental demands.

Hypothesis 2, addressing job types: Participants with physical health issues and

Hypothesis 2, addressing job types: Participants with physical health issues and

manual job report a lower score on the WRFQ subscale of physical demands than

manual job report a lower score on the WRFQ subscale of physical demands than

those with physical health issues and non-manual or mixed jobs.

those with physical health issues and non-manual or mixed jobs.

Hypothesis 3, addressing correlation between WRFQ scores and scores of a

Hypothesis 3, addressing correlation between WRFQ scores and scores of a

related construct (work ability): 3a) There are moderate to strong correlations

related construct (work ability): 3a) There are moderate to strong correlations

between the score of the overall work ability item of the WAI (that measures a

between the score of the overall work ability item of the WAI (that measures a

related construct) and the overall score of the WRFQ; 3b) There are moderate to

related construct) and the overall score of the WRFQ; 3b) There are moderate to

strong correlations between the scores of the mental and physical demands items

strong correlations between the scores of the mental and physical demands items

of the WAI and those of the subscales of physical and mental demands of the

of the WAI and those of the subscales of physical and mental demands of the

WRFQ.

WRFQ.

Hypothesis 4, addressing age: Consistently with other studies finding that both,

Hypothesis 4, addressing age: Consistently with other studies finding that both,

chronological and functional age, are associated with a decrease in work ability

chronological and functional age, are associated with a decrease in work ability

and/or work outcomes [42-46], there is a trend on the overall scores of the WRFQ

and/or work outcomes [42-46], there is a trend on the overall scores of the WRFQ

showing worse work functioning with increasing age.

showing worse work functioning with increasing age.

All analyses were performed with SPSS (Version 15.0. Chicago, IL; 2006) and

All analyses were performed with SPSS (Version 15.0. Chicago, IL; 2006) and

Mplus (Version 7. Los Angeles, CA; 2012).

Mplus (Version 7. Los Angeles, CA; 2012).

RESULTS

RESULTS

Sample characteristics. Four hundred fifty-five participants completed the WRFQ-

Sample characteristics. Four hundred fifty-five participants completed the WRFQ-

SpV and were included in the analyses. All were active employees working an

SpV and were included in the analyses. All were active employees working an

average of 39 hours per week (SD=8.5), mean age of 42 years (SD=11) and with

average of 39 hours per week (SD=8.5), mean age of 42 years (SD=11) and with

different levels of education, job types and health issues (table 1). Compared with

different levels of education, job types and health issues (table 1). Compared with

the general Spanish working population, women and participants with higher

the general Spanish working population, women and participants with higher

educational level were overrepresented [47]. A subgroup of 181 participants also

educational level were overrepresented [47]. A subgroup of 181 participants also

completed the WAI items [Supplementary materials (1)].

completed the WAI items [Supplementary materials (1)].

61

61

184

184

Table 1. Participants' characteristics.

Table 1. Participants' characteristics. Total n=455


Participants with health issues (n=299)

Participants without health issues (n=156)

42.1

(11.1)

43.7

(10.8)

39.0

Low

73

(16.0)

61

(20.4)

Middle

157

(34.5)

121

High

225

(49.5)

117

Manual

111

(24.4)

Non-manual

125

(27.5)


Job type, n (%)

Mixed

Total n=455

(11.0)


12

(7.7)


(40.5)

36

(39.1)

108

81

(27.1)

30

(19.2)

82

(27.4)

43

(27.6)

83

(53.2)

(11.1)

43.7

(10.8)

39.0

Low

73

(16.0)

61

(20.4)

12

(7.7)

(23.1)

Middle

157

(34.5)

121

(40.5)

36

(23.1)

(69.2)

High

225

(49.5)

117

(39.1)

108

(69.2)

Manual

111

(24.4)

81

(27.1)

30

(19.2)

Non-manual

125

(27.5)

82

(27.4)

43

(27.6)

83

(53.2)

Job type, n (%)

218

(47.9)

136

(45.5)

(8.5)

38.8

(7.8)

38.7

(9.7)

Working hours/week, mean (SD)

None

156

(34.3)

0

(0.0)

156

(100.0)

Health issue type, n(%)

Physical

139

(30.5)

139

(46.5)

0

Mental health

125

(27.5)

125

(41.8)

Others

35 13.0

(7.7) (27.7)

35 19.9

(11.7) (32.2)

Health issue type, n(%)


Mixed

Extended survey with WAI a WAI overall-item, mean (SD) b

WAI physical demands, mean (SD) b

WAI mental demands, mean (SD)

7.6 3.8 3.9

(2.1) (1.0) (1.2)

Men n=71 (39.2%) 7.6 3.7 3.9

(2.1)

218

(47.9)

136

(45.5)

(8.5)

38.8

(7.8)

38.7

(9.7)

None

156

(34.3)

0

(0.0)

156

(100.0)

(0.0)

Physical

139

(30.5)

139

(46.5)

0

(0.0)

0

(0.0)

Mental health

125

(27.5)

125

(41.8)

0

(0.0)

0 0

(0.0) (0.0)

Others

35 13.0

(7.7) (27.7)

35 19.9

(11.7) (32.2)

0 0

(0.0) (0.0)


(1.0) (1.2)

Supplementary materials (1). Work Ability Index (WAI) scores obtained in a convenience subsample of participants (n=181).

Women n=110 (60.8%) 7.7 3.8 3.8

(11.0)

38.7

Supplementary materials (1). Work Ability Index (WAI) scores obtained in a convenience subsample of participants (n=181). Total n=181

Participants without health issues (n=156)

42.1

38.7

Working hours/week, mean (SD)

Participants with health issues (n=299)

Extended survey with WAI a WAI overall-item, mean (SD)

(2.0) (1.0)

Women n=110 (60.8%)

7.6

(2.1)

7.6

(2.1)

7.7

(2.0)

3.8

(1.0)

3.7

(1.0)

3.8

(1.0)

b

3.9

(1.2)

3.9

(1.2)

3.8

(1.2)

WAI mental demands, mean (SD)

(a) Single item question of the work ability index (scale 0-10)

(a) Single item question of the work ability index (scale 0-10)

(b) Single item question of the work ability index (scale 0-5).

(b) Single item question of the work ability index (scale 0-5).

63

Men n=71 (39.2%)

b

WAI physical demands, mean (SD)

(1.2)

Total n=181

63

184

184

Table 2 shows the mean, SD and median scores for each WRFQ-SpV subscale

Table 2 shows the mean, SD and median scores for each WRFQ-SpV subscale

and the overall scale. Higher values indicate better work functioning (less disability

and the overall scale. Higher values indicate better work functioning (less disability

at work). Mental and social demands subscales scored the highest mean and

at work). Mental and social demands subscales scored the highest mean and

median, and the output demands subscale scored the lowest.

median, and the output demands subscale scored the lowest.

Floor effects were not found for any subscale, but ceiling effects were found for the

Floor effects were not found for any subscale, but ceiling effects were found for the

subscales of work scheduling (20%), mental (29%) and social demands (31%),

subscales of work scheduling (20%), mental (29%) and social demands (31%),

exceeding the 15% criterion [34]. A sensitivity analysis was carried out, restricting

exceeding the 15% criterion [34]. A sensitivity analysis was carried out, restricting

the sample to only those participants reporting health problems (n=299; 66% of the

the sample to only those participants reporting health problems (n=299; 66% of the

sample), and ceiling effects also appeared for the same subscales.

sample), and ceiling effects also appeared for the same subscales.

Reliability assessment: The SEMs were 7.1 for the overall score, 8.5 for work

Reliability assessment: The SEMs were 7.1 for the overall score, 8.5 for work

scheduling, 8.9 for output, 8.6 for physical, 10.6 for mental and 13.3 for social

scheduling, 8.9 for output, 8.6 for physical, 10.6 for mental and 13.3 for social

demands [Supplementary materials (2)]. Cronbach alpha coefficients were 0.98 for

demands [Supplementary materials (2)]. Cronbach alpha coefficients were 0.98 for

the overall scale and above 0.81 for all subscales (table 2).

the overall scale and above 0.81 for all subscales (table 2).

Structural validity assessment: Fit was fair for the five factor model applying

Structural validity assessment: Fit was fair for the five factor model applying

method of the robust categorical least squares for categorical variables (Chi-

method of the robust categorical least squares for categorical variables (Chi-

square, 1285.8; 314 degrees of freedom, p