Inertial Measurement Unit Systems for Classifying ...

3 downloads 0 Views 40MB Size Report
Aug 15, 2017 - Baechle TR, Earle RW. Resistance .... Melzi S, Borsani L, Cesana M. The virtual trainer: supervising movements through a wearable ...... Warner M, Padua D, Emery CA, Stokes M. Predicting sport and occupational lower.
Inertial Measurement Unit Systems for Classifying Compound Lower Limb Exercises: Development and Evaluation Martin O’Reilly 13206487 This thesis is submitted to University College Dublin in fulfilment of the requirement for the degree of Doctor of Philosophy School of Public Health, Physiotherapy and Sports Science Head of School: Prof. Giuseppe De Vito

Principal Supervisor: Prof. Brian Caulfield Secondary Supervisor: Prof. Tomás Ward Doctoral Studies Panel Members: Dr. Eamonn Delahunt & Dr. Matthew Patterson

Submitted: August 2017

Table of Contents Thesis Abstract………………………………………………...………………….………………...X Statement of Original Authorship………………………………………………….…………….XI Role of Candidate and Collaborators……………………………………………….………….XII List of Publications…………………………………………………………………...……….….XIII Abbreviations……………………………………………………………………………....………XV Acknowledgements………………………………………………………………………….…...XVI

SECTION A: BACKGROUND TO THESIS………………………………………………………. 1 Chapter 1 - Tracking Compound Lower Limb Exercises: An Introduction to Relevant Theoretical Constructs and Thesis Objectives…………………………………………..….... 1 1.0. Introduction……………………………………………………………………………………… 2 1.1. The Benefits of Compound Lower-Limb Exercises and Difficulties in Achieving Them ... 3 1.2. Assessing Compound Lower-Limb Exercises: Current Practice………………..………… 4 1.3. Exercise Assessment with IMUs……………………………………………………………….4 1.4. IMU Based Exercise Biofeedback Systems…………………………………………………. 7 1.5. Aims and Objectives……………………………………………………………………………. 7 1.6. Data Collection and Participants…………………………………………………………….... 8 1.7 References……………………………………………………………………………………….. 8

Chapter 2 - Wearable Inertial Sensor Systems for Lower Limb Exercise Detection, Evaluation and Feedback: A Systematic Review ……………………………………………13 2.0. Abstract………………………………………………………………………………………….14 2.1. Introduction…………………………………………………………………………………….. 15 2.2. Methods………………………………………………………………………………………… 17 2.2.0. Literature Search Strategy and Study Selection Process……………………….17 2.2.1. Data Extraction Process…………………………………………………………... 18 2.2.2. Assessment of Study Quality……………………………………………………….18 2.3. Results………………………………………………………………………………..…………19

i

2.3.0. Database Search and Paper Lists…………………………………………………19 2.3.1. Sensor Set-ups………………………………………………………………………21 2.3.2. Exercises Investigated Versus Study design…………………………………… 23 2.3.3. Qualitative Reviews………………………………………………………………… 26 2.4. Discussion……………………………………………………………………………………….43 2.4.0. Sensor Set-ups…………………………………..…………………………………..43 2.4.1. Measurement Validation Studies…………………………………………..………44 2.4.2. Exercise Detection Systems……………………………………..…………………44 2.4.3. Movement Classification Systems………………………………………..………..45 2.4.4. Studies which Evaluate Feedback…………………………………………………46 2.4.5. User Evaluation Studies……………………………...…………………………..…46 2.4.6. Review Limitations……………………………………………………..……………47 2.5. Conclusion………………………………………………………………………………………48 2.6. References………………………………………………………………………………………48

SECTION B: ESTABLISHING THE EFFECTIVENESS OF VARIOUS IMU SET-UPS AND CLASSIFICATION APPROACHES IN ASSESSING LOWER LIMB EXERCISES………….58 SECTION B. 1: EXERCISE DETECTION….……………………………………………………..58 Chapter 3 - Technology in Strength and Conditioning Tracking Lower Limb Exercises with Wearable Sensors……………………………………………………………………………59 3.0. Abstract………………………………………………………………………………………….60 3.1. Introduction………………………………………………………………………….…………..61 3.2. Methods…………………………………………………………………………………….……63 3.2.0. Experimental Approach to the Problem…………………………………….……..63 3.2.1. Participants…………………………………………………………………………...63 3.2.2. Procedures…………………………………………………………………….……..63 3.2.3. Signal Processing and Statistical Analysis………………………………………..67 3.3. Results…………………………………………………………………………………………...71 3.4. Discussion……………………………………………………………………………………….75

ii

3.5. Practical Applications…………………………………………………………………………..77 3.6. References………………………………………………………………………………………79

SECTION B. 2: CLASSIFYING EXERCISE TECHNIQUE WITH LARGE, BALANCED DATA SETS OF DELIBERATELY INDUCED TECHNIQUE DEVIATIONS……………...…………..83 Chapter 4 - Classification of Lunge Biomechanics with Multiple and Individual Inertial Measurement Units………………………………………………………………………………...84 4.0. Abstract………………………………………………………………………………………….85 4.1. Introduction……………………………………………………………………………………...86 4.2. Methods……………………………………………………………………………………….…87 4.2.0. Participants…………………………………………………………………………..88 4.2.1. Exercise Technique and Deviations………………………………………………88 4.2.2. Experimental Protocol………………………………………………………………90 4.2.4. Data Analysis………………………………………………………………………...90 4.3. Results…………………………………………………………………………………………...95 4.3.0. Classification with Full Feature Set………………………………………………..95 4.3.1. Feature Importance…………………………………………………………………98 4.3.2. Classification with Top-Ranked Features…………………………………………98 4.3.3. Number of Trees Used Versus Out of Bag Error…………………………………98 4.4. Discussion and Implications……………………………………………………………….. 101 4.5. Conclusion………………………………………………………………………………….… 105 4.6. References…………………………………………………………………………………… 106

Chapter 5 - Classification of Bodyweight Squat Biomechanics with Multiple and Individual Inertial Measurement Units………………………………………………………...110 5.0. Abstract………………………………………………………………………………………...111 5.1. Introduction…………………………………………………………………………………….112 5.2. Methods………………………………………………………………………………………..114 5.2.0. Experimental Approach to Problem………………………………………………114 5.2.1. Participants………………………………………………………………………….115 iii

5.2.2. Procedures………………………………………………………………………….115 5.2.3. Signal Processing and Statistical Analysis………………………………………117 5.3. Results………………………………………………………………………………………….120 5.4. Discussion……………………………………………………………………………………..123 5.5. Practical Implications…………………………………………………………………………127 5.6. References…………………………………………………………………………………….127

SECTION B.3: CLASSIFYING EXERCISE TECHNIQUE WITH SMALL, IMBALANCED DATA SETS OF NATURALLY OCCURRING TECHNIQUE DEVIATIONS………………...131 Chapter 6 - Classification of Single Leg Squat Biomechanics with Multiple and Individual Inertial Measurement Units………………………………………………………..132 6.0. Abstract………………………………………………………………………………………...133 6.1. Introduction…………………………………………………………………………………….134 6.2. Objectives……………………………………………………………………………………...135 6.3. Methods………………………………………………………………………………………..136 6.3.0. Participants………………………………………………………………………….136 6.3.1. Experimental Protocol……………………………………………………………..136 6.3.2. Data Labelling………………………………………………………………………138 6.3.3. Data Analysis……………………………………………………………………….138 6.4. Results………………………………………………………………………………….………140 6.5. Discussion……………………………………………………………………………………..143 6.6. Practical Implications…………………………………………………………………………145 6.7. Conclusions……………………………………………………………………………………145 6.8. References…………………………………………………………………………………….146

Chapter 7 - Classification of Deadlift Biomechanics with Multiple and Individual Inertial Measurement Units……………………………………………………………………………….149 7.0. Abstract………………………………………………………………………………………...150 7.1. Introduction…………………………………………………………………………………….151 7.2. Methods………………………………………………………………………………………..152 iv

7.2.0. Experimental Approach to Problem………………………………………………152 7.2.1. Participants…………………………………………………………………………153 7.2.2. Procedures…………………………………………………………………………153 7.2.3. Data Labelling………………………………………………………………………154 7.2.4. Signal Processing………………………………………………………………….154 7.2.5. Classification………………………………………………………………….…….154 7.3. Results……………………………………………………………………………………….…157 7.3.1. Data Set……….…………………………………………………………………….157 7.3.2. Experiment 1: Induced Technique Deviations …………………………….……157 7.3.3. Experiment 2: Naturally Occurring Technique Deviation……….……………..158 7.4. Discussion……………………………………………………………………………………..161 7.6. References…………………………………………………………………………………….163

Chapter 8 - Classification of Barbell Squat Biomechanics with Multiple and Individual Inertial Measurement Units……………………………………………………………………..166 8.0. Abstract………………………………………………………………………………………...167 8.1. Introduction…………………………………………………………………………………….169 8.2. Objectives……………………………………………………………………………………...171 8.2. Methods………………………………………………………………………………………..171 8.2.0. Experimental Approach to Problem………………………………………………171 8.2.1. Participants………………………………………………………………………….172 8.2.2. Procedures………………………………………………………………………….172 8.2.3. Data Labelling………………………………………………………………………173 8.2.4. Signal Processing and Statistical Analysis………………………………………174 8.3. Results……………………………………….…………………………………………………177 8.4. Discussion……………………………………………………………………………………..180 8.5. Conclusion……………………………………………………………………………………..183 8.6. References…………………………………………………………………………………….183

v

SECTION C: DEVELOPMENT AND EVALUATION OF ‘FORMULIFT’: A PERSONALISED EXERCISE CLASSIFICATION SYSTEM FOR LOWER LIMB EXERCISES………………187 Chapter 9 - A Mixed-methods Evaluation of ‘Formulift’: a Wearable Sensor Based Exercise Biofeedback System………………………………………………………….………188 9.0. Abstract………………………………………………………………………………………...189 9.1. Introduction……….………………………………………………………………………..…..190 9.2. Objectives……………………………………………………………………….……………..193 9.3. Methods………………………………………………………………………………….…….193 9.3.0. Participants………………………………………………………………………….193 9.3.1. Data Collection…………………………………………………………………..…194 9.3.2. Data Analysis……………………………………………………………………….196 9.4. Results……………………………………………………………………………….…………196 9.4.0. User Mobile Application Rating Scale…………………………………………...196 9.4.1. Usability and Functionality……………………………………………………..….197 9.4.2. Perceived Impact…………………………………………………………………..201 9.4.3. Subjective Quality………………………………………………………………….203 9.4.4. Future Changes…………………………………………………………………….203 9.5. Discussion……………………………………………………………………………………..206 9.5.0. Principal Results……………………………………………………………….…..206 9.5.1. Limitations…………………………………………………………………………..207 9.5.2. Future Work………………………………………………………………………...208 9.5.3. Comparison with Prior Work……………………………………………………...208 9.6. Conclusions…………………………………………………………………………………....209 9.7. References…………………………………………………………………………………….300

vi

SECTION D: NOVEL METHODS FOR THE CREATION OF IMU-BASED EXERCISE CLASSIFICATION SYSTEMS……………………………………………………………………215 Chapter 10 - A Mobile Application to Streamline the Development of Wearable Sensor Based Exercise Biofeedback Systems: System Development and Evaluation…….…216 10.0. Abstract……………………………………………………………………………………….217 10.1. Introduction…………………………………………………………………………………..218 10.2. Objectives…………………………………………………………………………………….220 10.3. Methods………………………………………………………………………………………220 10.3.0. System Overview…………………………………………………………………220 10.3.1. System Evaluation………………………………………………………………..226 10.4. Results………………………………………………………………………………………..229 10.4.0. Participant Demographics……………………………………………………….229 10.4.1. System Evaluation Results………………………………………………………229 10.5. Discussion……………………………………………………………………………………231 10.5.0. System Development ……………………………………………………………231 10.5.1. System Evaluation………………………………………………………….…….231 10.5.2. Limitations…………………………………………………………………………232 10.5.3. Conclusions……………………………………………………………………….233 10.6. References…………………………………………………………………………………...233

Chapter 11 - Feature-free Activity Classification of Inertial Sensor Data with Machine Vision Techniques: Method, Development and Evaluation……………………………….238 11.0. Abstract………………………….……………………………………………………………239 11.1. Introduction……………………………………………………………………………….…..241 11.1.0. Background……………………………………………………………………….241 11.1.1. Related Work…………………….………………………………………………..242 11.2. Methods………………………………………………………………………………………246 11.2.0. Data Collection……………………………………………………………………246 vii

11.2.1. Preparation for Transfer Learning………………………………………………248 11.3. Results………………………………………………………………………………………..252 11.4. Discussion……………………………………………………………………………………254 11.4.0. Principal Results………………………………………………………………….254 11.4.1. Comparison with Prior Work…………………………………………………….255 11.4.2. Limitations…………………………………………………………………………256 11.4.3. Conclusions……………………………………………………………………….256 11.4.4. Future Work……………………………………………………………………….257 11.5. References…………………………………………………………………………………...258

SECTION E: THESIS CONCLUSIONS………………………………………………………….262 Chapter 12 – Concluding Remarks…………………………………………………………….262 12.0. Introduction…………………………………………………………………………………...263 12.1. Conclusions…………………………………………………………………………………..263 12.2 Future Directions……………………………………………………………………………..266 12.3. Closing Statement…………………………………………………………………………...267

APPENDICES….…………………………………………………………………………….…….268 Appendix A – Supplementary Documents……………………………………………….…..268 A.1. Chapter 9 –List of Tasks………………………..………………………………….…….…..269 A.2. Chapter 9 - Interview Guide……………………………………………………….…………270 A.3. Chapter 9 – System Usability Scale (SUS)………………………………………………..271 A.4. Chapter 9 – User Mobile Application Rating Scale (uMARS)……………………………272

Appendix B – Research Ethics Documents…………………………………….....…………278 B.1. Ethical Approval Letter – Study 1…………………………………………………………...279 B.2. Participant Information Leaflet – Study 1…………………………………………………..281 B.3. Informed Consent Form – Study 1…………………………………………….……………283

viii

B.4. Ethical Approval Letter – Study 2…………………………………………………………...284 B.5. Participant Information Leaflet – Study 2…………………………………………………..286 B.6. Informed Consent Form – Study 2……………………………………………………….…289 B.7. Par-Q Form – Studies 1 and 2………………………………………………………………290

Appendix C – Journal Publications……………………………..…………...………………..292 Technology in S&C: Tracking Lower-Limb Exercises with Wearable Sensors……………...293 Classification of lunge biomechanics with multiple and individual inertial measurement units………………………………………………………………………………………………….304 Technology in Strength and Conditioning: Assessing Bodyweight Squat Technique with Wearable Sensors………………………………………………………………………………….324 Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units………………………………………………………………….…….334 Classification of deadlift biomechanics with wearable inertial measurement units…………341 Technology in Rehabilitation: Comparing Personalised and Global Classification Methodologies in Evaluating the Squat Exercise with Wearable IMUs………………………348 Mobile App to Streamline the Development of Wearable Sensor-Based Exercise Biofeedback Systems: System Development and Evaluation………………………………………………..358 Feature-free Activity Classification of Inertial Sensor Data with Machine Vision Techniques: Method, Development and Evaluation…………………………………………………………...371 Appendix D – Conference Publications………………………………………………………384 Evaluating Squat Performance with a Single Inertial Measurement Unit…………………….385 Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit..…………………………………………………………………………………………………391 Evaluating Performance of the Lunge Exercise with Multiple and Individual Inertial Measurement Units………………………………………………………………………………...395 Leveraging IMU Data for Accurate Exercise Performance Classification and Musculoskeletal Injury Risk Screening………………………………………………………………………………396 Objective Classification of Dynamic Balance Using a Single Wearable Sensor…………….398 The Influence of Feature Selection Methods on Exercise Classification with Inertial Measurement Units………………………………………………………………………………...417 Binary Classification of Running Fatigue using a Single Inertial Measurement Unit……….421

ix

Thesis Abstract The benefits of resistance training are diverse and well documented. Compound lower limb exercises are utilised in rehabilitation and strength and conditioning programmes to improve the exercisers’ functional movement, strength, muscular hypertrophy and a variety of other factors. However, adhering to exercise programmes and completing exercises with safe and effective technique can be very difficult for exercisers. Traditionally a clinician or coach can be employed to address this, providing motivation and guidance to exercisers. In many cases however, due to cost and availability issues, one is left to complete exercise without expert supervision. In such circumstances the exerciser’s motivation may dwindle and their exercise technique may be aberrant which can reduce progress towards training goals and increase their likelihood of injury. Wearable inertial measurement units (IMUs) consisting of accelerometers, gyroscopes and magnetometers are a relatively new technology which may be employed to address the aforementioned issues which arise when completing compound lower limb exercises. They hold a number of key advantages over traditional biomechanical measurement tools (e.g. force plates, electromyography, optical motion capture) which make them more applicable for use in real world biomechanical biofeedback systems. Specifically, they are low cost and highly portable. When used in combination with appropriate data analysis pathways and other ubiquitous technologies (e.g. smartphones and tablets) for data processing and feedback, they may offer the potential to automatically track and assess lower limb exercises. This thesis will investigate the development and evaluation of biomechanical biofeedback systems for compound lower limb exercises, made with IMUs and smartphones. Specifically, signal processing and machine learning will be applied to IMU data to quantify, detect and classify the technique quality of compound lower limb exercises. Of key concern in the development of the systems will be the practicality, usability, functionality and cost of the system for end users. As such, system efficacy will be compared when using systems created using multiple IMUs and individual IMUs positioned at a variety of anatomical locations. A prototype system will be developed and evaluated with different types of real end-users. Finally, steps which can improve the development of future IMU based exercise classification systems will be investigated.

x

Statement of Original Authorship I hereby certify that the submitted work is my own work, was completed while registered as a candidate for the degree stated on the Title Page, and I have not obtained a degree elsewhere on the basis of the research presented in this submitted work.

Signed: ____________________________

Date:_____________________________

xi

Role of Candidate and Collaborators Martin O’Reilly: Whole Thesis: Study design, participant recruitment, data acquisition, data management, data analysis, application development, manuscript preparation, manuscript proofing, and manuscript submission. Darragh Whelan: Section B: Study design, participant recruitment, data acquisition, data labelling, manuscript preparation, and manuscript proofing. Brian Caulfield: Whole thesis: Study design, manuscript preparation, and manuscript proofing. Tomás Ward: Whole thesis: Study design, manuscript preparation, and manuscript proofing. Eamonn Delahunt: Section B: Study design, manuscript preparation, and manuscript proofing. Cailbhe Doherty: Chapter 2: Data analysis, manuscript preparation, manuscript proofing, and manuscript submission. William Johnston: Chapter 2: Data analysis and manuscript preparation. Patrick Slevin: Chapter 9: Study design, data analysis, and manuscript preparation. Ersi Ni: Chapter 9: Application development. Joe Duffin: Chapter 10: Application development. José Dominguez Veiga: Chapter 11: Data analysis, manuscript preparation, manuscript proofing, and manuscript submission.

xii

List of Journal Publications Whelan DF, O'Reilly MA, Ward TE, Delahunt E, Caulfield B. Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units. Methods of information in medicine. 2016 Oct 26;55(6). O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM. Technology in S&C: Assessing Bodyweight Squat Technique with Wearable Sensors. Journal of strength and conditioning research. 2017. Apr 15. [Epub Ahead of print]. O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM. Classification of deadlift biomechanics with wearable inertial measurement units. Journal of biomechanics. 2017. May 16. [Epub Ahead of print]. O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Classification of lunge biomechanics with multiple and individual inertial measurement units. Sports biomechanics. 2017 May 19. [Epub Ahead of print]. OʼReilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Technology in Strength and Conditioning Tracking Lower-Limb Exercises with Wearable Sensors. Journal of strength and conditioning research. 2017 Jun;31(6):1726. O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Technology in Rehabilitation: Comparing Personalised and Global Classification Methodologies in Evaluating the Squat Exercise with Wearable IMUs. Methods of information in medicine. 2017 Jun 14;56(4). O'Reilly MA, Duffin J, Ward TE, Caulfield B. A Mobile Application to Streamline the Development of Wearable Sensor Based Exercise Biofeedback Systems: System Development and Evaluation. JMIR Rehabilitation and Assistive Technologies. 2017 Aug 15; [online]. Dominiguez Veiga J, O'Reilly MA, Whelan DF, Caulfield B, Ward TE. Feature-free Activity Classification of Inertial Sensor Data with Machine Vision Techniques: Method, Development and Evaluation. JMIR mHealth & uHealth. 2017 Jul 15; [online]. O'Reilly MA, Slevin P, Ward TE, Caulfield B. A Mixed-methods Evaluation of ‘Formulift’: a Wearable Sensor Based Exercise Biofeedback System. JMIR mHealth & uHealth. 2017. Awaiting Proofs. OʼReilly MA, Caulfield B, Ward TE, Johnston W, Doherty C. Wearable Inertial Sensor Systems for Lower Limb Exercise Detection, Evaluation and Feedback: A Systematic Review. Sports Medicine. 2018. Under Review. xiii

List of Conference Publications Work Pertaining to Thesis O’Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T, et al. Evaluating squat performance with a single inertial measurement unit. In: Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. p.1-6. Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit. In: Proceedings of the 3rd

Workshop on ICTs for improving Patients Rehabilitation Research Techniques,

(REHAB); 2015. Oct 1; Lisbon, Portugal. N.Y., U.S.A.: ACM; 2015. p. 144–7. Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Lunge Exercise with Multiple and Individual Inertial Measurement Units. Pervasive Health 10th EAI International Conference on Pervasive Computing Technologies for Healthcare 2016. p. 101-8. Whelan D, O'Reilly M, Huang B, Giggins O, Kechadi T, Caulfield B. Leveraging IMU data for accurate exercise performance classification and musculoskeletal injury risk screening. In: Proceedings of the IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC); 2016. Aug 26; Orlando, F.L., U.S.A.. N.Y., U.S.A.: IEEE; 2016. p. 659-62. Associated Collaborative Work Completed During Ph.D. Johnston W, O'Reilly M, Dolan K, Reid N, Coughlan G, Caulfield B. Objective classification of dynamic balance using a single wearable sensor. In: Proceedings of the 4th International Congress on Sport Sciences Research and Technology Support; 2016. Nov 7;Porto, Portugal. Setubal, Portugal: SCITEPRESS–Science and Technology Publications; 2016. p. 15-24. O'Reilly MA, Johnston W, Buckley C, Whelan D, Caulfield B. The influence of feature selection methods on exercise classification with inertial measurement units. In: Proceedings of the 14th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2017. May 9; Eindhoven, Netherlands. N.Y., U.S.A.: IEEE; 2017. p. 193-96. Buckley C, O'Reilly MA, Whelan D, Farrell AV, Clark L, Longo V, Gilchrist MD, Caulfield B. Binary classification of running fatigue using a single inertial measurement unit. In: Proceedings of the 14th International Conference on

Wearable and Implantable Body

Sensor Networks (BSN); 2017. May 9; Eindhoven, Netherlands. N.Y., U.S.A.: IEEE; 2017. p. 197-201. xiv

Abbreviations 3-D: three-dimensional CNN: convolutional neural network CPU: central processing unit DSP: digital signal processing ED: exercise detection FE: feedback evaluation GPU: graphics processing unit HAR: human activity recognition IMU: inertial measurement unit IQR: interquartile range KNN: k-nearest neighbours LOOCV: leave-one-out-cross-validation LOSOCV: leave-one-subject-out-cross-validation MC: movement classification MV: measurement validation NSCA: national strength and conditioning association RMS: Root mean square ROM: range of motion RT: resistance training S&C: strength and conditioning SD: standard deviation SLS: single leg squat SUS: system usability scale SVM: support vector machine UE: user experience UKSCA: united kingdom strength and conditioning association uMARS: user mobile application rating scale

xv

Acknowledgements I would like to say a huge thank you to….. Brian and Tomás, for being superb mentors, continuously inspiring and for completely breaking the mould of the PhD supervisor stereotype. I feel incredibly lucky to have had the chance to work with you both! Eamonn and Matt, for your expert input and unique perspectives throughout the Ph.D. process. All those that volunteered in my studies, for your selflessness, your time and your efforts. Kevin, Matt, Ken and Cailbhe, for being superhero postdocs! All the Insight personal sensing team, for all the thought provoking conversations, fresh perspectives, laughs and implicit learning you have given me over the years. Karol, Kieran, Niamh, Matt, Paddy and everyone at Shimmer, past and present, for helping making this work possible. Darragh, for being the Garfunkel to my Simon. To all my friends, for the comic relief and unforgettable good times which made the last four years such a pleasure. Soracha, for your tremendous love and support and making Dublin a second home for the past few years. Bernie and Anne-Marie, for being simply the best older sisters and rocks, or as Mum would say ‘bricks’, through thick and thin. Emily, for the unexpected day trip to Nairobi and for going to extreme lengths to ensure writing the supplementary material at the end of this thesis was bottom of my ‘list of appendix related problems’ in 2017. I look forward to the day in 2018 when Dad and I get to congratulate our family’s first proper Doctor. Mum, for being the most loving person I know. And most of all to Dad, for being my research hero since before I even knew what research was and more importantly for your positive attitude with which you tackle everything, stride by stride! You are a truly incredible inspiration to me both in academia and far, far beyond.

xvi

Section A: Background to Thesis Chapter 1 Tracking Compound Lower Limb Exercises: An Introduction to Relevant Theoretical Constructs and Thesis Objectives

1

1.0. Introduction Resistance training (RT) in a healthy population is a popular and beneficial modality of exercise completed by millions of people worldwide. The benefits of RT are well documented and diverse. In general one may participate in RT in order to increase their muscular strength (1) and hypertrophy (2). Further benefits of regular RT include improved athletic performance (3), mental health (4), quality of life in elderly populations (5) and controlled weight loss and weight gain (6). To safely and efficiently achieve such benefits, the dosage (volume and intensity of the exercise), frequency of participation (training sessions per week), type of exercise (High intensity interval training, plyometric training, strength training), rest and recovery and movement quality (adherence to exercise technique guidelines) all must be considered (7). This thesis will focus specifically on compound lower limb exercises. A compound movement can be defined as any exercise that engages two or more different joints to fully stimulate entire muscle groups and multiple muscles (7). Bodyweight (BW) RT exercises will be investigated; BW lunges, BW squats and BW single leg squats. RT exercises commonly used for strength and conditioning (S&C) will also be studied; barbell back squats and barbell deadlifts. This work will focus on evaluating the completion of these compound lower limb exercises and providing users with associated feedback. It is hoped that learnings arising from this research will be generalizable to the field of exercise classification, and inform the creation of IMU-based classification systems for many more exercises and applications. There will be three primary research concerns; 1) Developing methods of evaluating the chosen compound lower limb exercises. 2) Implementing such methods in to a prototype system and evaluating it with end users. 3) Establishing methods to improve the accuracy of and streamline the development of exercise classification systems for new exercises and applications. This introductory chapter will provide background regarding the above research concerns. Firstly, the literature discussing compound lower limb exercises is presented. The associated benefits and challenges regarding them are outlined (Section 1.1). This is followed by presenting current practice in the assessment of compound lower limb exercises in an S&C setting (Section 1.2). Section 1.3 will outline the potential role of inertial measurement units (IMUs) in assessing compound, lower limb exercises and a review of exercise analysis systems utilising IMUs will be presented. Section 1.4 will discuss biomechanical biofeedback systems which utilise IMUs with a specific focus on users’ experiences and their perceived benefits of using such systems. Section 1.5 will display the aims and objectives of this thesis. 2

1.1. The Benefits of Compound Lower Limb Exercises and Difficulties in Achieving Them Many international bodies recommend participation in RT due to the many proven benefits of such training. For instance, the American College of Sports Medicine recommend a minimum of 2 strength training sessions per week for both healthy adults and older or frail adults. These guidelines also include recommendations that one completes barbell back squats, deadlifts

and/or

BW

lunges

(8).

The

World

Health

Organisation

replicate

this

recommendation for adults aged 18-64 (9). Such recommendations stem from a plethora of research demonstrating the diverse benefits of RT ranging from improved athletic performance in sporting populations (3) to a greater quality of life in elderly populations (5). The specific RT exercises employed in this thesis have a number of proven benefits. BW lunges and squats are highly practical to complete as they require no additional equipment. They maintain and reinforce the body’s fundamental functional movement patterns and heighten one’s strength and relative strength (to body mass) (10). Single leg squats replicate these benefits and also aid one’s balance, mobility, coordination and proprioception (11). These exercises are particularly useful for lesser trained individuals beginning a RT programme or younger and older people (7). In addition to being used in an S&C context, compound lower limb BW exercises are regularly used in musculoskeletal injury screening (12,13) and as late stage rehabilitation exercises for the hip, knee and ankle (14). In all circumstances strict guidelines for movement patterns should be adhered to. Barbell squats and deadlifts are two of the most researched and practiced S&C exercises (15,16). They have been shown to have great ability in improving an individual’s lower limb force production and strengthening the body’s posterior chain in comparison to other exercises (17,18). They are regularly used in S&C settings as part of programmes which develop muscle mass, lower limb force production, vertical jump performance and/or sprint speed due to their proven correlations with and impact on such performance measures (18– 20). Therefore, there is a vast array of benefits to completing all exercises investigated in this thesis. It should be noted that each of the exercises also have corresponding guidelines on how to complete them with acceptable technique (21,22). They also each possess a number of common deviations from acceptable technique (21,22). Completing the exercises with aberrant movement patterns has been shown to increase stress on the joints of the lower extremity and spine, potentiating the risk of injury (23–25). Overuse injuries are also commonly reported in powerlifting (26). It is therefore of great importance to ensure

3

acceptable technique is performed when completing lower limb exercises. This is in addition to monitoring training volume and load.

1.2. Assessing Compound Lower Limb Exercises: Current Practice There are currently 3 distinct methods which are undertaken to assess the technique of the exercises used in this thesis: (i) 3-D motion capture; (ii) visual analysis from a qualified exercise professional; (iii) self-assessment. All of these have a number of limitations. 3-D motion capture systems are expensive (> €100,000) and the application of skin-mounted markers may hinder normal movement (27,28). Furthermore, data processing can be time intensive and specific expertise is often required to interpret the processed data and to make recommendations on the observed results. Therefore, these systems are not frequently used to assess exercise technique beyond the research laboratory (29). In clinical and gym-based settings, subjective visual assessment is typically used to assess lower limb, multi-joint exercises. This subjective visual assessment of human biomechanics is not always reliable even amongst experts as the need to visually assess numerous constituent components simultaneously is challenging (30). People exercising may not be able to afford supervision in many instances. Therefore, most commonly people completing the exercises studied in this thesis rely on self-assessment of their exercise technique. This is a very flawed method of exercise technique assessment as people may lack the knowledge required to assess the movement patterns. Simultaneously completing an exertive movement and assessing it can be difficult and one may be bias in the assessment of themselves (31). The author therefore believes that there is a need for a technique analysis system for lower limb exercises. This system should draw on the strengths of the 3 methods (objective, valid, and practical) and ameliorate their limitations (expensive, subjective, and bias).

1.3. Exercise Assessment with IMUs A comprehensive review of IMUs and their role in monitoring lower limb exercises is presented in the next Chapter. This section provides an introductory outline to IMUs and their use to date in monitoring repetition based exercises. Wearable IMUs may offer the potential to bridge the gap between laboratory and day-to-day “real-world” acquisition and assessment of human movement. These IMUs are small, inexpensive sensing units that consist of accelerometers, gyroscopes and magnetometers. They are able to acquire data pertaining to the inertial motion and 3-D orientation of individual limb segments (32,33). Self-contained, wireless IMU devices are easy to set up 4

and allow for the acquisition of human movement data in unconstrained environments (34). In this thesis the term ‘IMU system’ will be used to describe the IMU sensors, the sensor signals, the associated signal processing applied to them and the output of the exercise classification algorithms. IMU systems can robustly track the variety of postures and environmental complexities associated with training. This gives them an advantage over camera-based systems, which are hampered by location, occlusion and lighting issues (35). IMU systems have also been shown to have high agreement with optoelectronic motion capture systems when measuring joint angles (29,36,37). There are many commercially available examples of IMU systems that monitor physical activity (e.g. PebbleTM and FitbitTM). However, using IMU systems to assess gym-based exercises is less common. Researchers have demonstrated the ability of IMU-based systems to distinguish different gym-based exercises and count repetitions of these exercises with moderate to good levels of accuracy (38–42). While these aforementioned systems detail information on the number of repetitions performed and which exercise was completed, they do not provide instruction on exercise technique and quality of performance. A holistic exercise tracking system should not only recognise the exercise completed, but should also provide technique feedback. A growing body of scientific literature has investigated the utility of IMU systems to assess exercise technique. Taylor et al (43) used a five IMU system to categorize five technique deviations during performance of the standing hamstring curl exercise and four technique deviations during performance of the straight-leg raise exercise. They were able to identify these deviations with 80% accuracy, 75% sensitivity and 90% specificity. Pernek and colleagues (40) also utilized a five IMU system to monitor exercise intensity in six dumbbell upper limb exercises (biceps curl, single arm triceps extension, front vertical lift, lateral vertical lift, bent over row, military press), demonstrating an average error of just 6% for intensity prediction. Melzi et al (44) used a wireless body area network of accelerometers to assess performance of the biceps curl. However, their proposed approach was very exercise specific and difficult to transfer to other regularly performed gym-based exercises (40). Research undertaken by Velloso et al (45) examined the ability of a four IMU system to identify four deviations from acceptable technique during a unilateral dumbbell bicep curl. Their reported overall accuracy ranged from 74-86%. While these results are encouraging, the multiple IMU systems detailed above are expensive and their use may prove impractical due to the increased risk of placement error and comfort issues (29). In addition the use of multiple IMUs leads to increased overall power requirements and connectivity complexity with respect to the hosting device (e.g. smartphone, tablet).

5

A reduced IMU set-up is more desirable for daily environment applications (29) . Bonnet et al reported that that a single IMU mounted on the lower back could predict ankle, knee and hip joint angles in the sagittal plane during performance of the BW squat exercise, with a maximal error of 3.5° compared to a motion capture system. This indicates that relatively accurate quantification of sagittal plane lower limb joint movement can be achieved using a single sensor unit system. Giggins et al (46) demonstrated an overall accuracy of 79-83% using a single IMU on either the foot, shin or thigh to identify deviations in the performance of seven rehabilitation exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads and straight-leg raise). Pernek et al (39) analysed the ability of an accelerometer contained within a smartphone to assess exercise performance during nine free-weight and machine resistance exercises. They assessed movement quality based on the speed of exercise performance and reported a temporal error of 11% for individual repetition duration. In summary, over the past decade IMU systems have been shown to have the ability to detect a variety of exercise types and assess movement quality during exercise with varying degrees of accuracy. In order to create an exercise tracking tool for lower limb exercises IMUs positioned at a variety of anatomical locations may be employed. An IMU system for exercise analysis should aim to be as practical and affordable as possible (Table 1.1). This can be achieved by minimising the number of IMUs a user needs to wear. Despite this, the system should be functional and therefore a high level of accuracy should be achieved by the chosen IMU set-up. Finally, the information gained from the IMU system should be relayed to the user in a simple, engaging and understandable fashion. The method of presenting the information gained from the IMU system to a user may be just as important as computing accurate and meaningful information in the first instance.

Table 1.1: Desirable features for IMU-based exercise biofeedback systems. 

Affordable – inexpensive for the user to purchase.



Portable – can be used in any gym and other workout environments.



Computationally efficient - to prolong battery life and provide instant feedback.



Easy to set up – easy to position correctly with minimal set up time required.



Wearable - easy for the user to don/doff system.



Accurate - accurately classify movement during exercise.



Deliver effective feedback – informing the user on what they are doing, how they are doing it, and how they need to do to improve their performance.

6

1.4. IMU Based Exercise Biofeedback Systems While all the aforementioned work demonstrates the technological proficiency of IMU based exercise biofeedback systems in classifying exercise technique, little is currently known about the user-experience and users’ perceptions of such biofeedback systems. There are relatively vast number of studies which examine this in similar fields such as upper body stroke rehabilitation (47), but there is currently a sparsity of such studies for IMU feedback systems for lower limb exercises.

1.5. Aims and Objectives This thesis is concerned with the development and evaluation of exercise classification systems for compound lower limb exercises. Systems using IMUs for motion tracking are proposed as they are relatively low cost, portable and practical for system users. Therefore, the aim of this thesis was to develop and compare methodologies and approaches to classifying lower limb compound exercises. Of key interest were the accuracy, efficiency, practicality and usability of the developed systems. To investigate these system qualities the following objectives were identified for this thesis: 

To conduct a systematic review of the literature regarding IMU systems for the analysis of targeted lower limb exercises used in the following spaces: strength and conditioning, musculoskeletal and orthopaedic rehabilitation, injury screening and athlete performance evaluation.



To develop, compare and contrast wearable IMU systems which automatically detect compound lower limb exercises when using IMUs, positioned at various anatomical locations, in combination and isolation.



To develop, compare and contrast wearable IMU systems which classify exercise technique quality when using IMUs, positioned at various anatomical locations, in combination and isolation.



To develop a prototype exercise classification feedback system and complete a formal, mixed-methods evaluation of its usability, functionality and perceived impact with different types of real end users.



To investigate novel methods in creating exercise classifications systems which may further improve system accuracy, efficiency, practicality and usability.

7

1.6. Data Collection and Participants While the work pertaining to Sections B, C and D of this thesis is presented via publications, the associated data collection corresponds to three phases. A total of 86 participants were recruited, via campus posters and social media, during phase 1 and 2 of data collection (Section B of thesis). Of these, 22 participants completed phase 1 which involved collection of the lunge (Chapter 4), BW squat (Chapter 5), deadlift (Chapter 7) data whereby the participants deliberately induced common technique deviations. Phase 1 also included unconditioned SLS data (Chapter 6). In phase 2, the remaining 64 participants completed the unconditioned barbell squat (Chapter 8) and unconditioned deadlift (Chapter 7) (i.e. 3 repetition maximum tests) in addition to the phase 1 data collection protocol. Sporadic issues with sensor Bluetooth connectivity throughout data collection meant full data sets for each exercise could not be obtained for some participants in both phase 1 and phase 2 of data collection. In addition to this, it was required by University College Dublin’s Human Research Ethics Board that all study participants had at least one year’s experience with each exercise for which they completed data collection. The combination of these factors means that the participant numbers vary slightly for Chapters 3 through to 8. A total of 15 new participants were recruited via campus posters for phase 3 of data collection (Section C and D of thesis). The same participants were used for evaluating the ‘Formulift’ system (Chapter 9) and validating the tablet application for the streamlined creation of personalised exercise classifiers (Chapter 10).

1.7. References 1. Folland JP, Williams AG. Morphological and neurological contributions to increased strength. Sports medicine. 2007 Feb 1;37(2):145-68. 2. Latham NK, Bennett DA, Stretton CM, Anderson CS. Systematic review of progressive resistance strength training in older adults. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences. 2004 Jan 1;59(1):48-61. 3. Cormie P, Mcguigan MR, Newton RU. Adaptations in athletic performance after ballistic power versus strength training. Medicine & Science in Sports & Exercise. 2010 Aug 1;42(8):1582-98. 4. O'Connor PJ, Herring MP, Caravalho A. Mental health benefits of strength training in adults. American Journal of Lifestyle Medicine. 2010 Sep;4(5):377-96. 5. Rejeski WJ, Mihalko SL. Physical activity and quality of life in older adults. The journals of gerontology. Series A, Biological sciences and medical sciences. 2001 Oct;56:23.

8

6. American College of Sports Medicine. Appropriate intervention strategies for weight loss and prevention of weight regain for adults. Med Sci Sports Exerc. 2001 Oct 11;33(12):2145-56. 7. Kraemer WJ, Ratamess NA. Fundamentals of resistance training: progression and exercise prescription. Medicine and science in sports and exercise. 2004 Apr 1;36(4):674-88. 8. American College of Sports Medicine. ACSM's guidelines for exercise testing and prescription. Philadelphia, P.A., U.S.A. Lippincott Williams & Wilkins; 2013 Mar 4. 9. World Health Organisation. Global Strategy on Diet, Physical Activity and Health [Internet].

2017

[cited

2017

Mar

31].

Available

from:

http://www.who.int/dietphysicalactivity/factsheet_adults/en/ 10. Harrison JS. Bodyweight training: A return to basics. Strength & Conditioning Journal. 2010 Apr 1;32(2):52-5. 11. Zwerver J, Bredeweg SW, Hof AL. Biomechanical analysis of the single-leg decline squat. British journal of sports medicine. 2007 Apr 1;41(4):264-8. 12. Teyhen DS, Shaffer SW, Lorenson CL, Halfpap JP, Donofry DF, Walker MJ, Dugan JL, Childs JD. The functional movement screen: a reliability study. Journal of orthopaedic & sports physical therapy. 2012 Jun;42(6):530-40. 13. O'Connor FG, Deuster PA, Davis J, Pappas CG, Knapik JJ. Functional movement screening: predicting injuries in officer candidates. Medicine and science in sports and exercise. 2011 Dec;43(12):2224-30. 14. Boling MC, Bolgla LA, Mattacola CG, Uhl TL, Hosey RG. Outcomes of a weight-bearing rehabilitation program for patients diagnosed with patellofemoral pain syndrome. Archives of physical medicine and rehabilitation. 2006 Nov 30;87(11):1428-35.. 15. Clark DR, Lambert MI, Hunter AM. Muscle activation in the loaded free barbell squat: a brief review. The Journal of Strength & Conditioning Research. 2012 Apr 1;26(4):116978. 16. McGuigan MR, Wilson BD. Biomechanical Analysis of the Deadlift. The Journal of Strength & Conditioning Research. 1996 Nov 1;10(4):250-5. . 17. Augustsson J, Esko A, Thomeé R, Svantesson U. Weight training of the thigh muscles using closed versus open kinetic chain exercises: a comparison of performance enhancement. Journal of Orthopaedic & Sports Physical Therapy. 1998 Jan;27(1):3-8. 18. Thompson BJ, Stock MS, Shields JE, Luera MJ, Munayer IK, Mota JA, Carrillo EC, 9

Olinghouse KD. Barbell deadlift training increases the rate of torque development and vertical jump performance in novices. Journal of strength and conditioning research. 2015 Jan;29(1):1. 19. Wisløff U, Castagna C, Helgerud J, Jones R, Hoff J. Strong correlation of maximal squat strength with sprint performance and vertical jump height in elite soccer players. British journal of sports medicine. 2004 Jun 1;38(3):285-8. 20. Comfort P, Stewart A, Bloom L, Clarkson B. Relationships between strength, sprint, and jump performance in well-trained youth soccer players. The Journal of Strength & Conditioning Research. 2014 Jan 1;28(1):173-7. 21. Baechle TR, Earle RW. Resistance Training Exercise Techniques. NSCA's Essentials of Personal Training. Champaign, I.L., U.S.A.: Human Kinetics; 2004. 22. Whatman C, Hing W, Hume P. Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Physical Therapy in Sport. 2012;13(2):87-96. 23. Hall M, Nielsen JH, Holsgaard-Larsen A, Nielsen DB, Creaby MW, Thorlund JB. Forward lunge knee biomechanics before and after partial meniscectomy. The Knee. 2015 Dec 31;22(6):506-9. 24. Farrokhi S, Pollard CD, Souza RB, Chen YJ, Reischl S, Powers CM. Trunk position influences the kinematics, kinetics, and muscle activity of the lead lower extremity during the forward lunge exercise. journal of orthopaedic & sports physical therapy. 2008 Jul;38(7):403-9. 25. Cholewicki J, McGill SM, Norman RW. Lumbar spine loads during the lifting of extremely heavy weights. Medicine and science in sports and exercise. 1991 Oct;23(10):1179-86. 26. Siewe J, Rudat J, Röllinghoff M, Schlegel UJ, Eysel P, Michael JP. Injuries and overuse syndromes in powerlifting. International journal of sports medicine. 2011 Sep;32(09):70311. 27. Ahmadi A, Mitchell E, Destelle F, Gowing M, O’Connor NE, Richter C, et al. Automatic activity classification and movement assessment during a sports training session using wearable inertial sensors. In: Proceedings

of the 11th International Conference on

Wearable and Implantable Body Sensor Networks, (BSN); 2014. Jun 16; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2014. p. 98–103. 28. Bonnechere B, Jansen B, Salvia P, Bouzahouene H, Omelina L, Moiseev F, Sholukha V, Cornelis J, Rooze M, Jan SV. Validity and reliability of the Kinect within functional

10

assessment activities: comparison with standard stereophotogrammetry. Gait & posture. 2014 Jan 31;39(1):593-8. 29. Bonnet V, Mazza C, Fraisse P, Cappozzo A. Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Transactions on Biomedical Engineering. 2013 Jul;60(7):1920-6. 30. Whiteside D, Deneweth JM, Pohorence MA, Sandoval B, Russell JR, McLean SG, Zernicke RF, Goulet GC. Grading the functional movement screen: A comparison of manual (real-time) and objective methods. The Journal of Strength & Conditioning Research. 2016 Apr 1;30(4):924-33. 31. John OP, Robins RW. Accuracy and bias in self-perception: individual differences in selfenhancement and the role of narcissism. Journal of personality and social psychology. 1994 Jan;66(1):206. 32. Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7. 33. Burns A, Greene BR, McGrath MJ, O'Shea TJ, Kuris B, Ayer SM, et al. SHIMMER™–A wireless sensor platform for noninvasive biomedical research. IEEE Sensors Journal. 2010;10(9):1527-34. 34. McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012 Dec 1;15(4):207-13. 35. Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems; 2014 Apr 26; Toronto, Canada. N.Y., U.S.A.: ACM; 2014. p. 3225-3234. 36. Leardini A, Lullini G, Giannini S, Berti L, Ortolani M, Caravaggi P. Validation of the angular measurements of a new inertial-measurement-unit based rehabilitation system: comparison with state-of-the-art gait analysis. Journal of neuroengineering and rehabilitation. 2014 Sep 11;11(1):136. 37. Tang Z, Sekine M, Tamura T, Tanaka N, Yoshida M, Chen W. Measurement and estimation of 3d orientation using magnetic and inertial sensors. Advanced Biomedical Engineering. 2015;4:135-43.

11

38. Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Proceedings of the 9th International Conference on Ubiquitous Computing (UbiComp); 2007. Sep 16; Innsbruck, Austria. Berlin, Germany: Springer; 2007. p. 19-37 39. Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Personal and ubiquitous computing. 2013;17(4):771-82. 40. Pernek I, Kurillo G, Stiglic G, Bajcsy R. Recognizing the intensity of strength training exercises with wearable sensors. Journal of Biomedical Informatics. 2015;58:145-55. 41. Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks; 2011. Nov 7; Beijing, China. Brussels, Belgium: ICST; 2011. p. 1-7. 42. Muehlbauer M, Bahle G, Lukowicz P. What can an arm holster worn smart phone do for activity recognition?. In: Proceedings of the 15th Annual International Symposium on Wearable Computers (ISWC); 2011. Jun 12; San Francisco, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2011. p. 79-82. 43. Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2014. p. 2214-2218. 44. Melzi S, Borsani L, Cesana M. The virtual trainer: supervising movements through a wearable wireless sensor network. In: Proceedings of the 2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, (SECON Workshops); 2009. 22 Jun; Rome, Italy. N.Y., U.S.A.: IEEE; 2009. P. 1-3. 45. Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H. Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference; 2013. Mar 7; Stuttgart, Germany. N.Y., U.S.A.: ACM; 2014. p. 116-123. 46. Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors:

a

cross-sectional

analytical

study.

Journal

of

neuroengineering

and

rehabilitation. 2014 Nov 27;11(1):158. 47. Wang Q, Markopoulos P, Yu B, Chen W, Timmermans A. Interactive wearable systems for upper body rehabilitation: a systematic review. Journal of neuroengineering and rehabilitation. 2017 Mar 11;14(1):20.

12

Chapter 2

Wearable Inertial Sensor Systems for Lower Limb Exercise Detection, Evaluation and Feedback: A Systematic Review

This chapter is based on the following paper which is currently under review in Sports Medicine: OʼReilly MA, Caulfield B, Ward TE, Johnston W, Doherty C. Wearable Inertial Sensor Systems for Lower Limb Exercise Detection, Evaluation and Feedback: A Systematic Review. Sports Medicine. 2018. Under Review. 13

2.0. Abstract Background: Analysis of lower limb exercises is traditionally completed with three distinct methods (i) 3D motion capture; (ii) visual analysis from a qualified exercise professional; (iii) self-assessment. Each method is associated with a number of limitations. Objective: The aim of this systematic review is to synthesize and evaluate studies which have investigated the capacity for Inertial Measurement Unit (IMU) technologies to assess movement quality in lower limb exercises. Data Sources: A systematic review of PubMed, ScienceDirect and Scopus was conducted. Study Eligibility Criteria: Articles written in English and published in the last 10 years which contained an IMU system for the analysis of repetition-based targeted lower limb exercises were included. Study Appraisal and Synthesis Methods: The quality of included studies was measured using an adapted version of the STROBE assessment criteria for cross-sectional studies. The studies were categorised in to five groupings: Exercise detection, movement classification, measurement validation, exercise-performance feedback or user-experience. Each study was then qualitatively summarised. Results: From the 2452 articles that were identified with the search strategies, 55 papers are included in this review. Conclusions: Wearable inertial sensor systems for analysing lower limb exercises are a rapidly growing technology. Research over the past ten years has predominantly focused on validating measurements that the systems produce and classifying users’ exercise quality. There are a limited number of studies which provide feedback to end-users and assess its effects. There have been very few user evaluation studies and no clinical trials in this field to date.

14

2.1. Introduction Lower limb exercises are used in rehabilitation, performance assessment, injury screening and strength and conditioning (S&C) contexts (1–3). Movement is deemed ‘optimal’ during these exercises when injury-risk is minimised and performance is maximised (4). There are currently 3 distinct methods of assessing movement during lower limb exercise: (i) 3D motion capture; (ii) visual analysis from a qualified exercise professional; (iii) self-assessment. Each method is associated with a number of limitations. For instance, 3-D motion capture systems are expensive (> €100,000) and the application of skin-mounted markers may hinder normal movement (5,6). Furthermore, data processing can be time intensive and specific expertise is often required to interpret the processed data and to make recommendations on the observed results. Therefore, these systems are not frequently used to assess exercise technique beyond the research laboratory (7). In clinical and gym-based settings, visual assessment is typically used to assess lower limb exercises. Visual assessment of human biomechanics is subjective and unreliable amongst novices and experts alike, as the need to visually assess numerous constituent components simultaneously is challenging (8). This issue is compounded by the fact that athletes/clients may not be able to afford the supervision of a qualified professional (such as a physiotherapist, athletic therapist or personal trainer) in many instances. For this reason, individuals largely rely on selfassessment of their exercise technique in gym-based settings. The obvious limitation with this approach is that the individual may lack the knowledge required to assess their movement patterns, while simultaneously completing an exertive movement and assessing it without bias can be difficult (9). Due to these limitations, in the past 15 years there has been an increase in interest in the automated assessment of lower limb exercises with wearable IMUs. Wearable IMUs are small, inexpensive sensing units that consist of accelerometers, gyroscopes and/or magnetometers. They are able to acquire data pertaining to the inertial motion and 3D orientation of individual limb segments (10,11). Self-contained, wireless IMU devices are easy to set up, and allow for the acquisition of human movement data in unconstrained environments (12). IMU systems can robustly track a variety of postures in the complex environment associated with training in the ‘real-world’, unlike camera-based systems, which are hampered by location, occlusion and lighting issues (13). IMUs have also been shown to be as effective as marker-based systems at measuring joint angles (7,14,15). Therefore, IMUs have been recently employed for analysing a range of components of lower limb exercises. This includes detecting and quantifying the number of repetitions that are completed of a given exercise (16,17) , computing the range of motion (ROM) at key joints during these repetitions (18,19), temporal analysis of exercises (20,21), classifying one’s 15

performance of an exercise as acceptable or as a specific deviation from acceptable (3,22), or extracting exercise performance measures such as jump height and reactive strength index (23). In the past decade a number of reviews have assessed the literature pertaining to exercise analysis with wearable sensors. Fong and Chan reviewed the use of wearable IMUs in lower limb biomechanics studies, however the focus of this work was broad, and predominantly reviewed gait based papers (24). Another early review covered the broad scope of health and wellness, rehabilitation, and injury prevention with both wearable and ambient sensor systems (25). The field has expanded considerably since then. Recently, a systematic review was published by Wang et al. which classified studies involving upper limb wearable systems for rehabilitation (26). The ‘wearability’ of such systems and evidence supporting the systems’ effectiveness were also reviewed. Prior to this, this group published a review of studies on upper limb rehabilitation systems from 2008 to 2013 (27). A variety of works have given an in depth summary of movement measurement and analysis technologies, however these do not focus on exercise analysis or the lower limb (28–30). Cuesta-Vargas et al reviewed the use of inertial sensors in human motion analysis and showed their capability for task-specific analysis (31). Other studies have investigated how feedback affects therapy outcomes, however these systems did not necessarily involve wearable IMUs and focused predominantly on the upper extremity (32–34). To date, a contemporary systematic review investigating the capacity for IMU technologies to quantify movement quality during lower limb exercises is not available. Therefore, the aim of this systematic review is to synthesize and evaluate studies which have investigated the capacity for IMU technologies to assess movement quality in lower limb exercises. In particular, we aim to describe the sensing setups used, inclusive of type (accelerometer and/or gyroscope and/or magnetometer), number and position of the sensing units. We also aim to describe the measurements each system extracted from the sensing units (e.g. ROM, power) and if these were extracted as part of a validation study or an evaluation study with real users. We will also establish which exercises were analysed by such systems. This review serves to summarise a rapidly growing field which has not been specifically reviewed in over 7 years. It will identify clear gaps in the literature which require further research and can be used as a resource for sports-medicine practitioners to build an understanding the capabilities of IMU systems in assessing lower limb exercises.

16

2.2. Methods 2.2.0. Literature Search Strategy and Study Selection Process The protocol for this review was performed in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement (35). A literature search was completed within the following 3 databases: PubMed, Scopus and ScienceDirect. Papers regarding the following were selected: exercise, lower body, movement monitoring and IMUs. MeSh (Medical Subject Heading) terms or title/abstract keywords and their synonyms and spelling variations were used in several combinations and modified for every database. Articles published from January 2007 to May 2017 were reviewed. The general search strategy including the search terms used, are listed in Table 2.1. This search includes refereed journal papers and peer-reviewed articles published in conference proceedings. Only articles written in the English language were included. The article selection process consisted of the following steps using the PRISMA (35) guidelines (Figure

1): 1) A

computerized search strategy was performed for the period January 2007 until September 2017; 2) After removal of duplicates, titles and abstracts of the remaining articles were screened; 3) The reviewer read the full texts and selected articles based on the inclusion/exclusion criteria. In cases where a journal paper covered the contents reported in the earlier conference publications, the journal paper was preferred over the conference paper. In cases where the overlap was only partial, multiple publications were used as sources. Due to the relative novelty of IMU technologies, the grey literature was not searched; only peer-reviewed scientific articles were eligible for inclusion. We deemed this appropriate due to the non-interventional nature of studies in this field. Table 2.1: Literature search strategy. “exercise” OR ”rehab*” OR “weight training” OR ”motor activity” OR ”personal” OR “strength” OR “conditioning” OR “hypertrophy” OR “gym” OR “weight lifting” OR “resistance” OR “training” AND Lower Body “lower body” OR “lower extremity” OR “leg” OR “thigh” OR “shank” OR “ankle” OR “foot” OR “joint” AND Movement “monitor” OR “motion” OR “classif*” OR “recogn*” OR “evaluat*” OR Monitoring “posture” OR “sensing” OR “assess*” OR “quantification” OR “biomech*” OR “tracking” OR “quality” OR “kinematics” OR “biofeedback” AND Inertial “inertial sensor” OR “gyroscop*” OR “IMU” OR “inertial measurement units” Measurement OR “wearable” OR “acceleromet*” OR “ sensor system” OR “sensor Units network” OR “magnetometer” OR “MEMS” OR “smartphone” OR “mobile” OR “wireless” AND NOT “robot” OR “exoskeleton” Exercise

17

Inclusion criteria were: a) The articles contain a system for exercise analysis using IMUs. b) The system is intended for monitoring repetition-based targeted exercises for the lower limb (e.g. squats, deadlifts, single leg squats, lunges, straight leg raises, and jumps), or analysing rehabilitation, workplace or strength and conditioning exercises c) The system included detection of exercises and/or quantification of exercise volume and/or analysis of exercise technique or performance measures. d) Articles were published in the last 10 years. e) Articles were written in the English language. Exclusion criteria were: a) b) c) d)

Systematic reviews and literature reviews. Books and other non-peer reviewed literature. Studies evaluating robotic systems or exoskeletons. Studies investigating human activity recognition in non-rehabilitative or strength and conditioning settings (i.e. in the ‘real-world’). e) Studies evaluating pathological groups only. f) Sensing modality used was not a wearable accelerometer, gyroscope, magnetometer or combination of those (IMU). g) Study only concerns non-repetition based targeted exercises e.g. running, walking, gait, balance. h) Study concerns non-human, animal subjects.

2.2.1. Data Extraction Process Data extraction was completed by two authors (MOR and CD). Where discrepancies occurred, these were discussed and the associated papers were reassessed. A standardised data extraction form was utilised. Details about the study design, the exercises investigated, the sensor systems (e.g. accelerometer-only vs accelerometer + gyroscope) and the set-ups (e.g. multi-site vs single-site) used were ascertained. The studies were divided into five categories based on the aims/objectives of this review: Exercise detection (ED); movement classification (MC); measurement validation (MV); exercise-performance feedback (FE); user-experience (UE). Each study was then qualitatively summarised (aims, findings and conclusions based on these findings).

2.2.2. Assessment of Study Quality Two authors (MOR and CD) evaluated the quality of the studies deemed eligible for inclusion using an adapted version of the STROBE assessment criteria for cross-sectional studies (36), which was devised by author consensus. Specifically, each study was rated on 10 specific criteria which were derived from items 1, 3, 6, 8, 11, 14, 18, 19, 20 and 22 of the

18

original checklist. In cases where the authors completing paper rating (MOR or CD) were an author of a paper included in this review, the paper was instead rated by a different author of this paper (WJ) to minimise the risk of bias. Final study ratings for each reviewer were collated and examined for discrepancies. Any inter-rater disagreement was resolved by consensus decision. Once consensus was reached for all study ratings, overall quality scores were collated by summing those criteria, providing a score out of 10. Studies were considered to be of high quality when >7 domains were scored as high (1). If >3 domains were scored as low (0), the study was considered of low quality.

2.3. Results 2.3.0. Database Search and Paper Lists An overview of the results in the different stages of the article selection process is shown in Figure 2.1. From the 2452 articles that were identified with the search strategies, 55 papers are included in this review following the selection process. The quality of the included reviews is displayed in Table 2.2.

Table 2.2. Risk of bias assessment of the included studies based on the modified STROBE criteria. Ref Modified Strobe criteria 1 2 3 4 5 6 7 8 9 10 Ahmadi et al, 2014 (37) 1 1 0 1 1 0 0 0 0 1 Low Ai et al, 2014 (38) 0 0 0 1 1 1 0 0 0 1 Low Alahakone et al, 2009 (39) 1 1 0 1 1 0 1 0 0 0 Low Arai et el, 2012 (40) 1 0 1 1 1 1 0 0 1 1 Low Bo et al, 2011 (41) 0 0 0 1 1 0 1 0 0 0 Low Bolink et al, 2016 (42) 1 1 1 1 1 1 1 1 1 1 High Bonnet et al, 2011 (7) 1 1 0 1 1 1 0 0 0 1 Low Bonnet et al, 2013 (43) 1 1 1 1 1 1 1 1 1 1 High Chakraborty et al, 2013 (44) 0 0 0 0 0 0 0 0 0 0 Low Chang et al, 2007 (16) 1 0 0 0 1 0 1 0 0 0 Low Charlton et al, 2015 (45) 1 1 1 1 1 1 1 1 1 1 High Chen et al, 2013 (46) 1 0 0 1 1 0 1 0 0 0 Low Chen et al, 2015 (47) 0 0 0 0 0 0 0 0 0 0 Low Conger et al, 2016 (48) 1 1 1 1 1 1 1 1 1 1 High Dominiguez-Veiga et al, 2017 (49) 1 1 0 1 1 1 1 1 1 1 High Dowling et al, 2012 (50) 1 1 1 1 1 1 1 1 1 1 High Doyle et al, 2011 (51) 1 1 0 1 1 1 1 1 1 1 High Epelde et al, 2014 (52) 0 1 1 0 0 1 1 0 0 1 Low Faber et al, 2015 (53) 1 1 0 1 1 1 1 1 1 1 High Fitzgerald et al, 2007 (54) 0 0 0 0 0 0 0 0 0 1 Low Giggins et al, 2013 (17) 0 1 1 1 1 1 1 0 0 1 Low Giggins et al, 2014a (3) 1 1 1 1 1 1 1 1 1 1 High Giggins et al, 2014b (55) 1 1 1 1 1 1 1 1 1 0 High Gleadhill et al, 2016 (20) 1 1 1 1 1 1 1 0 1 1 High González-Sánchez et al, 2016 (56) 1 1 1 1 1 1 1 1 1 1 High 19

Gordon et al, 2012 (57) 0 1 0 1 1 0 1 0 0 0 Low Haladjian et al, 2015 (58) 0 0 0 0 0 1 0 0 0 1 Low Houmanfar et al, 2016 (59) 1 1 0 1 1 1 1 0 1 0 Low Kianifir et al, 2016 (60) 1 1 1 1 1 1 1 0 0 0 Low Lam et al, 2014 (61) 1 1 0 0 0 1 1 0 0 0 Low Lin et al, 2012 (19) 1 1 1 1 1 1 1 0 1 0 High Mehta et al, 2016 (62) 1 1 1 1 1 1 1 1 1 0 High Morales et al, 2017 (63) 1 1 1 1 1 1 1 1 1 1 High Morris et al, 2014 (13) 1 1 0 1 1 1 1 1 0 0 Low O’Reilly et al, 2017a (64) 1 1 0 1 1 1 1 1 1 1 High O’Reilly et al, 2017b (65) 1 1 0 1 1 1 1 1 1 1 High O’Reilly et al, 2017c (66) 1 1 0 1 1 1 1 1 1 1 High O’Reilly et al, 2017d (67) 1 1 1 1 1 1 1 1 1 0 High O’Reilly et al, 2017e (68) 0 1 0 1 1 1 1 1 1 1 High O’Reilly et al, 2017f (69) 1 1 1 1 1 1 1 1 1 1 High Omkar et al, 2011 (70) 1 1 0 1 1 1 1 0 0 1 Low Papi et al, 2015 (71) 1 0 1 0 0 1 1 1 1 1 Low Papi et al, 2015 (72) 1 1 0 1 1 1 1 1 1 1 High Patterson et al, 2010 (23) 1 1 0 1 1 1 1 1 0 1 High Pernek et al, 2012 (73) 1 1 0 1 1 1 1 0 0 1 Low Quaglierella et al, 2010 (74) 1 1 1 1 1 1 1 1 0 1 High Rawson et al, 2010 (75) 1 1 1 1 1 1 1 0 0 1 High Setuain et al, 2015 (76) 1 1 1 1 1 1 1 0 1 0 High Setuain et al, 2015b (21) 1 1 0 1 1 1 1 1 1 1 High Shepherd et al, 2016 (77) 1 0 1 0 0 0 0 1 0 0 Low Taylor et al, 2010 (22) 0 1 0 1 1 0 1 1 0 0 Low Tunçel et al, 2009 (78) 0 1 0 1 1 0 1 0 1 1 Low Whelan et al, 2016 (79) 1 1 0 1 1 1 1 1 1 1 High Yurtman et al, 2014 (18) 1 1 0 1 1 1 0 1 1 0 Low Zijlstra et al, 2010 (80) 1 1 1 1 1 1 1 0 1 0 High Items legend: 1. Provide in the abstract an informative and balanced summary of what was done and what was found. 2. State specific objectives, including any prespecified hypotheses. 3. Give the eligibility criteria, and the sources and methods of selection of participants. 4. For each variable of interest, give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group. 5. Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen and why. 6. Give characteristics of study participants (e.g. demographic, clinical, social) and information on exposures and potential confounders. 7. Summarise key results with reference to study objectives. 8. Discuss limitations of the study, considering sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias. 9. Give a cautious overall interpretation of results considering objectives, limitations, multiplicity of analyses, results from similar studies, and other relevant evidence. 10. Give the source of funding and the role of the funders for the present study and, if applicable, for the original study on which the present article is based.

20

2.3.1. Sensor Set-ups Table 2.3 categorizes the included articles based on whether the systems they adopted used multiple/single sensor units, compared sensor units at a variety of anatomical locations, and/or compared multiple sensor set-ups to single sensor set-ups for each application. There was a large degree of heterogeneity in the included studies’ sensor set-ups. In particular, the types of sensors on board each sensing unit (accelerometer and/or gyroscope and/or magnetometer) and the number of sensing units required to be worn by system users varied. Table 2.4 demonstrates the distribution of sensors used in the included studies.

Sensor set-up Multiple sensor units (n=24) Single sensor units (n=19) Comparison of multiple and single sensor units (n=12)

Table 2.3: Sensing set-ups evaluated. Studies (16),(18),(78),(37),(74),(19),(57),(41),(44),(46),(47),(22), (53),(72),(58),(60),(59),(39),(50),(52),(61),(56),(71) (13),(73),(55),(23),(70),(7),(43),(38),(42),(62),(76),(40),(63),(45),(21),(7 7),(51),(69),(49) (17),(75),(3),(80),(20),(53),(64),(79),(65),(66),(67),(68)

21

Table 2.4: Sensors used in each study included in this review. Author(s), year

Ref

Ahmadi et al, 2014 Ai et al, 2014 Alahakone et al, 2009 Arai et el, 2012 Bo et al, 2011 Bolink et al, 2016 Bonnet et al, 2011 Bonnet et al, 2013 Chakraborty et al, 2013 Chang et al, 2007 Charlton et al, 2015 Chen et al, 2013 Chen et al, 2015 Conger et al, 2016 Dominiguez-Veiga et al, 2017 Dowling et al, 2012 Doyle et al, 2011 Epelde et al, 2014 Faber et al, 2015 Fitzgerald et al, 2007 Giggins et al, 2013 Giggins et al, 2014a Giggins et al, 2014b Gleadhill et al, 2016 González-Sánchez et al, 2016 Gordon et al, 2012 Haladjian et al, 2015 Houmanfar et al, 2016 Kianifir et al, 2016 Lam et al, 2014 Lin et al, 2012 Mehta et al, 2016 Morales et al, 2017 Morris et al, 2014 O’Reilly et al, 2017a O’Reilly et al, 2017b O’Reilly et al, 2017c O’Reilly et al, 2017d O’Reilly et al, 2017e O’Reilly et al, 2017f Omkar et al, 2011 Papi et al, 2015 Papi et al, 2015 Patterson et al, 2010 Pernek et al, 2012 Quaglierella et al, 2010 Rawson et al, 2010 Setuain et al, 2015 Setuain et al, 2015b Shepherd et al, 2016 Taylor et al, 2010 Tunçel et al, 2009 Whelan et al, 2016 Yurtman et al, 2014 Zijlstra et al, 2010

(37) (38) (39) (40) (41) (42) (7) (43) (44) (16) (45) (46) (47) (48) (49) (50) (51) (52) (53) (54) (17) (3) (55) (20) (56) (57) (58) (59) (60) (61) (19) (62) (63) (13) (64) (65) (66) (67) (68) (69) (70) (71) (72) (23) (73) (74) (75) (76) (21) (77) (22) (78) (79) (18) (80)

Accel

Gyro

Mag

Other

22

Figure 2.1: Prisma flowchart of the results from the literature search.

2.3.2. Exercises Investigated Versus Study design In the included studies, a total of fifty-three exercises were evaluated using a wearable inertial sensor system (Table 2.5). The most commonly investigated single-joint, uni-planar exercise was the lying straight leg-raise. There were three single-joint multi-planar exercises investigated. There were also two multi-joint, uni-planar exercises and twenty-six multi-joint, multi-planar exercises investigated. The most investigated of these were the sit-to-stand and squat exercises.

23

M J U P

SJ-MP

SJ-UP

Table 2.5: Exercises which have been investigated during the studies included in this review and the study type which they were included in. Measurement Exercise Movement Feedback End user Validation Detection Classification Evaluation evaluation Lying hip abduction (55), (45) (17), (18) (3), (18) (77) Lying hip extension (55), (45) (17), (18) (3), (18) Lying knee flexion (supine) (62) Inner range quads (55) (17), (81) (3), (81) Seated knee extension (55), (73), (40) (17), (78), (3) (61) Seated knee flexion (19) Lying straight leg raise (55), (19) (17), (18), (81) (3), (18),(81), (22) (77) Standing calf raise (73) (16), (48) Seated straight leg raise (59) (18), (18) (52) (52) Standing straight leg raise (19) (78) (61) Standing knee (58) (78) (22) (77) (61) flexion/extension Standing hip extension (78) (61) Standing hip abduction (78) (22) Standing leg curl (73) Seated calf raises (73) Lying leg curl (73), (75) Seated resisted knee (73), (75) extension Ankle dorsi/plantarflexion (38) (56) Ankle internal/external rotation (38) Ankle inversion/eversion (38) Seated hip internal/external (45) rotation Supine hip internal/external (45) rotation Lying straight diagonal leg (19) raise Standing circle trace (hip) (19) Lying circle trace (hip) (19) Heel slides (55), (47) (17) (3) 24

Lying hip & knee flexion

(55), (19), (59)

Sit to stand

(80), (19), (41), (72), (42) (73) (54), (63)

BMJ-MP

Leg press Lunge Kicking Deadlift Mini-squats Squats

(20) (73), (75), (7), (43), (19), (41), (58), (59)

(17)

(3),

(48), (64), (49) (37) (16) (81) (13), (78), (48), (64), (49)

(65), (69) (69) (81) (66), (67), (69)

(51)

(64), (49) (68), (69) Barbell deadlifts Overhead squats (44) Kettlebell swing (13) Sun salutation (70) Hang clean (57) Block step up (42) (64), (49) Single leg squats (60), (79) Box lift (53) Stoop box lift (53) Squat box lift (53) One leg hops (58) Side hops (58) Box jump (37) Bilateral squat jumps (74) Bilateral Countermovement (57), (74) Jumps Drop jumps (23), (76), (21), (39), (50) Unilateral drop jump (74), (21) Unilateral countermovement (74), (21) jumps (64),(49) Tuck jumps *Key: SJ-UP = Single-joint, uni-planar, SJ-MP = Single-joint, multi-plane, MJ-UP = Multi-joint, uni-planar, MJ-MP = Multi-joint, multi-plane

25

2.3.3. Qualitative Reviews 2.3.3.0. Measurement Validation Twenty-eight studies identified for inclusion in this review attempted to validate wearable motion

sensor

systems

(7,19–21,23,38,40–45,47,53–55,57–59,62,63,70,72–76,80,81).

These twenty-eight studies were categorised as evaluating either concurrent validity (Table 6) or construct validity (Table 7). For the purposes of this review, concurrent validity was defined as when a newly developed tool such as a wearable sensor system is compared to another test which is considered to be the “gold standard” to measure the construct in question (82). Construct validity compares a new wearable system’s output to another test that measures a similar construct but that is not a “gold standard” (convergent validity), or evaluates the system’s capacity to discriminate between known-groups in a cross sectional (discriminative validity; known groups) or longitudinal (discriminative validity; responsiveness) manner (82). Concurrent Validity Seventeen of the studies included in this review sought to compare a wearable sensor system’s output to a tool used in current clinical practice (e.g. goniometer for joint angle measurement) or gold standard biomechanical measurement tools (e.g. optoelectronic motion capture systems and force plates) (7,19,20,23,42,43,45,53,57,58,62,63,72,74–76,80). These studies are summarised in Table 2.6. Construct validity Eleven studies investigated the construct validity of wearable motion systems for specific applications in tracking lower limb exercises. Of these, four assessed convergent validity (40,41,70,73). Five studies pertained to known-groups validity (21,38,44,54,55). Two studies evaluated the longitudinal validity of a lower limb wearable sensor system in assessing joint ROM throughout a rehabilitation programme (47,59). All eleven studies which predominantly regarded construct validity are summarised in Table 2.7.

26

Table 2.6: Summary of studies assessing concurrent validity of wearable sensor based system to standard clinical measure or biomechanical gold standard. Sensor set-up and Gold standard/ Author(s) Sample Outcomes Findings placement Comparator Lin and Kulic ,(19).

3 x tri-axial accel + gyro (trunk, thigh, shank)

20 (8 females, 12

Joint angles

Optoelectronic motion

The average root mean square

females), injury-

during 9 lower

capture

error (RMSE) between a motion

free

limb rehabilitation

capture system and their

exercises

wearable system was 6.5 o across all exercises.

Haladjian et

2 x tri-axial accel + gyro

4 (2 females, 2

Sagittal plane

al., (58).

(thigh & shank)

males), following

knee joint angle

Goniometer

There was a 5 o difference in agreement between the ‘KneeHapp’ system and the

ACL surgery

goniometer. Bolink et al

1 x tri-axial accel + gyro +

17 (8 females, 9

Pelvic orientation

Optoelectronic motion

Frontal plane pelvic angle

(42).

mag (lumbar)

males), injury-free

angles during the

capture

estimations achieved a RMSE in

sit to stand and

the range of 2.7° to 4.5° and

block step up

sagittal plane measurements

exercises

achieved a RMSE in the range of 2.7° to 8.9° when compared with optoelectronic motion capture

Faber et al

1 x accel + gyro (multiple

20 (10 females,

Which location

Optoelectronic motion

They concluded that regardless

(53).

locations from C7-MPSIS)

10 males), injury-

optimally agreed

capture

of participant’s sex or lifting style,

free

with an

the optimal sensing unit location

optoelectronic

for the measurement of trunk 27

motion capture

inclination is at about 25% of the

system’s

distance from the sacrum to C7.

calculation for trunk inclination? The data used was from a variety of box lifting exercises. Mehta et al,

1 x iPhone (tri-axial accel +

60 (sex not

Knee flexion and

(62).

gyro)

reported),

extension ROM

Standard goniometry

They showed the mobile application allowed for a smaller

following total

minimal detectable change than

knee replacement

goniometry.

or with knee osteoarthiritis Morales et

1 x smartphone (accel +

33 (sex not

inclination of the

Tape measure test,

They found no significant

al, (63).

gyro + mag)

reported), injury-

tibia during the

goniometry and the

differences between any of the

free

weight bearing

leg motion system

measurement techniques.

lunge exercise Bonnet et

1 x tri-axial accel + gyro

10 (4 females, 6

Sagittal hip, knee

Optoelectronic motion

Their most recent predictive

al, (7,43).

(lumbar)

males), injury free

and joint angles

capture

algorithm had a root mean

during a

square difference of 3.2 o,, 2 o

bodyweight squat

and 3.1

exercise

angles respectively when

o

for ankle, knee and hip

28

compared with in both a robot model and with 8 healthy human participants (43). Patterson and Caulfield, (23).

1 x tri-axial accel (ankle)

20 (14 females, 6

Reactive strength

males), injury free

index during the

Force plate data

Pearson’s product correlation of 0.9816 in computing reactive strength index during the drop jump exercise

Force plate

Spearman’s coefficient was

drop jump exercise

Quaglieralla

2 x tri-axial accel (left and

51, (26 injury-free,

Flight time during

et al, (74).

right ankle)

25 following

countermovement

found to be greater than 0.95 in

surgery for

jumps and squat

this case.

Achilles tendon

jumps

rupture; 51 males) Gordon et

2 x tri-axial accel + gyro

1 (male), injury-

Mean percentage

Optoelectronic motion

The temporal measures had the

al, (57).

(trunk and barbell)

free

error for the

capture system and

lowest mean percentage error

following

force plates

with time to peak velocity and

measurements:

time to peak power having an

Peak velocity,

error of just 0.034% and 1.01%

time to peak

respectively. The kinetic

velocity, peak

measures had a larger error with

power, time to

peak power and peak force both

peak power and

resulting in a 12.5% error versus

force at peak

the force plates and motion

power.

capture system.

29

Setuain et

1 x tri-axial accel + gyro +

17 (8 females, 9

Vertical force

al, (76).

mag (lumbar)

males), injury-free

derived from IMU

Force plates

Several biomechanical variables such as the resultant force–time curve patterns in drop jumps, unilateral drop jumps and unilateral countermovement jumps can be reliably measured with a lumbar worn IMU

Rawson et

3 x uni-axial accel (wrist,

30 (15 females,

Activity counts

Cosmed™ system

Activity counts were correlated

al, (75).

waist and ankle)

15 males), injury

during the squat,

(COSMED, Rome,

with energy expenditure as

free

leg extension and

Italy)

computed by a cosmed™ system

leg curl exercises

(COSMED, Rome, Italy). Thirty healthy participants were recruited and a primary finding of the study was that a regression equation which inputs included sex, fat-free mass, and counts of activity from the waist accelerometer explained 90% (R2 = 0.90) of the variance in energy expenditure as measured by the cosmed™ system.

Papi et al,

1x tri-axial accel + gyro

14 (7 females, 7

total time taken to

Optoelectronic motion

The waist worn sensing unit was

(72).

(waist), 1x tri-axial accel

males), injury-free

complete a five

capture

found to have a 0.86 RMSE

30

(waist) + bend sensor also

time sit to stand

versus the measure from a

used (knee)

test

motion capture system

Ziljstra et

3 x tri-axial accel + gyro +

17 (10 females, 7

vertical power

Optoelectronic motion

They used Pearson’s correlation

al, (80).

mag (sternum, pelvis,

males), injury free

during the sit to

capture + force plates

to compare each sensor

SIPS)

position’s power output to force

stand test

plate data and found an R2 of 0.984 at the body’s estimated centre of mass. Gleadhill et

3 x tri-axial accel+gyro on

11 (1 female, 10

Temporal features

Optoelectronic motion

The average Pearson’s

al, (20).

spine (C7, T12 and S1)

males), injury free

from

capture

correlation with a motion capture

accelerometers

system was R2 = 0.9997 for

during fifteen

sagittal plane accelerometer

variations of the

peaks.

deadlift exercise.

Charlton et

1 x smartphone (accel +

20 (males), injury

Hip ROM (flexion,

Optoelectronic motion

The Smartphone demonstrated

al, (45).

gyro + mag)

free

abduction,

capture

good to excellent reliability (ICCs

adduction, supine

> 0.75) for four out of the seven

internal and

movements, and moderate to

external rotation

good reliability for the remaining

and sitting internal

three movements (ICC = 0.63–

and external

0.68)

rotation)

31

Table 2.7: Summary of studies assessing construct validity of wearable sensor based systems. Author(s)

Sensor set-up and placement

Sample

Outcomes

Bo et al,

2 x tri-axial accel 2 x

Not

Knee angles

(41)

dual-axial gyro +

described

during sit to

Microsoft Kinect

stand and

(Thigh and shank)

squat.

Construct Validity Type Convergent

Comparator Microsoft Kinect

Pernek et

1 x smartphone w/ tri-

10 (4

Detection of

Convergent

Manual

al, (73)

axial accel (On

females, 6

exercise

extraction of

weights stack or

males),

repetitions

reps by

ankle)

injury-free

and the

authors.

Findings High potential for fusion of kinect and inertial sensors for more accurate joint angle measurement

99% accuracy in repetition detection, 89% accuracy in detecting start and end points of reps

start/end of repetitions. Omkar et al,

1 x tri-axial accel +

11 (4

Grace and

Convergent

(70)

gyro (Lumbar)

females, 7

consistency

of partcipants’

males),

during rhythmic

sequence by

injury-free

exercise

yoga expert

Arai et al,

1 x tri-axial gyro

105 (55

Physical

(40)

(shank)

females, 50

function and

performance

males),

self-efficacy

measurements.

elderly,

Convergent

Visual analysis

Functional

Found performance of 2 participants to be significantly worse than the others (more jerks and halts).

Gyroscope peaks correlated with some physical functions such as muscle strength (r = 0.304, p < 0.01), and walking velocity (r = 0.543, p < 0.001).

injury-free

32

Chakraborty

Xsens™ moCap suit

et al, (44)

Known-groups

Individual’s

6 (sex not

Body posture

reported),

during overhead

measures pre

undergoing

squat task

& post injury

Not described.

rehabilitation of lower limb Fitzgerald

Xsens™ moCap suit

et al, (54)

2 (sex not

Body posture

Known-groups

reported),

during straight

injured and

one injury-

line lunge

uninjured

free, one 15

Comparison of

individual

Greater range of trunk flexion/extension, thigh internal/external rotation and trunk flexion/extension for injured athlete.

weeks post MCL tear Ai et al, (38)

1 x tri-axial accel +

3 (males;1

gyro + mag (instep of

healthy, 1

foot OR shank)

polymyotosis,

ROM, Movement smoothness, trajectory error

Known-groups

Signal features when exercises completed with correct technique and aberrant technique.

Known-groups

Comparison of each group’s results

Proof of concept for tracking ankle exercises with IMUs shown via each participant’s differing trajectories.

1 chronic lower back pain) Giggins et

1 x tri-axial accel +

9 (5 females,

al, (55)

gyro (shin)

4 males), injury-free

Comparison of features when

A number of significantly features found across all exercises and all deviations/known groups.

exercises completed with acceptable and aberrant

33

technique. Setuain et

1 x accel + gyro +

22, (sex not

al, (21)

mag (Lumbar)

reported; 6 ACL

Signal features during a battery of vertical jumping tests.

Known-groups.

Comparison of features when exercises

reconstructed

completed by

and 16 injury-

ACL

free)

reconstructed and injury-free group.

Chen et al, (81)

Houmnfar, Karg and Kulic, (59)

2 x tri-axial accel + gyro (thigh and shank)

2 x tri-axial accel + gyro + mag (thigh and shank)

10, (5 females, 5 maes;5 injury-free + 5, rehab after total knee arthroplasty)

Knee ROM during heel slides tested 1 day pre, 1 day post, 2 weeks post and 6 weeks post total knee arthroplasty.

Longitudinal

28, (sex not reported; 18, rehab following knee/hip replacement, 10, injuryfree)

Distance of patient's data (joint angle, velocity, acceleration) from healthy norms throughout rehabilitation.

Longitudinal

Individual’s known

The ACL-reconstructed male athletes did not show any significant (P 95% regardless of sensor position). The ability of a single IMU system to 124

identify which deviation has occurred (multi-label classification as shown in Table 5.4) is moderate, with the right shank showing the highest overall accuracy (73%). A possible reason for this may be that the deviations investigated may involve a high degree of movement in the shanks in order to complete the aberrant movement. The left and right shank show similar overall classification results. Overall classification scores are around 12% higher using the right shank sensor compared to the left. This discrepancy could be attributed to the fact that the majority of the participants were right foot dominant, leading to overcompensation on this side. The confusion matrix in Figure 5.3 shows that the right shank is able to classify normal reps with an excellent true positive rate, but deviations such as KTF and BO are less clearly classified, possibly due to the similar movement profile needed to complete these deviations. Pernek et al. (17) used a single IMU to capture resistance training information and were able to recognise repetition duration with a temporal detection error of about 11%. However, different exercise goals may require varying movement speeds. As such, assessment based on speed alone is not a holistic way of evaluating exercise technique. Bonnet et al. (6) demonstrated that a single IMU system on the lower back could measure ankle, knee and hip joint angles in the sagittal plane during the BW squat with a maximal error of 3.5 degrees compared to a ViconTM motion capture system in human participants. However, the ability to display angles to the end user may not be actionable information to an individual not trained in biomechanics as they may not be able to distinguish what angle range represents an aberrant movement pattern. Furthermore, the authors were only able to identify sagittal plane angles. Many common BW squat deviations can occur in different or multiple planes simultaneously, such as knee valgus and varus deviations, which occur predominantly in the frontal plane. Giggins et al. (23) assessed the ability of a single IMU to identify deviations in seven exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads and straight-leg raise) with an overall accuracy of 79-83% depending on where the IMU was positioned. These results are slightly higher than those presented in this work, however the authors looked at a maximum of three deviations for each exercise while some of the exercises only had one deviation (i.e. binary level classification). Our work sought to identify five deviations from normal and this may go some way to explaining the lower overall accuracy scores compared to Giggins et al. Furthermore, the BW squat exercise is more complex than the exercises classified by Giggins et al. with deviations occurring in multiple joints simultaneously. Comparing results presented in this chapter with the above work is challenging due to differences in exercises investigated, sensor positions and feedback given to end users. However, these results build on this previous work. The majority of research to date has 125

investigated the ability of IMU systems to monitor technique in simple exercises such as heel slides (23), dumbbell curls (21) or straight leg raises (19). This chapter describes an evaluation of an IMU system’s ability to quantify BW squatting performance, a complex exercise that involves multiple joints. This system has also demonstrated the ability to identify five deviations from normal technique (Table 5.4). The reduced number of deviations in some of the studies (19, 21, 23) may make it easier for classifiers to identify specific deviations and subsequently produce higher accuracy, sensitivity and specificity scores. Finally, the main focus of the IMU system analysed in this chapter is to identify specific technique deviations and not just angles of movement (6), tempo (17) or exercise intensity (18). This information may be more actionable to gym users, particularly those without biomechanics training. It is difficult to ascertain whether the moderate levels of multi-level classification presented in this chapter are sufficient for real-life S&C settings. Further research is being undertaken to determine usability, functionality and user perceptions of employing wearable technology to assess exercise biomechanics. It is hoped this will give a greater indication as to the levels of accuracy S&C coaches, experienced and novice gym users would define as acceptable. It is possible that these levels may vary depending on the clients S&C coaches work with and the training goals of gym users. A number of contextual factors must be taken into account when interpreting the results. All deviations were deliberately induced and completed by healthy individuals. When deviations occur naturally, the exact way in which they present may differ from the induced deviations investigated in this study. Moreover, there were no controls as to the severity with which each participant performed a deviation and therefore it is possible that naturally occurring BW squat deviations in a ‘real life’ application may be more acute, or occur in a more idiosyncratic fashion than those used in the presented classification systems. No gold standard 3-dimensional motion capture system was used to confirm that each deviation occurred. However, a chartered physiotherapist and an individual trained in S&C were present for all data collection and ensured the deviations occurred through visual observation. The participant was asked to repeat the movement if the investigators felt it had not been completed satisfactorily. A motion capture system was not used as researchers have already shown the reliability of IMU set-ups compared to such systems (12, 13). Additionally, the five deviations identified by the IMU systems is a non-exhaustive list of those that can occur during the BW squat exercise. These deviations were chosen in consultation with sports medicine practitioners, S&C coaches and the NSCA guidelines (2).

126

In conclusion, it is shown that a system based on data derived from body worn IMUs can classify acceptable and aberrant BW squat biomechanics with excellent overall accuracy, sensitivity and specificity. These excellent classification levels are maintained even with a single IMU. The ability to identify specific stimulated deviations is more difficult but can be achieved with a good level of overall accuracy with a five IMU system. A single IMU system can identify specific deviations with a moderate level of accuracy. These results are comparable with current research in the area despite the BW squat being a more complex exercise then many of those previously investigated. However, it must be stressed that this is not a fully operational system at present. Such a system should be able to recognise and evaluate multiple exercises. Furthermore, the deviations investigated in this study are induced. This means they may appear differently in a natural setting. However, the results presented in this chapter are promising and further research is warranted in order to investigate an IMU system’s capability of monitoring technique in various movements. This work is currently on going with future research focusing on the analysis of natural deviations and exercises such as the deadlift and tuck jump.

5.5. Practical Implications The BW squat is an important movement in S&C, musculoskeletal injury risk screening and rehabilitation. The ability to objectively quantify BW squatting technique using low cost IMU technology would have practical advantages in all of these settings. In an S&C setting the ability to remotely monitor form may help exercise goals be achieved and reduce the risk of injury. In musculoskeletal screening, an IMU system that can identify aberrant movement patterns would allow for quicker and more objective risk identification and stratification. In a rehabilitation setting, monitoring technique would be important to prevent further injury and possibly allow exercises to be completed at home without the need for constant supervision, reducing overall healthcare costs.

5.6. References 1. Cook G, Burton L, Hoogenboom B. Pre-participation screening: the use of fundamental movements as an assessment of function–part 1. North American journal of sports physical therapy: NAJSPT. 2006 May;1(2):62. 2. Baechle TR, Earle RW. Resistance Training Exercise Techniques. NSCA's Essentials of Personal Training. Champaign, I.L., U.S.A.: Human Kinetics; 2004. 3. Hall M, Nielsen JH, Holsgaard-Larsen A, Nielsen DB, Creaby MW, Thorlund JB. Forward lunge knee biomechanics before and after partial meniscectomy. The Knee. 2015 Dec 31;22(6):506-9.

127

4. Ahmadi A, Mitchell E, Destelle F, Gowing M, O’Connor NE, Richter C, et al. Automatic activity classification and movement assessment during a sports training session using wearable inertial sensors. In: Proceedings

of the 11th International Conference on

Wearable and Implantable Body Sensor Networks, (BSN); 2014. Jun 16; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2014. p. 98–103. 5. Bonnechere B, Jansen B, Salvia P, Bouzahouene H, Omelina L, Moiseev F, Sholukha V, Cornelis J, Rooze M, Jan SV. Validity and reliability of the Kinect within functional assessment activities: comparison with standard stereophotogrammetry. Gait & posture. 2014 Jan 31;39(1):593-8. 6. Bonnet V, Mazza C, Fraisse P, Cappozzo A. Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Transactions on Biomedical Engineering. 2013 Jul;60(7):1920-6. 7. Whiteside D, Deneweth JM, Pohorence MA, Sandoval B, Russell JR, McLean SG, et al. Grading the functional movement screen: A comparison of manual (real-time) and objective methods. The Journal of Strength & Conditioning Research. 2016 Apr;30(4):924-33. 8. Burns A, Greene BR, McGrath MJ, O'Shea TJ, Kuris B, Ayer SM, Stroiescu F, Cionca V. SHIMMER™–A wireless sensor platform for noninvasive biomedical research. IEEE Sensors Journal. 2010 Sep;10(9):1527-34. 9. Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7. 10. McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012;15(4):207-13. 11. Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems; 2014 Apr 26; Toronto, Canada. N.Y., U.S.A.: ACM; 2014. p. 3225-3234. 12. Leardini A, Lullini G, Giannini S, Berti L, Ortolani M, Caravaggi P. Validation of the angular measurements of a new inertial-measurement-unit based rehabilitation system: comparison with state-of-the-art gait analysis. Journal of neuroengineering and rehabilitation. 2014 Sep 11;11(1):136. 13. Tang Z, Sekine M, Tamura T, Tanaka N, Yoshida M, Chen W. Measurement and Estimation of 3D Orientation using Magnetic and Inertial Sensors. Advanced Biomedical Engineering. 2015;4:135-43. 128

14. Muehlbauer M, Bahle G, Lukowicz P. What can an arm holster worn smart phone do for activity recognition?. In: Proceedings of the 15th Annual International Symposium on Wearable Computers (ISWC); 2011. Jun 12; San Francisco, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2011. p. 79-82. 15. Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Proceedings of the 9th International Conference on Ubiquitous Computing (UbiComp); 2007. Sep 16; Innsbruck, Austria. Berlin, Germany: Springer; 2007. p. 19-37. 16. Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks; 2011. Nov 7; Beijing, China. Brussels, Belgium: ICST; 2011. p. 1-7. 17. Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Personal and ubiquitous computing. 2013;17(4):771-82. 18. Pernek I, Kurillo G, Stiglic G, Bajcsy R. Recognizing the intensity of strength training exercises with wearable sensors. Journal of Biomedical Informatics. 2015;58:145-55. 19. Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2012. p. 2214-2218. 20. Melzi S, Borsani L, Cesana M. The virtual trainer: supervising movements through a wearable wireless sensor network. In: Proceedings of the 2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, (SECON Workshops); 2009. 22 Jun; Rome, Italy. N.Y., U.S.A.: IEEE; 2009. p. 1-3. 21. Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H. Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference; 2013. Mar 7; Stuttgart, Germany. N.Y., U.S.A.: ACM; 2014. p. 116-123. 22. Bonnet V, Mazza C, Fraisse P, Cappozzo A. A least-squares identification algorithm for estimating squat exercise mechanics using a single inertial measurement unit. Journal of biomechanics. 2012;45(8):1472-7. 23. Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors:

a cross-sectional analytical study.

Journal of

Neuroengineering

and

Rehabilitation. 2014;11(1):158-68. 24. Jerri AJ. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proceedings of the IEEE. 1977;65(11):1565-96. 25. Shimmer 9DOF calibration [Available from: http://www.shimmersensing.com/shop/ shimmer-9dof-calibration (last accessed: 26 June 2017). 129

26. Katz MJ, George EB. Fractals and the analysis of growth paths. Bulletin of Mathematical Biology. 1985;47(2):273-86. 27. Single-level

discrete

1-D

wavelet

transform

[Available

from:

http://uk.mathworks.com/help/wavelet/ref/dwt.html (last accessed :26 June 2017). 28. Breiman L. Random forests. Machine Learning. 2001;45(1):5-32. 29. Mitchell E, Ahmadi A, O'Connor NE, Richter C, Farrell E, Kavanagh J, et al. Automatically detecting asymmetric running using time and frequency domain features.

In:

Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. P.1-6. 30. Fushiki T. Estimation of prediction error by using K-fold cross-validation. Statistics and Computing. 2011;21(2):137-46. 31. Taylor PE, Almeida GJ, Kanade T, Hodgins JK. Classifying human motion quality for knee osteoarthritis using accelerometers. In: Proceedings of the 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2010. Aug 31; Buenos Aires, Argentina. N.Y., U.S.A.: IEEE; 2010. p. 339-343.

130

Section B.3: Classifying Exercise Technique with Small, Imbalanced Data Sets of Naturally Occurring Technique Deviations Foreword – Section B.3: In section B.2 data analysis pathways were developed and evaluated for classifying technique quality in two compound lower limb exercises; lunges and bodyweight squats. It was found that a 5 IMU set-up could classify exact deviations from correct technique with moderate to good accuracy and could distinguish if technique was acceptable or aberrant (binary classification) with good to excellent accuracy. A system developed with data from a single body worn IMU provided poor to good accuracy in multi-label classification and good to excellent accuracy in binary classification. Although these results demonstrate significant promise for the efficacy of wearable IMU systems which classify compound lower limb exercise technique, it is important to note that they were developed and evaluated with deliberately induced technique deviations. These induced deviations may not ideally represent real world technique deviations, which may be more acute and occur in a more idiosyncratic fashion. Additionally, the classification algorithms employed for analysis are known to be more effective for large, balanced data sets than they are for smaller data sets and/or data sets where each class has vastly different numbers of instances for training data. In section B.3, systems will be developed and evaluated with IMU data from naturally occurring technique deviations. This will be completed for 3 exercises; single leg squats (Chapter 6), barbell deadlifts (Chapter 7) and barbell squats (Chapter 8). Data analysis pathways which replicate those used in Chapter 4 and 5 will be assessed. It will also be investigated whether other techniques may be employed to improve system efficacy when developing classification systems for compound lower limb exercises with naturally occurring deviations.

131

Chapter 6 Classification of Single Leg Squat Biomechanics with Multiple and Individual Inertial Measurement Units

This chapter is based on the following paper which is published in Methods of information in medicine: Whelan DF, O'Reilly MA, Ward TE, Delahunt E, Caulfield B. Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units. Methods of information in medicine. 2016 Oct 26;55(6).

132

6.0. Abstract Background: The single leg squat (SLS) is a common lower limb rehabilitation exercise. It is also frequently used as an evaluative exercise to screen for an increased risk of lower limb injury. To date athlete/patient SLS technique has been assessed using expensive laboratory equipment or subjective clinical judgement; both of which are not without shortcomings. IMUs may offer a low cost solution for the objective evaluation of athlete/patient SLS technique. Objectives: The aims of this study were to determine if in combination or in isolation IMUs positioned on the lumbar spine, thigh and shank are capable of: (A) distinguishing between acceptable and aberrant SLS technique; (B) identifying specific deviations from acceptable SLS technique. Methods: Eighty-three healthy volunteers participated (60 males, 23 females, age: 24.68 +/4.91 years, height: 1.75 +/- 0.09 m, body mass: 76.01 +/- 13.29 kg). All participants performed 10 SLSs on their left leg. IMUs were positioned on participants’ lumbar spine, left shank and left thigh. These were utilized to record tri-axial accelerometer, gyroscope and magnetometer data during all repetitions of the SLS. SLS technique was labelled by a chartered physiotherapist using an evaluation framework. Features were extracted from the labelled sensor data. These features were used to train and evaluate a variety of randomforests classifiers that assessed SLS technique. Results: A three IMU system was moderately successful in detecting the overall quality of SLS performance (77% accuracy, 77% sensitivity and 78% specificity). A single IMU worn on the shank can complete the same analysis with 76% accuracy, 75% sensitivity and 76% specificity. Single sensors also produce competitive classification scores relative to multisensor systems in identifying specific deviations from acceptable SLS technique. Conclusions: A single IMU positioned on the shank can differentiate between acceptable and aberrant SLS technique with moderate levels of accuracy. It can also capably identify specific deviations from optimal SLS performance. IMUs may offer a low cost solution for the objective evaluation of SLS performance. Additionally, the classifiers described may provide useful input to an exercise biofeedback application.

133

6.1. Introduction The single leg squat (SLS) is a commonly used rehabilitative exercise following lower limb musculoskeletal injury (1). Additionally, it is frequently utilized as an evaluative exercise to assess athletes’ risk of incurring lower limb musculoskeletal injury (2). From a clinical perspective, the SLS allows clinicians/practitioners to simultaneously assess trunk, pelvis, hip, and knee kinematics during a weight-bearing activity (3). Therefore, it is necessary that patient/athlete performance of the SLS can be evaluated effectively and reliably. To date, objective quantification of patient/athlete performance of the SLS has been determined using marker-based motion analysis systems (1). This approach is time intensive, expensive (over €100,000 for a complete system) and the application of skinmounted markers may hinder normal movement (4). As such, beyond the research laboratory, these systems are not frequently used for the objective quantification of patient/athlete SLS technique. As an alternative, real-time visual evaluation of patient/athlete performance of the SLS is more commonly used. In this instance, kinematics of the trunk, pelvis, hip and knee are simultaneously evaluated to provide an overall assessment of the patient’s/athlete’s performance of the exercise (3). It is difficult to standardise SLS performance evaluation due to the experience of the rater (3), the method used to rate performance of the exercise (ordinal vs. dichotomous scales) or the instructions given to the raters (5). Inaccurate evaluation of patient/athlete performance of the SLS could have implications for clinical and exercise progression decisions. Recent technological advances have allowed for the possibility of using IMUs as part of a method for capturing human movement during the performance of exercises such as the SLS. IMUs are able to acquire data pertaining to the linear and angular motion of individual limb segments as well as the centre of mass of the body. They are small, inexpensive, easy to set up and facilitate the acquisition of human movement data in unconstrained environments (6). Thus, they offer the potential to bridge the gap between laboratory-based and day-to-day “real-world” acquisition of human movement. Body worn systems incorporating multiple IMUs have been shown to be effective at differentiating exercises and evaluating exercise performance. Chang et al. (7) incorporated accelerometers into a workout glove and belt clip with the aim of differentiating between, and counting the number of repetitions of, nine different upper and lower limb exercises. Their system achieved 95% exercise classification and repetition counting accuracy. A case study completed by Fitzgerald et al. (8) used 10 IMUs to provide feedback to a healthy non-injured athlete and an athlete five weeks post-knee injury during the performance of a lunge exercise. Analysis of the gyroscope signals from the IMUs identified lower limb movement 134

deviations in the injured athlete when compared to the non-injured athlete. Seeger et al. (9) used three IMUs to differentiate between a total of 16 cardiovascular and weight-lifting exercises. Classification accuracy ranged from 71-100% for the different exercises. However, multiple sensor systems are expensive for end-users. They may also prove impractical due to the increased risk of placement error and comfort issues. Furthermore, a multiple sensor set-up would put a bigger strain on power usage and BluetoothTM capabilities of the sensors and hosting smartphone. Consequently, the transferability of a multiple sensor set-up to dayto-day “real-world” situations is not practical (10). For increased end user cost effectiveness and practicality, a single sensor set-up is far more desirable. Giggins et al. (11) demonstrated the ability of a single IMU to successfully differentiate seven commonly prescribed orthopaedic rehabilitation exercises (heel slide, hip abduction, hip extension, hip flexion, inner range quads, knee extension, straight leg raise). Accuracies of 93-95% were observed. Muehlbauer et al. (12) reported that a single sensor placed on the upper arm could distinguish between 10 different upper body exercises with an overall recognition rate of 94%. Pernek et al. (13) used a single sensor in a smartphone to count repetitions of resistance exercises; observing an overall repetition count accuracy of 99%. Evaluation of exercise performance is also vital to ensure that not only is the exercise completed but that it is completed with acceptable technique. Taylor et al. (14) used five IMUs to identify five technique deviations in the standing hamstring curl and four deviations in the straight-leg raise. They were able to classify the different deviations with 80% accuracy, 75% sensitivity and 90% specificity. Velloso et al. (15) used four IMUs to classify deviations from normal form during a unilateral dumbbell bicep curl. They achieved an overall accuracy of 74-86% in identifying specific deviations. Giggins et al. (16) demonstrated an overall accuracy of 79-81% using different combinations of one, two or three sensors placed on the lower limb to analyse exercise technique in seven exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads and straight-leg raise). These studies demonstrate that it is possible to evaluate exercise performance of simple exercises using multiple IMUs. However, the ability of an IMU based system to evaluate more complex exercises such as the SLS is less understood.

6.2. Objectives The research question this study seeks to address is: “How well can an IMU-based system quantify performance of the SLS?” The aims of this study were to determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of: (A) 135

distinguishing between acceptable and aberrant SLS technique; (B) identifying specific deviations from acceptable SLS performance.

6.3. Methods Data were acquired from participants as they completed 10 SLS repetitions on their left leg with their best possible form. All repetitions were recorded using a high-definition video camera. Each participant’s performance of each SLS repetition was rated by a chartered physiotherapist using a scale developed by Whatman et al. (Table 6.1) (3). Data derived from the IMUs during each repetition were compared to this rating to determine if a single IMU on the lumbar spine could discriminate between different levels of SLS performance and identify the specific deviations from acceptable SLS form. 6.3.0. Participants Eighty-three healthy volunteers participated. No participant had a current or recent musculoskeletal injury that would impair their SLS performance. All participants had prior experience with the exercise. Each participant signed a consent form prior to completing the study. The University Human Research Ethics Committee approved the study protocol. 6.3.1. Experimental Protocol The testing protocol was explained to participants upon their arrival to the research laboratory. All participants completed a 10-minute warm-up on an exercise bike; during which they were required to maintain a power output of 100W and cadence of 75-85 revolutions per minute. Following the warm-up, an investigator (the same investigator for all participants) secured three IMUs (SHIMMER, Shimmer research, Dublin, Ireland) on the participant at the following anatomical locations: the level of the 5th lumbar vertebra, the mid-point of the left femur (determined as half way between the greater trochanter and lateral femoral condyle), and on the left shank 2 cm above the lateral malleolus (Figure 6.1). The orientation and location of the IMU was consistent across all study participants.

136

Figure 6.1: Image showing IMU positions and SLS exercise (1 = left shank; 2 = left thigh; 3 = lumbar spine). A pilot study was used to determine an appropriate sampling rate and the ranges for the accelerometer and gyroscope within the IMU. In the pilot study, data during performance of the SLS data was collected at 512 samples/s. A Fourier transform was then used to detect the characteristic frequencies of the signal which were all found to be less than 20 Hz. Therefore, a sampling rate of 51.2 Hz was deemed appropriate for this study based upon the Shannon sampling theorem and the Nyquist criterion (17). The Shimmer IMU was configured to stream tri-axial accelerometer (+/- 2 g), gyroscope (+/- 500 o/s) and magnetometer (+/- 1.9 Ga) data with the sensor ranges chosen also based upon data from the pilot study. The IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application (18). Participants completed 10 repetitions of a left leg SLS with their best form. A Charted Physiotherapist demonstrated and instructed all participants on how to complete the SLS with acceptable technique. This involved maintaining their trunk and pelvis in a neutral position, keeping their patella in line with the second toe, preventing their foot from moving into excessive pronation and keeping the movement throughout available range of motion as smooth as possible. Their right leg was kept as extended as possible in front of them while the left knee was flexed between 60 and 90 degrees. All participants were allowed trial repetitions to ensure they were comfortable with the exercise before commencing their set of 10 repetitions. 137

6.3.2. Data Labelling Participants’ performance of the SLS was recorded using a high-definition video camera. A chartered physiotherapist with more than six years’ post-graduation experience and an MSc in Sports and Exercise Medicine reviewed all recorded SLS repetitions. Each repetition was separated and reviewed on multiple occasions in a systematic format. For every repetition a score of 0 or 1 was given to each section as outlined in the scoring system shown in Table 6.1. This was adapted from the scoring system described by Whatman et al. (3). To establish the overall score of each repetition a ‘1’ (movement dysfunction) was given to repetitions that scored a ‘1’ in two or more of the six categories. All other repetitions were rated as ‘0’ (acceptable movement pattern). The chartered physiotherapist involved in the study developed the method of assigning an overall score following consultation with colleagues who work in musculoskeletal physiotherapy and sports medicine.

Table 6.1: SLS data labelling system used adapted from Whatman et al. Visual rating sheet Trunk Pelvis 1:

Moves out of neutral in frontal or transverse plane Moves out of neutral in frontal or transverse plane or moves away from midline

N:0 Y:1 N:0 Y:1

Knee

Patella moves out of line with 2nd toe

N:0 Y:1

Foot

Moves in to excessive pronation

N:0 Y:1

Oscillation

Observable oscillation

N:0 Y:1

Loss of Balance

Visible loss of balance

N:0 Y:1

Overall Score

Movement dysfunction

N:0 Y:1

6.3.3. Data Analysis Nine signals were collected from each IMU: accelerometer x, y, z; gyroscope x, y, z; and magnetometer x, y, z. To ensure the data analysed applied to each participant’s movement and to eliminate unwanted high-frequency noise, the nine signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n=8. Nine additional signals were then calculated. The 3-D orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al. (19). The resulting quaternion values (W, X, Y and Z) were then converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of the lumbar spine, left thigh and left shank in the sagittal, frontal plane and transverse planes respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y and z. The magnitude of 138

acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the lumbar spine and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y and z. All ten repetitions from each participant’s SLS data set were programmatically extracted using the IMU data and resampled to a length of 250 samples; this was undertaken to minimise the influence of the speed of repetition performance on signal feature calculations. It also ensured the computed features related to differences in movement patterns and not the participant’s exercise tempo. Time-domain and frequency-domain descriptive features were computed to describe the pattern of each of the eighteen signals when the five different exercises were completed. These features were namely signal mean, RMS, standard deviation, kurtosis, median, skewness, range, maximum, minimum, variance, energy, 25th percentile, 75th percentile, level crossing rate, fractal dimension and the variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 6. This resulted in seventeen features for each of the eighteen available signals producing a total of 306 features per sensor unit. The random-forests method was employed to perform classification (20). A random forest is an ensemble of decision trees that will output a prediction value, in this case SLS quality. Each decision tree is constructed by using a random subset of the training data. Following training of the random forest, one can then pass each test row through it, in order to output a predicted class. This technique was chosen as it has been shown to be effective in analysing exercise technique with IMUs when compared to the Naïve-Bayes and Radialbasis function network techniques (21). Four hundred decision trees were used in each random-forest classifier. Classification quality was compared with and without performing principal component analysis (PCA) on the training data. Using PCA produced lower accuracy, sensitivity and specificity scores and therefore, PCA was not included in the final exercise classification system. Six separate random forests were used to analyse if a specific deviation had occurred as described in Table 6.1. A seventh random forest predicted the overall score of each SLS repetition. For each of the above classifiers a variety of training features were used in order to establish classification quality when using three, two and individual IMUs. Classifiers were developed and evaluated using the following seven combinations of variables; the 918 (3 x 306) variables computed from every IMU, the 612 (2 x 306) variables from the left shank and lumbar IMUs, the 612 (2 x 306) variables from the left thigh and lumbar IMUs, the 612 (2 x

139

306) variables from the left shank and left thigh IMUs, and the 306 variables from each of the three individual IMUs. To establish the quality of each classifier in discriminating between acceptable and aberrant SLS technique or identifying a specific deviation from acceptable SLS performance, repeated random sub-sample validation (RRSSV) was used. This method of classifier evaluation was chosen due to the relatively small data set used to train the classifier. Leave-one-subject out cross-validation (LOSOCV) was not deemed necessary for this study due to the high interrepetition variability of SLS performance in each participant's set of the exercise. Data was shuffled programmatically. The first 80% of data were used as the training set for the random forests classifier, initially resulting in 672 repetitions per training set. However, the training data was then balanced to avoid biasing the classifier. This was completed by counting the number of instances of each class (0 and 1) and removing a random selection of repetition from the class with more instances until there was an equal amount of training data to represent both classes. The remaining 20% of observations were used as the test set for the classifier resulting in 168 test repetitions per evaluation. Accuracy, sensitivity and specificity metrics were calculated. Accuracy measures the overall effectiveness of a classifier and is computed by taking the ratio of correctly classified examples and the total number of examples available. Sensitivity measures the effectiveness of a classifier at identifying a desired label, while specificity measures the classifier’s ability to detect negative labels. This process was repeated ten times.

6.4. Results The demographics of the participants were as follows: 60 males, 23 females, age: 24.68 +/4.91 years, height: 1.75 +/- 0.094m, body mass: 76.01 +/- 13.29kg. Table 6.2 demonstrates the mean sensitivity, specificity and accuracy for the overall score following the ten cycles of RRSSV for systems using each individual IMU and each combination of IMUs. The best single sensor for classifying overall score was the left shank with an accuracy of 76%. The highest quality classification came from the two-sensor combination of the shank and thigh, which achieved 78% accuracy.

140

Table 6.2: Classification results for ‘Overall SLS Score’ for each IMU combination. Accuracy Left Shank Left Thigh Lumbar All 3 IMUs Lumbar + Shank Shank + Thigh Lumbar + Thigh

76% 75% 73% 77% 75% 78% 77%

Sensitivity Specificity 75% 71% 74% 77% 78% 78% 75%

76% 77% 72% 78% 72% 78% 79%

Table 6.3 demonstrates the classification scores for the detection of each specific deviation as described in Table 6.1. Deviation of the pelvis from the neutral position was the most poorly detected deviation. The three-sensor combination detected this deviation with 70% accuracy and a single sensor located on the lumbar spine detected this deviation with 69% accuracy. In some cases (e.g. the foot moving into excessive pronation), the single sensor system outperformed the multi-sensor set-ups. The IMU positioned on the left shank produced an accuracy of 75% for this deviation, superior to the accuracy of 73% achieved when using all three IMUs. Single sensor set-ups appear comparable to multi-sensor set-ups for the detection of all six deviations.

141

Table 6.3: Classification results for specific deviations for each IMU combination. Trunk: Moves out of neutral in frontal or transverse plane Accuracy Sensitivity Specificity Left Shank Left Thigh Lumbar All 3 IMUs Lumbar + Shank Shank + Thigh Lumbar + Thigh

70%

73%

69%

66%

67%

66%

69%

62%

70%

65%

64%

65%

73%

72%

73%

69%

77%

68%

74%

61%

76%

70%

80%

69%

76%

75%

76%

70%

73%

69%

69%

67%

69%

66%

63%

67%

75%

73%

75%

69%

70%

69%

Foot: Moves in to excessive pronation Accuracy Sensitivity Specificity Left Shank Left Thigh Lumbar All 3 IMUs Lumbar + Shank Shank + Thigh Lumbar+ Thigh

Oscillation: Observable Oscillation Accuracy Sensitivity Specificity

75%

63%

77%

75%

63%

77%

72%

67%

73%

72%

68%

74%

69%

60%

71%

70%

69%

71%

73%

64%

75%

74%

76%

71%

74%

71%

74%

76%

77%

65%

72%

69%

73%

75%

75%

72%

72%

64%

73%

70%

71%

69%

Knee: Patella moves out of line with 2nd toe Accuracy Sensitivity Specificity Left Shank Left Thigh Lumbar All 3 IMUs Lumbar + Shank Shank + Thigh Lumbar+ Thigh

Pelvis: Moves out of neutral in frontal or transverse plane or moves away from midline Accuracy Sensitivity Specificity

Loss of Balance: Visible loss of balance Accuracy Sensitivity Specificity

71%

73%

71%

73%

74%

61%

71%

68%

71%

68%

68%

72%

72%

78%

70%

67%

67%

63%

75%

76%

75%

71%

71%

70%

71%

62%

73%

76%

77%

65%

74%

74%

74%

75%

75%

72%

74%

69%

75%

72%

73%

62%

142

6.5. Discussion Our results indicate that an IMU sensor-based system is capable of evaluating SLS performance with moderate accuracy, sensitivity and specificity. A three-sensor set-up can distinguish between acceptable and aberrant SLS technique with 77% accuracy, 77% sensitivity and 78% specificity. Two sensors (shank and thigh) discriminate between acceptable and aberrant performance with 78% accuracy, sensitivity and specificity. A single sensor (left shank) can identify acceptable SLS performance with 76% accuracy, 75% sensitivity and 76% specificity. Specific deviations can also be classified with a moderate level of accuracy, sensitivity and specificity as shown in Table 6.3. Overall accuracy for specific deviations ranged from 65%-76%, sensitivity ranged from 60%-80% and specificity from 61%-77%. These results indicate that an IMU sensor set offers the possibility of monitoring SLS exercise form objectively outside of a laboratory setting. Importantly, a single sensor set-up has comparable accuracy, sensitivity and specificity to a multi sensor set-up if positioned appropriately. A single sensor set-up is a less cumbersome, more energy efficient and more cost-effective solution for end users; which may increase the likelihood of adoption particularly within a clinical setting. Authors have previously utilized multiple (7-9) and single (11-13) IMU set-ups to differentiate exercise performance and to count the number of exercise repetitions. However, patients are likely to move beyond the exercises evaluated in these aforementioned studies relatively early in their rehabilitation programmes. The increased complexity of the SLS means it is predominantly used in later stages of rehabilitation. Furthermore, these studies focused on the recognition of specific exercises and counting of exercise repetitions and not on the quality of movement during exercise performance. A number of researchers have used multiple sensor systems to classify exercise performance with varying results. Taylor et al. (14) used five IMUs to identify five technique deviations in the standing hamstring curl and four deviations in the straight-leg raise. It is worth noting that only 7% of the exercise repetitions were classified by a biomechanics expert, which may have resulted in an increased possibility of incorrect data labelling. Velloso et al. (15) used a total of four sensors to classify four deviations from normal form during a unilateral dumb-bell bicep curl. They demonstrated an overall accuracy of 74-86% in identifying specific deviations. However, these deviations were induced and were possibly not representative of movement deviations that occur in a natural exercise environment. The scores presented in this chapter are comparable with Taylor et al. (14) and Velloso et al. (15), while having the added benefit of being reproducible with a single sensor set-up. 143

Furthermore, the deviations observed in our data set occurred without simulation (i.e. they were not prescriptively induced), thus providing a more realistic representation of SLS exercise performance. Giggins et al. (16) investigated the potential of a single sensor system to analyse natural deviations from acceptable form in a total of seven exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads and straight-leg raise). They demonstrated an overall accuracy of 79-81% using different combinations of one, two or three sensors placed on specific lower limb body sites. While the accuracy results of our research are slightly lower (73-78%), it should be noted that the SLS is a more complex exercise than those evaluated by Giggins et al. (19). Furthermore, our classification system was able to detect deviations occurring at multiple body locations simultaneously unlike the majority of previous research in the area. There are several contextual factors that are appropriate to consider for discussion purposes. A 3-dimensional motion capture system was not used to confirm that each deviation occurred. Instead, a chartered physiotherapist recorded the presence of any deviations noted during the performance of each SLS repetitions. The use of video analysis allowed for multiple viewings of each SLS repetition. The ability to view the movement on multiple occasions and slow down the playback speed allowed for a detailed analysis of each repetition. A single chartered physiotherapist performed this analysis. Future work should involve multiple biomechanical experts rating SLS performance to increase the reliability of the rating labels. However, this approach could prove challenging as it may not be possible to obtain an agreed consensus between different experts as to what constitutes acceptable movement biomechanics. The overall accuracy, sensitivity and specificity scores presented in this work are slightly lower than that of other authors (14-16). This may be due to the small amount of acceptable SLS performances seen in the data set (52 acceptable SLS vs 778 SLS with aberrant technique). It is hoped that future work will involve the collection of a greater number of acceptable SLSs. It is envisaged that this future data collection will be combined with improved classification techniques to make it possible to not only identify where the deviation has occurred, but also to grade the severity of the deviation as described in the scale developed by Whatman et al. (3). Along with the addition of a greater number of expert reviewers and acceptable SLS performances, future work will involve analysing a range of different movements, including squats, lunges, deadlifts and tuck jumps. It is hoped that that the range of movements commonly used in rehabilitation and screening can be graded using data derived from an 144

IMU based system. This approach could allow for the development of a system that can be used for musculoskeletal injury risk screening and exercise analysis.

6.6. Practical Implications A single sensor system that can automatically evaluate SLS technique could be very beneficial to clinicians. The SLS is a commonly used exercise to assess lower limb function (22). The assessment of human movement proficiency is predominantly completed subjectively through the use of visual rating scales such as the Functional Movement Screen (23, 24), Tuck Jump Assessment (25) or lower extremity functional screening tests (3). The subjectivity inherent in rating these screening tools leads to the potential for bias and/or measurement error. Furthermore, the process of screening can prove time consuming for clinicians, particularly when there are a large number of participants, e.g. in a sports team setting. An IMU based system can offer clinicians the potential to screen multiple athletes simultaneously in an objective manner. This could lead to a quicker and more reliable method of screening than currently available. An IMU system also offers clinicians the potential to evaluate their patient’s treatment more effectively through remotely monitoring their patients’ compliance and technique when completing rehabilitation exercises at home. For example, exercise technique feedback could be provided to patients automatically. This means patients could correct their form during the exercise without the need for a clinician to be present (26). This would increase the potential of home-based care, which may be effective at reducing health care costs (27). The ability to remotely monitor the SLS and provide locally generated feedback would also prove very beneficial to S&C coaches as the SLS is often a component of their conditioning programmes.

6.7. Conclusions An IMU based system is capable of differentiating between acceptable and aberrant SLS technique with moderate accuracy. The overall accuracy presented in this work is comparable to other research investigating early-stage rehabilitation exercises technique with IMUs. This study has shown that it is possible to classify a more complex exercise with IMUs while maintaining moderate levels of accuracy. Furthermore, it is shown that a single IMU can produce comparable results to a multi-sensor set-up. This suggests that the system can be cost-effective and practical to implement in a clinical setting. Future work should aim to develop a low cost biomechanical analysis system that is capable of measuring technique 145

in a range of exercises. Such a system would offer clinicians the ability to screen for injury risks quickly and objectively while also allowing for the remote monitoring of their patients’ rehabilitation.

6.8. References 1. Zwerver J, Bredeweg SW, Hof AL. Biomechanical analysis of the single-leg decline squat. British journal of sports medicine. 2007 Apr 1;41(4):264-8. 2. Willson JD, Ireland ML, Davis I. Core strength and lower extremity alignment during single leg squats. Medicine & Science in Sports & Exercise. 2006 May 1;38(5):945-52. 3. Whatman C, Hing W, Hume P. Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Physical Therapy in sport. 2012 May 31;13(2):87-96. 4. Ahmadi A, Mitchell E, Destelle F, Gowing M, O’Connor NE, Richter C, et al. Automatic activity classification and movement assessment during a sports training session using wearable inertial sensors. In: Proceedings

of the 11th International Conference on

Wearable and Implantable Body Sensor Networks, (BSN); 2014. Jun 16; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2014. p. 98–103. 5. Chmielewski TL, Hodges MJ, Horodyski M, Bishop MD, Conrad BP, Tillman SM. Investigation of clinician agreement in evaluating movement quality during unilateral lower extremity functional tasks: a comparison of 2 rating methods. journal of orthopaedic & sports physical therapy. 2007 Mar;37(3):122-9. 6. McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012 Dec 1;15(4):207-13. 7. Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Proceedings of the 9th International Conference on Ubiquitous Computing (UbiComp); 2007. Sep 16; Innsbruck, Austria. Berlin, Germany: Springer; 2007. p. 19-37. 8. Fitzgerald D, Foody J, Kelly D, Ward T, Markham C, McDonald J, Caulfield B. Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises. In: Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2007. Aug 22; Lyon, France. N.Y., U.S.A.: IEEE; 2007. p. 4870-4874. 9. Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks; 2011. Nov 7; Beijing, China. Brussels, Belgium: ICST; 2011. p. 1-7. 146

10. Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems; 2014 Apr 26; Toronto, Canada. N.Y., U.S.A.: ACM; 2014. p. 3225-3234. 11. Giggins O, Sweeney KT, Caulfield B. The use of inertial sensors for the classification of rehabilitation exercises. In: Proceedings of the 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2014. Aug 26; Chicago, I.L., U.S.A. N.Y., U.S.A.: IEEE; 2014. p. 2965-8. 12. Muehlbauer M, Bahle G, Lukowicz P. What can an arm holster worn smart phone do for activity recognition?. In: Proceedings of the 15th Annual International Symposium on Wearable Computers (ISWC); 2011. Jun 12; San Francisco, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2011. p. 79-82.13. 13. Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Pers Ubiquitous Comput. 2013;17(4):771-82. 14. Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2014. p. 2214-2218. 15. Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H. Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference; 2013. Mar 7; Stuttgart, Germany. N.Y., U.S.A.: ACM; 2014. p. 116-123. 16. Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors:

a cross-sectional

analytical

study.

Journal

of

neuroengineering

and

rehabilitation. 2014 Nov 27;11(1):158. 17. Jerri AJ. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proceedings of the IEEE. 1977;65(11):1565-96. 18. Shimmer

9DOF

calibration

[Available

from:

http://www.shimmersensing.com/shop/shimmer-9dof-calibration (last accessed: 26 June 2017). 19. Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7. 20. Breiman L. Random forests. Machine Learning. 2001;45(1):5-32.. 21. Mitchell E, Ahmadi A, O'Connor NE, Richter C, Farrell E, Kavanagh J, et al. Automatically detecting asymmetric running using time and frequency domain features. In: Proceedings of the 12th International Conference on Wearable and Implantable Body 147

Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. P.1-6. 22. Ugalde V, Brockman C, Bailowitz Z, Pollard CD. Single leg squat test and its relationship to dynamic knee valgus and injury risk screening. PM&R. 2015 Mar 31;7(3):229-35. 23. Cook G, Burton L, Hoogenboom B. Pre-participation screening: the use of fundamental movements as an assessment of function–part 1. North American journal of sports physical therapy: NAJSPT. 2006 May;1(2):62. 24. Cook G, Burton L, Hoogenboom B. Pre-participation screening: The use of fundamental movements as an assessment of function–Part 2. North American journal of sports physical therapy: NAJSPT. 2006 Aug;1(3):132. 25. Myer GD, Ford KR, Hewett TE. Tuck jump assessment for reducing anterior cruciate ligament injury risk. Athletic Therapy Today. 2008 Sep;13(5):39-44. 26. Giggins O, Kelly D, Caulfield B, editors. Evaluating rehabilitation exercise performance using a single inertial measurement unit. In: Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare; 2013. May 5; Venice, Italy. N.Y., U.S.A.: IEEE; 2013. p.49-56. 27. Avci A, Bosch S, Marin-Perianu M, Marin-Perianu R, Havinga P. Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey. In: Proceedings of the 23rd international conference on Architecture of computing systems (ARCS); 2010. Feb 22; Hannover, Germany. Berlin, Germany: VDE; 2010. p. 1-10.

148

Chapter 7

Classification of Deadlift Biomechanics with Multiple and Individual Inertial Measurement Units

This chapter is based on the following paper which is published in Journal of biomechanics: O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM. Classification of deadlift biomechanics with wearable inertial measurement units. Journal of biomechanics. 2017. May 16. [Epub Ahead of print].

149

7.0. Abstract The deadlift is a compound full-body exercise that is fundamental in resistance training, rehabilitation programs and powerlifting competitions. Accurate quantification of deadlift biomechanics is important to reduce the risk of injury and to ensure training and rehabilitation goals are achieved. This study sought to develop and evaluate deadlift exercise technique classification systems utilising IMUs, recording at 51.2 Hz, worn on the lumbar spine, both thighs and both shanks. It also sought to compare classification quality when these IMUs are worn in combination and in isolation. Two data sets of IMU deadlift data were collected. Eighty participants first completed deadlifts with acceptable technique and 5 distinct, deliberately induced deviations from acceptable form. Fifty-five members of this group also completed a fatiguing protocol (3-Repition Maximum test) to enable the collection of natural deadlift deviations. For both data sets, universal and personalised random-forests classifiers were developed and evaluated. Personalised classifiers outperformed universal classifiers in accuracy, sensitivity and specificity in the binary classification of acceptable or aberrant technique and in the multi-label classification of specific deadlift deviations. Whilst recent research has favoured universal classifiers due to the reduced overhead in setting them up for new system users, this work demonstrates that such techniques may not be appropriate for classifying deadlift technique due to the poor accuracy achieved. However, personalised classifiers perform very well in assessing deadlift technique, even when using data derived from a single lumbar-worn IMU to detect specific naturally occurring technique mistakes.

150

7.1. Introduction The deadlift is a compound full-body exercise that is fundamental in resistance training, rehabilitation and powerlifting (1, 2). It is a complex movement that requires training to ensure correct form (1). Aberrant deadlift biomechanics have been shown to increase load shear forces in the lower back (3), potentiating the risk of injury. Thus, reliable assessment of deadlift biomechanics is necessary to mitigate injury risk. The assessment of deadlift biomechanics is typically undertaken using 3-D motion capture or subjective visual analysis, both of which have limitations. Using 3-D motion capture systems is expensive and data processing can be time intensive (4). Subjective visual assessment can prove unreliable as visually assessing numerous constituent components simultaneously is challenging (5). Wearable IMUs could bridge the gap between laboratory and clinical acquisition and assessment of human biomechanics as they allow for an inexpensive method of acquiring objective human movement data in unconstrained environments (6). In this chapter, the term IMU system will describe IMU sensors, sensor signals, associated signal processing and exercise classification algorithm output. A growing body of literature has investigated how these systems can be used for exercise biomechanics evaluation and feedback (7-16). These studies have demonstrated that IMU systems can monitor exercise biomechanics with moderate to excellent accuracy. Of these, only Gleadhill et al (16) analysed the deadlift using an IMU system. The authors compared an IMU system to a traditional 3D motion capture system in identifying temporal features in deadlift technique variations. They found high agreement between the two systems and stated that the work provided the foundations to use IMU systems for activity recognition and technique analysis. While a promising first step, they only analysed correlations between the two systems and did not attempt to classify technique deviations, meaning application in a real-world environment may be limited. Furthermore, no information is provided regarding the deadlift technique variations investigated, or if these variations were induced or natural. The majority of the above research classified exercise technique as acceptable or aberrant using universal classifiers. A universal classifier is built using a large data set collected from multiple participants. This type of classifier will function when presented with new data from individuals not included in the training data. These classifiers are often developed using induced deviations (i.e. deviations intentionally performed by participants). However, natural deviations may be nuanced and subsequently more difficult to classify. Therefore, universal classifiers may not always be suitable for exercise analysis. This may be particularly true in the deadlift, as the intricacies associated with optimal biomechanics can vary greatly between individuals (1). Furthermore, in a natural environment a variety of deviations may

151

present in different quantities, some occurring less frequently than others. This makes collecting a large and balanced data set of natural deviations challenging, which is necessary for the development of a robust universal classification system (17-19). For these reasons, a personalised classifier may be more appropriate for deadlift analysis. A personalised classifier is developed using data provided by a single person. IMU signals are collected from participants and each individual repetition is assessed and labelled by a movement expert through live or post-hoc video analysis. IMU signals for each repetition can then be associated with this repetition’s movement pattern. When the data set used for training the IMU system is collected this way, the system can be individualised. While this may prove more labour intensive than using an IMU system based on a universal classifier, it may be appropriate when analysing complex exercises like the deadlift. The objective of this study was to determine whether an IMU system could identify deviations from acceptable deadlift biomechanics. The aims of this study were: (a) determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of distinguishing between acceptable and aberrant deadlift biomechanics; (b) determine the capabilities of an IMU system at identifying specific deviations from acceptable deadlift biomechanics; (c) compare a personalised to a universal classifier in identifying the above; (d) compare the above on a large data set of deliberately induced technique deviations and a smaller data set of naturally occurring technique deviations.

7.2. Methods 7.2.0. Experimental Approach to Problem Two experiments were employed to enable the development of a wearable IMU system for assessing deadlift technique. In the first experiment 80 participants completed deadlifts with acceptable form and deliberately induced technique deviations (Table 7.1). In the second experiment 55 participants performed a 3-repetition maximum strength (3 RM) deadlift protocol to elicit natural deadlift biomechanics breakdown. A chartered physiotherapist labelled video data of each deadlift repetition as acceptable or containing one of the technique deviations (Table 7.1). The physiotherapist has extensive training in S&C and has previous experience evaluating deadlift biomechanics. In both experiments data were acquired from 5 IMUs (SHIMMER, Shimmer Research, Dublin, Ireland) (Figure 3.1). A total of 306 variables were extracted from the sensor signals from each IMU for every deadlift repetition. These variables were used to develop and evaluate an automated classification system. This was undertaken using data derived from each individual IMU and combinations

152

of multiple IMUs. A universal and a personalised classification system were evaluated for every participant. 7.2.1. Participants Eighty healthy volunteers (57 males, 23 females, age: 24.68 +/- 4.91 years, height: 1.75 +/0.094m, body mass: 76.01 +/- 13.29kg) were recruited for the first experiment in this study. Fifty-five members of this cohort also participated in the second experiment (37 males, 18 females, age = 24.21 +/- 5.25 years, height = 1.75 +/- 0.1 m, body mass = 75.09 +/- 13.56 kg). All participants had prior experience with the exercise and no musculoskeletal injury that would impair deadlift performance. Each participant signed a consent form prior to study commencement. The University Human Research Ethics Committee approved the study protocol. 7.2.2. Procedures The testing protocol was explained to participants upon their arrival at the laboratory. Prior to testing, a ten-minute warm-up on an exercise bike (Lode B.V., Groningen, The Netherlands) was completed. Next, a chartered physiotherapist secured the IMUs to the following predetermined specific anatomic locations on the participant using neoprene straps; over clothing at the spinous process of the 5th lumbar vertebra, the mid-point of both the right and left thighs (determined as half way between the greater trochanter and lateral femoral condyle), and on both shanks 2cms above the lateral malleolus (Figure 3.1). The orientation and location of the IMUs were consistent across participants and local frame x, y and z axes were used for each IMU (Figure 3.1). The straps used were specifically designed for exercise environments and minimised unwanted IMU position deviation due to clothing and movement artefact. The IMU settings chosen (sampling frequency: 51.2 Hz, tri-axial accelerometer (± 2 g), gyroscope (± 500 o/s) and magnetometer (± 1.9 Ga)) replicate those used in previous research and were based on pilot data analysis as described in Whelan, O'Reilly (8). Each IMU was calibrated for these specific sensor ranges, using the Shimmer 3 default local coordinate

system,

with

the

Shimmer

9DoF

Calibration

application

(http://www.shimmersensing.com/shop/shimmer-9dof-calibration). In experiment 1 the participants completed 10 deadlift repetitions with acceptable form and 3 repetitions of each deviation (Table 7.1). In order to ensure standardisation, form was considered acceptable if it was completed as defined by the National Strength and Conditioning Association (NSCA) (20). In experiment 2, participants completed a 3 RM test. This involves increasing load incrementally until an individual cannot maintain acceptable form and is described in detail by Horvat, Franklin (21). 153

7.2.3. Data Labelling Each deadlift repetition was separated and viewed on multiple occasions in a systematic format by the chartered physiotherapist. Repetitions were labelled as acceptable or the most dominant deviation from acceptable form was chosen. 7.2.4. Signal Processing Signal processing and classification analyses were completed using MATLAB (2012, The MathWorks, Natwick, USA). Spectral analysis was completed on the IMU data. It was found that all data pertaining to movement was in the 0-20 Hz frequency band. Therefore the accelerometer x, y, z, gyroscope x, y, z and magnetometer x, y, z signals were first low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Nine additional signals were then calculated as follows: IMU 3-D orientation was computed using the gradient descent algorithm developed by Madgwick, Harrison (22). The resulting W, X, Y and Z quaternion values are a mathematical representation of an object’s 3D orientation in space and are not subject to gimbal lock (23). The rotation quaternions were also converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal and transverse plane respectively. The magnitude of acceleration and rotational velocity were also computed using the vector magnitude of accelerometer x, y, z and gyroscope x, y, z respectively. Following this, each exercise repetition was programmatically extracted from the IMU data and resampled to a length of 250 samples. This was undertaken to time-normalise the data and minimise the influence of repetition tempo on signal feature calculations. 7.2.5. Classification Time-domain and frequency-domain descriptive features were computed to characterise each exercise repetition. The 17 features computed for each signal were ‘mean’, ‘RMS’, ‘standard deviation’, ‘kurtosis’, ‘median’, ‘skewness’, ‘range’, ‘variance’, ‘maximum’, ‘minimum’, ‘energy’, ‘25th percentile’, ‘75th percentile’, ‘fractal dimension’, ‘level crossing-rate’ and the variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 6 (Figure 7.1). These replicate those used in recent similar work (8, 15). These features, when used in combination, describe the shape of the various signals from each IMU. When a person’s motion is altered due to aberrant deadlift technique, the IMU signals will also change. The features used capture the diverse range of signal changes that can occur due to aberrant deadlift biomechanics. All computed features form a feature-vector for each repetition that is used along with the repetition’s label to train classification

algorithms.

154

Figure 7.1: Diagram linking number of IMUs, number of recorded and derived signals, number of features extracted and the variety of feature combinations used to test classifiers. 155

The random-forests method was employed to perform classification (24). During analysis, several types of classifiers were tested including K-Nearest Neighbours, Support Vector Machines and Naïve Bayes classifiers. However, none were shown to provide improved results on the datasets and some increased computational time. A total of 128 trees were used for each random forest. This number was chosen after observing the accuracy rate for incrementing number of trees from 1-500. While an increased number of trees will always improve classification accuracy, this increase was considered negligible when using more than 128 trees. Additional trees also reduce end user application efficiency. Initially, binary classification was evaluated using data from experiment 1 to establish how effectively each individual IMU and combination of IMUs could distinguish between acceptable and aberrant deadlift technique in a large, balanced data set of deliberately induced technique deviations. Multi-label classification was then evaluated on this data set to investigate how effectively each individual IMU and each IMU combination could be used to discriminate between acceptable deadlift technique and each of the deliberate deviations from acceptable technique (Table 7.1). Equivalent binary and multi-label classifiers were then applied to the data set from experiment 2. For each classification task, universal classifiers were evaluated using leave-one-subject-outcross-validation (LOSOCV) (25). Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the overrepresented class(es) were removed in order to balance the training data. The quality of the personalised exercise classification systems was established using leaveone-out-cross-validation (LOOCV) (25). Each deadlift repetition corresponds to one fold of the cross validation. At each fold, one repetition is held out as test data while the random forests classifier is trained with the same participant’s other completed repetitions. Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the overrepresented class(es) were removed to balance the training data. The held-out data is used to assess the classifier’s ability to correctly categorise new data it is presented with. Participants were not included for this analysis if they did not have at least 2 repetitions belonging to each class being classified, as this would not allow for training and test data for that class. The scores used to measure classification quality were accuracy, sensitivity and specificity, computed according to the below formulae (TP = True Positive; TN = True Negative; FP = False Positive; FN = False Negative). 156

𝑇𝑃+𝑇𝑁

1. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 𝑇𝑃

2. 𝑆𝑒𝑛s𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃+𝐹𝑁 𝑇𝑁

3. 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁+𝐹𝑃 In reviewing the accuracy, sensitivity and specificity scores produced by each classifier, 90% or higher was considered an ’excellent’ quality result, 80%-89% was considered a ’good’ quality result, 60-79% was considered a ’moderate’ result and anything less than 59% was deemed a poor result. This classification accuracy rating system has been used in previously published work (8, 15). For personalised classifiers, each participant’s scores were calculated, and then the mean and standard deviation across all participants were computed.

7.3. Results 7.3.1. Data Set Table 7.1 shows the total number of extracted deadlift repetitions for each class in experiment 1 (induced reps) and experiment 2 (natural reps). The data set from experiment 1 is larger and more balanced that that arising from experiment 2. Table 7.1: List and description of deadlift exercise deviations used in this study and the number of repetitions (n) extracted for each class when using induced deviations ad naturally occurring technique deviations. Induced Natural Deviation Description reps (n) reps (n) ACC Acceptable deadlift technique 796 854 SBB Shoulders behind bar at start position 212 0 RB Rounded back at any point during movement 211 40 HEX Hyperextended spine at any point during 191 85 movement BT OTH

Bar tilting Other

393 0

12 17

7.3.2. Experiment 1: Induced Technique Deviations Binary classification results for the data set collected in experiment 1, where participants completed deadlifts with deliberate technique deviations, are demonstrated in Table 7.2. It shows the classification accuracy, sensitivity and specificity (Formulae 1-3) following crossvalidation for both universal and personalised classifiers. Results are also compared for systems developed using data from each individual IMU and a variety of combinations of 5, 3 and 2 IMUs.

157

Multi-label classification results (i.e. detection of exact technique deviation) are demonstrated in Table 7.3. The results show classification efficacy when using data derived from each individual IMU and various combinations of multiple IMUs. 7.3.3. Experiment 2: Naturally Occurring Technique Deviations Table 7.4 compares the quality of universal classifiers and personalised classifiers in the binary classification of deadlift technique using the data set of naturally occurring technique deviations from experiment 2. Classification efficacy is shown for systems using multiple IMUs and systems developed using individual IMUs at various anatomical positions. The results shown in Table 7.5 show the capacity of IMU based systems to classify which natural deviation presents using universal and personalised classifiers.

158

Table 7.2: Overall accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV to evaluate global classifiers and LOOCV to evaluate personalised classifiers for induced technique deviations. Global Classifiers Personalised Classifiers ( 𝑥̅ (𝑆𝐷)) IMU placement(s) Accuracy (%) Sensitivity (%) Specificity (%) Accuracy (%) Sensitivity (%) Specificity (%) All 5 Sensors Lumbar & Shanks Lumbar & Thighs Both Shanks Both Thighs Left Shank Left Thigh Lumbar Right Thigh Right Shank

75 71 74 66 73 64 68 70 72 63

57 53 56 47 58 48 58 52 59 42

89 85 87 80 84 76 75 83 82 79

93 (6) 93 (6) 91 (6) 91 (6) 90 (8) 88 (10) 87 (8) 88 (8) 89 (9) 90 (7)

90 (9) 91 (7) 89 (9) 88 (9) 86 (10) 86 (11) 85 (9) 90 (9) 86 (10) 87 (8)

96 (6) 96 (7) 93 (9) 95 (7) 93 (8) 89 (11) 89 (10) 86 (11) 91 (10) 92 (10)

Table 7.3: Overall accuracy, sensitivity and specificity in multi-class classification (exact deviation) for each combination of IMUs following LOSOCV to evaluate global classifiers and LOOCV to evaluate personalised classifiers for induced technique deviations. Global Classifiers Personalised Classifiers ( 𝑥̅ (𝑆𝐷)) IMU placement(s) Accuracy (%) Sensitivity (%) Specificity (%) Accuracy (%) Sensitivity (%) Specificity (%) 60 62 92 81 (11) 83 (13) 96 (2) All 5 Sensors Lumbar & Shanks 56 59 91 81 (10) 83 (11) 96 (2) Lumbar & Thighs 55 56 91 79 (12) 81 (13) 96 (3) Both Shanks 38 43 88 75 (14) 77 (15) 95 (3) Both Thighs 48 46 89 74 (13) 76 (14) 95 (3) Left Shank 37 41 87 71 (15) 73 (17) 94 (3) Left Thigh 37 38 87 67 (13) 69 (16) 93 (3) Lumbar 49 52 90 72 (13) 75 (14) 94 (3) Right Thigh 44 43 89 74 (14) 75 (14) 94 (3) Right Shank 34 39 87 73 (13) 74 (15) 94 (3) 159

Table 7.4: Overall accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV to evaluate universal classifiers and LOOCV to evaluate personalised classifiers for natural technique deviations. Global Classifiers Personalised Classifiers ( 𝑥̅ (𝑆𝐷)) IMU placement(s) Accuracy (%) Sensitivity (%) Specificity (%) Accuracy (%) Sensitivity (%) Specificity (%) 73 78 49 84 (13) 83 (17) 83 (17) All 5 Sensors Lumbar & Shanks 70 76 34 83 (13) 81 (17) 82 (16) Lumbar & Thighs 70 76 42 82 (12) 80 (16) 81 (20) Both Shanks 65 72 27 82 (15) 79 (22) 78 (20) Both Thighs 70 74 42 82 (14) 79 (16) 82 (23) Left Shank 63 68 39 80 (15) 80 (17) 76 (24) Left Thigh 67 70 48 80 (14) 79 (16) 77 (22) Lumbar 70 76 34 80 (14) 80 (15) 78 (22) Right Thigh 71 78 36 82 (13) 79 (16) 81 (21) Right Shank 69 76 31 80 (15) 79 (21) 77 (23) Table 7.5: Overall accuracy, sensitivity and specificity in multi-class classification (exact deviation) for each combination of IMUs following LOSOCV to evaluate global classifiers and LOOCV to evaluate personalised classifiers for natural technique deviations. Global Classifiers Personalised Classifiers ( 𝑥̅ (𝑆𝐷)) IMU placement(s) Accuracy (%) Sensitivity (%) Specificity (%) Accuracy (%) Sensitivity (%) Specificity (%) All 5 Sensors 54 18 87 78 (13) 74 (21) 90 (12) Lumbar & Shanks 54 18 87 75 (13) 78 (13) 66 (34) Lumbar & Thighs 32 18 82 77 (13) 75 (15) 81 (15) Both Shanks 32 17 84 73 (18) 72 (22) 78 (21) Both Thighs 48 13 83 74 (19) 72 (25) 77 (22) Left Shank 47 11 82 71 (18) 74 (16) 70 (29) Left Thigh 56 11 81 67 (20) 68 (21) 68 (23) Lumbar 36 11 81 75 (14) 74 (15) 83 (12) Right Thigh 38 13 77 71 (19) 71 (24) 71 (26) Right Shank 53 15 84 69 (13) 65 (20) 78 (20) 160

7.4. Discussion The objective of this study was to determine whether an IMU system could identify deviations from acceptable deadlift biomechanics. The results in section 3 indicate this is possible with good to excellent overall accuracy using a personalised classifier. Personalised classifiers outperform universal classifiers in attempting to identify both induced and natural deadlift deviations regardless of IMU set-up. IMU systems using a personalised classifier produce good to excellent accuracy when identifying induced deviations from acceptable deadlift biomechanics (Tables 2 and 3). Personalised classifiers can also identify natural deviations with moderate to good accuracy (Tables 4 and 5). In reviewing the literature, no data was found on the ability of an IMU system to classify deadlift technique. Gleadhill et al (16) compared the ability of an IMU system in identifying temporal features in the deadlift to a motion capture system, finding high agreement. The results presented in this work build upon this research by using temporal and other time and frequency domain features to create a classification framework. Furthermore, the high agreement achieved by Gleadhill et al (16) is with a 3 IMU system. The results presented in Section 3 indicate that a single IMU system is capable of classifying acceptable and aberrant deadlift technique with moderate to excellent accuracy. Single IMU systems are less expensive and more practical for end users due to reduced risk of placement error and power usage, making them more desirable for daily environment applications (26). It is difficult to directly compare results with similar work due to differences in exercises investigated, data set sizes, sensor positions and end-user feedback. However, these results compare favourably to research in the area (7, 10-15). The majority of research to date has investigated the ability of IMU systems to monitor technique in simple exercises such as straight leg raises, dumbbell curls or heel slides (12). This chapter evaluates an IMU system’s ability to assess deadlift biomechanics, a complex multi-joint exercise. The presented system can distinguish between six different deadlift classes (acceptable and five deviations) with moderate to good overall accuracy (Table 7.3 and Table 7.5). The lower number of classes in some studies (7, 11, 12) may make it easier for classifiers to identify specific deviations with higher efficacy. Furthermore, the system presented in this work is capable of identifying natural deviations from acceptable deadlift biomechanics. The majority of previous research identified induced deviations using a universal classifier (7, 12, 13). Universal classification techniques have been shown to classify naturally occurring deviations in the single leg squat with moderate accuracy (8, 14). However, the ability of this classifier to identify specific deadlift deviations is poor (Table 7.5). This may be due to a number of factors. The number of acceptable deadlifts far outnumbers any other label (Table

161

7.1), and this unbalanced data set makes it difficult to create universal classifiers that can be used for all individuals (17, 19). As many deviations were sporadic, the use of a universal classifier to identify specific deadlift deviations may require a larger data set including more deviations. Additionally, the inter-subject variability in acceptable deadlift biomechanics, as described by the IMU sensor signal features, may exceed the intra-subject variability between acceptable technique and aberrant deadlift biomechanics. This would make universal classifier creation difficult. In addition to producing higher overall classification accuracy, a personalised classifier may offer other benefits. Personalised classifiers are more computationally efficient than universal classifiers as they use less training data and therefore require less memory. Unlike universal classifier development, they negate the need for a large data set to classify exercise biomechanics (17, 19). The use of a personalised classifier may also allow for the development of a universal classifier in the future. All labelled data collected for personalised classifier development could be stored and used to build the large data set necessary to improve universal classifiers. The main disadvantage associated with a personalised classifier is that data must be collected and labelled from individual patients. This means practitioners must monitor exercise technique in real time or use post hoc video analysis and label appropriately, which may prove time consuming. However, since practitioners often monitor exercise biomechanics prior to independent exercise completion, it may fit into clinical practice without difficulty. In an effort to streamline this process, the authors have recently developed a tablet application that enables clinicians to simultaneously capture video and IMU data from a person exercising. The application automatically splits video and IMU data in to reps, allows efficient repetition labelling and can automatically build personalised classifiers. In conclusion, the deadlift is important in rehabilitation and S&C. Accurate quantification deadlift biomechanics is important to reduce injury risk and ensure goals are achieved. The work presented in this chapter indicates that an IMU system can classify acceptable and aberrant deadlift biomechanics with good to excellent overall accuracy, sensitivity and specificity using a personalised classifier. Furthermore, a personalised classification system is far better at identifying specific naturally occurring deadlift deviations. The results presented in this work are comparable with current research in the area. However, most of this research has been carried out using universal classifiers and identifying induced deviations. While a universal classifier may allow for less end-user interaction, it is difficult to classify naturally occurring deviations from acceptable deadlift biomechanics using this technique. As a result, the use of a personalised classifier may be more appropriate for identifying aberrant deadlift biomechanics. 162

7.6. References 1.

Hales M. Improving the deadlift: Understanding biomechanical constraints and physiological adaptations to resistance exercise. Strength & Conditioning Journal. 2010;32(4):44-51.

2.

Escamilla RF, Francisco AC, Fleisig GS, Barrentine SW, Welch CM, Kayes AV, et al. A three-dimensional biomechanical analysis of sumo and conventional style deadlifts. Medicine and science in sports and exercise. 2000;32(7):1265-75.

3.

Cholewicki J, McGill S, Norman R. Lumbar spine loads during the lifting of extremely heavy weights. Medicine and science in sports and exercise. 1991;23(10):1179-86.

4.

Bonnet V, Mazza C, Fraisse P, Cappozzo A. Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Transactions on Biomedical Engineering. 2013;60(7):1920-6.

5.

Whiteside D, Deneweth JM, Pohorence MA, Sandoval B, Russell JR, McLean SG, et al. Grading the functional movement screen: A comparison of manual (real-time) and objective methods. The Journal of Strength & Conditioning Research. 2016;30(4):92433.

6.

McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012;15(4):207-13.

7.

Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2012. p. 2214-2218.

8.

Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units. Methods of Information in Medicine. 2016;56(2):88-94.

9.

Pernek I, Kurillo G, Stiglic G, Bajcsy R. Recognizing the intensity of strength training exercises with wearable sensors. Journal of Biomedical Informatics. 2015;58:145-55.

10. Melzi S, Borsani L, Cesana M. The virtual trainer: supervising movements through a wearable wireless sensor network. In: Proceedings of the 2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, (SECON Workshops); 2009.

22 Jun; Rome, Italy. N.Y.,

U.S.A.: IEEE; 2009. p. 1-3. 11. Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H. Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference; 2013. Mar 7; Stuttgart, Germany. N.Y., U.S.A.: ACM; 2014. p. 116-123.

163

12. Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. Journal of Neuroengineering and Rehabilitation. 2014;11(1):158-68. 13. O’Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T, et al. Evaluating squat performance with a single inertial measurement unit. In: Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. p.1-6. 14. Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit. In: Proceedings of the 3rd Workshop on ICTs for improving Patients Rehabilitation Research Techniques, (REHAB); 2015. Oct 1; Lisbon, Portugal. N.Y., U.S.A.: ACM; 2015. p. 144–7. 15. Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Lunge Exercise with Multiple and Individual Inertial Measurement Units. In: Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare, Pervasive Health; 2016. May 16; Cancun, Mexico. N.Y., U.S.A.: ACM; 2016. p. 101-8. 16. Gleadhill S, Lee JB, James D. The development and validation of using inertial sensors to monitor postural change in resistance exercise. Journal of biomechanics. 2016;49(7):1259-63. 17. Chawla NV. Data mining for imbalanced datasets: An overview.

Data mining and

knowledge discovery handbook. N.Y.C., N.Y., U.S.A.: Springer; 2005. p. 853-67. 18. Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques.

Emerging Artificial Intelligence Applications in Computer

Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies: IOS Press; 2007. p. 3-25. 19. He H, Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 2009;21(9):1263-84. 20. Baechle TR, Earle RW. Resistance Training Exercise Techniques. NSCA's Essentials of Personal Training. Champaign, I.L., U.S.A.: Human Kinetics; 2004. 21. Horvat M, Franklin C, Born D. Predicting strength in high school women athletes. The Journal of Strength & Conditioning Research. 2007;21(4):1018-22. 22. Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7. 23. Kuipers JB. Quaternions and rotation sequences: Princeton university press Princeton; 1999. 164

24. Breiman L. Random forests. Machine Learning. 2001;45(1):5-32. 25. Fushiki T. Estimation of prediction error by using K-fold cross-validation. Statistics and Computing. 2011;21(2):137-46. 26. Bonnet V, Mazza C, Fraisse P, Cappozzo A. A least-squares identification algorithm for estimating squat exercise mechanics using a single inertial measurement unit. Journal of biomechanics. 2012;45(8):1472-7.

165

Chapter 8

Classification of Barbell Squat Biomechanics with Multiple and Individual Inertial Measurement Units

This chapter is based on the following paper which is published in Methods of Information in Medicine: O'Reilly MA, Whelan D, Ward TE, Delahunt E, Caulfield B. Technology in Rehabilitation: Comparing Personalised and Global Classification Methodologies in Evaluating the Squat Exercise with Wearable IMUs. Methods of information in medicine. 2017 Jun 14;56(4).

166

8.0. Abstract Background: The barbell squat is a popularly used lower limb rehabilitation exercise. It is also an integral exercise in injury risk screening protocols. To date athlete/patient technique has been assessed using expensive laboratory equipment or subjective clinical judgement; both of which are not without shortcomings. Inertial measurement units (IMUs) may offer a low-cost solution for the objective evaluation of athlete/patient technique. However, it is not yet known if global classification techniques are effective in identifying naturally occurring minor deviations during the barbell squat. Objectives: The aims of this study were to: (a) determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of distinguishing between acceptable and aberrant barbell squat technique; (b) determine the capabilities of an IMU system at identifying specific natural deviations from acceptable barbell squat technique; and (c) compare a personalised (N=1) classifier to a global classifier in identifying the above. Methods Fifty-five healthy volunteers (37 males, 18 females, age = 24.21 +/- 5.25 years, height = 1.75 +/- 0.1 m, body mass = 75.09 +/- 13.56 kg) participated in the study. All participants performed a barbell squat 3-repetition maximum max strength test. IMUs were positioned on participants’ lumbar spine, both shanks and both thighs; these were utilized to record tri-axial accelerometer, gyroscope and magnetometer data during all repetitions of the barbell squat exercise. Technique was assessed and labelled by a chartered physiotherapist using an evaluation framework. Features were extracted from the labelled IMU data. These features were used to train and evaluate both global and personalised random forests classifiers. Results: Global classification techniques produced poor accuracy (AC), sensitivity (SE) and specificity (SP) scores in binary classification even with a 5 IMU set-up in both binary (AC: 64%, SE: 70%, SP: 28%) and multi-class classification (AC: 59%, SE: 24%, SP: 84%). However, utilising personalised classification techniques even with a single IMU positioned on the left thigh produced good binary classification scores (AC: 81%, SE: 81%, SP: 84%) and moderate-to-good multi-class scores (AC: 69%, SE: 70%, SP: 89%). Conclusions: There are several challenges in developing global classification exercise technique evaluation systems for rehabilitation exercises such as the barbell squat. Building large, balanced data sets to train such systems is difficult and time intensive. Minor, naturally occurring deviations may not be detected utilising global classification approaches. Personalised classification approaches allow for higher accuracy and greater system efficiency for end-users in detecting naturally occurring barbell squat technique deviations.

167

Applying this approach also allows for a single-IMU set up to achieve similar accuracy to a multi-IMU setup, which reduces total system cost and maximises system usability.

168

8.1. Introduction The squat is a compound full-body exercise, whose constituent movements are integral to activities of daily living. The barbell squat (squat with a weighted barbell placed across the upper shoulders) often features as a fundamental exercise in resistance training and rehabilitation programs. Furthermore, it is incorporated into musculoskeletal injury risk screening/identification protocols (1). Aberrant squat technique has been shown to increase stress on the joints of the lower extremity (2), potentiating the risk of injury. Thus, the reliable assessment of technique is necessary to mitigate injury risk. The assessment of squat technique is typically undertaken using one of two distinct methods: (a) 3-D motion capture; (b) subjective visual analysis. Both have a number of limitations. 3-D motion capture systems are expensive and the application of skin-mounted markers may hinder normal movement (3, 4). Furthermore, data processing can be time intensive and specific expertise is often required to interpret the processed data and make recommendations on the observed results. Therefore, these systems are not frequently used to assess squat technique beyond the research laboratory (5). In clinical and gym-based settings, subjective visual assessment is typically used to assess technique. This subjective visual assessment of human movement is not always reliable even amongst experts, as the need to visually assess the numerous constituent components of the movement simultaneously is challenging (6). Wearable inertial measurement units (IMUs) may offer the potential to bridge the gap between laboratory and day-to-day “real-world” acquisition and assessment of human movement. These IMUs are small, inexpensive sensors that consist of accelerometers, gyroscopes and magnetometers. They are able to acquire data pertaining to the linear and angular motion of individual limb segments, and the centre of mass of the body. Selfcontained, wireless IMU devices are easy to set up and allow for the acquisition of human movement data in unconstrained environments (7). In this paper the term IMU system will be used to describe the IMU sensors, the sensor signals, the associated signal processing applied to them and the output of the exercise classification algorithms. IMU systems can robustly track the variety of postures and environmental complexities associated with training, unlike camera-based systems, which are hampered by location, occlusion and lighting issues in such settings (8). IMUs have also been shown to be as effective as marker-based systems at measuring joint angles (5, 9, 10). There are many commercially available examples of IMU systems that monitor physical activity (e.g. JawboneTM and FitbitTM). However, using IMU systems to assess gym-based exercises such as the barbell squat is less common. Researchers have demonstrated the ability of IMU169

based systems to distinguish different exercises and count exercise repetitions with moderate to good levels of accuracy (11-15). Whilst these systems are capable of counting exercise repetitions, they do not provide instruction on technique and performance quality. A holistic exercise tracking system should not only recognise the exercise completed and count repetitions, but should also provide technique feedback. Furthermore, in order for IMU systems to assess human movement data as part of a musculoskeletal injury risk screening protocol, they need to be able to identify aberrant movement patterns and provide easily interpretable data to clinicians/coaches who use them. A growing body of scientific literature has investigated the ability of IMU systems to assess technique to provide this holistic exercise analysis (15-22). The majority of authors have developed these IMU systems by employing the following steps: (a) collection of labelled data set; (b) pre-processing of data; (c) data segmentation; (d) feature extraction; (e) classification development and evaluation (23). These studies have demonstrated the ability of IMU systems in identifying deviations with moderate to excellent levels of accuracy in exercises such as the biceps curl, military press, squat and lunge. However, the majority of these IMU systems were developed using a data set consisting of induced deviations (i.e. deviations that were intentionally performed by participants). When deviations occur naturally, the exact way in which they present may be more nuanced and subsequently more difficult to classify. This means that these systems may not be suitable for a real-world environment where deviations present in a natural manner. When collecting data in a natural environment a variety of deviations may present in different quantities, with some deviations occurring less frequently than others. This means collecting a large and balanced data set of natural deviations is challenging. This is necessary to allow for the development of a robust global classification system (24-26). In these situations, a personalised classifier may be more appropriate. A personalised classifier is a classifier developed on data provided by a single person (N=1). The data used to develop this classifier is collected from participants as they complete exercises wearing IMUs. Each individual exercise repetition is assessed and labelled by a movement expert live, or through post-hoc video analysis. This means that IMU signals for each exercise repetition can be associated with this repetition’s movement pattern. When the data set used for training the IMU system is collected in this way, it means the system can be individualised. While this may prove more labour intensive than using an IMU system based on a global classifier, it may be more appropriate in some situations.

170

8.2. Objectives The barbell squat is a compound full-body exercise that is typically a constituent component of resistance training, rehabilitation programs and musculoskeletal injury risk screening protocols. Incorrect technique can increase the risk of sustaining a musculoskeletal injury. Traditionally, exercise technique has been evaluated using expensive motion capture systems or via subjective visual inspection from trained professionals. IMU systems offer an opportunity to provide low-cost exercise technique assessment. However, to date, no research has evaluated the capability of IMU systems to identify natural deviations in barbell squat technique. In this setting, the use of an individualised classifier based on an N=1 data set may prove more appropriate than global classifiers. Therefore, the research question that this study seeks to address is: “how well can an IMUbased system quantify barbell squat technique?” The aims of this study were to: (a) determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of distinguishing acceptable and aberrant barbell squat technique; (b) determine the capabilities of an IMU system at identifying specific natural deviations from acceptable barbell squat technique; (c) compare a personalised to a global classifier in identifying the above.

8.2. Methods 8.2.0. Experimental Approach to Problem This study employed an opportunistic approach to the development of a wearable sensor system for automatically assessing barbell squat technique. Participants were required to perform a 3-repetition maximum barbell squat test. This test was recorded in HD video. A chartered physiotherapist then assessed each repetition video and labelled the labelled it appropriately (i.e. acceptable or containing one of the deviations identified in Table 8.1). In order to ensure standardisation, form was considered acceptable if it was completed as defined by the National Strength and Conditioning Association (NSCA) (27). The deviations from this acceptable form are detailed in Table 8.1. During performance of the barbell squats, data was acquired from 5 IMUs (SHIMMER, Shimmer Research, Dublin, Ireland) placed on the lumbar spine, right and left thigh and right and left shank. The IMUs were positioned on each participant by the same researcher using a standardised and repeatable protocol. Participants were allowed a rest interval between performances of each set of repetitions. Following data collection, a total of 306 variables were extracted from the sensor signals for every repetition from each IMU. These variables were used to develop and evaluate the quality of an automated classification system for the analysis of barbell squat technique. This 171

was undertaken using data derived from each individual IMU and combinations of multiple IMUs. A global classification system was evaluated as well as separate (N=1) personalised classifier for each participant. Table 8.1: List and description of barbell squat exercise deviations used in this study. Label Description Acceptable Acceptable technique Knee Valgus Knees coming together during downward phase Knee Varus Knees coming apart during downward phase Knees Too Knees ahead of toes during downward phase Forward Heels Elevated Heels raising off the ground during exercise Bent Over Excessive flexion of hip and torso during exercise Other Other deviation, not highlighted in NSCA guidelines NSCA = National Strength and Conditioning Association

8.2.1. Participants Fifty-five healthy volunteers (37 males, 18 females, age = 24.21 +/- 5.25 years, height = 1.75 +/- 0.1 m, body mass = 75.09 +/- 13.56 kg) participated in the study. No participant reported having a current or recent musculoskeletal injury that would impair his or her performance of the exercise. All participants reported a level of familiarity with the barbell squat exercise. The University College Dublin Human Research Ethics Committee approved the study protocol and written informed consent was obtained from all participants before testing. In cases where participants were under the age of 18, written informed consent was also obtained from a parent or guardian. 8.2.2. Procedures The testing protocol was explained to participants upon their arrival at the laboratory. Prior to formal testing, all participants performed a ten-minute warm-up on an exercise bike (Lode B.V., Groningen, The Netherlands) maintaining a power output of 100W and constant cadence of 75-85 revolutions per minute. Following completion of the warm-up, a chartered physiotherapist secured the IMUs to pre-determined specific anatomic locations on the participant as follows: the spinous process of the 5th lumbar vertebra, the mid-point of both the right and left femurs (determined as half way between the greater trochanter and lateral femoral condyle), and on both shanks 2cms above the lateral malleolus (Figure 3.1). The orientation and location of the IMUs was consistent across participants. A pilot study was undertaken to determine the most appropriate sampling rate and the ranges for the accelerometer and gyroscope on board the IMUs. For the pilot study, data were acquired (512 samples/sec) during performance of the squat, lunge, deadlift, single leg 172

squat and tuck jump exercises. A Fourier transform was then used to estimate the spectral extent of the signals which was found to be less than 20 Hz. Therefore, a sampling rate of 51.2 samples/s was chosen based upon the Shannon sampling theorem and the Nyquist criterion (28). Each IMU was configured to stream triaxial accelerometer (± 2 g), gyroscope (± 500 o/s) and magnetometer (± 1.9 Ga) data with the sensor ranges chosen based upon data from the pilot study. Each IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application (29). Participants were required to complete a full 3-repetition maximum (3RM) strength test for the barbell squat (29). Following a warm-up on an exercise bike, participants completed a set of barbell squat exercises with a resistance that allowed for 8-12 repetitions comfortably. After resting for 1-minute, the load was increased by 10-20% and they performed a further 46 repetitions. This was followed by a 2-minute rest period. Following this they performed 3 repetitions with near maximum load. They then rested for 2-4 minutes. If they passed the previous set, the weight was incremented by 5-10% and another 3-repetition set was completed. This load increment was repeated until the participant could no longer lift the weight in a safe manner for three repetitions.

8.2.3. Data Labelling All repetitions were recorded using a HD video camera placed in front of the participants. The video recordings of each exercise repetition were reviewed by a chartered physiotherapist with over seven years’ experience in musculoskeletal and sports physiotherapy. Each exercise repetition was separated and reviewed on multiple occasions systematically. For each repetition, the chartered physiotherapist first deemed if exercise technique was “acceptable”. The criteria for acceptable technique were based upon the recommendations detailed in National Strength and Conditioning Association guidelines (27). For safety reasons participants completed the exercise in a squat rack. The barbell was placed on the rack just above shoulder level and loaded appropriately. The participant then stepped under the bar and placed it on the back of their shoulders, slightly below their neck. The bar was held with both arms and lifted off the rack by pushing with the legs and straightening the torso. The participant then stepped away from the rack and completed the squatting movement. Their chest was held up and out with their head tilted slightly up. As participants moved into the squat position they were instructed to allow hips and knees to flex while keeping their torso to floor angle constant. They were required to keep their heels on the floor and knees aligned over their feet. Participants continued flexing at the hips and knees until their thighs were parallel to the floor. As they moved upward a flat back was maintained

173

and their chest was held up and out. Hips and knees were to be extended at the same rate with heels on floor and knees aligned over feet until the starting position was reached. The bar was then placed back on the rack. If a repetition was not completed as above, then the chartered physiotherapist selected the most dominant deviation from a pre-defined list (Table 8.1). This method of data labelling replicates methods from recently published work in the field of IMU based exercise technique classification systems (21). 8.2.4. Signal Processing and Statistical Analysis Nine signals were collected from each IMU; accelerometer x, y, z, gyroscope x, y, z and magnetometer x, y, z. Data were analysed using MATLAB (2012, The MathWorks, Natwick, USA). To eliminate unwanted high-frequency noise during each repetition, the nine signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Whilst classification is solely possible using features derived from the accelerometer, gyroscope and magnetometer signals, the use of additionally derived signals improves system accuracy, sensitivity and specificity. As such, nine additional signals were then calculated as follows: The 3-D orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al. (30). The resulting W, X, Y and Z quaternion values were also converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal and transverse plane respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y, z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the lumbar spine and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y, z. Each exercise repetition was extracted from the IMU data and resampled to a length of 250 samples. This time-normalisation was undertaken to minimise the influence a participant’s repetition tempo had on signal feature calculations. It also ensured consistent computational efficiency in applications for end users and has been used in recently published, similar work (19, 21, 22). Repetitions completed by the participant where the IMU’s Bluetooth signal dropped were excluded from analysis. The total number of repetitions belonging to each class are shown in Table 8.2. Time-domain and frequency-domain descriptive features were computed to describe the pattern of each of the eighteen signals when the barbell squats were completed. These features were namely 'Mean', 'RMS', 'Standard Deviation', 'Kurtosis', 'Median', 'Skewness', ‘ Range', ‘Variance', 'Max', 'Min', 'Energy', '25th Percentile', '75th Percentile', 'Level Crossing Rate', 'Fractal Dimension' (31) and the ‘variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to

174

level 7’ (32). This resulted in 17 features for each of the 18 available signals producing a total of 306 features per IMU. Figure 7.1 summarises the above whereby, 5 IMUs recorded 9 signals each, 9 more signals were derived from these resulting in a total of 18 signals per IMU. 17 features were computed per repetition for each signal from each IMU resulting in a total of 1530 features (306 per IMU, 17 per signal). These features were then used to develop and evaluate a variety of classifiers as described below. The random-forests method was employed to perform classification (33). This technique was chosen as it has been shown to be effective in analysing exercise technique with IMUs when compared to the Naïve-Bayes and Radial-basis function network techniques (34). 128 decision trees were used in each random-forest classifier. Classifiers were developed and evaluated for the ten combinations of IMUs as shown in Figure 7.1. Initially, binary classification was evaluated to establish how effectively each individual IMU and each combination of IMUs could distinguish between acceptable and aberrant barbell squat technique. All repetitions of acceptable technique were labelled ‘0’ and all repetitions performed with one of the pre-defined deviations as outlined in Table 8.1 were labelled ‘1’. Multi-label classification was then evaluated on the IMU data to investigate how effectively each individual IMU and each IMU combination could be used to discriminate between acceptable barbell squat technique and each of the six pre-defined deviations from acceptable technique as described in Table 8.1. All repetitions of acceptable performance remained labelled as ‘0’ and each of the different deviations were labelled ‘1-6’. The quality of the global exercise classification system was established using leave-onesubject-out-cross-validation (LOSOCV) and the random-forests classifier with 128 trees (35). Each participant’s data corresponds to one fold of the cross validation. At each fold, one participant’s data is held out as test data while the random forests classifier is trained with all other participants’ data. Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the over-represented class(es) were removed to balance the training data. The held-out data is used to assess the classifier’s ability to correctly categorise new data it is presented with. The use of LOSOCV ensures that there is no biasing of the classifiers, because the test subjects data is completely unseen by the classifier prior to testing. The quality of the personalised exercise classification systems was established using leaveone-out-cross-validation and a random forests classifier with 128 trees. Each repetition 175

corresponds to one fold of the cross validation. At each fold, one repetition is held out as test data while the random forests classifier is trained with the same participant’s other completed repetitions. Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the over-represented class(es) were removed to balance the training data. The held-out data is used to assess the classifier’s ability to correctly categorise new data it is presented with. Participants were not included for this analysis if they did not have at least 2 repetitions belonging to each class being classified as this would not allow for training and test data for that class. The scores used to measure the quality of classification were total accuracy, average sensitivity and average specificity. Accuracy is the number of correctly classified repetitions of all the exercises divided by the total number of repetitions completed; this is calculated as the sum of the true positives (TP) and true negatives (TN) divided by the sum of the true positives, false positives (FP), true negatives and false negatives (FN): 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =

𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁

In binary classification, acceptable technique was considered the ‘positive’ class and aberrant technique was considered the ‘negative’ class. As such, single sensitivity and specificity values were computed to establish binary classification quality for each IMU combination. In multi-label classification, the sensitivity and specificity were calculated for each of the six class labels as outlined in Table 8.1. Each label was sequentially treated as the ‘positive’ class, and then the mean and standard deviation across the six values was taken. Sensitivity and specificity were computed using the formulas below. Sensitivity measures the effectiveness of a classifier at identifying a desired label, while specificity measures the classifier’s ability to detect other labels. 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =

𝑇𝑃 𝑇𝑃 + 𝐹𝑁

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =

𝑇𝑁 𝑇𝑁 + 𝐹𝑃

In addition to these measures, receiver operating characteristic (ROC) curves were plotted to compare the quality of global and individualised binary classifiers. A single ROC curve was created for individualised classifiers and global classifiers by pooling the true label score and predicted labels together for all participants. The MATLAB ‘perfcurve’ function was used to

176

generate

the

X

and

Y

points

for

both

ROC

curves

[https://uk.mathworks.com/help/stats/perfcurve.html]. In reviewing the accuracy, sensitivity and specificity scores produced by each classifier, 90% or higher was considered an ’excellent’ quality result, 80%-89% was considered a ’good’ quality result, 60-79% was considered a ’moderate’ result and anything less than 59% was deemed a poor result. The authors chose these values after reviewing the aforementioned literature on identifying deviations from acceptable exercise performance using data derived from IMUs. In reviewing such literature, an existing accepted standard for an excellent, good, moderate or poor classifier could not be found. Therefore, the above system was agreed on by the authors to facilitate interpretation of results.

8.3. Results Table 8.2 shows the total number of repetitions collected for each class, as labelled by the chartered physiotherapist. For binary classification, there were 884 acceptable repetitions and 606 aberrant repetitions recorded. Table 8.2: List and description of barbell squat exercise labels used in this study and the number of repetitions extracted of each class as labelled by the chartered physiotherapist. Label Description Total reps Acceptable

Acceptable technique

884

Knee Valgus

Knees coming together during downward phase

22

Knee Varus

Knees coming apart during downward phase

183

Knees ahead of toes during downward phase

50

Heels raising off the ground during exercise

7

Bent Over

Excessive flexion of hip and torso during exercise

96

Other

Other deviation, not highlighted in NSCA guidelines

250

Knees Too Forward Heels Elevated

NSCA = National Strength and Conditioning Association Table 8.3 demonstrates the accuracy, sensitivity and specificity of the global classification methods in binary classification.

177

Table 8.3: Overall accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV using global classifiers. Sensor(s) Accuracy (%) Sensitivity (%) Specificity (%) All 5 Sensors 64 70 28 Lumbar & Shanks 65 69 34 Lumbar & Thighs 62 68 21 Both Shanks 66 70 38 Both Thighs 63 75 26 Left Shank 62 70 31 Left Thigh 63 69 24 Lumbar 61 68 21 Right Thigh 63 70 27 Right Shank 62 69 45

Table 8.4 shows the total accuracy, mean sensitivity and mean specificity of the global classification methods in multi-class classification (detection of exact deviation).

Table 8.4: Overall accuracy, average sensitivity and average specificity in multi-label classification (exact deviation) for each combination of IMUs following LOSOCV using global classifiers. Sensor(s) Accuracy (%) Sensitivity (%) Specificity (%) All 5 Sensors 59 24 84 Lumbar & Shanks 57 25 85 Lumbar & Thighs 57 22 84 Both Shanks 53 20 85 Both Thighs 52 15 82 Left Shank 48 19 85 Left Thigh 48 15 82 Lumbar 52 19 83 Right Thigh 51 14 82 Right Shank 55 21 86

Table 8.5 demonstrates the mean accuracy, sensitivity and specificity scores for each individual participant’s personalised barbell squat technique binary classifier that was evaluated with LOOCV.

178

Table 8.5: Average accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOOCV using personalised, N of 1 classifiers. Sensor(s) Accuracy (%) ± SD Sensitivity (%) ± SD Specificity (%) ± SD All 5 Sensors 82 ± 13 83 ± 14 84 ± 14 Lumbar & Shanks 80 ± 14 81 ± 16 82 ± 14 Lumbar & Thighs 82 ± 12 82 ± 13 87 ± 11 Both Shanks 79 ± 16 80 ± 19 81 ± 15 Both Thighs 83 ± 11 84 ± 12 88 ± 12 Left Shank 79 ± 6 81 ± 17 80 ± 20 Left Thigh 81 ± 13 81 ± 13 84 ± 16 Lumbar 80 ± 14 81 ± 15 83 ± 16 Right Thigh 80 ± 16 84 ± 12 82 ± 17 Right Shank 80 ± 15 78 ± 17 82 ± 15

Figure 8.1 shows an ROC curve for all participants when both global and individualised classification methodologies were used for a binary classification system based on data from the left thigh IMU. The area under the curve (AUC) for the global method was 0.52 and the AUC for the personalised method was 0.98.

Figure 8.1: ROC curves comparing binary classification systems when using global and personalised classification methodologies using data from the left thigh IMU. 'Acceptable' technique was considered the ‘true’ class.

179

Table 8.6 demonstrates the mean accuracy, sensitivity and specificity scores for each individual participant’s personalised barbell squat technique multi-class classifier that was evaluated with LOOCV. Table 8.6: Overall accuracy, average sensitivity and average specificity in multi-label classification (exact deviation) for each combination of IMUs following LOOCV using personalised, N of 1 classifiers. Sensor(s) Accuracy (%) ± SD Sensitivity (%) ± SD Specificity (%) ± SD All 5 Sensors 70 ± 20 73 ± 17 88 ± 12 Lumbar & Shanks 69 ± 20 71 ± 18 90± 8 Lumbar & Thighs 70 ± 17 70 ± 15 87 ± 9 Both Shanks 70 ± 18 71 ± 17 89 ± 7 Both Thighs 70 ± 16 72 ± 13 88 ± 11 Left Shank 67 ± 20 71 ± 17 86 ± 12 Left Thigh 69 ± 18 70 ± 18 89 ± 9 Lumbar 67 ± 20 70 ± 19 89 ± 10 Right Thigh 70 ± 16 72 ± 13 86 ± 12 Right Shank 67 ± 20 71 ± 15 88 ± 8

8.4. Discussion The aims of this study were to: (a) determine if an IMU system is capable of distinguishing between acceptable and aberrant barbell squat technique; (b) determine the capabilities of an IMU system at identifying specific natural deviations from acceptable barbell squat technique; and (c) compare a personalised (N=1) classifier to a global classifier in identifying the above. The results of this paper indicate that an IMU system is not capable of detecting aberrant barbell squat technique using global classifiers as demonstrated by the low specificity scores (Table 8.3). However, good levels of accuracy, sensitivity and specificity are achieved using a personalised classifier (Table 8.5). Similarly, the ability of an IMU system to identify specific deviations in technique is poor using a global classifier (Table 8.4) however these results are improved to moderate levels using a personalised classifier (Table 8.6). To the best of the authors’ knowledge this is first paper to demonstrate the ability of an IMU system to identify natural deviations during performance of the barbell squat exercise. To date there has been a lack of research investigating the ability of IMU systems to classify technique in lower limb compound exercises. Whilst global classification techniques replicating those used in this paper have been shown to successfully classify naturally occurring deviations in the single leg squat (21, 36), they were shown to be ineffective in classifying barbell squat technique. Additionally, we have demonstrated that a personalised classifier out performs a global classifier in assessing barbell squat technique (Figure 8.1, 180

Tables 3-6). This is likely due to a number of factors. As outlined in Table 8.2 the number of acceptable repetitions far outnumbers any other label. This unbalanced data set makes it difficult to create global classifiers that can be used for all individuals (24, 25). As many deviations were seen sporadically, the use of a global classifier to identify specific deviations in the barbell squat may require the collection of a data set consisting of larger amounts of each deviation. The inter-subject variability in movement patterns that are considered acceptable in barbell squat technique may also exceed the intra-subject variability between acceptable technique and aberrant technique. This would make the creation of global classifiers exceptionally difficult. It is likely that this is not the case for the single leg squat and hence global classification methodologies worked better for classifying deviations in this exercise. It is difficult to directly compare results with previous work in the area due to differences in exercises investigated, sensor positions and classifier techniques employed. However, the results presented in this paper using a personalised classifier compare favourably to other research in the area (16-19). The majority of research to date has investigated the ability of IMU systems to monitor technique in simple exercises such as straight leg raises (16), dumbbell curls (18), or heel slides (19). This paper describes an evaluation of an IMU system’s ability to quantify barbell squat technique, a more complex exercise that involves multiple joints. This system has also demonstrated the ability to identify a total of seven different classes (Table 8.2). The lower number of classes in some of the studies (16, 18, 19) may make it easier for classifiers to identify specific deviations and subsequently produce higher accuracy, sensitivity and specificity scores. However, it must be noted that all of these systems used a global classifier in distinguishing between exercise technique and many of the studies classified deviations that were deliberately induced. As shown in Table 8.4 the ability of a global classifier to identify specific deviations in barbell squat technique is poor. Therefore, a personalised classifier may be more suitable when assessing this exercise in a clinical setting where technique deviations are natural. The results presented in Table 8.5 and Table 8.6 show that a single IMU system is comparable to a multiple IMU system in determining barbell squat technique using a personalised classifier. Multiple IMU systems are more expensive than a single IMU system due to the need to purchase additional sensors. Furthermore, they are less practical for end users as there is an increased risk of placement error in addition to power usage and BluetoothTM connectivity issues. For these reasons a reduced IMU set-up is more desirable for daily environment applications (37). Therefore, the single IMU system results presented in this paper increase the likelihood of clinical adoption.

181

A personalised classifier offers a number of benefits compared to a global classifier when assessing barbell squat technique. Most obviously, the higher levels of accuracy would mean an improved user experience in a clinical setting. A personalised classifier also allows for analysis to be performed on data sets that are unbalanced, like the one shown in Table 8.2. Furthermore, personalised classifiers are also more computationally efficient than global classifiers as they are developed using less training data and therefore require less memory. This would improve processing time and increase battery life. The main disadvantage associated with a personalised classifier is that the user must collect and label data sets from individual patients. This means clinicians must monitor exercise technique in real time or use post-hoc video analysis and label this appropriately. This may prove time consuming. Furthermore, this does not lend itself to a “set up and go” approach that involves minimal interaction with the user interface, which is more preferable for endusers (8). However, as clinicians often monitor exercise technique prior to allowing patients complete their exercises it may fit into clinical practice without issue, with clinicians labelling repetitions as they analyse exercise completion. Furthermore, the labelled data set developed using this method could be used to build global classifiers better equipped at identifying natural deviations in the future. This is because all labelled data that is collected by practitioners could be stored and used to build the large data set necessary to improve global classifier scores. A challenging aspect of this work is to ascertain whether the results presented in this paper are sufficient for real-life applications. It is likely that the classification accuracy achieved using a global classifier is too low for use in healthcare environments, while those produced by a personalised classifier may be acceptable. However, it is important to note that what is considered an acceptable level of classification accuracy is likely to be influenced by application domain (injury rehabilitation, strength and conditioning, musculoskeletal injury risk screening, etc.) and end user profile (rehabilitation professionals, sports coaches, strength and conditioning staff, recreational gym users). Our research team is undertaking further projects to determine usability, functionality and user perceptions of wearable technology to assess exercise biomechanics. This information is being gathered from a range of professionals and patients, who incorporate exercises such as the barbell squat in their rehabilitation programme, exercise routine and injury risk screening protocols. It is envisaged that this will provide greater indication as to the levels of accuracy end users would define as acceptable. Furthermore, this work will contribute new information regarding how best to provide actionable feedback to these users that allows for safe and effective exercise completion.

182

8.5. Conclusion Our results show that a system based on data derived from body worn IMUs can classify acceptable and aberrant barbell squat biomechanics with good overall accuracy, sensitivity and specificity using a personalised classifier. These classification scores are maintained even with a single IMU. The ability to identify specific deviations is more difficult but can be achieved with a moderate level of overall accuracy using a personalised classifier. Our results are comparable with other research in the area, despite the barbell squat being a more complex exercise then many of those previously investigated. However, most of this research has been carried out using global classifiers. While this may allow for less user interaction, it produces poor levels of accuracy when attempting to identify specific natural deviations during performance of the exercise. As a result, the use of a personalised classifier may be more appropriate for identifying natural deviations in barbell squat technique.

8.6. References 1.

Cook G, Burton L, Hoogenboom B. Pre-participation screening: the use of fundamental movements as an assessment of function - part 1. North American journal of sports physical therapy. 2006;1(2):62-72.

2.

Hall M, Nielsen JH, Holsgaard-Larsen A, Nielsen DB, Creaby MW, Thorlund JB. Forward lunge knee biomechanics before and after partial meniscectomy. The Knee. 2015;22(6):506-9.

3.

Ahmadi A, Mitchell E, Destelle F, Gowing M, O’Connor NE, Richter C, et al. Automatic activity classification and movement assessment during a sports training session using wearable inertial sensors. In: Proceedings of the 11th International Conference on Wearable and Implantable Body Sensor Networks, (BSN); 2014. Jun 16; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2014. p. 98–103.

4.

Bonnechere B, Jansen B, Salvia P, Bouzahouene H, Omelina L, Moiseev F, Sholukha V, Cornelis J, Rooze M, Jan SV. Validity and reliability of the Kinect within functional assessment activities: comparison with standard stereophotogrammetry. Gait & posture. 2014 Jan 31;39(1):593-8.

5.

Bonnet V, Mazza C, Fraisse P, Cappozzo A. Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Transactions on Biomedical Engineering. 2013;60(7):1920-6.

6.

Whiteside D, Deneweth JM, Pohorence MA, Sandoval B, Russell JR, McLean SG, et al. Grading the functional movement screen: A comparison of manual (real-time) and

183

objective methods. 2016;30(4):924-33.

The

Journal

of

Strength

&

Conditioning

Research.

7.

McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012;15(4):207-13.

8.

Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems; 2014 Apr 26; Toronto, Canada. N.Y., U.S.A.: ACM; 2014. p. 3225-3234.

9.

Leardini A, Lullini G, Giannini S, Berti L, Ortolani M, Caravaggi P. Validation of the angular measurements of a new inertial-measurement-unit based rehabilitation system: comparison with state-of-the-art gait analysis. Journal of neuroengineering and rehabilitation. 2014 Sep 11;11(1):136.

10.

Tang Z, Sekine M, Tamura T, Tanaka N, Yoshida M, Chen W. Measurement and Estimation of 3D Orientation using Magnetic and Inertial Sensors. Advanced Biomedical Engineering. 2015;4:135-43.

11.

Muehlbauer M, Bahle G, Lukowicz P. What can an arm holster worn smart phone do for activity recognition?. In: Proceedings of the 15th Annual International Symposium on Wearable Computers (ISWC); 2011. Jun 12; San Francisco, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2011. p. 79-82.

12.

Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Proceedings of the 9th International Conference on Ubiquitous Computing (UbiComp); 2007. Sep 16; Innsbruck, Austria. Berlin, Germany: Springer; 2007. p. 19-37.

13.

Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks; 2011. Nov 7; Beijing, China. Brussels, Belgium: ICST; 2011. p. 1-7.

14.

Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Personal and ubiquitous computing. 2013;17(4):771-82.

15.

Pernek I, Kurillo G, Stiglic G, Bajcsy R. Recognizing the intensity of strength training exercises with wearable sensors. Journal of Biomedical Informatics. 2015;58:145-55.

16.

Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2012. p. 2214-2218.

17.

Melzi S, Borsani L, Cesana M. The virtual trainer: supervising movements through a wearable wireless sensor network. In: Proceedings of the 2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, (SECON Workshops); 2009. 22 Jun; Rome, Italy. N.Y., U.S.A.: IEEE; 2009. p. 1-3. 184

18.

Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H. Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference; 2013. Mar 7; Stuttgart, Germany. N.Y., U.S.A.: ACM; 2014. p. 116-123.

19.

Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. Journal of Neuroengineering and Rehabilitation. 2014;11(1):158-68.

20.

O’Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T, et al. Evaluating squat performance with a single inertial measurement unit. In: Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. p.1-6.

21.

Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit. In: Proceedings of the 3rd Workshop on ICTs for improving Patients Rehabilitation Research Techniques, (REHAB); 2015. Oct 1; Lisbon, Portugal. N.Y., U.S.A.: ACM; 2015. p. 144–7.

22.

O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Classification of lunge biomechanics with multiple and individual inertial measurement units. Sports biomechanics. 2017 May 19. [Epub Ahead of print].

23.

Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Lunge Exercise with Multiple and Individual Inertial Measurement Units. In: Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare, Pervasive Health; 2016. May 16; Cancun, Mexico. N.Y., U.S.A.: ACM; 2016. p. 101-8.

24.

Chawla NV. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook. N.Y.C., N.Y., U.S.A.: Springer; 2005. p. 853-67.

25.

He H, Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 2009;21(9):1263-84.

26.

Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies: IOS Press; 2007. p. 3-25.

27.

Baechle TR, Earle RW. Resistance Training Exercise Techniques. NSCA's Essentials of Personal Training. Champaign, I.L., U.S.A.: Human Kinetics; 2004.

28.

Jerri AJ. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proceedings of the IEEE. 1977;65(11):1565-96.

29.

Shimmer 9DOF calibration [Available from: http://www.shimmersensing.com/shop/shimmer-9dof-calibration (last accessed: 30 June 2017).

185

30.

Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7.

31.

Katz MJ, George EB. Fractals and the analysis of growth paths. Bulletin of Mathematical Biology. 1985;47(2):273-86.

32.

Single-level discrete 1-D wavelet transform [Available from: http://uk.mathworks.com/help/wavelet/ref/dwt.html (last accessed: 30 June 2017).

33.

Breiman L. Random forests. Machine Learning. 2001;45(1):5-32.

34.

Mitchell E, Ahmadi A, O'Connor NE, Richter C, Farrell E, Kavanagh J, et al. Automatically detecting asymmetric running using time and frequency domain features. In: Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. P.1-6.

35.

Fushiki T. Estimation of prediction error by using K-fold cross-validation. Statistics and Computing. 2011;21(2):137-46.

36.

Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units. Methods of Information in Medicine. 2016;56(2):88-94.

37.

Bonnet V, Mazza C, Fraisse P, Cappozzo A. A least-squares identification algorithm for estimating squat exercise mechanics using a single inertial measurement unit. Journal of biomechanics. 2012;45(8):1472-7.

186

Section C: Development and Evaluation of ‘Formulift’: a Personalised Exercise Classification System for Lower Limb Exercises

Section C: Foreword Based on findings from section B of this thesis, the ‘Formulift’ system was developed. Formulift consists of a single IMU, worn on the user’s left thigh, and a smartphone. The IMU collects motion data from the user and streams it wirelessly to their smartphone. The smartphone then completes data processing of the IMU data and provides feedback and guidance to the user on their exercise technique. Personalised classifiers are employed for each user to optimise system efficiency and accuracy. These classifiers utilise the data analysis pathways created in Chapters 7 and 8. Chapter 9 aims to investigate users’ experiences when completing workouts with ‘Formulift’. Specifically, qualitative and quantitative methods will be employed to establish the usability, functionality, subjective quality and perceived impact of ‘Formulift’ with three different types of system users; beginner gym-goers, experienced gym-goers and qualified S&C coaches. An assessment of the real world accuracy, sensitivity and specificity of Formulift will follow in Chapter 10 of this thesis.

187

Chapter 9 A Mixed-methods Evaluation of ‘Formulift’: a Wearable Sensor Based Exercise Biofeedback System

This chapter is based on the following paper which, at the time of thesis submission, has been accepted with ‘minor revisions’ for publication in JMIR mHealth and uHealth. The suggested revisions have been implemented in to this chapter. O'Reilly MA, Slevin P, Ward TE, Caulfield B. A Mixed-methods Evaluation of ‘Formulift’: a Wearable Sensor Based Exercise Biofeedback System. JMIR mHealth & uHealth. 2018. Awaiting proofs. 188

9.0. Abstract Background: ‘Formulift’ is a newly developed mHealth app which connects to a single IMU worn on the left thigh. The IMU captures users’ movements as they exercise and the app analyses the data to count repetitions in real-time and classify users’ exercise technique. The app also offers feedback and guidance to users on exercising safely and effectively. Objectives: The aim of the study was to assess the ‘Formulift’ system, with 3 different and realistic types of potential users (beginner gym-goers, experienced gym-goers and qualified Strength & Conditioning coaches) under a number of categories: (a) Usability. (b) Functionality. (c) The perceived impact of the system. (d) The subjective quality of the system. It was also desired to discover suggestions for future improvements to the system. Methods: Fifteen healthy volunteers participated (12 males, 3 females, age: 23.8 +/- 1.8 years, height: 1.79 +/- 0.07 m, body mass: 78.4 +/- 9.6 kg). Five were beginner gym-goers, 5 were experienced gym goers and 5 were qualified and practising S&C coaches. IMU data was first collected from each participant in order to create individualised exercise classifiers for them. They then completed a number of non-exercise related tasks with the app. Following this a workout was completed using the system, involving squats, deadlifts, lunges and single leg squats. Participants were then interviewed about their user-experience and completed the System Usability Scale (SUS) and the user-Mobile Application Rating Scale (uMARS). Thematic analysis was completed on all interview transcripts and survey results were analysed. Results: Qualitative and quantitative analysis found the system has ‘good’ to ‘excellent’ usability.

The system achieved a mean ± S.D.

SUS usability score of 79.2 ± 8.8.

Functionality was also deemed to be good with many users reporting positively on the systems rep counting, technique classification and feedback. A number of bugs were found and other suggested changes to the system were also made. The overall subjective quality of the app was good with a median star rating of 4/5 (IQR: 3-5). Participants also reported that the system would aid their technique, provide motivation, reassure them and help them avoid injury. Conclusions: This study demonstrated an overall positive evaluation of ‘Formulift’ in the categories of usability, functionality, perceived impact and subjective quality. Users also suggested a number of changes for future iterations of the system. These findings are the first of their kind and show great promise for wearable sensor based exercise biofeedback systems.

189

9.1. Introduction Resistance training is an exercise modality used in rehabilitation and strength and conditioning (S&C) settings. Adhering to a resistance training exercise programme can increase a person’s muscular strength, hypertrophy and power (1). This can improve their sporting performance, mood and quality of life (2), (3). However many people completing exercise programmes encounter various difficulties when performing their exercises without the supervision of a trained exercise professional such as a S&C coach. One such difficulty is that in such circumstances, people may execute their exercises incorrectly (4), (5). Incorrect alignment during exercise, incorrect speed of movement and poor quality of movement may have an impact on the efficacy of exercise and may therefore result in a poor outcome (4), (5). Exercising with aberrant biomechanics may also heighten one’s risk of injury (6) necessitating technological solutions to provide accurate assessment of exercise form. IMUs have been shown recently to be an accurate method for such exercise assessment. Wearable IMUs are able to acquire data pertaining to the linear and angular motion of limb segments and can also be used to measure a body’s 3-D orientation (7), (8). They are small, inexpensive, easy to set up and facilitate the acquisition of human movement data in unconstrained environments (9).

Recent research has shown that a diverse range of

exercises can be accurately evaluated with multiple and individual IMU set-ups (10)–(13). These range from early stage rehabilitation exercises like heel slides and straight leg raises (12) to more complex, S&C exercises such as bodyweight squats (14) lunges (15) and single leg squats (13), (16), (17). More cost effective and practical systems using a single bodyworn IMU have also been shown to be effective in the analysis of exercise technique (13), (15), (18), (19). Such single IMU based systems are considered preferential where they can provide equivalent exercise analysis quality to multiple IMU set-ups. Recently it has been shown that the detection of acute, naturally occurring, deviations in compound lower limb S&C exercises such as the barbell squat and the deadlift, personalised classification systems are superior in accuracy to global ones (20). It has also been shown that such personalised systems enable a single IMU to accurately classify repetitions of such exercises as ‘acceptable’ or ‘aberrant’ (20). While all the aforementioned work demonstrates the technological proficiency of IMU based exercise biofeedback systems in classifying exercise technique, little is currently known about the user-experience and users’ perceptions of such biofeedback systems. There is currently a surge of usability and system evaluation studies being published in the mHealth and uHealth field (21)–(24) however, there is a sparsity of such studies pertaining to IMU based exercise biofeedback systems. Some past work has assessed the usability of IMU 190

based exercise biofeedback systems (25), (26) but, to the author’s knowledge there has not yet been any evaluation studies of biofeedback systems which classify exercise quality based on data from a wearable sensor and relay feedback to users via a mobile app. The study aimed to evaluate a recently developed IMU based exercise biofeedback system called ‘Formulift’.’ Formulift’ consists of a mobile application and a single Shimmer IMU (Shimmer, Dublin, Ireland). The IMU is worn on a user’s left thigh and tracks their motion as they complete the following four exercises; squats, single leg squats, lunges and deadlifts. The mobile app processes the signals from the IMU, counts repetitions

and utilises

personalised classification methods to determine if each repetition completed of an exercise is ‘acceptable’ or ‘aberrant’. In this specific study these categories are derived from human expert labelled training data. The user receives feedback on their exercise technique following the completion of each set of an exercise. The app also contains instructional information on how to do the exercises with acceptable technique and the option to review a workout session. A number of screenshots from the app are shown in Figure 9.1. A video demonstrating

the

app

can

be

found

at

the

[https://www.youtube.com/watch?v=yMqPUVgclPA&feature=youtu.be]

following

links: and

[http://tinyurl.com/yc7eosjn].

191

Figure 9.1: The 'Formulift' app. User preferences and connecting to the IMU (top left), realtime exercise biofeedback (top right), information and instructions (bottom left) and workout review (bottom right). 192

9.2. Objectives The aim was to assess the system under a number of categories: (a) Usability - “The extent to which the system can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” (b) Functionality “The ability of an interface or device to perform according to a specifically defined set of parameters.” (27) whereby, the key functions of ‘Formulift’ are to accurately detect and count repetitions of the exercises under study, determine if each repetition was completed with ‘acceptable’ or ‘aberrant’ technique and provide the user with interpretable feedback on their completed exercise. (c) The perceived impact of the system. (d) The subjective quality of the system. It was also desired to uncover suggested future improvements to the system. Three different realistic types of system users were employed to complete this evaluation: Beginner gym-goers (< 6 months experience), experienced gym-goers (> 2 years’ experience) and qualified S&C coaches. Employing these 3 types of end users was hypothesised to enable a more comprehensive user-centred design approach to creating future iterations of the ‘Formulift’ system.

9.3. Methods 9.3.0. Participants Fifteen healthy volunteers participated (12 males, 3 females, age: 23.8 +/- 1.8 years, height: 1.79 +/- 0.07 m, body mass: 78.4 +/- 9.6 kg). Group 1 were 5 beginner gym-goers with fewer than 6 months experience with resistance training and the exercises used in this study. Group 2 were 5 experienced gym-goers with a minimum of two years’ experience with resistance training and the exercises used in this study. The final group of system evaluators were practising S&C coaches with qualifications from the National Strength and Conditioning Association (NSCA) or the United Kingdom Strength and Conditioning Association (UKSCA). Sample size numbers were chosen based on a combination of standard practice for completing qualitative usability studies (28) and in keeping with recent publications which also utilised the quantitative surveys used in this work (29), (30).

No participant had a

current or recent musculoskeletal injury that would impair their exercise performance. Each participant signed a consent form prior to completing the study. The University College Dublin Human Research Ethics Committee approved the study protocol.

193

9.3.1. Data Collection 9.3.1.0. System Use The testing protocol was explained to participants upon their arrival to the research laboratory. All participants completed a 5 minute warm-up on an exercise bike; during which they were required to maintain a power output of 100 W and cadence of 75-85 revolutions per minute. Following the warm-up, an investigator positioned a single IMU (SHIMMER, Shimmer research, Dublin, Ireland) on the participant at the mid-point of the left femur (determined as half way between the greater trochanter and lateral femoral condyle). Video and IMU data were then simultaneously collected as the users completed the four following exercises: Bodyweight left leg single leg squats, bodyweight lunges, bodyweight or barbell squats and barbell deadlifts. 40 repetitions of each exercise were collected. 20 repetitions were completed with ‘acceptable’ form and 20 repetitions were completed with ‘aberrant’ form. The ‘aberrant’ repetitions from the 5 beginners were naturally occurring, whereas the 10 experienced participants deliberately induced their ‘aberrant’ form. Following this data collection the IMU was removed from the participant’s left thigh. The exercise professional then used the segmented videos to label each repetition of the 4 exercises as being ‘acceptable’ or ‘aberrant’ technique. 4 binary random forests classifiers were then created for each participant, each pertaining to one of the 4 aforementioned exercises. These random forests objects were imported into the biofeedback application to make a personalised exercise classification system for each participant (Figure 9.1). While their personalised system was created, the participants completed a set of non-exercise based tasks within the app. Appendix A.1 is the sheet given to participants listing the tasks which were to be completed ‘before the exercise analysis session’.

They involved app

navigation, interpretation of information within the app and following instructions on system use and how to do the exercises. Following the creation of their personalised biofeedback system the participant first secured the IMU to their left thigh and then completed the list of tasks outlined in the ‘during exercise analysis section’ of Appendix A.1. They first connected the wireless Shimmer IMU to the mobile application. They then completed 2 sets of 10 repetitions for each of the 4 exercises. In the first set of each exercise they were instructed to exercise with their best possible technique and in the second they were asked to try and replicate the mistake they had made prior to the exercise professional’s coaching. Throughout the session they were able to navigate to any point within the app including the ‘review tab’ and to view any instructional content. The whole exercise session was observed by the investigator, who took note of any system crashes and the associated conditions, as the participant used the system and completed their exercises. The session was also simultaneously videoed for review following 194

data collection. Upon completing the required exercises, participants were provided with the opportunity to test any other tasks within the app they desired. Participants then moved on to evaluating the system where they were administered surveys and partook in an interview. 9.3.1.1. Interviews Immediately after completion of their exercise session with the system, a semi-structured interview was conducted with each participant. A Dell Inspiron 5100 laptop (DELL, Texas, U.S.A.) was used to video record the interview. The webcam also captured the screen of the Android smartphone, allowing users to demonstrate any specific aspects of the app they wished to discuss. Each interview followed a topic guide to ensure consistent questioning across every interview (31). This guide can be seen in Appendix A.2. Open-ended questions were used to garner participant’s views and experiences of the system in relation to usability, functionality and perceived impact. Furthermore, participant’s reflections regarding their general evaluation of the system and suggested future changes were also captured. 9.3.1.2. Surveys In addition to the interviews the system was also assessed quantitatively utilising 2 surveys. By mixing both quantitative and qualitative research and data, gains in breadth and depth of understanding and corroboration can be achieved, while offsetting the weaknesses inherent to using each approach separately. The system usability scale (SUS) (Appendix A.3) was first used in studies described in (28) and (31). This is a short, 10 point questionnaire which has been widely adopted in many domains as a fast and reliable measure of a system’s usability. The scale produces a usability score out of 100 (not a percentage) for every user who completes it. These scores can then be compared to the large body of published data on systems assessed with the SUS to find adjective and percentile rankings of a system’s usability (29), (33). The ‘user-Mobile Application Rating Scale’ (uMARS) was also completed by all participants (22) (Appendix A.4). This is an adapted version of the popular ‘Mobile Application Rating Scale’ (MARS), and is more appropriate for end users of mobile apps (30). It assesses the app under the areas of engagement, functionality, aesthetics and information to produce an overall app quality score out of 5. The app subjective quality and perceived impact are also assessed separately. The perceived impact section of the survey was tailored to this study to investigate the app’s perceived impact on a person ‘exercising with their best technique’.

195

9.3.2. Data Analysis 9.3.2.0. Qualitative Interview recordings for all participants were transcribed verbatim and anonymized. A grounded-theory approach was then taken to the thematic analysis of the interview transcripts (34), (35). The interview topic guide was used to create an initial coding frame which was then refined as more data was analysed. Analysis involved scrutinizing the data to identify patterns, assigning codes to the data and building themes and sub-themes from the codes (36). To maximise rigor, themes and sub-themes were crosschecked for consistency across the 3 participant types (beginner gym-goers, experienced gym-goers and S&C coaches) which involved searching for discrepancies and validating the recurrences of themes and sub-themes. The thematic analysis was crosschecked and overseen by an expert in qualitative analysis, with over a decade’s experience in the field. Data saturation was determined when no new data and no new themes and relationships amongst the interview data were emerging (37). 9.3.2.1. Quantitative The SUS score was computed for each participant following standard scoring methodology (32). The mean and standard deviation for the SUS scores was calculated for all participants and for each subgroup (beginner gym-goers, experienced gym-goers and S&C coaches). The uMARS was also scored following standard procedures (22). For each participant a score out of 5 was calculated for engagement, functionality, aesthetics and information. The mean of these 4 scores produced an ‘overall app quality score’ for each app user. App subjective quality was quantitatively assessed taking each user’s star rating of the app. A perceived impact score, out of 5, was also found for each participant. The means and standard deviations of all the above scores were found for all participants and the 3 aforementioned subgroups.

9.4. Results The ‘Formulift’ system was assessed across four distinct domains: usability, functionality, perceived impact, and overall quality. In the upcoming sub-sections, results will be presented from both the quantitative surveys and qualitative interviews. Finally, suggested future changes will also be described. 9.4.0. User Mobile Application Rating Scale The uMARS provided quantitative results on a number of key aspects of the app.

A

summary of results from the uMARS are summarised in Table 9.1. This table is referred to throughout the results section. 196

Table 9.1: Results from uMARS survey for beginner gym-goers, experienced gym-goers and S&C coaches. Overall quality is computed as described in (22). Beginners (n=5)

Experienced (n=5)

̅ ±S.D.) (𝒙

̅ ±S.D.) (𝒙

Engagement

3.78 ± 0.48

Functionality

S&Cs (n=5)

All (n=15)

̅ ±S.D.) (𝒙

̅ ±S.D.) (𝒙

3.5 ± 0.42

3.67 ± 0.33

3.66 ± 0.42

4.27 ± 0.22

4.24 ± 0.45

4.08 ± 0.39

4.2 ± 0.37

Aesthetics

3.78 ± 0.50

3.87 ± 0.81

4.2 ± 0.34

3.9 ± 0.62

Information

4.29 ± 0.58

4.2 ± 0.29

3.9 ± 0.56

4.14 ± 0.53

Overall Quality

4.03 ± 0.25

3.95 ± 0.29

3.96 ± 3.96

3.98 ± 0.21

Perceived Impact

4.57 ± 0.39

4.12 ± 0.45

3.28 ± 0.52

4.03 ± 0.70

Star Rating

3.83 ± 0.68

4.0 ± 0.63

3.6 ± 0.49

3.8 ± 0.63

9.4.1. Usability and Functionality The system achieved a mean SUS usability score of 79.2 ± 8.8. Beginners deemed the system most usable with a score of 86.25 ± 1.9, whereas the experienced gym-goers and S&C coaches scored the system at 75.5 ± 9.1 and 74.5 ± 8.0 respectively. These usability scores put the system at an 85%-95% percentile based on all published research using the SUS (29) and would deem the system’s usability good to excellent on an adjective rating scale (33). The functionality section of the uMARS also demonstrated an overall positive usability and functionality experience for users with a mean score of 4.2 ± 0.37 (Table 9.1). Whilst, these surveys demonstrated that ‘Formulift’ was deemed to have good usability, they provided limited insight in to the reasons for this and to what can be improved. This was found in analysis of the interview transcripts as described below. Three key areas emerged from the interview data in relation to system usability: overall ease of use, the app’s interface and the IMU. In terms of overall ease of use, 14 out of 15 participants remarked on the system being ‘easy to use’, ‘straightforward’ and/or ‘intuitive’. Example statements included: “I thought it was so easy to use… I just like how accessible it is as well” – Beginner gym-goer 197

“Very, very easy to use. Really straightforward. You know, easy to get around and realise what you’re doing.“- Experienced gym-goer “It’s very easy to navigate through. It’s pretty easy to be honest…. It’s monkey see, monkey do really.” – S&C Coach Many participants commented on the intuitive nature of the app. For instance, the layout of the app was acknowledged to be very easy to follow with a minimal number of menus, large icons and large buttons being cited as the reasons for this. Example statements included: “I mean, it’s quite user friendly. The interface; there’s not too much going on, on the screen. It’s very clear where the info tab is, where the exercise tab is etc.”-S&C Coach “Large buttons made that easy. It might come in to play more if you’ve got sweaty hands, but yeah in terms of navigation it was good. The size of the text and the buttons etc. is good. Overall, very good.“ -S&C Coach “Yeah, the UI is really simple. Some other fitness apps are horrific, I hate using them, because they look horrible. “-Beginner gym-goer “It was really easy to find things and navigate through.”-Experienced gym goer No users reported any difficulties interpreting the language used within the app. The colour used within the app was also referred to in a positive manner. It was considered to ‘make things stand out great’ during a session, make the app ‘attractive’ and the colour of the rep number (red, orange or green) following a set was said to be very useful, “I think the three colour, ‘green, orange and red’ feedback was a really useful function as it let you know if you’re doing something well, something a little bit off or doing something badly.”Beginner gym goer Participants also offered positive feedback regarding the “How to wear the sensor” section. They found the instructions were very clear and easy to follow. “One thing that I thought was done well was just showing you how to place the sensor as well. That could be a big obstacle, if it wasn’t shown properly. It would hinder people’s ability to use it. It was done well.”-S&C Coach “You go in to sensor placement/orientation you can’t go wrong there. If you do, you have an issue.”-S&C Coach In addition to this, a participant spoke positively about wearing the IMU, “I’m not conscience about wearing it, nobody can see it, it doesn’t feel weighty or anything like that. I almost forget it’s on my leg while I’m talking to you.” –Beginner gym-goer

198

Participants reported a number of usability issues. The most reported usability issue related to app navigation, in particular to going back a step within the app. Four participants, who are usually IPhone users, struggled initially to know how to navigate backwards in the app, “Maybe as I’m coming from IOS to Android but there was no clear back button so you have to switch in and out or use the phone’s button. On an iPhone, there’d always be something on the screen. That was one thing.”-Beginner gym goer “Just because I’m not used to using android, I didn’t know how to go back a step but other than that, no the app itself is very easy to navigate. “-Experienced gym-goer Some participants also highlighted the need for more status indicators as a usability issue. Particularly, they highlighted that within the ‘How to use the App’ instructions, there was no on-screen indicator when one reached the last instruction which meant they did not know that the final instruction had been reached. More importantly, the need for a loading indicator was highlighted when a user pressed the ‘Analyse my Set’ button. As one beginner gym-goer commented, “…when you don’t do that, I will impatiently tap the same button until something happens which in this case caused crashes.”-Beginner gym-goer In fact, this crash, caused by multiple taps of the ‘Analyse my Set’ button was the one of the most reported functionality issues. While many users reported no bugs in the app, five users expressed experiencing a crash of the same manner. Two other critical bugs were found within the app, which caused system crashes. The first was recorded by four users who reported a crash when quickly clicking through the ‘How to use the App’ instructions. “There is a way of crashing it (the app). If you use the ‘how to use app menu’ and go quickly through the menu, it’s pretty easy to crash. It seems like the second time it happens. You can scan through it the first time but not the second.”-Beginner gym-goer “In the app instructions, I was tapping through quickly and it just crashed. I wasn’t mashing the button but I was pressing it reasonably quickly.“-Experienced gym-goer The second was experienced by two users who also found that the app crashed when they quickly navigated between the 4 main tabs of the app (connect, exercise, information and review). “When I was exiting the app, when I had been looking at the exercises, it was just coming out and hitting all the buttons (demoes bashing all the menu buttons quickly) and the app crashed.“-S&C Coach

199

These were both programming bugs which will be amended in future versions of the app. The most recurrent, non-fatal functionality issue mentioned by users regarded the real-time rep counting during sets. Eight participants described thinking there was a lag at the start of the set, and after 2/3 reps it was as if the app caught up and started counting them properly. “When I did the first rep of each set, I wasn’t sure if it was recording it, until I did the second rep. It would then say ‘2’. Sometimes it would take a couple seconds just to vibrate and register that I’d completed the repetition. “-Beginner gym-goer “The rep counting was also a little bit slow at the start.”-Experienced gym-goer However, all participants felt the total rep count was always correct. Participants also felt the binary classification of exercise repetitions (as acceptable or aberrant technique) was accurate. The beginners and experienced athletes found the system’s feedback useful. Gymgoers remarked, “It was really interesting how it could pick up on the bad ones and I know there were definitely some bad ones in there!”-Beginner gym-goer “I usually am very aware of my form for sets but there was a set of single leg squats where I didn’t do the exercise well enough, and the app told me that I hadn’t, and I wasn’t aware of that but then when I thought about it the app was definitely right.”-Experienced gym-goer The S&C coaches, whose experience and knowledge allowed them to gain more insight to the accuracy of the system, were predominantly content with the system’s accuracy. Two S&C coaches did however feel the system misclassified a small number of specific reps, “It worked for everything except my single leg squats I’d say and maybe a little on the lunges.”-S&C Coach “Maybe one thing that it wasn’t able to discriminate on that well was the last set I did of shallow bodyweight squats. Maybe the accuracy fell off if I was doing something between a ¼ squat and a proper full squat. That was the only one that was a tiny bit inaccurate. “-S&C Coach Overall, the SUS results, the uMARS and the thematic analysis have demonstrated the system was usable and functional. The thematic analysis of the interview transcriptions has also uncovered a number of specific functionality issues and aspects of usability that can be amended or improved in future iterations of the app.

200

9.4.2. Perceived Impact The quantitative analysis of the perceived impact of the system, through the uMARS, demonstrated the system was very beneficial to gym-goers in heightening their awareness of, advancing their knowledge, increasing their motivation and their likelihood to seek help with ‘exercising with best possible technique’ (Table 9.1). Thematic analysis of the interview transcriptions verified these quantitative findings and also uncovered a number of other perceived benefits and disadvantages to use of the system. All users reported that using the system would aid their technique whilst exercising. Beginners often mentioned that the system would enable them to learn proper technique, while experienced gym-goers stated that the system would be useful particularly when they lose focus or increase the weight they are lifting and S&C coaches thought the system would help people correct their technique and avoid injury while exercising. Statements included, “It’s also nice to have the feedback on how I’m actually doing things. Personally, when I go to the gym, I may even do a whole workout and not know if things have gone correctly. It’s pretty annoying to go home and be thinking, ‘Did I do my squats right today?’, ‘I’m not actually sure’.”-Beginner gym-goer “For people who are just starting out with workout programmes and need technique and form, it’s helpful. It’s helpful also for advanced weightlifting individuals who are looking to prevent injury and that kind of thing I would say.”-Experienced gym-goer “A lot of the glaring issues people have when staring weight training are addressed. If people even just think about 1 or 2 of the issues that the app lays out then their technique can improve immensely in a very short amount of time just from these little bits of information.”S&C coach “Well advantages would be, obviously you’re avoiding injury as you go to the gym. This gives you a new source that can tell you if you’re doing it right or wrong or not.” –S&C coach Eight users also suggested that use of the system would have a positive effect on their focus and motivation to exercise with acceptable technique. S&C coaches also suggested the system would be particularly useful in a team setting where athletes sometimes don’t process guidance properly or lose focus. All 3 test groups made statements regarding the system heightening focus, concentration and/or motivation, such as, “It would be a motivational thing as well as obviously the benefit of getting help to correct yourself when you exercise poorly if needed.” –Beginner gym-goer “Particularly, with me, when I’m sometimes doing weights I lose focus, so it would help me keep track.”-Experienced gym-goer

201

“You have definitely got some players where the information goes in one ear and straight out the other. So it would be good for us in the sense that we could connect this up, we analyse what we want to know and they find out straight away if they’ve done a good or bad rep.”S&C coach Three out of six of the beginner gym-goers also spoke about the system as a tool to build their confidence in how they are exercising. They spoke of the app as a method to boost their likelihood of seeking help from a friend or trainer, a way to get over the initial anxiety of going to a gym and to reassure them that they are exercising properly. One beginner spoke extensively of this, including saying, “Also, having something on my leg is really reassuring because I’ve always found that with fitness apps on my phone that direct me to exercise, I almost feel like all the information there can be interpreted wrong and when I go to do the exercises I might be misinterpreting them. But whatever it is, just having this on my leg just makes me feel a little bit more confident in doing them and interpreting the information that is provided by the app.”Beginner gym-goer In addition to giving people confidence in their training, 3 S&C coaches felt one of the key benefits to the system is that it would boost people’s likelihood to simply start and commit to an exercise programme. One of the S&Cs in fact saw this as the biggest benefit to the system, and another spoke very positively of this aspect of system use, “The biggest benefit is it gets people in to the gym.” –S&C coach “I think downloading the app could give a lot of people confidence to walk in to the gym in the first place, that’s really, really good.” –S&C coach The aforementioned subthemes of the perceived impact of system use: (a) Improving technique, (b) increasing motivation and focus and (c) promoting participation in exercise, are all well accepted benefits of one having an S&C coach or personal trainer. Interestingly a number of participants described the system either as a ‘virtual trainer’ or a middle ground to having no personal trainer, “The app almost acts as a person telling you you’re doing it wrong. That’s how I felt.”Beginner gym-goer “I wouldn’t get a personal trainer but this system could be a good middle ground.”-Beginner gym-goer “If you don’t want to hire a coach, as coaches are a lot of money then it will give you a pretty good overview of the kind of stuff you have to do.” –S&C coach

202

There were no subthemes which emerged perceiving negative aspects to system use. However one experienced gym-goer did suggest that use of the system could distract from focusing on their exercise technique. They stated, “It wasn’t necessarily confusing but I did think I might be paying more attention to the app than my own form.”-Experienced gym-goer 9.4.3. Subjective Quality The subjective quality portion of the uMARS showed that, when available, 9 participants thought they would use this system 10-50 times over the next year and five thought they would use it greater than 50 times in the next year. These 5 participants were all beginner gym-goers. All participants would recommend the system to people who might benefit from it. The median star rating from all 15 participants was 4/5 (IQR: 3-4). The interview data reflected these quantitative ratings. All participants said they ‘liked’ the system or thought it was ‘good’. More detailed statements included, “Overall, I was very impressed with the app. I have to say, very impressed.”-Beginner gymgoer “I’ve been going to the gym for whatever amount of years and I’d still use something like this if it can tell me which reps are good and which reps are bad.”-Experienced gym-goer “In terms of something to use during a session, I think it would be great.”-S&C coach In terms of aspects of quality to improve on, two S&C coaches felt the feedback was perhaps a little basic and could be more detailed, “Not that the technology it involves is, but in terms of how much information you could actually access it was quite basic.”-S&C coach One experienced gym-goer also expressed that without more feedback they might stop using the system once they had perfected their technique, “I think the limit to the app is once you have the motion down, you’re less likely to keep using it.”-Experienced gym-goer Overall, all users subjective rating of system quality was positive. However all participants had suggested improvements for future iterations of the app, which emerged as a theme during qualitative analysis, and will be discussed in the upcoming subsection. 9.4.4. Future Changes The most popular suggestion for future changes to the system was to add more exercises which can be tracked. Beginners stressed the need for this, saying things such as

203

“I would like it if there were more exercises within the app as standard gym session would generally involve more exercises”-Beginner gym-goer “I think just add more exercises. Keep developing it as it’s just a great idea”-Beginner gymgoer “Maybe add some other type of movement that people do, I don’t know how well it transfers to upper body movements but certainly bench press is something that people always tend to need help with when they first go in to a gym.” –S&C Coach “I guess just add more exercises. So then it would cover more things, because I guess there is a wide range of exercises that people do when they go to the gym and they can all be done with poor form if you don’t know what you are doing.” –S&C Coach Experienced gym-goers and S&C coaches regularly suggested the need for more exercises to be tracked by the system. However, due to their experience and knowledge, they also suggested the types of exercises which would require technique classification and suggested that for many exercises the system would only need to automatically count reps and sets. There was a general consensus among the experienced gym-goers and S&C coaches that upper body compound exercises (e.g. bench press, overhead press and barbell row) were the additional exercises which should be incorporated to the system including technique classification. There was also a shared opinion that users should be able to add any exercise they complete to the system to be logged automatically. However it was suggested that noncompound, secondary exercises may not require technique classification as they are associated with a lower injury risk. It was also said that Olympic weightlifting moves should not be added to the battery of assessed movements as they would be too dangerous to learn via an app. Two statements which summarise the cohort’s general opinion were, “Because, it’s in an app; I would say prioritise… you could have compound exercises like a bench press or an OHP (over-head press) but like you can’t teach like a jerk or a clean so just compound or isolation movements as there is less that can go wrong with those kind of things.” –S&C coach “In terms of other exercises; again I suppose I like the idea that it would manly be your key lifts. In terms of adding loads of other exercises, I don’t know if it would be necessary. The ones we would mainly cover in terms of injury risk are your squat, your back squat, your deadlift etc. So yeah, in terms of that I’d keep it to key lifts.” –S&C coach Feedback was another prominent subtheme which emerged in the area of future changes. Two key things were suggested recurrently: (a) Providing longitudinal feedback/progress

204

over time and (b) more detailed feedback on the completed exercise repetitions. With regards to providing longitudinal feedback, one beginner suggested, “I’d be really interested to use it regularly and see if I look back over weeks am I seeing progress? … (Comparison to SleepTracker) So I’d like to see something similar in this where you could link your exercise quality to your habits and progress.”-Beginner gym-goer This type of longitudinal feedback reflected what most users of the system would also desire. With regards to receiving more specific feedback on their completed exercise a diverse range of suggestions were made, “Then after the end of your sets, if it tells you like an estimate of your maximum and could count your rest times.”-Beginner gym-goer “It would be really interesting to actually see the angles.”-Experienced gym-goer “A drop down with exactly what reps are good and bad would be useful.”-Experienced gymgoer “We’d be quite keen on muscle fibre recruitment during an exercise. I’m not sure if the sensors can pick up on it.”-S&C coach “Tempo - that would be a big one for us.”–S&C coach The most frequently reported request however was to receive feedback on the exact mistake one was making when exercising, as opposed to whether a repetition was simply ‘good’ or ‘bad’, “Maybe, for example, ’your back was too arched’ or ‘not arched enough’ , or the angle of your legs, how far down you should be going etc.”-Experienced gym goer When participants mentioned this, they were informed of other work from the authors which uses multiple IMUs to classify the exact deviation one makes while completing the exercises (14), (15), (38). They were then asked if they would rather a multi-IMU system which may be more expensive than the evaluated single IMU system if it could identify specific exercise mistakes. Opinions were mixed on this, with 2 participants stating it ‘would depend on cost’. However 1 beginner, 1 experienced gym-goer and 1 S&C coach did suggest they would actually prefer a multi-IMU system which had such capability, “I do actually think more sensors would be cool but I think, I think that because I’m a bit of a nerd with stuff, so I’m like more sensors, that’s cool; more accurate data etc. I think for the people you may actually be selling this app to, one sensor is actually nearly too much”Beginner gym-goer

205

“I think I’d like more sensors and feedback. Wearing sensors doesn’t put me off”Experienced gym-goer “I suppose, because I’m dealing with high level athletes, I would prefer to have more sensors to get more information.”-S&C coach No other subthemes regarding future changes were found, however one S&C coach did suggest a team version of the system, where multiple users could connect to a coach’s tablet app. This would allow them to focus their time and attention to the team-members who require it the most. They also suggested the straps and IMU should be improved to be more appealing in such a setting.

9.5. Discussion 9.5.0. Principal Results To the authors’ knowledge, this study is the first to apply a mixed-methods approach to evaluating a wearable sensor based exercise biofeedback system. In particular, this study is a first look at users’ perceptions of such systems and their potential benefits. Therefore, in addition to providing information on the usability, functionality and perceived impact of ‘Formulift’, the presented results also offer a number of end-user insights that can be leveraged to inform the development of future exercise biofeedback systems. 9.5.0.0. Usability and Functionality The results demonstrated a good to excellent overall level of system usability. Participants highlighted that the ‘Formulift’ system was easy to set up and intuitive in nature particularly in relation to the ease at which they could complete tasks. This shows great promise for the uptake of wearable sensor based exercise biofeedback apps within beginner and experienced gym-goer populations. However, analysis of the data also uncovered a number of specific usability issues which will be amended in future iterations of the app. For instance, the app should incorporate more status indicators e.g. the appearance of a loading screen while exercise data is being analysed through the ‘Analyse my set’ function. This addition would signal to the user that an action is taking place thus reducing the user uncertainty related to the completion of the task. A clearer method for navigating backwards in the app should also be added. Such changes should minimise confusion for system users and enable a more enjoyable and efficient user-experience. This study has also demonstrated that the ‘Formulift’ system is functional. The key desired functions of ‘Formulift’ are to accurately detect and count repetitions of the exercises under study, determine if each repetition was completed with ‘acceptable’ or ‘aberrant’ technique 206

and provide the user with interpretable feedback on their completed exercise. The combination of both qualitative and quantitative results shows that the system was indeed functional in these 3 areas. This was not withstanding a number of functionality bugs which were found during the study. The most significant bug was the real-time rep counting algorithm lagging at the start of some user’s sets. This must be rectified for future iterations of the system. It is essential that the real-time rep counting functions correctly because if it does not, it may distract the user from completing their exercise properly. 9.5.0.1. Perceived Impact One of the most important findings of this study is that the range of different system users (beginner gym-goers, experienced gym-goers and S&C coaches) reported several benefits to using the ‘Formulift’ system. Most importantly, all users felt that the system would improve their technique as they exercise. This is a central finding as prior work has simply shown the ability of IMU based exercise classification systems to detect ‘acceptable’ and ‘aberrant’ technique but has not determined if users would find feedback of this kind beneficial (12), (15), (17), (38)–(40). Interestingly, users also highlighted the system’s positive psychological benefits with regards to improved levels of focus, motivation and confidence whilst exercising. These perceived benefits are in line with desired aims for such systems as outlined in prior research (12), (15), (17), (38)–(40). While these benefits are well reported aims for many biofeedback systems (41), the literature currently lacked end-user validation. Further study is required to objectively validate these perceived benefits. 9.5.0.2. Subjective Quality En masse, ‘Formulift’ was well received by system users. The uMARS results showed the app had a median star rating of 4 out of 5. This shows that users thought the system was good but could also be improved. This feeling was backed up during participants’ interviews. While suggested improvements to the system will later be discussed, it is an important finding of this study that system users did like ‘Formulift’. Wearable sensor based exercise biofeedback systems are a very new technology and little is yet known about how users feel about using them. Therefore, it is encouraging to developers of such systems that the participants of this study gave predominantly positive feedback on the system. 9.5.1. Limitations There are a number of contextual factors which should be considered when reviewing this study. Firstly, all results presented in this study are based on the participants’ first use of the system. While this is likely appropriate for highlighting any usability and functionality problems with the system, it is possible that one’s rating of the system’s quality and impact could vary over time. It should also be stated that results regarding the ‘perceived impact’ of the system, as determined by the uMARS and thematic analysis of interview transcripts, are 207

solely users’ opinions on the benefits of the system. More work is required to determine if the system objectively improves, for example, people’s motivation, exercise adherence, and exercise technique. In order to achieve this, a longitudinal study will be required. It should also be noted that this study took place in an artificial gym environment within a biomechanics laboratory. It would be useful to complete the proposed longitudinal system evaluation with participants exercising in their ‘normal’ gyms. This may help uncover additional usability or functionality issues which should be amended and other future changes which could improve the system. A further limitation of this study is the sample size used and the homogeneity of the sample. While a sample of 15 participants complies with usability testing standards, it may not guarantee that the sample is representative of the wider population when considering quantitative results. This may be the case in particular for specific populations not captured in our sample e.g. female, elderly, overweight or underweight. However, there was high consistency when triangulating quantitative and qualitative results which suggests they are both of merit. 9.5.2. Future Work The systematic evaluation of the ‘Formulift’ system uncovered a variety of suggested changes for future system iterations. Future work will endeavour to incorporate these changes in to the system or allow them to be customisable user preferences within the system e.g. turning on/off receiving feedback on tempo of movement. Additionally, future work will aim to establish users’ perceptions of the system following multiple uses over a period of time. Such work would also grant the opportunity to (a) objectively measure the impact of the system for users in a more rigorous manner and (b) investigate how these findings compare to the system’s perceived impact found in this study. 9.5.3. Comparison with Prior Work As stated in the introduction to this chapter, a vast proportion of the published work pertaining to exercise analysis with IMUs regards the efficacy of the various sensor set ups and data analysis techniques to assess different exercises (10-19). However, while in the early development phase of such systems, it is also of key importance to understand their usability, functionality, subjective quality and perceived impact from the end-user perspective. Involving the user early and often in the design process can help identify previously unforeseen user-experience issues which can then be rectified to help increase levels of user engagement which is a central determinant to overall user adoption prospects (42). To the author’s knowledge there are no published evaluation studies of wearable sensor based exercise biofeedback systems. Literature is available in associated fields, with a vast variety of mHealth and exergaming systems being evaluated over the past decade (24), (43). 208

It may be inappropriate to compare the results evaluating ‘Formulift’ to apps and systems in other sub-disciplines in mHealth due to the different demographics of users and purposes of such systems. However, it is important to note that the methodological approach undertaken in this study is in line with current state of the art recommendations in usability studies (24). Involving 3 types of real system users (beginners, experienced and S&C) and employing a mixed methods approach to evaluate the system has maximised our understanding on users’ perceptions of ‘Formulift’ and will inform the design of future iterations. Whilst recent work such as that by Kotsantinidis et al who outlined the development and evaluation of ‘FitForAll’: an Exergaming Platform Improving Physical Fitness and Life Quality of Senior Citizens (44) has shown great benefits to exercise biofeedback systems and demonstrated good SUS scores, a lack of qualitative assessment of the system limits the conclusions which can be drawn on the system’s usability and user-experience. It is the authors’ contention that we can maximise the learnings on mHealth system evaluations through combining appropriate surveys and interviews. This approach can more effectively inform the iterative design process to make the systems as beneficial as possible for end-users.

9.6. Conclusions In this study we sought to evaluate the ‘Formulift’ system, a new exercise biofeedback app which classifies technique and tracks reps completed of exercises. A mixed-methods approach was undertaken to quantitatively and qualitatively assess the system under a variety of distinct categories: usability, functionality, subjective quality and perceived impact. The assessment of the system was completed by 3 types of real system users: beginner gym-goers, experienced gym-goers and qualified and practising S&C coaches. The usability of the system was determined to be very good following both quantitative and qualitative analysis. The system also functioned as desired with users reporting that the system accurately detected their reps in real-time, classified their exercise quality and gave them appropriate feedback. Users expressed that they liked the system and that it could aid their focus and technique while exercising. Additionally, it was found that the system could increase their motivation and confidence in completing exercise. These findings are the first of their kind and show great promise for wearable sensor based exercise biofeedback systems. However, this study also found a great deal of potential improvements to the ‘Formulift’ system (Future Work Subsection). By implementing these changes, it is hoped that systems such as ‘Formulift’ may become an affordable, user-friendly and useful tool that will aid gym-goers to enhance their training and support S&C coaches in the monitoring of their athletes.

209

9.7. References 1.

Ahtiainen JP, Pakarinen A, Alen M, Kraemer WJ, Häkkinen K. Muscle hypertrophy, hormonal adaptations and strength development during strength training in strengthtrained and untrained men. European journal of applied physiology. 2003 Aug 1;89(6):555-63.

2.

Kraemer WJ, Mazzetti SA, Nindl BC, Gotshalk LA, Volek JS, Bush JA, Marx JO, Dohi K, GÓmez AL, Miles M, Fleck SJ. Effect of resistance training on women’s strength/power and occupational performances. Medicine & Science in Sports & Exercise. 2001 Jun 1;33(6):1011-25.

3.

Frontera WR, Meredith CN, O'Reilly KP, Knuttgen HG, Evans WJ. Strength conditioning in older men: skeletal muscle hypertrophy and improved function. Journal of applied physiology. 1988 Mar 1;64(3):1038-44.4.

4.

Bassett SF. The assessment of patient adherence to physiotherapy rehabilitation. New Zealand journal of physiotherapy. 2003 Jul;31(2):60-6.

5.

Friedrich M, Cermak T, Maderbacher P. The effect of brochure use versus therapist teaching on patients performing therapeutic exercise and on changes in impairment status. Physical Therapy. 1996 Oct 1;76(10):1082-8.

6.

Hall M, Nielsen JH, Holsgaard-Larsen A, Nielsen DB, Creaby MW, Thorlund JB. Forward lunge knee biomechanics before and after partial meniscectomy. The Knee. 2015 Dec 31;22(6):506-9.

7.

Burns A, Greene BR, McGrath MJ, O'Shea TJ, Kuris B, Ayer SM, Stroiescu F, Cionca V. SHIMMER™–A wireless sensor platform for noninvasive biomedical research. IEEE Sensors Journal. 2010 Sep;10(9):1527-34.

8.

Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7.

9.

McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012 Dec 1;15(4):207-13.

10.

Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks; 2011. Nov 7; Beijing, China. Brussels, Belgium: ICST; 2011. p. 1-7. 210

11.

Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems; 2014 Apr 26; Toronto, Canada. N.Y., U.S.A.: ACM; 2014. p. 3225-3234.

12.

Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. Journal of neuroengineering and rehabilitation. 2014 Nov 27;11(1):158.

13.

Whelan DF, O'Reilly MA, Ward TE, Delahunt E, Caulfield B. Technology in rehabilitation: evaluating the single leg squat exercise with wearable inertial measurement units. Methods of Information in Medicine. 2017;56(2):88-94.

14.

O’Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T, et al. Evaluating squat performance with a single inertial measurement unit. In: Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. p.1-6.

15.

O’Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Classification of lunge biomechanics with multiple and individual inertial measurement units. Sports biomechanics. 2017 May 19:1-9.

16.

Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit. In: Proceedings of the 3rd

Workshop on ICTs for improving Patients Rehabilitation Research

Techniques, (REHAB); 2015. Oct 1; Lisbon, Portugal. N.Y., U.S.A.: ACM; 2015. p. 144–7. 17.

Kianifar R, Lee A, Raina S, Kulić D. Classification of squat quality with inertial measurement units in the single leg squat mobility test. In: Proceedings of the IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC); 2016. Aug 26; Orlando, F.L., U.S.A.. N.Y., U.S.A.: IEEE; 2016. p. 6273-76.

18.

Giggins O, Kelly D, Caulfield B. Evaluating Rehabilitation Exercise Performance Using a Single Inertial Measurement Unit. In: Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare; 2013. May 5; Venice, Italy. N.Y., U.S.A.: IEEE; 2013. p. 49–56.

19.

Johnston W, O'Reilly M, Dolan K, Reid N, Coughlan G, Caulfield B. Objective classification of dynamic balance using a single wearable sensor. In: Proceddings of the 4th International Congress on Sport Sciences Research and Technology Support 211

;2016. Nov 7;Porto, Portugal. Setubal, Portugal: SCITEPRESS–Science and Technology Publications; 2016. p. 15-24. 20.

O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM. Classification of deadlift biomechanics with wearable inertial measurement units. Journal of biomechanics. 2017 Jun 14;58:155-61.

21.

Al Ayubi SU, Parmanto B, Branch R, Ding D. A persuasive and social mHealth application for physical activity: a usability and feasibility study. JMIR mHealth and uHealth. 2014 Apr;2(2).

22.

Stoyanov SR, Hides L, Kavanagh DJ, Wilson H. Development and validation of the user version of the Mobile Application Rating Scale (uMARS). JMIR mHealth and uHealth. 2016 Apr;4(2).

23.

Lodhia V, Karanja S, Lees S, Bastawrous A. Acceptability, usability, and views on deployment of peek, a mobile phone mHealth Intervention for eye care in Kenya: qualitative study. JMIR mHealth and uHealth. 2016 Apr;4(2).

24.

Zapata BC, Fernández-Alemán JL, Idri A, Toval A. Empirical studies on usability of mHealth apps: a systematic literature review. Journal of medical systems. 2015 Feb 1;39(2):1.

25.

D. Fitzgerald, D. Kelly, T. Ward, C. Markham, and B. Caulfield. Usability evaluation of e-motion: A virtual rehabilitation system designed to demonstrate, instruct and monitor a therapeutic exercise programme. In: Proceedings of the International Conference on Virtual Rehabilitation; 2008. Aug 25; Vancouver, Canada. N.Y., U.S.A.: IEEE; 2008. p. 144–149.

26.

D. Fitzgerald, N. Trakarnratanakul, L. Dunne, B. Smyth, and B. Caulfield,. Development and user evaluation of a virtual rehabilitation system for wobble board balance training. In: Proceedings of the IEEE 30th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC); 2008. Aug 20; Vancouver, Canada. N.Y., U.S.A.: IEEE; 2008. p. 4194–4198.

27.

Abran A, Khelifi A, Suryn W, Seffah A. Usability meanings and interpretations in ISO standards. Software Quality Journal. 2003 Nov 1;11(4):325-38.

28.

Turner CW, Lewis JR, Nielsen J. Determining usability test sample size. International encyclopedia of ergonomics and human factors. 2006 Nov;3(2):3084-8.

29.

J. Sauro. Measuring Usability With The System Usability Scale (SUS). Measuring. Usability. 2011. p1-5.

212

30.

Stoyanov SR, Hides L, Kavanagh DJ, Zelenko O, Tjondronegoro D, Mani M. Mobile app rating scale: a new tool for assessing the quality of health mobile apps. JMIR mHealth and uHealth. 2015 Jan;3(1).

31.

J Ritchie J, Lewis J, Nicholls CM, Ormston R. Qualitative research practice: A guide for social science students and researchers. Sage; C.A., U.S.A.. 2013.

32.

Brooke J. SUS-A quick and dirty usability scale. Usability evaluation in industry. 1996 Jun 11;189(194):4-7.

33.

Bangor A, Kortum P, Miller J. Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of usability studies. 2009 May 1;4(3):114-23.

34.

Heath H, Cowley S. Developing a grounded theory approach: a comparison of Glaser and Strauss. International journal of nursing studies. 2004 Feb 29;41(2):141-50.

35.

Braun V, Clarke V. Using thematic analysis in psychology. Qualitative research in psychology. 2006 Jan 1;3(2):77-101.

36.

Saldaña J. The coding manual for qualitative researchers. Sage; C.A., U.S.A.. 2015.p. 223.

37.

Fusch PI, Ness LR. Are we there yet? Data saturation in qualitative research. The Qualitative Report. 2015 Sep 7;20(9):1408

38.

O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Technology in Strength and Conditioning Tracking Lower-Limb Exercises With Wearable Sensors. The Journal of Strength & Conditioning Research. 2017 Jun 1;31(6):1726-36.

39.

Pernek I, Kurillo G, Stiglic G, Bajcsy R. Recognizing the intensity of strength training exercises with wearable sensors. Journal of biomedical informatics. 2015 Dec 31;58:145-55.

40.

Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2014. p. 2214-2218.

41.

Giggins OM, Persson UM, Caulfield B. Biofeedback in rehabilitation. Journal of neuroengineering and rehabilitation. 2013 Jun 18;10(1):60.

42.

Abras C, Maloney-Krichmar D, Preece J. User-centered design. Bainbridge, W. Encyclopedia of Human-Computer Interaction. Sage; C.A., U.S.A.. 2004; 37(4):44556.

43.

Sween J, Wallington SF, Sheppard V, Taylor T, Llanos AA, Adams-Campbell LL. The 213

role of exergaming in improving physical activity: a review. Journal of Physical Activity and Health. 2014 May;11(4):864-70. 44.

Konstantinidis EI, Billis AS, Mouzakidis CA, Zilidou VI, Antoniou PE, Bamidis PD. Design, implementation, and wide pilot deployment of FitForAll: an easy to use exergaming platform improving physical fitness and life quality of senior citizens. IEEE journal of biomedical and health informatics. 2016 Jan;20(1):189-200.

214

Section D: Novel Methods for the Creation of IMU-Based Exercise Classification Systems Section D: Foreword Thus far, in this thesis, a number of IMU-based exercise classification systems have been developed and compared. A key finding has been that developing personalised classifications systems for each user improves system accuracy and computational efficiency. This method allows for systems developed with one IMU to have excellent accuracy in detecting acceptable and aberrant exercise technique in compound lower limb exercises. The mixed-methods evaluation of the ‘Formulift’ application demonstrated positive usability, functionality and perceived impact of personalised IMU based exercise classifications systems. However, there remain a number of ways in which such systems may be improved. In Chapter 10, the real-world accuracy and technical development of ‘Formulift’ during the experiment previously described in Chapter 9 will be presented. In addition to this, a tablet application which automates the process of developing personalised exercise classification systems is developed and discussed. This application may vastly increase the variety of exercises which can be classified with IMUs and ameliorate the time and effort issues associated with developing personalised classification systems. In Chapter 11, a novel method for global classification is presented and evaluated. This method is developed in the attempt to increase system accuracy and to target ease of implementation for newcomers to the field of exercise classification.

215

Chapter 10 A Mobile Application to Streamline the Development of Wearable Sensor Based Exercise Biofeedback Systems: System Development and Evaluation

This chapter is based on the following paper which is published in JMIR Rehabilitation and Assistive Technologies: O'Reilly MA, Duffin J, Ward TE, Caulfield B. A Mobile Application to Streamline the Development of Wearable Sensor Based Exercise Biofeedback Systems: System Development and Evaluation. JMIR Rehabilitation and Assistive Technologies. 2017 Aug 15; (online). 216

10.0. Abstract Background: Biofeedback systems which utilise IMUs have been shown recently to have the ability to objectively assess exercise technique. However, there are a number of challenges in developing such systems; vast amounts of IMU exercise data sets must be collected and manually labelled for each exercise variation and naturally-occurring technique deviations may not be well detected. One method of combatting these issues is through the development of personalised exercise technique classifiers. Objectives:

We aimed to create a tablet application for physiotherapists and personal

trainers which would automate the development of personalised multiple and single IMU based exercise biofeedback systems for their clients. We also sought to complete a preliminary investigation of the accuracy of such individualised systems in a ‘real world’ evaluation. Methods: A tablet application was developed which automates the key steps in exercise technique classifier creation through synchronising video and IMU data collection, automatic signal processing, data segmentation, data labelling of segmented videos by an exercise professional, automatic feature computation and classifier creation. Fifteen volunteers (12 males, 3 females, age: 23.8 +/- 1.8 years, height: 1.79 +/- 0.07 m, body mass: 78.4 +/- 9.6 kg) then completed 4 lower limb compound exercises using a personalised single IMU based classification system. The real world accuracy of the systems was evaluated. Results:

The tablet application successfully automated the process of creating

individualised exercise biofeedback systems. The personalised systems achieved an average of 90% accuracy, with 90% sensitivity and 89% specificity for assessing aberrant and acceptable technique with a single IMU positioned on the left thigh. Conclusions: A tablet application was developed that automates the process required to create a personalised exercise technique classification system. This tool can be applied to any cyclical, repetitive exercise. The personalised classification model displayed excellent system accuracy even when assessing acute deviations in compound exercises with a single IMU.

217

10.1. Introduction Exercise rehabilitation for the treatment of musculoskeletal conditions such as osteoarthritis, following injury or orthopaedic surgical procedures is accepted as an essential treatment tool (1–3).

Resistance training may also be used to improve one’s muscular strength,

hypertrophy and power in non-patient populations (4–6). However, many people completing exercise programmes encounter a variety of difficulties when performing their exercises without the supervision of a trained exercise professional such as a physiotherapist or strength and conditioning (S&C) coach. One such difficulty is that in many circumstances, people may execute their exercises incorrectly (7,8). Incorrect alignment during exercise, incorrect speed of movement and poor quality of movement may have an impact on the efficacy of exercise and may therefore result in a poor outcome (7,8). It is therefore essential that accurate assessment of exercise performance is available to ensure people perform their exercises properly. This is particularly necessary in cases where an individual completes their exercise programme in the absence of an exercise professional’s supervision e.g. during home-based rehabilitation programmes or S&C programmes where the person exercising cannot afford a personal trainer. Recent research has shown IMU based biomechanical biofeedback systems to be an accurate

exercise assessment

tool.

Biomechanical biofeedback

involves;

(a)

the

measurement of one’s movement, postural control or force output and (b) the provision of feedback to the user regarding this (9). IMUs are able to acquire data pertaining to the linear and angular motion of individual limb segments and the centre of mass of the body. They are small, inexpensive, easy to set up and facilitate the acquisition of human movement data in unconstrained environments (10). Research in this field has shown the ability of multiple body-worn IMUs to evaluate exercise quality for a variety of exercises (11–14). These range from early stage rehabilitation exercises like heel slides and straight leg raises (15) to more complex, late stage rehabilitation exercises or S&C exercises such as bodyweight squats (16), lunges (17), and single leg squats (18–20). More cost effective and practical systems using a single body-worn IMU have also been shown to be effective in the analysis of exercise technique (17,18,21,22). Systems that are based on a single IMU are considered preferential as they can provide equivalent exercise analysis quality to multiple IMU set-ups at a lower cost. However, in a number of cases a single IMU set-up achieves lower quality exercise analysis levels than multiple IMU set-ups. The ability of a single IMU set-up to detect acute naturally occurring technique deviations in compound late stage rehabilitation and S&C exercises such as deadlifts, lunges and squats is also largely unknown; whilst this has been shown as possible for single leg squats (18), the reported findings on lunges and squats pertain to 218

detecting deliberately induced exercise technique deviations (16,17). There is also a need to iteratively improve the accuracy, sensitivity and specificity of IMU based exercise technique biofeedback systems and increase the number of exercises that can be analysed with IMUs. IMU based exercise biofeedback systems should be able to assess technique for a comprehensive range of exercises both accurately and in a manner which is practical for people completing exercise. There are a number of considerable challenges in the creation of such biofeedback systems. Firstly, in order for machine learning classification algorithms to produce desirable results they require large volumes of training data. As such it is difficult to collect IMU data on a large variety of exercises in a research environment. Subsequently, current research has mainly assessed very commonly completed exercises which span the scope of musculoskeletal screening, rehabilitation and strength and conditioning (S&C). There remain thousands of exercises for which the ability of IMUs to assess their technique is unknown. Classification algorithms such as random forests and logistic regression also require balanced training data sets, where each class (e.g. acceptable or aberrant) have the same amount of instances in the training data (23–25). This provides a huge challenge in creating systems which aim to detect natural technique deviations which occur idiosyncratically and at greatly differing frequencies. This challenge is heightened in circumstances where the intersubject variation of completing an exercise with acceptable form exceeds the intra-subject variation between one’s acceptable and aberrant form. One solution to combatting the aforementioned challenges may be to create individualised exercise classification systems. In this circumstance, a classifier is created using training data solely from the person whose exercise is to be assessed. Preliminary research has shown that such classifiers can produce superior accuracy to global classification systems (26,27). Additionally, some global classification systems have only been developed and evaluated with deliberately induced technique deviations (16,17). Personalised systems may allow for many more exercises to be evaluated for a particular exercising person and could allow for acute, naturally occurring technique deviations to be detected with a single bodyworn IMU where this has not been previously possible. The classifiers would also be less memory intensive and more efficient as they are developed using smaller training data sets. However, to the authors’ knowledge there is a lack of tools currently available to efficiently capture and label IMU data during exercise to enable the efficient development of personalised exercise technique classification systems.

219

10.2. Objectives Therefore, the purpose of this investigation was to create a tablet application (app) which enabled efficient creation of personalised single, IMU-based exercise biofeedback systems. We also sought to investigate the accuracy of this personalised system in a ‘real world’ evaluation using a sample of four, compound, lower limb exercises (lunges, squats, single leg squats and deadlifts) in 15 participants. In this chapter, an overview of the developed app is first presented. An experimental evaluation of the system in the real world is then described.

10.3. Methods 10.3.0. System Overview In exercise classification with IMUs, there exist a number of universal steps which allow for the development of exercise biofeedback systems (28). Firstly, IMU data must be collected from participants as they exercise. Each repetition of each exercise must be labelled by an exercise professional. The signals collected from the IMU must be filtered to eliminate unwanted noise and additional signals may be computed, which for instance describe the IMU’s 3D orientation. The signals are segmented into epochs, each of which pertains to one repetition of an exercise. Features are computed from these segmented signals as described in the upcoming ‘feature computation and classifier creation’ section. Finally, a classification model is trained using both the labels provided by an exercise professional and the features computed from the sensor signals which pertain to the same repetitions (Figure 10.1). The tablet app, presented in this chapter, allows for simultaneous IMU and video data capture. It then allows labelling of each IMU data epoch through reviewing its associated video epoch. Features are then automatically computed from the IMU signal epochs and classifiers are built using these features and the labels provided by the exercise professional.

Figure 10.1: Steps involved in the development of an IMU based exercise classification system.

220

10.3.0.0. Overview of Data Collection Tool

Figure 10.2: Schematic demonstrating the flow and functionality of the tablet application.

The tablet app was developed using Android Studio (Android, Google, California, U.S.A.) and ran on a Samsung Galaxy S2 tablet (Samsung, Suwon, South Korea). It contains a number of tabs which enable a vast degree of functionality for the automated creation of personalised classification systems. Figure 10.2 demonstrates the processes involved and highlights the need for data labelling from an exercise professional. The various tabs within the app are demonstrated in Figure 10.3. The system can connect to a maximum of five Shimmer IMUs (29) and stream synchronised data from them simultaneously. All IMUs are automatically configured to stream tri-axial accelerometer (+/- 2 g), gyroscope (+/- 500

o

/s) and

magnetometer (+/- 1.9 Ga) data at 51.2 Hz. These values were chosen as they have previously been shown to be appropriate for the analysis of rehabilitation exercise with IMUs (15,18,19). However, the sampling rate and sensor ranges may be insufficient for faster exercises such as jumping or plyometric exercises. Future iterations of the system will address this by allowing the exercise professional to select sampling rate and sensor ranges based on exercise type prior to data collection. For this study, the IMU was calibrated by the lead investigator of this study. This took roughly ten minutes.

221

Figure 10.3: Home screen of tablet application, demonstrating its variety of functions. The app then allows for the automation of all the aforementioned steps in the development of an exercise technique classifications system as shown in Figure 10.1 and 10.2. 10.3.0.1. Video and IMU Data Collection Following sensor set up, navigating to the ‘Record a New Session’ tab allows an exercise professional to video their client as they exercise whilst data from the IMUs is simultaneously collected. Video is captured at the tablet’s natural sampling rate and IMU data is collected at 51.2 Hz (Figure 10.4). The exercise professional may choose to record their client from the frontal or sagittal plane depending on the exercise being evaluated.

Figure 10.4: Data capture part of app which allows IMU data and video to be captured simultaneously.

222

10.3.0.2. Signal Processing and Segmentation Following the recording of a set of a particular exercise, a number of steps are completed by the app in processing the IMU data. To ensure the data analysed applied to each participant’s movement and in order to eliminate unwanted high-frequency noise, the six signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n=8. Nine additional signals were then calculated. The 3-D orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al (30). The resulting quaternion values (W, X, Y and Z) were then converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal plane and transverse planes respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y and z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the IMU and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y and z. While these magnitude signals do not allow for specific body segment planes to be analysed they can aid capturing detection of aberrant movement when deviations are very pronounced or happen in multiple planes. The signals and video data were then programmatically segmented into epochs which relate to single, full repetitions of the completed exercises. Many algorithms are available to segment human motion for rehabilitation exercises, including the sliding window algorithm (31) top-down, bottom-up algorithms (32), zero-velocity-crossing algorithms (ZVC), templatebase matching methods (33), and the combination algorithms of the above (34). These algorithms have advantages and disadvantages. For the purpose of the creation of a functioning classifier creation tool, a simple peak-detection algorithm was used on the gyroscope signal with the largest amplitude for any particular exercise. The start and end points of each repetition can then be found by looking for the corresponding zero-crossing points of the gyroscope signal leading up to and following the location of a peak in the signal. Figure 3.3 demonstrates example results of the segmentation algorithm used on the gyroscope Z signal, from an IMU positioned on the left thigh during three repetitions of the deadlift exercise. Following the signal processing and segmentation of the IMU data, the video was cut into epochs based on the start and end points of repetitions found in the IMU data. The session name, exercise name, repetition number, IMU data and video data for each individual exercise repetition were stored as objects in a database.

223

The specific signal processing and segmentation processes selected were chosen based on their demonstrated capability in similar research (16–19). In future iterations of the app, a variety of additional signal processing and segmentation options may be presented to the exercise professionals using the system or the functions will be updated to match the emerging state of the art. 10.3.0.3. Data Labelling The app enables a number of different functionalities regarding data labelling. The exercise professional using the tablet app first has the ability to add new exercises and technique deviations as possible labels for the stored and segmented data. These labels also become available to the exercise professional when they record new exercise sessions. The exercise professional then has the option of labelling the videos, repetition by repetition through viewing them according to the filter criteria ‘session name’ or by ‘exercise type’. The default class for all repetitions is ‘Acceptable’ until they are labelled as ‘Aberrant’ or as a specific deviation from an acceptable technique. An unlimited number of possible labels can be created for each exercise. Once data has been collected, for each exercise, there is also an ‘Auto-label’ function. This function uses data already labelled by the exercise professional to build a random forests classifier which estimates the class for currently unlabelled data. As shown in Figure 10.5, the app then presents the classifier’s predicted label with the video of the repetition and allows the exercise professional to either keep the prediction or ignore the prediction. If the prediction is ignored, the repetition can then manually be labelled in the ‘review by exercise’ or ‘review by session’ tab. The database can also be manually updated at any time, allowing the exercise professional to remove particular repetitions or delete/edit the current label for it. Figure 10.5 highlights the app’s various data labelling functionalities.

224

Figure 10.5: Various data labelling functionalities of the app. 10.3.0.4. Feature Computation and Classifier Creation Once the data has been labelled as desired by the exercise professional, the app can then build the personalised exercise technique classification objects for each client and each exercise they completed. A separate classifier is created for each different exercise. Time-domain and frequency-domain descriptive features are computed in order to describe the pattern of each of the eighteen signals when the four different exercises were completed. These features were namely 'Mean', 'RMS', 'Standard Deviation', 'Kurtosis', 'Median', 'Skewness', ‘Range', ‘Variance', 'Max', 'Min', 'Energy', '25th Percentile', '75th Percentile', 'Level Crossing Rate', 'Fractal Dimension' (35) and the ‘variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 7’.

This

resulted in 17 features for each of the 18 available signals producing a total of 306 features per IMU. Training data are balanced to ensure the developed classifiers are unbiased. This is done by removing random observations of over represented classes until all classes have an equal number of observations. For instance, if a labelled data set of squat repetitions has 50 ‘acceptable’ repetitions and 40 ‘aberrant’ repetitions, ten ‘acceptable’ repetitions which are chosen randomly using a programmatic method, will not be used to train the classifier. Finally, the app builds random forests classifier objects with 400 trees. The choice of features computed, balancing of training data and use of a random forests classifier all replicates recently published work in the field (15–18). Similarly to the signal

225

processing and segmentation, these processes can be updated in future iterations of the app to match the emerging state of the art in exercise technique classification with IMUs. The developed classifier objects can then be exported from within the tablet app to individual’s exercise biofeedback apps on their smartphones for use in monitoring their rehabilitation exercise programmes.

10.3.1. System Evaluation 10.3.1.0. Participants Fifteen volunteers, not currently undergoing any rehabilitation, participated. Whereby, no participant had a current or recent musculoskeletal injury that would impair their exercise performance. Participants were recruited via poster advertisements on notice boards in the local area and were therefore a sample of convenience. Five participants were beginner exercisers who had been screened to have naturally aberrant technique and were untrained in the exercises in the study. Ten participants were experienced with the exercises and were required to deliberately mimic aberrant technique at appropriate times during the experiment. Each participant signed a consent form prior to completing the study. The University College Dublin Human Research Ethics Committee approved the study protocol. 10.3.1.1. Experimental Protocol The testing protocol was explained to participants upon their arrival to the research laboratory. Their gender was recorded and their weight was measured using a weighing scales. Height was then measured with a stadiometer. All participants completed a 5 minute warm-up on an exercise bike; during which they were required to maintain a power output of 100W and cadence of 75-85 revolutions per minute. Following the warm-up, an investigator placed a single IMU on the participant at the mid-point of the lateral aspect of the left femur (determined as half way between the greater trochanter and lateral femoral condyle). The orientation and location of the IMU was consistent across all study participants. The IMU sampling rate and sensor range settings used were identical to those described in the ‘overview of tool’ section. Video and IMU data were then simultaneously collected as the participant completed four of the following exercises: bodyweight left leg single leg squats, bodyweight lunges, bodyweight or barbell squats and barbell deadlifts. These exercises were chosen pragmatically as they represent compound lower limb exercises (hip, knee and ankle) which span both the latestage rehabilitation and S&C domains. They also cannot easily be analysed by any existing systems. 40 repetitions of each exercise were collected. 20 repetitions were completed with

226

‘acceptable’ form and 20 repetitions were completed with ‘aberrant’ form. The ‘aberrant’ repetitions from the five beginners were naturally occurring, whereas the ten experienced participants deliberately induced their ‘aberrant’ form. Following this data collection, the IMU was removed from the participant’s left thigh. As the participant rested, the exercise professional then used the segmented videos to label all exercise repetitions of the four exercises as being ‘acceptable’ or ‘aberrant’ technique (160 repetitions per participant). Four binary random forests classifiers were then created for each participant, each pertaining to one of the four aforementioned exercises. These random forests objects were imported into a biofeedback application. The data labelling and classifier creation took a maximum of 30 minutes per participant. The biofeedback application titled ‘Formulift’ (Figure 10.6) allows a person exercising to connect to a Shimmer IMU, select each of the above exercises and have their repetitions of each exercise be classified as ‘acceptable’ or ‘aberrant’.

Figure 10.6: Screenshot from the 'Formulift app' which uses the classifiers developed from the tablet app to analyse whether a person's exercise technique is acceptable or aberrant as they complete squats, deadlifts, lunges and single leg squats. Following the creation of their personalised biofeedback system, the participant first secured the IMU to their left thigh themselves and connected the wireless Shimmer IMU to the mobile 227

application. These steps took roughly one minute. They then completed two sets of ten repetitions for each the four exercises. Upon completion of each repetition of each exercise the smartphone vibrated to provide simple vibrotactile feedback to the user that the system was functioning. In the first set of each exercise they were instructed to exercise with their best possible technique and in the second they were asked to try and replicate the mistake they had made prior to the exercise professional’s coaching. The whole session was simultaneously videoed and the classifier’s predictions of the participant’s’ technique were stored in the background storage folders on the tablet. 10.3.1.2. Data Analyses Following the participants’ use of their personalised biofeedback app, the system’s predicted labels (acceptable or aberrant) for each repetition of each exercise were stored. The videos of each repetition of each exercise were then labelled by an S&C coach with more than five years’ experience in visual analysis of the exercises. They were labelled as acceptable or aberrant in a systematic format. The S&C coach could view the repetitions as many times as necessary to make a clear judgement on the label. Labelling all data for each beginner participant took under 25 minutes and was quicker for the experienced participants as their aberrant form was deliberately induced. Example types of aberrant form which the exercise professional was looking for included knee valgus, knee varus and asymmetry as used in similar recent research (16-19). The personalised classifiers predicted labels were then compared to the exercise professional’s labels, which were considered as ground truth for each repetition of each exercise from each participant. Where the exercise professional had labelled a repetition as ‘acceptable’ and the classifier predicted ‘acceptable’ this was counted as a true positive (TP). However, if the classifier predicted ‘aberrant’ in this circumstance, a false negative (FN) was counted. If the exercise professional and classifier both deemed a repetition to be ‘aberrant’, it was counted as a true negative (TN). However, if the exercise professional deemed a repetition to be ‘aberrant’ and the classifier predicted it as ‘acceptable’ this was counted as a false positive (FP). The scores used to measure the quality of classification were total accuracy, sensitivity and specificity. Accuracy is the number of correctly classified repetitions of all the exercises divided by the total number of repetitions completed; this is calculated as the sum of the true positives (TP) and true negatives (TN) divided by the sum of the true positives, false positives (FP), true negatives and false negatives (FN): 𝑇𝑃+𝑇𝑁

1) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁

228

Sensitivity and specificity were computed using the formulas below: 𝑇𝑃

2) 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃+𝐹𝑁 𝑇𝑁

3) 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁+𝐹𝑃 Sensitivity measures the effectiveness of a classifier at identifying a desired label, while specificity measures the classifiers ability to detect negative labels. These three metrics were used to assess the classification quality for each individual participant for each of the four exercises completed.

10.4. Results 10.4.0. Participant Demographics The demographics of the participants were as follows: 12 males, 3 females, age: 23.8 +/- 1.8 years, height: 1.79 +/- 0.07 m, body mass: 78.4 +/- 9.6 kg. 10.4.1. System Evaluation Results Table 10.1 demonstrates the mean accuracy, sensitivity and specificity scores for all participants using their personalised classifiers for each of the four exercises under study in the real-world evaluation as described in ‘system evaluation’ section. The mean results for the five beginner participants, who had naturally aberrant technique, are shown as well as those from the more experienced participants who had deliberately induced technique aberrations. The system was more accurate for the experienced exercisers group (98.59%) than the beginners group (88.00%) for the deadlift exercise but was otherwise more accurate for the beginners. This is particularly interesting as the beginner’s technique aberrations were naturally occurring and the experienced group’s aberrations were deliberately induced. The system was least accurate for lunges (84.14%) and most accurate for single leg squats (97.26%) across all participants. Accuracy varied considerably for each individual in the lunge and squat exercises as can be seen in the presented standard deviations (Table 10.1). The range of accuracies across all participants was less variable for the single leg squat and deadlift exercises. For the single leg squat exercise the mean sensitivity was 98% and the mean specificity was 93%. This means the system was better at detecting acceptable single leg squat technique than aberrant technique or that 7% of aberrant exercise repetitions were misclassified as acceptable. The system had relatively similar sensitivity and specificity in classifying lunges 229

and deadlifts. Therefore, it would not appear bias to one class to an exerciser using the system. However, for the squat exercise there is a 13% chance an acceptable repetition may be classified as aberrant and a 17% chance that an aberrant repetition may be classified as acceptable. Table 10.1: Table showing the mean accuracy, sensitivity and specificity of personalised classifiers for the binary evaluation (acceptable or aberrant technique) of each exercise and each participant. Participants Accuracy Sensitivity Specificity (%) Mean (S.D.)

(%) Mean (S.D.)

(%) Mean (S.D.)

Beginners (n=5)

99.17 (1.86)

100.00 (0.00)

98.33 (3.73)

Experienced (n=10)

95.98 (6.69)

97.00 (4.83)

90.41 (15.24)

All (n=15)

97.26 (5.54)

98.00 (4.00)

93.03 (19.09)

Beginners (n=5)

92.63 (10.50)

96.67 (7.45)

88.70 (16.36)

Experienced (n=10)

77.77 (21.26)

74.07 (3.19)

83.82 (32.17)

All (n=15)

84. 14 (18.96)

83.11 (27.49)

85.78 (20.85)

Beginners (n=5)

84.83 (16.58)

75.00 (35.47)

95.00 (5.00)

Experienced (n=10)

82.71 (15.43)

90.98 (15.25)

74.44 (32.01)

All (n=15)

84.53 (16.38)

87.06 (27.53)

82.67 (29.00)

Beginners (n=5)

88.00 (8.16)

84.00 (16.25)

90.00 (2.00)

Experienced (n=10)

98.59 (2.71)

98.15 (3.55)

98.99 (2.86)

All (n=15)

94.81 (7.93)

93.10 (13.35)

95.78 (14.35)

Single leg squats

Lunges

Squats

Deadlifts

230

10.5. Discussion 10.5.0. System Development The tool described in this chapter successfully automates the process of creating personalised IMU based exercise technique classification systems. The previously laborious sequence of data collection, data labelling and data analyses in software such as MATLAB (36) has been streamlined as an android tablet app which can be used by an exercise professional. The app eliminates the need for a data analysis professional to develop the classification systems by automating the commons steps in the development of such systems (Figure 10.1). A key benefit of this tool for exercise professionals is that it allows rapid development of personalised exercise feedback systems tailored to their client’s exercise needs and specific movement patterns. There are a number of notable benefits to taking an individualised analysis approach to the development of IMU based exercise technique analysis systems. Recent work has shown such systems to be more accurate and computationally efficient than global classifiers (27). The development of global classifiers is extremely time intensive and requires hundreds of hours of data collection and analysis by researchers. Data must be collected in such fashion for any exercise for which a technique classifier is desired. This means that currently, there exist only a handful of exercises which have been proven to be possible to assess with IMUs. The system described in this chapter should allow for the creation of a personalised exercise classifier for any rehabilitation or strength and conditioning exercises which is cyclical and repetition based. Therefore, clinicians would not be limited in their exercise choices when designing specific programmes to meet their clients’ needs. The app described in this chapter could be conceivably used by a clinician during a patient’s visit to their clinic and then the data labelled from this session could be used to create a functioning analysis tool for their programme which they may complete in the absence of professional supervision. 10.5.1. System Evaluation The preliminary evaluation of the system also suggests the accuracy, sensitivity and specificity of the personalised exercise technique classifiers may exceed that of global exercise technique classification systems. This reflects other similar research which compared sensor set-ups and classification methodologies for the barbell squat and deadlift exercises (27). Whilst it is difficult to make direct comparisons to previous research, it can be noted that a single IMU positioned on the left thigh has been demonstrated as capable of assessing acceptable or aberrant lunge technique with 77% accuracy (17) and single leg squat technique with 75% accuracy (18). These values were computed using leave one subject out cross validation (LOSOCV). The personalised systems, evaluated in the real world, achieved 84% and 97% accuracy for the same analysis of lunges and single leg 231

squats respectively. The binary classification of squat technique has previously been shown to be 80% accurate in a global classification system utilising a single lumbar worn IMU (16). The individualised systems described in this chapter ranged from 50% to 100% accuracy and had mean value of 85% across the 15 participants. It can also be noted that the deviations, collected from the five participant beginner group, used for analysis in this chapter were naturally occurring, whereas in the aforementioned lunge and squat global classifiers, the deviations from correct technique were deliberately induced by study participants. This may make individualised classifiers more functional and usable in the real world. This chapter’s deadlift accuracy result of 95% exceeds recently published work on binary classification of the deadlift with a left thigh IMU where 84% accuracy was achieved (27). This is likely because there was more training data for each individual in this study. The personalised classification systems used in this preliminary evaluation of the tablet app were developed using four sets of each exercise (a total of 40 repetitions). Increasing the amount of training data used for each individual would likely further improve the accuracy of their personalised exercise technique evaluation system (24,25).

10.5.2. Limitations There are a number of contextual factors to this study which should be considered. Most notably, whilst the tool described allows for the efficient creation of an IMU based exercise technique classifier for any cyclical, repetition-based exercise, it is not as simple as using a global classification system for exercises for which they exist. The tool described requires at least one recorded session with an exercise professional and requires the exercise professional’s time and expertise to label the video data. However, the tool described could be conceivably used to fill in the gaps in a client’s exercise programme where a global classifier is not yet available. Moreover, the labelled data can all be stored in a database and the data which was initially used to create individualised classifiers can be pooled together to make a global classifier. The exercise professional could switch to this global classifier when they deem it accurate enough to negate the benefits of creating an individualised classifier for each of their clients. A key area which limits the findings of the evaluation study is that it was small scale and the participants were not balanced in experience or gender. Moreover, the study participants were relatively homogenous in the evaluation study and it is not yet understood if the results found would be generalizable to other populations such as older, obese or underweight people. In particular, the system evaluation was completed with individuals not currently undergoing rehabilitation. Future work should investigate the system with individuals

232

undergoing rehabilitation. It is foreseen that the personalised classifier should still work provided the exercise professional can label the data appropriately for each individual’s needs. The authors also acknowledge that more work is required to assess the capabilities of classifiers created with this new tool, particularly in the detection of exact deviations in exercise technique. The capabilities of a multiple IMU set-up must be examined. However, the results presented show excellent potential for a single IMU set-up to assess complex, compound lower limb exercises when using personalised classifiers. It should be noted that this paper only describes the development of this new tool and its first evaluation. It is not yet fully understood how it will be incorporated into clinical practice. Future work should investigate the influence of the exercise professional’s experience on system accuracy and usability and how the system can be incorporated into a clinician’s use of time. Only one exercise professional labelled the data in the evaluation study. The coding was not compared with other professionals and this is necessary future work. Finally, the tool described only replicates current state of the art in the field and the signal processing, feature computation and classification methods ought to be iterated as the field progresses.

10.5.3. Conclusions In this chapter, a tablet application that streamlines the creation of IMU based exercise technique analysis systems is presented. The tool replicates the data analysis pathways that have been used in recently published research (16-19). It also allows an exercise professional to record video data simultaneously to IMU data and label it efficiently following a session with a client. The app then creates personalised exercise technique classifiers for the client based on the labelled IMU data. These personalised classifiers are less memory intensive and more accurate than equivalent global classifiers for the exercises used in this study. In addition to this, data collected with the tool could ultimately be used to train new global classification systems with increased accuracy due to the increased amount of training data available.

10.6. References 1.

Health Quality Ontario. Physiotherapy rehabilitation after total knee or hip replacement: an evidence-based analysis. Ont arioHealth Technology Assessment Series. 2005;5(8):1–91.

2.

Hernández‐Molina G, Reichenbach S, Zhang B, LaValley M, Felson DT. Effect of therapeutic exercise for hip osteoarthritis pain: Results of a meta‐analysis. Arthritis 233

Care & Research. 2008 Sep 15;59(9):1221-8. 3.

Zhang W, Doherty M, Bardin T, Pascual E, Barskova V, Conaghan P, Gerster J, Jacobs J, Leeb B, Liote F, McCarthy G. EULAR evidence based recommendations for gout. Part II: Management. Report of a task force of the EULAR Standing Committee for International Clinical Studies Including Therapeutics (ESCISIT). Annals of the rheumatic diseases. 2006 Oct 1;65(10):1312-24.

4.

Frontera WR, Meredith CN, O'Reilly KP, Knuttgen HG, Evans WJ. Strength conditioning in older men: skeletal muscle hypertrophy and improved function. Journal of applied physiology. 1988 Mar 1;64(3):1038-44.

5.

Ahtiainen JP, Pakarinen A, Alen M, Kraemer WJ, Häkkinen K. Muscle hypertrophy, hormonal adaptations and strength development during strength training in strengthtrained and untrained men. European journal of applied physiology. 2003 Aug 7;89(6):555-63.

6.

Kraemer WJ, Mazzetti SA, Nindl BC, Gotshalk LA, Volek JS, Bush JA, Marx JO, Dohi K, GÓmez AL, Miles M, Fleck SJ. Effect of resistance training on women’s strength/power and occupational performances. Medicine & Science in Sports & Exercise. 2001 Jun 1;33(6):1011-25.

7.

Bassett SF. The assessment of patient adherence to physiotherapy rehabilitation. New Zealand journal of physiotherapy. 2003 Jul;31(2):60-6.

8.

Friedrich M, Cermak T, Maderbacher P. The effect of brochure use versus therapist teaching on patients performing therapeutic exercise and on changes in impairment status. Physical Therapy. 1996 Oct 1;76(10):1082-8.

9.

Giggins OM, Persson UM, Caulfield B. Biofeedback in rehabilitation. Journal of neuroengineering and rehabilitation. 2013 Jun 18;10(1):60.

10.

McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering. 2012 Dec 1:1-7.

11.

Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Proceedings of the 9th International Conference on Ubiquitous Computing (UbiComp); 2007. Sep 16; Innsbruck, Austria. Berlin, Germany: Springer; 2007. p. 19-37.

12.

Fitzgerald D, Foody J, Kelly D, Ward T, Markham C, McDonald J, Caulfield B.

234

Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises. In: Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2007. Aug 22; Lyon, France. N.Y., U.S.A.: IEEE; 2007. p. 4870-74. 13.

Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks; 2011. Nov 7; Beijing, China. Brussels, Belgium: ICST; 2011. p. 1-7.

14.

Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems; 2014 Apr 26; Toronto, Canada. N.Y., U.S.A.: ACM; 2014. p. 3225-3234.

15.

Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. Journal of neuroengineering and rehabilitation. 2014 Nov 27;11(1):158.

16.

O’Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T, et al. Evaluating squat performance with a single inertial measurement unit. In: Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015. Jun 9; Boston, M.A., U.S.A. p. 1-6. U.S.A. N.Y., U.S.A.: IEEE; 2015. p.1-6.

17.

Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Lunge Exercise with Multiple and Individual Inertial Measurement Units. In: Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare, Pervasive Health; 2016. May 16; Cancun, Mexico. N.Y., U.S.A.: ACM; 2016. p. 101-8.

18.

Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units. Methods of Information in Medicine. 2016;56(2):88-94.

19.

Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit. In: Proceedings of the 3rd

Workshop on ICTs for improving Patients Rehabilitation Research

Techniques, (REHAB); 2015. Oct 1; Lisbon, Portugal. N.Y., U.S.A.: ACM; 2015. p. 144–7.

235

20.

Kianifar R, Lee A, Raina S, Kulić D. Classification of squat quality with inertial measurement units in the single leg squat mobility test. In: Proceedings of the IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC); 2016. Aug 26; Orlando, F.L., U.S.A.. N.Y., U.S.A.: IEEE; 2016. p. 6273-76.

21.

Giggins O, Kelly D, Caulfield B. Evaluating rehabilitation exercise performance using a single inertial measurement unit. In: Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare; 2013. May 5; Venice, Italy. N.Y., U.S.A.: IEEE; 2013. p.49-56.

22.

Johnston W, O'Reilly M, Dolan K, Reid N, Coughlan G, Caulfield B. Objective classification of dynamic balance using a single wearable sensor. In: Proceddings of the 4th International Congress on Sport Sciences Research and Technology Support ;2016. Nov 7;Porto, Portugal. Setubal, Portugal: SCITEPRESS–Science and Technology Publications; 2016. p. 15-24.

23.

He H, Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 2009;21(9):1263-84.

24.

Chawla NV. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook. N.Y.C., N.Y., U.S.A.: Springer; 2005. p. 853-67.

25.

Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques.

Emerging Artificial Intelligence Applications in Computer

Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies: IOS Press; 2007. p. 3-25. 26.

Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. In: Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012. Aug 28; San Diego, C.A., U.S.A. N.Y., U.S.A.: IEEE; 2012. p. 2214-2218.

27.

O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM. Classification of deadlift biomechanics with wearable inertial measurement units. Journal of biomechanics. 2017 Jun 14;58:155-61.

28.

Whelan D, O'Reilly M, Huang B, Giggins O, Kechadi T, Caulfield B. Leveraging IMU data for accurate exercise performance classification and musculoskeletal injury risk screening. In: Proceedings of the IEEE 38th Annual International Conference of the

236

Engineering in Medicine and Biology Society (EMBC); 2016. Aug 26; Orlando, F.L., U.S.A.. N.Y., U.S.A.: IEEE; 2016. p. 659-62. 29.

Burns A, Greene BR, McGrath MJ, O'Shea TJ, Kuris B, Ayer SM, Stroiescu F, Cionca V. SHIMMER™–A wireless sensor platform for noninvasive biomedical research. IEEE Sensors Journal. 2010 Sep;10(9):1527-34.

30.

Madgwick SO, Harrison AJ, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the 12th International Conference on Rehabilitation Robotics (ICORR); 2011. Jun 29; Zurich, Switzerland. N.Y., U.S.A.: IEEE; 2011. p. 1-7.

31.

Shatkay H, Zdonik SB. Approximate queries and representations for large data sequences. In: Proceedings of the Twelfth International Conference on Data Engineering; 1996. Feb 26; New Orleans, L.A.,U.S.A.. N.Y., U.S.A.: IEEE; 1996. p. 536-545.

32.

Duda RO, Hart PE, Stork DG. Pattern classification. Wiley, New York; 1973 Nov.

33.

Pomplun M, Mataric MJ. Evaluation metrics and results of human arm movement imitation. In: Proceedings of the 1st IEEE-RAS International Conference on Humanoid Robotics (Humanoids); 2000. Sep 7; Boston, M.A., U.S.A.. N.Y., U.S.A.: IEEE; 2000. p. 7-8.

34.

Lin JF, Kulic D. Online segmentation of human motion for automated rehabilitation exercise analysis. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2014 Jan;22(1):168-80.

35.

Katz MJ, George EB. Fractals and the analysis of growth paths. Bulletin of mathematical biology. 1985 Jan 1;47(2):273-86.

36.

Matlab. The Mathworks. Natwick, U.S.A. 2012. R2012b.

237

Chapter 11

Feature-free Activity Classification of Inertial Sensor Data with Machine Vision Techniques: Method, Development and Evaluation

This chapter is based on the following paper which is published in JMIR mHealth and uHealth: Dominguez Veiga J, O'Reilly MA, Whelan DF, Caulfield B, Ward TE. Feature-free Activity Classification of Inertial Sensor Data with Machine Vision Techniques: Method, Development and Evaluation. JMIR mHealth & uHealth. 2017 Jul 15; (online). *As stated in the preliminaries of this thesis, this work was completed with José Dominguez Veiga who worked collaboratively on the manuscript writing, data analysis, results and discussion presented in this chapter. In particular, José implemented all work pertaining to the presented convolutional neural network. 238

11.0. Abstract Background: Inertial sensors are one of the most commonly used sources of data for Human Activity Recognition (HAR) and Exercise Detection (ED) tasks. The time series produced by these sensors are generally analysed through numerical methods. Machine learning techniques such as random forests or support vector machines are popular in this field for classification efforts, but they need to be supported through the isolation of a potentially large number of additionally crafted features derived from the raw data. This feature pre-processing step can involve non-trivial digital signal processing (DSP) techniques. However, in many cases, the researchers interested in this type of activity recognition problems do not possess the necessary technical background for this feature set development. Objective: This paper presents a novel application of established machine vision methods to provide interested researchers with an easier entry path into the HAR and ED fields. This can be achieved by removing the need for deep DSP skills through the use of Transfer Learning. This can be done by using a pre-trained Convolutional Neural Network (CNN) developed for machine vision purposes for exercise classification effort. The new method should simply require researchers to generate plots of the signals that they would like to build classifiers with, store them as images, and then place them in folders according to their training label before retraining the network. Methods: We apply an established machine vision technique – CNN - to the task of ED. Tensorflow, a high-level framework for machine learning, is used to facilitate infrastructure needs. Simple time series plots generated directly from accelerometer and gyroscope signals are used to retrain an openly available online neural network (Inception) originally developed for machine vision tasks. Data from 82 healthy volunteers, performing 5 different exercises while wearing a lumbar-worn IMU was collected. The ability of the proposed method to automatically classify the exercise being completed was assessed using this data set.

For comparative

purposes, classification using the same data set was also performed using the more conventional approach of feature-extraction and classification using Random Forest classifiers. Results: With the collected data set and the proposed method, the different exercises could be recognised with 95.9% accuracy, which is competitive with current state of the art techniques in ED.

239

Conclusions: The high level of accuracy attained with the proposed approach indicates that the waveform morphologies in the time-series plots for each of the exercises are sufficiently distinct among the participants to allow the use of machine vision approaches. The use of high-level machine learning frameworks, coupled with the novel use of machine vision techniques instead of complex manually crafted features, may facilitate access to research in the HAR field for individuals without extensive digital signal processing or machine learning backgrounds.

240

11.1. Introduction 11.1.0. Background Inertial sensors are ubiquitous in everyday objects such as smartphones and wristbands, and can provide large amounts of data regarding movement activity. Analysis of such data can be diverse, but in general terms can be characterised as complex operations using a broad range of machine learning techniques and highly sophisticated signal processing methods. The latter is required in order to extract salient features that can improve recognition performance. These features are not only complex to calculate, but also making a-priori reasoned arguments towards their effectiveness in improving overall results is difficult. The temptation to include additional features in an attempt to improve classification accuracy may result in pipelines (infrastructure) with excessive complexity, yielding slower processing, and increased resource usage. To counter this proliferation of features, it is common to use dimensionality reduction techniques including linear approaches such as Principal Component Analysis, and increasingly common, nonlinear methods principally based on manifold learning algorithms. In contrast to this complex tool we propose a method to classify human activity from inertial sensor data based on images and using deep learning based machine vision techniques. This approach reduces the amount of deep domain knowledge needed in terms of DSP down to some basic steps of pre-processing and segmentation, substituting instead a neural network that can learn the appropriate features independent of a user-driven feature candidature step. Convolutional networks are not trivial to work with, but the recent availability of higher level deep learning frameworks such as TensorFlow (1) and the use of Transfer Learning, a technique to re-use already trained convolutional neural networks, considerably reduces the skills needed to set up and operate such a network. In this study, we sought to demonstrate a novel application of machine vision techniques as a classification method for IMU data. The main goal of this work was to develop a novel data analysis pathway for researchers who are most interested in this type of work such as medical and exercise professionals. These individuals may not have the technical background to implement existing state of the art data analysis pathways. We also aimed to evaluate the efficacy of our new classification technique by attempting to detect 5 commonly completed lower limb exercises (squats, deadlifts, lunges, single leg squats and tuck jumps) using the new data analysis pathway. The accuracy, sensitivity and specificity of the pathway were compared to recently published work on the same data set. 241

11.1.1. Related Work The three main topics in this section are as follows: 1) a brief overview of the current HAR and ED literature; 2) an account of some of the newer advances in the field that are using neural networks for certain parts of the feature discovery and reduction process, 3) an introduction to Transfer Learning, highlighting its benefits in terms of time and resource savings, and working with smaller data sets. 11.1.1.0. Activity Classification for Inertial Sensor data Over the past 15 years, inertial sensors have become increasingly ubiquitous due to their presence in smartphones and wearable activity trackers (2). This has enabled countless applications in the monitoring of human activity and performance spanning applications in general human activity recognition (HAR), gait analysis, the military field, the medical field and exercise recognition and analysis (3)–(6). Across all these application spaces there are common challenges and steps which must be overcome and implemented to successfully create functional motion classification systems. Human activity recognition with wearable sensors usually pertains to the detection of gross motor movements such as walking, jogging, cycling, swimming and sleeping (5), (7). In this field of motion tracking with inertial sensors the key challenges are often considered to be: (a) the selection of the attributes to be measured, (b) the construction of a portable, unobtrusive, and inexpensive data acquisition system, (c) the design of feature extraction and inference methods, (d) the collection of data under realistic conditions, (e) the flexibility to support new users without the need for re-training the system, and (f) the implementation in mobile devices meeting energy and processing requirements (3), (7). With the ever-increasing computational power and battery life of mobile devices, many of these challenges are becoming easier to overcome. While system functionality is dependent on hardware constraints, the accuracy, sensitivity and specificity of HAR systems are most reliant on building large, balanced, labelled data sets, the identification of strong features for classification and the selection of the best machine learning method for each application (3), (8)–(10). Investigating the best features and machine learning methods for each HAR application requires an individual or team appropriately skilled in signal processing and machine learning, and a large amount of time. They must understand how to compute time-domain, frequency-domain and time-frequency domain features from inertial sensor data and train and evaluate multiple machine learning methods (e.g. random forests (11), support vector machines (12), K-Nearest-Neighbours (13), Logistic Regression (14)) with such 242

features (3-5). This means that those who may be most interested in the output of inertial sensor based activity recognition systems (e.g. medical professionals, exercise professionals, biomechanists) are unable to design and create the systems without significant engagement with machine learning experts (4). The above challenges in system design and implementation are replicated in activity recognition pertaining to more specific or acute movements. In the past decade there has been a vast amount of work in the detection and quantification of specific rehabilitation and strength and conditioning exercises (15)–(17). Such work has also endeavoured to detect aberrant exercise technique and specific mistakes that system users make while exercising which can increase their chance of injury or decrease their body's beneficial adaptation due to the stimulus of exercise (17), (18). The key steps in the development of such systems have been recently outlined as: (1) Inertial sensor data collection, (2) data pre-processing, (3) feature extraction and (4) classification (Figure 11.1) (4). Whilst the first step can generally be completed by exercise professionals (e.g. physiotherapists and strength and conditioning coaches), the remaining steps require skills outside that included in the training of such experts. Similarly when analysing gait with wearable sensors, feature extraction and classification have been highlighted as essential in the development of each application (19), (20). This again limits the type of professional who can create such systems and the rate at which hypotheses for new systems can be tested.

Figure 11.1: Steps involved in the development of an IMU based exercise classification system (4). 11.1.1.1. Neural Networks and Activity Recognition In the past few years, convolutional neural networks (CNNs) have been applied in a variety of manners to HAR, in both the fields of ambient and wearable sensing. Mo, Li and Zhu applied a novel approach utilising machine vision methods to recognise 12 daily living tasks with the Microsoft Kinect™. Rather than extract features from the Kinect data streams, they developed 144*48 images using 48 successive frames from skeleton data and 15*3 joint position 243

coordinates and 11*3*3 joint rotation matrices. These images were then used as input to a multilayer CNN which automatically extracted features from the images which were fed in to a multilayer perceptron for classification (21). Stefic and Patras utilised CNNs to extract areas of gaze fixation in raw image training data as participants watched videos of multiple activities (22). This produced strong results in identifying salient regions of images which were then used for action recognition. Ma et al. also combined a variety of CNNs to complete tasks such as segmenting hands and objects from first-person camera images and then using these segmented images and motion images to train an action based and motion based CNN (23). This novel use of CNNs allowed an increase in activity recognition rates of 6.6%, on average. These research efforts demonstrated the power of utilising CNNs in multiple ways for HAR. Research utilising CNNs for HAR with wearable inertial sensors has also been published recently. Zeng et al. implemented a method based on CNNs which captures the local dependency and scale invariance of an inertial sensor signal (24). This allows features for activity recognition to be identified automatically. The motivation for developing this method was the difficulties in identifying strong features for HAR. Yang et al. also highlighted the challenge and importance of identifying strong features for HAR (25). They also employed CNNs for feature learning from raw inertial sensor signals. The strength of CNNs in HAR was again demonstrated here as its use in this circumstance outperformed other HAR algorithms, on multiple data sets, which utilised heuristic hand-crafting of features or shallow learning architectures for feature learning. Radu et al. also recently demonstrated that the use of CNNs to identify discriminative features for HAR when using multiple sensor inputs from various smartphones and smartwatches, which have different sampling rates, data generation models and sensitivities, outperforms classic methods of identifying such features (26). The implementation of such feature learning techniques with CNNs is clearly beneficial but is complex and may not be suitable for HAR system developers without strong experience in machine learning and DSP. From a CNN perspective, these results are interesting and suggest significant scope for further exploration for machine learning researchers. However, for the purposes of this paper their inclusion is to both succinctly acknowledge that CNN has been applied to HAR previously and to distinguish the present approach which seeks to use well developed CNN platforms tailored for machine vision tasks in a transfer learning context for HAR recognition using basic time series as the only user created features.

244

11.1.1.2. Transfer Learning in Machine Vision Deep Learning based Machine Vision techniques are used in many disciplines, from speech, video, and audio processing (27), through to HAR (21) and cancer research (28). Training deep neural networks is a time consuming and resource intensive task, not only needing specialised hardware (GPU) but also large data sets of labelled data. Unlike other machine learning techniques, once the training work is completed, querying the resulting models in order to predict results on new data is fast. In addition, trained networks can be re-purposed for other specific uses which are not required to be known in advance of the initial training (29). This arises from the generalised vision capabilities that can emerge with suitable training. More precisely, each layer of the network learns a number of features from the input data, and that knowledge is refined through iterations. In fact, the learning that happens at different layers seems to be non-specific to the data set, including the identification of simple edges in the first few layers, the subsequent identification of boundaries and shapes, and growing towards object identification in the last few layers. These learned visual operators are applicable to other sets of data (30). Transfer learning then is the generic name given to a classification effort when a pretrained network is reused for a task for which it was not specifically trained for. Deep learning frameworks such as Caffe (31) and TensorFlow can make use of pre-trained networks, many of which have been made available by researchers in repositories such as the Caffe Model Zoo, available in their Github repository. Retraining requires not only a fraction of the time that a full training session would need (minutes/hours instead of weeks), but more importantly in many cases, allows for the use of much smaller datasets. An example of this is the Inception model provided by Google, whose engineers reportedly spent several weeks training on ImageNet (32) (a data set of over 14 million images in over 2 thousand categories), using multiple GPUs and the TensorFlow framework. In their example (33), they use in the order of 3,500 pictures of flowers in 5 different categories to retrain the generic model, producing a model with a fair accuracy rating on new data. In fact, during the re-training stage, the network is left almost intact. The final classifier is the only part that is fully replaced, and bottlenecks (the layer before the final one) are calculated to integrate the new training data into the already cognisant network. After that, the last layer is trained to work with the new classification categories. This happens in image batches of a size that can be adapted to the needs of the new data set (alongside other hyperparameters such as learning rate and training steps). 245

Each step of the training process outputs values for training accuracy, validation accuracy, and cross entropy. A large difference between training and validation accuracy can indicate potential overfitting of the data, which can be a problem especially with small data sets, while the cross entropy is a loss function that provides an indication of how the training is progressing (decreasing values are expected).

11.2. Methods Given the potential advantages of transfer learning in machine vision for the purposes of HAR, we next describe an exemplar study where we apply these ideas for the purposes of classifying exercise data from inertial sensors. This very specific example is sufficiently comprehensive in scale and scope to represent a typical use case for the approach which to reiterate, will use pretrained convolutional neural networks (CNNs) with one lightweight additional training step, to classify inertial sensor data based on images generated from the raw data (Figure 11.3). The level of DSP skills to perform this analysis will be shown to be much lower compared to other methods of classifying this type of data with other machine learning techniques that rely on engineered features (Figure 11.1). This section contains all the details required to replicate this approach, focusing on how the data was collected, and how our system was set up and used. 11.2.0. Data Collection 11.2.0.0. Participants 82 healthy volunteers aged 16-38 (59 males, 23 females, age: 24.68 ± 4.91 years, height: 1.75 ± 0.09 m, body mass: 76.01 ± 13.29 kg) were recruited for the study. Participants did not have a current or recent musculoskeletal injury that would impair performance of multi-joint lower-limb exercises. All participants had been completing each of the five exercises as part of their training regime for at least one year. The Human Research Ethics Committee at University College Dublin approved the study protocol and written informed consent was obtained from all participants before testing. In cases where participants were under the age of 18, written informed consent was also obtained from a parent or guardian. 11.2.0.1. Procedures The testing protocol was explained to participants upon their arrival at the laboratory. Following this they completed a ten-minute warm-up on an exercise bike (Lode B.V., Groningen, The 246

Netherlands) maintaining a power output of 100W at 75-85 revolutions per minute. Next an IMU (SHIMMER, Dublin, Ireland) was secured on the participant by a chartered physiotherapist at the spinous process of the 5th lumbar vertebra (Figure 11.2). The orientation and location of all the IMUs was consistent for all the study participants across all exercises.

Figure 11.2: IMU position: The spinous process of the 5th lumbar vertebra. A pilot study was used to determine an appropriate sampling rate and the ranges for the accelerometer and gyroscope on board the IMU. In the pilot study squat, lunge, deadlift, single leg squat and tuck jump data were collected at 512 samples/s. A Fourier transform was then used to determine signal and noise characteristics of the signal that were all found to be less than 20 Hz. Therefore, a sampling rate of 51.2 samples/s was deemed appropriate for this study based upon the Shannon sampling theorem and the Nyquist criterion (34). The Shimmer IMU was configured to stream tri-axial accelerometer (± 16 g) and gyroscope (±500 ˚/s) data with the sensor ranges chosen based upon data from the pilot study. Each IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application. After completion of their warm up, participants proceeded to do one set of 10 repetitions of bodyweight squats, barbell deadlifts at a load of 25kg, bodyweight lunges and bodyweight single 247

leg squats (Figure 3.2). A Chartered Physiotherapist demonstrated correct technique for each of the exercises. Participants familiarised themselves with each exercise and their technique was assessed to be correct by the physiotherapist. Correct technique for squats, lunges and deadlifts was defined using guidelines from the National Strength and Conditioning Association (35). Single leg squats were completed according to the scoring criteria outlined by Whatman et al (36). Finally, each participant completed the ten second tuck jump test while attempting to maintain good form throughout (37). 11.2.1. Preparation for Transfer Learning Based on the previous design for an IMU based exercise classification system (Figure 11.1), with this new method the feature extraction step is not needed (Figure 11.3), as the CNN will take care of automatically both training the model and discovering the features by itself. The segmentation process is directly followed by the classification task (training and/or inference). 11.2.1.0. CNN Infrastructure Working with convolutional networks is not a trivial task. Fortunately, since the advent of deep learning in the last few years, a number of frameworks such as TensorFlow and Caffe have appeared in the market and are readily available for researchers. Most of these frameworks are open source, supported by large companies and/or universities, and provide not only helper libraries for numerical computation and machine learning, but also a flexible architecture and the possibility to almost trivially use multiple CPU(s) and GPU(s) if available. The authors used TensorFlow for the particular results provided in this paper, but any other framework or higher level library would suffice. Installing TensorFlow can be cumbersome, but Google provides a Docker container (38) with all the components to run TensorFlow out of the box. Documentation and scripts are also provided to retrain (39) networks and query (40) the new classifier. The aforementioned Docker container and scripts were used in this paper with minimal modifications. The pre-processing and segmentation of inertial data to create the images that are fed into the CNN were prepared with MATLAB, as explained in the following section.

248

Figure 11.3: Depiction of the changes between traditional methods and the one presented in this paper, in particular steps 3 and 4. 249

11.2.1.1. Data Preparation Six signals were collected from the IMU; accelerometer x, y, z and gyroscope x, y, z. Data were analysed using MATLAB (2012, The MathWorks, Natwick, USA). To ensure the data analysed applied to each participant’s movement and in order to eliminate unwanted high-frequency noise, the six signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. The filtered signals were then programmatically segmented into epochs that relate to single, full repetitions of the completed exercises. Many algorithms are available to segment human motion during exercise. These include the sliding window algorithm, top-down, bottom-up algorithms, zero-velocity crossing algorithms, template-base matching methods, and combination algorithms of the above (4). These algorithms all have advantages and disadvantages. For the purpose of the creation of a functioning exercise detection classifier, a simple peak-detection algorithm was used on the gyroscope signal with the largest amplitude for each exercise. The start and end points of each repetition were found by looking for the corresponding zero-crossing points of the gyroscope signal leading up to and following the location of a peak in the signal. Example results of the segmentation algorithm used on the gyroscope x signal, from an IMU positioned on the spine during 3 repetitions of the deadlift exercise, are provided (Figure 3.3). Each extracted repetition of exercise data was resampled to a length of 250 samples. The six signals were then plotted using the MATLAB subplot function. The first sub-plot, gyroscope x (sagittal plane) was plotted between the y-axis range of  250 °/s. Sub-plots 2 and 3, gyroscope y and z (frontal and transverse plane) were plotted between the y-axis range of 100 °/s. Accelerometer x (sub-plot 4) was plotted in the y-axis range of 3 m/s2 and accelerometer y and z (sub-plots 5 and 6) were plotted in the range 15 m/s2. Axes labels and markers were programmatically hidden and the blank space between each sub-plot was minimised. Following this the graphs were saved as 470x470 JPEG files. Examples of the generated JPEG files are provided (Figure 11.4).

250

Figure 11.4: Samples of the generated plots (JPEG files) which were used as training and test data in this study. 11.2.1.2. Retraining and using the new Model Transfer learning is the main technique used in this paper. This reuses an already trained CNN for classification purposes. In this case, the framework TensorFlow was used, which provides access to a model called Inception trained on over 14 million images, and also provides example scripts to retrain the network, i.e. discarding the provided classifier and adjusting the values of the last layer of the network according to the new data provided. The re-training scripts expects to find the images in a particular folder (passed as a parameter) and layout (Figure 11.5), that is, a folder for each category that the new classifier will learn to identify, containing training pictures in jpg format. During training, the network will automatically identify the features to use in order to create the classifier.

Figure 11.5: Folders containing images for the five exercises (Bodyweight squat: SQ, bodyweight lunge: LUL, barbell deadlift: DL, single leg squat: SLSL, and tuck jump: TJ). There are a number of hyperparameters that can be changed depending on the new data used to retrain, such as the validation and training split of data to be used, the size of the batches to train on, or the learning rate applied (probably the most important of all for fine tuning and avoiding extra computation). The only parameter changed in this work was the number of steps, 251

from a default 4000 iterations to 96000 steps. This number provides high accuracy without showing signs of overfitting (see results section). The output of the training phase is simply two files, one with new weights (the retrained network) and a second file with labels for the data trained (the default names are retrained_graph.pb, and retrained_labels.txt). These two files are all that is needed to predict results coming from new data. The classifier can be queried with the classify_image script mentioned above. Retraining and querying are actions that can be performed in a multitude of ways, with different frameworks and in different configurations. This work is about making things accessible and available. The Docker container for Tensorflow, with the documentation and helper scripts were the simplest route the authors could find.

11.3. Results As mentioned in the previous section, each training batch outputs training and validation accuracy, and a cross entropy (loss function) amount, alongside with final validation accuracy. Rolled averages for those four values for training sessions of 96000 steps are shown (Figure 11.6).

Figure 11.6: Training (blue) and validation (green) accuracy during training phase, with final accuracy (orange), and Cross Entropy (red) for 96000 steps.

As observed, the cross entropy keeps falling steadily, and the average difference between training and testing is not very large, so overfitting is not an issue. Averaged over 5 runs of 252

training, the final accuracy result was a 95.9% for 96000 steps. (Figure 11.7) shows a confusion matrix for this method.

Figure 11.7: Confusion matrix for the machine vision based classification method. The following (Figure 11.8) is an illustration of a misclassified plot. Part a) of the image shows a typical lunge signal, while part c) shows a typical single leg squat signal. Part b) in the middle, shows an example of a lunge repetition misclassified as a single leg squat. The issue seems to be concentrated in the top part of the image. The most likely reason for the odd lunge signal shape is that the subject may have looked over their shoulder or twisted for some reason during the repetition, and the final result is confusing the classifier, as it would confuse an expert looking directly at the plot.

Figure 11.8: A lunge signal (a), a lunge signal misclassified as a single leg squat (b), and a single leg squat signal (c) for comparison. 253

These results are equivalent with a recently published method on the same data set whereby the accuracy was found to be 94.1% (17). Figure 11.9 shows the confusion matrix for this feature based classification effort, and as it can be seen, the results are similar. However, leave-onesubject-out-cross-validation was used in this instance so the results are not directly comparable.

Figure 11.9: Confusion matrix for the feature based classification method. The emphasis on this work, though, is in the ease of set up by using transfer learning, and the need of only basic digital processing skills to prepare the data, when compared to other methods in this area.

11.4. Discussion 11.4.0. Principal Results An analysis of the data collected with the method proposed obtained an average 95.9% classification accuracy, which is competitive with current state of the art techniques. This high level of accuracy indicates that the distinctive waveforms in the plots for each of the exercises can be generalised among different participants and the patterns created are appropriate for classification efforts. These results are coupled with the underlying recurrent theme for this work - to enable a more approachable entry path into the HAR and ED fields. To do so, high-level machine learning frameworks, coupled with a novel use of machine vision techniques, are used 254

in two main ways: first in order to avoid the complexity of manually crafted features only available through advanced DSP techniques, and second to facilitate dimensionality reduction by allowing the CNN to take care of both feature extraction and classification tasks. 11.4.1. Comparison with Prior Work The methodology employed and the results achieved in this paper can be directly compared to a recently published ED paper on the exact same data set (17). In this recently published work, identical filtering and segmentation methodologies were employed. However, a vast amount of additional signals and data processing were required to achieve classification with the lumbar worn IMU. As well as the 6 signals from the accelerometer and gyroscope used in this paper, 12 additional signals were used for classification. These were magnetometer x, y and z, magnitude of acceleration, magnitude of rotational velocity and the IMU’s 3-D orientation as represented by a rotation quaternion (W, X, Y and Z) and Euler angles (pitch, roll and yaw). 19 features were then computed from the segmented epochs of the 18 signals. These features were namely “Mean”, “RMS”, “Standard Deviation”, “Kurtosis”, “Median”, “Skewness”, “Range”, “Variance”, “Max”, “Index of Max”, “Min”, “Index of Min”, “Energy”, “25th Percentile”, “75th Percentile”, “Level Crossing Rate”, “Fractal Dimension” and the “variance of both the approximate and detailed wavelet coefficients using the Daubechies 4 mother wavelet to level 7”. This resulted in a total of 342 features per exercise repetition. These features and their associated exercise label were used to evaluate and train a random forests classifier with 400 trees. Following leave-onesubject-out-cross-validation an accuracy result of 94.64% was achieved with this method. This recent work also demonstrated the laborious process of identifying the most important features for classification, which can improve the efficiency of the reported technique used. Whilst the accuracy result achieved in this recent work (94.64%) is slightly less than that presented in this paper (95.9%), the results should not be directly compared. This is because the additional signals used in (17) and the different method of cross-validation utilised in both studies to compute accuracy mean it is not a perfectly like for like comparison. However, it can be stated that similar levels of accuracy have been achieved with both methods. Most importantly, the ease of implementation of the classification method presented here greatly exceeds that presented in (17). Most notably the need to use additional signals and derive many features from them has been eliminated. This minimises the signal processing and machine learning experience needed by the person investigating the possibility of creating a classifier. This is in line with the core objective of this paper. 255

11.4.2. Limitations Simplicity was of utmost importance when designing this novel classification method for accelerometer and gyroscope data. Subsequently, maximal possible accuracy may not have been achieved. Utilising a better understanding on how to parameterise the re-training effort, and other techniques such as fine tuning (a method to reuse certain parts of a pre-trained network instead of simply changing the last layer and classifier) could produce better results. A better understanding on how to deal with the type of data we are using could be beneficial. In general, machine vision work is plagued with issues such as partial occlusion, deformation, or viewpoint variation, which the data in this work does not suffer from. Due to that, and also to make the baseline of this work as simple as possible, no data augmentation or any kind of image processing techniques has been used. The results reported have been obtained only with resources from readily available frameworks, mostly on default settings. It should also be noted that the presented method of classifying inertial sensor data with machine vision techniques has only been evaluated on exemplar samples of exercises which were conducted in a laboratory setting. Results are of high accuracy and competitive with recent work on the same data set (17) and therefore act as a proof of concept for the method. However, the method has not yet been evaluated in classifying inertial sensor data arising from free-living activities and other HAR classification tasks. Future work should investigate the method’s efficacy in such areas. Of key importance will be to simplify each application’s pre-processing and segmentation of the inertial sensor data. 11.4.3. Conclusions This paper has described a novel application approach for the classification of inertial sensor data in the context of HAR. There are two stand-out benefits of the machine vision approach described. The first is the ease of setting up the infrastructure for the CNNs involved through the use of transfer learning. The second is the reduction in the depth of digital signal processing expertise required on the part of the investigator. Due to the many difficulties in creating inertial sensor based activity recognition systems, the authors believe there is a need for a system development path which is easier to use for people who lack significant background in signal processing and machine learning. In particular the new development pathway should eliminate the

most

difficult

tasks

conventionally

identified

with

this

area,

i.e.

feature

development/extraction and dimensionality reduction for the best machine learning method for each new application (Figure 11.1). The new development pathway, although eliminating these steps, does not compromise the attainment of high quality classification accuracy, sensitivity and 256

specificity which is currently achieved through their successful implementation by appropriate experts (Figure 11.3). The exemplar study described here illustrates that the method is very competitive in comparison to customised solutions. Either way, the new pathway, at the very least will allow for the easier testing of hypotheses relating to new inertial sensor based activity classification systems i.e. is the classification possible at all based on the collected data set? Ideally, it should also achieve equivalent or superior system accuracy, sensitivity, and specificity when compared to the existing system development pathway. Whilst the presented method does successfully eliminate the need for feature crafting and identification of optimal classification algorithms, it does not eliminate the process of signal preprocessing and signal segmentation before performing classification. Therefore, there remains some complexity in the process of achieving exercise classification when using the machine vision technique. However, the authors consider the process of filtering, segmenting and plotting inertial sensor signals considerably less complex than identifying and computing strong features and an optimal classification method for the classification of inertial sensor data. 11.4.4. Future Work Even though the current infrastructure used is readily available, certain skills such as familiarity with Docker or with Python data science stacks, and basic DSP skills are still needed. The creation of a full package that could be installed on the researcher's machine could be an avenue to explore. Also the pre-processing and segmentation steps to prepare the data could be simplified by providing a set of scripts. A number of professional machine vision companies exist in the market, and some provide online services that allow retraining of their custom models, and could also be used for this type of work, avoiding the need for setting up the CNN infrastructure locally. The availability of this technology on Android mobile devices is something that the authors are also pursuing. TensorFlow may provide some initial support in this area. Finally, although this paper emphasises the lack of a necessity to present features other than the basic time series, it is clear that augmentation with derived features presents further opportunities for performance tweaking. For researchers more comfortable with such feature development, this application avenue is worth exploring.

257

11.5. References 1.

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. 2016 Mar 14.

2.

Perez AJ, Labrador MA, Barbeau SJ. G-sense: a scalable architecture for global sensing and monitoring. IEEE Network. 2010 Jul;24(4).

3.

Lara OD, Labrador MA. A survey on human activity recognition using wearable sensors. IEEE Communications Surveys and Tutorials. 2013 Jul;15(3):1192-209.O. D. Lara and M. a. Labrador, “A Survey on Human Activity Recognition using Wearable Sensors,” IEEE Commun.

Surv.

Tutorials,

vol.

15,

no.

3,

pp.

1192–1209,

2013,

DOI:

10.1109/SURV.2012.110112.00192. 4.

Whelan D, O'Reilly M, Huang B, Giggins O, Kechadi T, Caulfield B. Leveraging IMU data for accurate exercise performance classification and musculoskeletal injury risk screening. In: Proceedings of the IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC); 2016. Aug 16; Orlando, F.L., U.S.A. N.Y., U.S.A.: IEEE; 2016. p. 659-62.

5.

Preece SJ, Goulermas JY, Kenney LP, Howard D, Meijer K, Crompton R. Activity identification using body-mounted sensors—a review of classification techniques. Physiological measurement. 2009 Apr 2;30(4):R1.

6.

Stoppa M, Chiolerio A. Wearable electronics and smart textiles: a critical review. Sensors. 2014 Jul 7;14(7):11957-92.

7.

Kim E, Helal S, Cook D. Human activity recognition and pattern discovery. IEEE Pervasive Computing. 2010 Jan;9(1).

8.

Chawla NV. Data mining for imbalanced datasets: An overview.

Data mining and

knowledge discovery handbook. N.Y.C., N.Y., U.S.A.: Springer; 2005. p. 853-67. 9.

He H, Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 2009;21(9):1263-84.

10.

Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques.

Emerging Artificial Intelligence Applications in Computer

Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies: IOS Press; 2007. p. 3-25. 258

11.

Breiman L. Random forests. Machine Learning. 2001;45(1):5-32.

12.

Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and their applications. 1998 Jul;13(4):18-28.

13.

Dudani SA. The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics. 1976 Apr(4):325-7.

14.

Bishop CM. Pattern recognition and machine learning. New York City, N.Y., U.S.A.: Springer; 2006.

15.

Giggins O, Kelly D, Caulfield B. Evaluating rehabilitation exercise performance using a single inertial measurement unit. In: Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare; 2013. May 5; Venice, Italy. N.Y., U.S.A.: IEEE; 2013. p.49-56.

16.

Patel S, Park H, Bonato P, Chan L, Rodgers M. A review of wearable sensors and systems with application in rehabilitation. Journal of neuroengineering and rehabilitation. 2012 Apr 20;9(1):21.

17.

O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield B. Technology in S&C: Tracking Lower-Limb Exercises With Wearable Sensors. The Journal of Strength & Conditioning Research. 2017 Jun 1;31(6):1726-36.

18.

Bassett SF. The assessment of patient adherence to physiotherapy rehabilitation. New Zealand journal of physiotherapy. 2003 Jul;31(2):60-6.

19.

Nyan MN, Tay FE, Seah KH, Sitoh YY. Classification of gait patterns in the time– frequency domain. Journal of biomechanics. 2006 Dec 31;39(14):2647-56.

20.

Tao W, Liu T, Zheng R, Feng H. Gait analysis using wearable sensors. Sensors. 2012 Feb 16;12(2):2255-83.

21.

Mo L, Li F, Zhu Y, Huang A. Human physical activity recognition based on computer vision with deep learning model. In: Proceedings of Instrumentation and Measurement Technology Conference (I2MTC); 2016. May 23; Taipei, Taiwan. N.Y., U.S.A.: IEEE; 2016. p. 1-6.

22.

Stefic D, Patras I. Action recognition using saliency learned from recorded human gaze. Image and Vision Computing. 2016 Aug 31;52:195-205.

23.

Ma M, Fan H, Kitani KM. Going deeper into first-person activity recognition. In: 259

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 1894-1903. Jun 27; Seattle, W.A., U.S.A. N.Y., U.S.A.: IEEE; 2016. p1894-1903. 24.

Zeng M, Nguyen LT, Yu B, Mengshoel OJ, Zhu J, Wu P, Zhang J. Convolutional neural networks for human activity recognition using mobile sensors. In: Proceedings of the 6th International Conference on Mobile Computing, Applications and Services (MobiCASE); 2014. Nov 6; Austin, T.X., U.S.A. N.Y., U.S.A.: IEEE; 2014. p. 197-205.

25.

Yang J, Nguyen MN, San PP, Li X, Krishnaswamy S. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence; (IJCAI); 2015. Jul 25; Buenos Aires, Argentina. N.Y., U.S.A.: ACM; 2014. p. 3995-4001.

26.

Radu V, Lane ND, Bhattacharya S, Mascolo C, Marina MK, Kawsar F. Towards multimodal deep learning for activity recognition on mobile devices. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp); 2016. Sep 12; Heidelberg, Germany. N.Y., U.S.A.: ACM; 2016. p. 185-188.

27.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015 May 28;521(7553):436-44..

28.

Cruz-Roa AA, Ovalle JE, Madabhushi A, Osorio FA. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. In: Proceedings of the 15th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI); 2013. Sep 22; Japan, Tokyo. Berlin, Germany: Springer Heidelberg; 2013. (pp. 403-410).

29.

Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S. CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014. Jun 23; Columbus, O.H., U.S.A.. N.Y., U.S.A.: IEEE; 2014. p. 806-813.

30.

Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks?. In: Proceedings of Advances in neural information processing systems (NIPS); 2014. Dec 8; Montreal, Canada. Cambridge, M.A., U.S.A.; MIT Press; 2014. p. 33203328.

31.

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia; 2014. Nov 3; Orlando, F.L., U.S.A. . N.Y., 260

U.S.A.: ACM; 2014. p. 675-678. 32.

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2009. Jun 20; Miami, F.L., U.S.A. .. N.Y., U.S.A.: IEEE; 2009. p. 248-255.

33.

Google. TensofFlow Image retraining example [Internet]. [cited 7 Feb 2017]. Available from: http://www.webcitation.org/6o6I1SMkw.

34.

Jerri AJ. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proceedings of the IEEE. 1977;65(11):1565-96.

35.

Baechle TR, Earle RW. Resistance Training Exercise Techniques. NSCA's Essentials of Personal Training. Champaign, I.L., U.S.A.: Human Kinetics; 2004.

36.

Whatman C, Hing W, Hume P. Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Physical Therapy in sport. 2012 May 31;13(2):87-96.

37.

Myer GD, Ford KR, Hewett TE. Tuck jump assessment for reducing anterior cruciate ligament injury risk. Athletic Therapy Today. 2008 Sep;13(5):39-44.

38.

Google. TensorFlow Docker container [Internet].[cited 7 Feb 2017]. Available from: https://hub.docker.com/r/tensorflow/tensorflow/.

39.

Google. TensorFlow Image retraining script [Internet]. [cited 7 Feb 2017]. Available from: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retrainin g/retrain.py.

40.

Google.“Image classification script [Internet]. [cited 7 Feb 2017]. Available from: http://www.webcitation.org/6o6IE43KZ

261

Section E: Thesis Conclusions

Chapter 12 Concluding Remarks

262

12.0. Introduction Compound lower limb exercises are generally evaluated with 3 distinct methods: (i) 3-D motion capture; (ii) visual analysis from a qualified exercise professional; (iii) selfassessment. Whilst 3-D motion capture provides highly objective and valid biomechanical data, its cost and the impracticality of system set up, prohibit its use for exercise analysis beyond research settings. Visual analysis from an exercise professional may be prohibitively expensive to many exercisers and is also subjective which results in issues regarding reliability and validity. Self-assessment is challenging for exercisers as it requires considerable knowledge of the exercises being completed. Moreover, exercisers may be prone to bias when assessing themselves. Therefore, there is a need for exercise assessment systems for compound lower limb exercises which are affordable, practical and validated. In this thesis, wearable IMUs were employed to develop and evaluate such systems.

12.1. Conclusions The first aim of this thesis was to complete a systematic review of literature regarding IMU systems for assessing lower limb exercises in the following areas: S&C, injury screening and musculoskeletal/orthopaedic rehabilitation. This review summarised and extracted evidence from multiple study types: measurement validation, exercise detection, movement classification, feedback evaluation and system evaluations with end users. The completed review (Chapter 2) highlighted that the majority of the work completed in this field to date pertains to validating system measurements. There was a clear sparsity of works which pertain to movement classification systems for compound lower limb exercises, particularly those which compare using comprehensive and minimal sensor set-ups. Moreover, the review found that there has been minimal research completed in evaluating the effectiveness of biofeedback from IMU systems which assess lower limb exercises. It also demonstrated that the research pertaining to evaluating systems with end users is also poorly represented. Consequently, the remainder of the work presented in this thesis aimed to address some of these lesser researched topics. The second aim of this thesis was to develop and compare IMU systems for automatically detecting five different compound lower limb exercises (lunges, BW squats, barbell deadlifts, single leg squats and tuck jumps). Of key interest was to establish the efficacy of the system in detecting exercise, when developed with IMUs, positioned at a variety of anatomical locations, used in combination and isolation. In Section B.1 it was found that a system developed with 5 IMUs could detect the exercises with over 99% accuracy. Interestingly, it

263

was also found that a system developed using data from a single IMU, positioned on the right shank could detect the exercises with 98% accuracy. No previous work had demonstrated the efficacy of IMU systems in detecting biomechanically similar, compound lower limb exercises or compared the efficacy of systems created with 5 IMUs, subsets of these IMUs and each of the individual IMUs. Moreover the accuracy achieved was superior to other studies which involved exercise detection with IMUs due to the data analysis pathways which were used. Following the strong results in exercise detection, this thesis investigated the efficacy of IMU based exercise classification systems in classifying technique quality in variety of lower limb compound exercises. In section B.2, it was found that multi-IMU systems, using global classifiers, could detect the exact deviations in BW lunges (Chapter 4) and BW squats (Chapter 5) with good accuracy. However, a single IMU set-up was ineffective in detecting exact deviations in the lunge exercise. This was in contrast to the right shank IMU system detecting deviations in the squat exercise with good accuracy. Binary classification of exercise technique as ‘acceptable’ or ‘aberrant’ was found to be highly accurate with a single IMU set-up for the BW squat and have good accuracy in the BW lunge. Chapters 4 and 5 were the first work to ever create technique classification systems for compound lower limb exercises. They demonstrated that accuracy levels, in technique assessment, achieved with different sensor set-ups may differ considerably for lower limb exercises. These systems had also been developed using technique deviations which were deliberately induced by study participants. Therefore, it was necessary to develop technique classifications systems for more compound lower limb exercises and use naturally occurring technique deviations to develop and evaluate these systems. In section B.3 IMU-based exercise classification systems, trained and evaluated with naturally occurring technique deviations, were developed and compared for single leg squats (Chapter 6), barbell deadlifts (Chapter 7) and barbell squats (Chapter 8). Overall, it was found that the global classification methodologies which were effective in exercise detection (Chapter 3) and in technique classification systems developed with large, balanced data sets of deliberately induced technique deviations (Section B.2) were predominantly ineffective in this scenario. Therefore, new classification strategies were employed in the data analysis pathways. In Chapter 6, it was found that creating a separate binary classifier for every technique deviation in a single leg squat was superior to multi-label classification methods used in Section B.2. It was also found that single-IMU set-up achieved equivalent accuracy to a multi-IMU set-up when data analysis was completed in this manner. In Chapters 7 and 8, personalised classification systems, created for each study participant, were shown to have far superior accuracy, sensitivity and specificity to global classification systems in detecting 264

naturally occurring deviations in both the barbell deadlift and barbell squat exercises. In both these chapters it was also shown that a single IMU set-up could achieve comparable accuracy to a multi-IMU set-up in the binary classification and multi-label classification of exercise technique. This work was the first to demonstrate that personalised data analysis pathways allow for effective technique classification, with a single-IMU set-up, when detecting naturally occurring deviations in compound lower limb exercises. Consequently, the decision was made to develop and evaluate a biofeedback system, consisting of a smartphone and a single IMU worn on the left thigh, which incorporates such personalised data analysis pathways to classify exercise technique in lower limb exercises. This system was entitled ‘Formulift’. Section C: Chapter 9 investigates the third objective of this thesis; to establish the usability, functionality and perceived impact of a prototype exercise classification feedback system with different types of real end users. To complete a comprehensive evaluation of the ‘Formulift’ system three types of end users (beginner gym-goers, experienced gym-goers and S&C coaches) were recruited as study participants. IMU data was collected from each participant in order to develop their personalised classifications system.

They then

completed a workout involving squats, deadlifts, BW lunges and single leg squats whilst receiving the system’s feedback and guidance. The system was then evaluated both qualitatively (semi-structured interviews) and quantitatively (questionnaires). Qualitative and quantitative analysis found the system has ‘good’ to ‘excellent’ usability.

The system

achieved a mean ± S.D. SUS usability score of 79.2 ± 8.8. Functionality was also deemed to be good with many users reporting positively on the systems rep counting, technique classification and feedback. A number of bugs were found and other suggested changes to the system were also made. The overall subjective quality of the app was good with a median star rating of 4/5 (IQR: 3-5). Participants also reported that the system would aid their technique, provide motivation, reassure them and help them avoid injury. Chapter 9 demonstrated an overall positive evaluation of ‘Formulift’ in the categories of usability, functionality, perceived impact and subjective quality. Users also suggested a number of changes for future iterations of the system. These findings are the first of their kind and show great promise for wearable sensor based exercise biofeedback systems. Whilst the work presented in Section C shows positive user perceptions of personalised lower limb exercise classification systems, it cannot be denied that the necessity of data collection and labelling prior to creating a personalised system for each user, creates additional overhead in comparison to global classification systems. In Section D, preliminary investigations were undertaken to investigate ways in which this issue can be ameliorated. In Chapter 10, a tablet app was developed and evaluated which allows efficient IMU data 265

capture, data labelling via synchronised video and automated personalised classifier creation. This tablet app was shown to be able to create highly accurate personalised technique classification systems in a more automated and efficient manner than ever previously possible. If adopted by exercise professionals, it could allow for far larger data sets to be collected which could result in a considerable improvement in the efficacy of global classification systems for assessing compound lower limb exercises. In Chapter 11, a new feature-free global classification method was developed which incorporates deep learning. An initial evaluation of the method showed it has strong potential to provide increased accuracy, sensitivity and specificity in comparison to the feature-based global exercise classification systems. It is also the author’s contention that this machine vision method is easier to implement for non-experts in exercise classification. Section D therefore addresses the final objective of this thesis; to investigate novel methods in creating exercise classifications systems which may further improve system accuracy, efficiency, practicality and usability.

12.2 Future Directions Whilst this thesis has made significant contribution to the field of exercise classification with IMUs, it has also lead to new research questions which could be investigated as future work. IMU-based exercise biofeedback systems may be improved through additional features. This thesis predominantly focused on the assessment of one’s exercise technique. Systems could also provide additional feedback to exercisers. This could include the addition of performance measures during exercise such as motion tempo, velocity and power. The manner in which feedback is provided to users could also be investigated. The ‘Formulift’ system provided simple visual and vibrotactile feedback to exercisers on their technique during compound lower limb exercises. This feedback could perhaps be made more engaging, understandable or informative to exercisers through the addition of aural feedback or more interactive visual feedback. For specific applications exergaming could also be considered. The effect of the timing of feedback i.e. real-time or following completion of an exercise could also be investigated. Optimising feedback mechanisms could have positive influence on users’ experiences when exercising with an IMU-based exercise biofeedback system and further improve their exercise quality. For the majority of systems which were created and evaluated in this thesis, multi-label classification (detection of exact technique deviations) produced inferior accuracy results to binary classification (detection of acceptable or aberrant technique). As suggested by some participants in Chapter 9, more accurate multi-label classification may allow for an improved

266

user experience when using biomechanical exercise biofeedback systems. This is because it would allow the user to know exactly what movement inefficiency they are making whilst completing an exercise and hence allow for more targeted and specific feedback. It is the author’s belief that multi-label classification results can be improved in future research through (a) the collection of vastly larger data sets of IMU exercise data, using tools, such as that presented in Chapter 10, and, (b) the implementation of improved data analysis pathways, such as those presented in Chapter 11. This could ultimately eliminate the data collection and labelling task which is currently required to develop a personalised system for each system user (Chapters 9 and 10). Finally, completing long-term studies with real users of the systems described in this thesis could produce considerable new knowledge on their objective effect (i.e. how they influence exercise technique, exercise programme outcome measures and exercise adherence) and how user’s perceptions of the systems change over time. A randomised control trial could for instance be designed to measure the influence of system use on metrics such as technique, strength, hypertrophy and injury rates. This would be highly beneficial information to this field of research.

12.3. Closing Statement This thesis concerned the development and evaluation of exercise classification systems for compound lower limb exercises. Systems using IMUs for motion tracking were proposed as they are relatively low cost, portable and practical for system users. All aims of the thesis were underpinned by the need to create systems which were (a) algorithmically accurate and efficient, and (b) practical and usable for end-users. As highlighted in section 12.1 all aims of the thesis were effectively investigated, including the development and evaluation of a realword IMU-based exercise classification system, ‘Formulift’, which was experimentally evaluated to be accurate, efficient, practical and usable. Additionally, preliminary work has been completed to investigate how future exercise classifications systems for compound lower limb exercises may be made more efficiently and have improved accuracy. Systems such as ‘Formulift’ may aid exercisers to complete their exercise more safely and effectively and motivate exercisers to better adhere to their exercise programmes.

267

Appendices

Appendix A

Supplementary Documents

268

A.1. Chapter 9 – List of Tasks Before Exercise Analysis Session Open the ‘Formulift’ app. Find and view information about how to position and wear the sensor. Find and view information on how to use the app to analyse your exercise technique. Navigate through app to the various screens detailed in the instructions. Find and view the information videos about the four following exercises: Squat, Lunge, Deadlift and Single Leg Squat. During Exercise Analysis Session Open the ‘Formulift’ app. Input the messages you would like to receive when you complete an exercise with good/bad form/technique. Connect the smartphone to the Bluetooth sensor. Navigate to the exercise analysis section of the app. Select the ‘squat’ exercise and input the weight you are going to lift. Press the ‘Start Workout’ button when you are in position to and ready to start a set. When you are ready complete the following sequence of exercises, feel free to view the ‘review tab’ or ‘info tab’ between sets of exercises. Exercise

Reps

Squats

10

Squats Left leg forward lunges Left leg forward lunges

10

Deadlift

10

Deadlift

10

Left leg single leg squats

10

Weight (kg)

Rest time (s)

10

Technique Style Best possible form Best possible form

10 Best possible form

Best possible form

View the review tab and the information available there. Complete any other tasks within the app and when happy, close the app. 269

A.2. Chapter 9 – Interview Guide Context  Q 1. How experienced are you in the exercises you completed today? o 

Q 1.1. How many years have you done for them?

Q 2. How technologically proficient are you? o

Q 2.1. Are you usually an iPhone or an Android user?

o

Q 2.3. Do you use other health and fitness apps?

Overall experience  Q 3. What did you think of the app? o

Q 3.1. Why?

o

Q 3.2. How did formulift compare to other health and fitness apps you use?

Usability  Q 4. How did you find completing tasks while using the app? 

Q 5. What did you think of navigation/scrolling/colour/font size/language?

Functionality  Q 6. Did you think there were any bugs in the app? 

Q 7. Do you think the system works?

Perceived Impact  Q 8. What do you think the benefits or disadvantages to using the app are? o

Q 8.1. Why?

Closing remarks  Q 9. Is there anything else you would like to say about the app? 

Q 10. What other things would you like to see in future versions of the app?

270

A.3. Chapter 9 – System Usability Scale (SUS) © Digital Equipment Corporation, 1986.

Strongly disagree 1. I think that I would like to use this system frequently

Strongly agree

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

2. I found the system unnecessarily complex

3. I thought the system was easy to use

4. I think that I would need the support of a technical person to be able to use this system

5. I found the various functions in this system were well integrated

6. I thought there was too much inconsistency in this system

7. I would imagine that most people would learn to use this system very quickly 8. I found the system very cumbersome to use

9. I felt very confident using the system

10. I needed to learn a lot of things before I could get going with this system

271

A.4. Chapter 9 – User Mobile Application Rating Scale (uMARS)

272

273

274

275

276

277

Appendix B

Research Ethics Documents

278

B.1. Ethical Approval Letter – Study 1

279

280

B.2. Participant Information Leaflet – Study 1

281

282

B.3. Informed Consent Form – Study 1

283

B.4. Ethical Approval Letter – Study 2

284

285

B.5. Participant Information Leaflet – Study 2

286

287

288

B.6. Informed Consent Form – Study 2

289

B.7. Par-Q Form – Studies 1 and 2

290

291

Appendix C

Journal Publications

292

TECHNICAL REPORT

TECHNOLOGY IN S&C: TRACKING LOWER-LIMB EXERCISES WITH WEARABLE SENSORS MARTIN A. O’REILLY,1,2,* DARRAGH F. WHELAN,1,2,* TOMAS E. WARD,3 EAMONN DELAHUNT,2 BRIAN CAULFIELD1,2

AND

1

Insight Centre for Data Analytics, University College Dublin, Ireland; 2School of Public Health, Physiotherapy and Sports Science, University College Dublin, Ireland; and 3Insight Centre for Data Analytics, Maynooth University, Ireland ABSTRACT

INTRODUCTION

O’Reilly, MA, Whelan, DF, Ward, TE, Delahunt, E, and Caulfield, B. Technology in S&C: tracking lower-limb exercises with wearable sensors. J Strength Cond Res XX(X): 000– 000, 2017—Strength and conditioning (S&C) coaches offer expert guidance to help those they work with achieve their personal fitness goals. However, because of cost and availability issues, individuals are often left training without expert supervision. Recent developments in inertial measurement units (IMUs) and mobile computing platforms have allowed for the possibility of unobtrusive motion tracking systems and the provision of real-time individualized feedback regarding exercise performance. These systems could enable S&C coaches to remotely monitor sessions and help gym users record workouts. One component of these IMU systems is the ability to identify the exercises completed. In this study, IMUs were positioned on the lumbar spine, thighs, and shanks on 82 healthy participants. Participants completed 10 repetitions of the squat, lunge, single-leg squat, deadlift, and tuck jump with acceptable form. Descriptive features were extracted from the IMU signals for each repetition of each exercise, and these were used to train an exercise classifier. The exercises were detected with 99% accuracy when using signals from all 5 IMUs, 99% when using signals from the thigh and lumbar IMUs and 98% with just a single IMU on the shank. These results indicate that a single IMU can accurately distinguish between 5 common multijoint exercises.

KEY WORDS inertial measurement units, compound exercise, exercise biofeedback, exercise classification

*First two authors are joint lead authors. Address correspondence to Martin O’Reilly, [email protected] or Darragh Whelan, [email protected]. 00(00)/1–11 Journal of Strength and Conditioning Research  2017 National Strength and Conditioning Association

R

esistance exercise is an important component of any balanced exercise program (25). It can lower blood pressure, improve glucose metabolism, and reduce cardiovascular disease risk (28). Athletes also partake in resistance exercise to improve sporting performance (2) and reduce the risk of musculoskeletal injuries (12). Strength and conditioning (S&C) coaches offer expert guidance, monitoring, and motivation during resistance training. However, many people train without this support because of financial and availability issues (39). Furthermore, monitoring multiple athletes within a team setting is difficult and time consuming for S&C coaches. It has been shown that exercising without this guidance has a significant impact on exercise adherence and technique (19), which indicates that tracking exercise routines is beneficial for both coaches and gym users. Recent technological advances have supported the use of inertial measurement units (IMUs) to record exercise sessions. Inertial measurement units are small, inexpensive sensors that consist of accelerometers, gyroscopes, and magnetometers. They are able to acquire data pertaining to the linear and angular motion of individual limb segments and the centre of mass of the body (8). With appropriate signal processing, this allows for the quantification of human performance in a wide variety of fields such as measuring energy expenditure and analyzing gait (18,35). In this article, the term IMU system is used to describe the IMU device and its signals, the associated signal processing applied to them and the output of the exercise classification algorithms. These IMU systems can robustly handle the variety of postures and environmental complexity associated with weight training unlike camerabased motion analysis systems, which are hampered by location, occlusion, and lighting issues in such settings (25). Therefore, systems developed using IMU data may offer the potential for gym users and coaches to track training progress. Using IMUs for exercise monitoring is becoming increasingly popular, particularly in cardiovascular training and physical activity monitoring (25). Commercially available VOLUME 00 | NUMBER 00 | MONTH 2017 |

Copyright ª 2017 National Strength and Conditioning Association

1

293

Lower-limb Exercises & Wearable Sensors products that use IMUs for activity tracking include Fitbits and Jawbones. These devices have the potential to improve cardiovascular exercise adherence and are used in public health interventions (21). To date, the use of IMUs in the S&C arena is less common. Exercise progress is often tracked through the use of paper-based or computerized logbooks such as JeFit. This manual input of data can prove cumbersome, lead to a risk of recall bias, and can decrease training motivation (31,34). IMU systems may aid in realtime exercise session recording, allowing coaches and gym users to adjust training programs during workout routines. Data can also be pushed to a network storage location, e.g., through cloud-based services, meaning immediate access to workout logs on a range of mobile devices. Storing data on a cloud service could also allow coaches to provide feedback from a separate location asynchronously. This is especially pertinent for S&C coaches who are not able to access clients for extended periods. Furthermore, an IMU system could allow for the collection of exercise data over an extended period. The analysis of such data can facilitate coaches in optimizing training intensity by monitoring training frequency, exercise adherence and sets completed. This is vital for ensuring that goals are obtained and motivation is maintained (32). These advantages have led to a number of researchers investigating the ability of IMUs to identify exercises and track repetitions automatically. Chang et al. (10) used a smartphone and 2 IMUs, located in a weightlifting glove and on the hip, to differentiate between 9 upper- and lowerlimb exercises. The authors were able to identify exercises with 90% overall recognition accuracy. Seeger et al. (37) used 2 IMUs placed in a weightlifting glove and on the torso to differentiate between 16 gym exercises consisting of 5 cardiovascular exercises (e.g., running, rowing) and 11 upperand lower-limb weightlifting exercises using machines (e.g., lat pull-down, cable triceps extensions) and free weights (e.g., dumbbell lateral raise, barbell curl). Classification accuracy ranged from 71 to 100% for the 16 activities (37). However, the cross-validation method used by these authors is not stated making it difficult to ascertain the quality of their system in a real-world environment. This is because results may have been affected by biasing classifiers with training and test data from the same participants (38). Pernek et al. (31) used a single IMU to assess exercise performance. The IMU was placed on different body positions or an exercise machine depending on the activity completed. This system was able to count repetitions with an overall accuracy of 99%. The users chose which exercise they would be completing, so no activity recognition accuracy was required. Further work by Pernek et al. (32) showed the ability of their system to differentiate between 6 upperlimb exercises with 86% accuracy using 5 IMUs and 84% accuracy with a single IMU. Muehlbauer et al. (26) used an IMU built into a smartphone to differentiate between 10 upper-limb exercises with an overall accuracy of 94%.

2

the

Giggins et al. (14) also used a single IMU at different locations on the lower limb to differentiate between 7 early stage rehabilitation exercises with accuracies ranging between 93 and 95%. In summary, IMU-based systems are becoming increasingly prevalent in exercise tracking and S&C. An IMU system that can automatically recognize exercises could allow coaches and gym users to log and review their workout sessions in a more automated fashion than current practice allows. In this article, we investigate whether a system developed from IMU-derived data can distinguish between 5 commonly completed lower-limb exercises. This would prove a foundation for more comprehensive exercise classification systems. Furthermore, we aim to achieve this using an unobtrusive IMU setup and by leveraging cheap and ubiquitous sensor technology to ensure that the system is affordable for coaches and gym users.

METHODS Experimental Approach to the Problem

This study used an opportunistic approach to the development of a wearable IMU system for automatically detecting complex lower-limb exercises. Participants were equipped with wearable IMUs (SHIMMER, Shimmer Research, Dublin, Ireland) on the lumbar spine, both thighs, and both shanks and completed one set of 10 repetitions of the following exercises: (a) bodyweight squats; (b) barbell deadlifts; (c) single-leg squats; (d) and

Figure 1. The 5 IMU positions: (1) spinous process of the fifth lumbar vertebra, (2 and 3) mid-point of both femurs on the lateral surface (determined as halfway between the greater trochanter and lateral femoral condyle), (4 and 5) and on both shanks 2 cm above the lateral malleolus.

TM

Journal of Strength and Conditioning Research

Copyright ª 2017 National Strength and Conditioning Association

294

the

TM

Journal of Strength and Conditioning Research

| www.nsca.com

injury that would impair performance of multijoint lowerlimb exercises. All participants had been completing each of the 5 exercises as part of their training regime for at least 1 year. The Human Research Ethics Committee at University College Dublin approved the study protocol and written informed consent was obtained from all participants before testing. In cases in which participants were under the age of 18, written informed consent was also obtained from a parent or guardian. Procedures

The testing protocol was explained to participants upon their arrival at the laboratory. After this, they completed a 10-minute warm-up on an exercise bike (Lode B. V., Groningen, the Netherlands) maintaining a power output of 100 W at 75–85 revolutions per minute. Next, IMUs were secured on a participant by a chartered physiotherapist at the following 5 locations: spinous process of the fifth lumbar vertebra, mid-point of both femurs (determined as halfway between the greater Figure 2. Image showing the five exercises completed for this study: Bodyweight squat (upper left), bodyweight lunge (upper right), barbell deadlift (middle left), single leg squat (middle right) and tuck jump (bottom). trochanter and lateral femoral condyle), and on both shanks 2 cm above the lateral malleobodyweight lunges. They also completed 10 seconds of the lus (Figure 1). The orientation and location of all the IMUs were consistent for all the study participants across all tuck jump exercise (27). The same researcher completed exercises. IMU placement for all participants using a standardized A pilot study was used to determine an appropriate and repeatable protocol. After data collection, a total of sampling rate and the ranges for the accelerometer and 342 variables were extracted from the IMU signals for gyroscope on board the IMUs. In the pilot study, squat, every exercise repetition from each IMU. These variables lunge, deadlift, single-leg squat, and tuck jump data were were used to develop and evaluate the quality of an automated exercise detection system for lower-limb exercises. collected at 512 samples/s. A Fourier transform was then This was done for each individual IMU and combinations used to determine signal and noise characteristics of the signal that were all found to be less than 20 Hz. Therefore, of multiple IMUs. a sampling rate of 51.2 samples/s was deemed appropriate for this study based on the Shannon sampling theorem and Subjects the Nyquist criterion (16). The Shimmer IMU was configEighty-two healthy volunteers aged 16–38 (59 males and 23 ured to stream triaxial accelerometer (616 g), gyroscope females, age: 24.68 6 4.91 years, height: 1.75 6 0.09 m, body (6500 8/s), and magnetometer (61 Ga) data with the sensor mass: 76.01 6 13.29 kg) were recruited for the study. Participants did not have a current or recent musculoskeletal ranges chosen based on data from the pilot study. Each IMU VOLUME 00 | NUMBER 00 | MONTH 2017 |

Copyright ª 2017 National Strength and Conditioning Association

3

295

Lower-limb Exercises & Wearable Sensors

Figure 3. Plot showing detection of peak, start, and end points of exercise repetitions through identifying neighboring zero-crossing values to the peak locations. The signal shown is the gyroscope Z signal from the left thigh, during 3 repetitions of a deadlift.

was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application (8). After the warm-up, participants completed one set of 10 repetitions of the following exercises: bodyweight squats, barbell deadlifts at a load of 25 kg, bodyweight lunges, and single-leg squats (Figure 2). The correct technique for each

exercise was demonstrated and participants were allowed to familiarize themselves by completing practice repetitions of the upcoming movement. The bodyweight squats, barbell deadlifts, and bodyweight lunges were completed in accordance with the guidelines described by the National Strength and Conditioning Association (5). Single-leg squats were completed with participant’s best possible form according to the scoring criteria outlined by Whatman et al. (40). This involved maintaining their trunk and pelvis in a neutral position, keeping their patella in line with the second toe, preventing their foot from moving into excessive pronation, and keeping the movement throughout the range as smooth as possible. Their right leg was extended in front of them, and they flexed their left knee to between 608 and 908. The final exercise completed by all participants was the tuck jump exercise (27). Each participant completed as many tuck jumps as possible in 10 seconds while attempting to maintain good form throughout. Participants were allowed a familiarization set before recording data. Statistical Analyses

Figure 4. Diagram linking the number of IMUs, number of recorded and derived signals, number of features extracted, and the variety of feature combinations used to test classifiers.

4

the

Nine signals were collected from each IMU; accelerometer x, y, z, gyroscope x, y, z, and magnetometer x, y, z. Data were analyzed using MATLAB (2012, MathWorks, Natwick, MA, USA). To ensure that the data analyzed applied to each participant’s movement and to eliminate unwanted high-frequency noise, the 9 signals were low-pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Nine additional signals were then calculated. The 3-D

TM

Journal of Strength and Conditioning Research

Copyright ª 2017 National Strength and Conditioning Association

296

the

TM

Journal of Strength and Conditioning Research

| www.nsca.com

TABLE 1. IMU combinations compared and the number of features used for classification. Multiple IMUs

No. features

All 5 IMUs Lumbar and shanks Lumbar and thighs Both shanks Both thighs

1,710 1,026 1,026 684 684

(5 (3 (3 (2 (2

3 3 3 3 3

TABLE 2. Overall accuracy for each combination of IMUs. Total accuracy (%)

All 5 IMUs Lumbar and shanks Lumbar and thighs Both shanks Both thighs Left shank Left thigh Lumbar Right thigh Right shank

98.68 98.70 98.24 98.45 98.28 97.70 96.41 94.64 97.37 98.18

No. features

Left shank Left thigh Lumbar Right thigh Right shank

342 342 342 342 342

342) 342) 342) 342) 342)

orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al. (22). The resulting W, X, Y, and Z quaternion values were also converted to pitch, roll, and yaw signals. The pitch, roll, and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal, and transverse planes, respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y, z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the lumbar spine and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y, z. The IMU signals were then programmatically segmented in to epochs that relate to single, full repetitions of the completed exercises. Many algorithms are available to segment human motion during exercise. These include the sliding window algorithm, top-down, bottom-up algorithms, zero-velocity crossing algorithms, template-base matching methods, and combination algorithms of the above (41). These algorithms all have advantages and disadvantages. For the purpose of the creation of a functioning

IMU(s)

Individual IMUs

exercise detection classifier, a simple peak-detection algorithm was used on the gyroscope signal with the largest amplitude for any particular exercise. The start and end points of each repetition were found by looking for the corresponding zero-crossing points of the gyroscope signal leading up to and following the location of a peak in the signal. Figure 3 demonstrates example results of the segmentation algorithm used on the gyroscope Z signal, from an IMU positioned on the left thigh during 3 repetitions of the deadlift exercise. Each extracted repetition of exercise data was resampled to a length of 250 samples; this was undertaken to minimize the influence of the speed of repetition performance on signal feature calculations. It also ensured the computed features related to differences in movement patterns and not the participant’s exercise tempo. Time-domain and frequency-domain descriptive features were computed to describe the pattern of each of the 18 signals when the 5 different exercises were completed. These features were namely “Mean”, “RMS”, “Standard Deviation”, “Kurtosis”, “Median”, “Skewness”, “Range”, “Variance”, “Max”, “Index of Max”, “Min”, “Index of Min”, “Energy”, “25th Percentile”,

TABLE 3. Overall sensitivity and specificity for each combination of IMUs. IMU(s) All 5 IMUs Lumbar and shanks Lumbar and thighs Both shanks Both thighs Left shank Left thigh Lumbar Right thigh Right shank

Sensitivity (%) 6 SD 98.66 98.66 98.26 98.42 98.31 97.74 96.47 94.58 97.41 98.18

6 6 6 6 6 6 6 6 6 6

1.37 1.09 2.56 1.32 2.48 2.24 3.90 5.71 2.98 1.02

Specificity (%) 6 SD 99.67 99.67 99.56 99.61 99.57 99.42 99.09 98.67 99.34 99.54

6 6 6 6 6 6 6 6 6 6

0.35 0.38 0.30 0.41 0.29 0.41 0.57 1.30 0.51 0.35

VOLUME 00 | NUMBER 00 | MONTH 2017 |

Copyright ª 2017 National Strength and Conditioning Association

5

297

Lower-limb Exercises & Wearable Sensors

Figure 5. Heat map confusion matrix for actual exercise versus predicted exercise for all 5 IMUs.

“75th Percentile”, “Level Crossing Rate”, “Fractal Dimension” (17) and the “variance of both the approximate and detailed wavelet coefficients using the Daubechies 4 mother wavelet to level 7” (1). This resulted in 19 features for each of the 18 available signals producing a total of 342 features per IMU. Figure 4 summarizes the aforementioned, whereby 5 IMUs recorded 9 signals each, 9 more signals were derived from these resulting in a total of 18 signals per IMU. Nineteen features were computed per exercise repetition for each signal from each IMU. This resulted in a total of 1710 features (342 per IMU, 19 per signal). These features were then used to develop and evaluate a variety of classifiers as described in Figure 4. The random forests method was used to perform classification (7). This technique was chosen as it has been shown to be produce superior accuracy, sensitivity, and specificity scores

Figure 6. Heat map confusion matrix for actual exercise versus predicted exercise for right shank IMU.

6

the

in analyzing exercise technique with IMUs in comparison with the Naı¨ve-Bayes and Radial-basis function network techniques (23). Four hundred decision trees were used in each random forest classifier. Classifiers were developed and evaluated for the 10 combinations of IMUs as shown in Table 1. The quality of the exercise classification system was established using leave-one-subject-out cross-validation (LOSOCV) and the random forest classifier with 400 trees (13). Each participant’s data correspond to one fold of the crossvalidation. At each fold, one participant’s data are held out as test data while the random forest classifier is trained with all other participants’ data. The held out data are used to assess the classifier’s ability to correctly categorize unseen data. The use of LOSOCV ensures that there is no biasing of the classifiers, meaning that the test subject data are completely unseen by the classifier before testing. Previous research by Taylor et al. (38) has shown that not using this method of testing can skew results significantly. In our system, each individual repetition was classified. For each set of repetitions, the mode-predicted value was then given to each individual repetition. Therefore, in a situation in which a set of 10 repetitions of a squat were classified as 8 squats, 1 lunge, and 1 single-leg squat, all 10 repetitions were labeled as squat. The scores used to measure the quality of classification were total accuracy, average sensitivity, and average specificity. Accuracy is the number of correctly classified repetitions of all the exercises divided by the total number of repetitions completed; this is calculated as the sum of the true positives (TP) and true negatives (TN) divided by the sum of the true positives, false positives (FP), true negatives, and false negatives (FN):

Accuracy ¼

TP þ TN TP þ FP þ TN þ FN

For every IMU combination evaluated, the sensitivity and specificity were calculated for each of the 5 exercises, sequentially treating each label as the “positive” class, and then the mean and SD across the 5 values was taken. Sensitivity and specificity were computed using the formulas below:

Sensitivity ¼

TP TP þ FN

Specificity ¼

TN TN þ FP

Sensitivity measures the effectiveness of a classifier at identifying a desired label, whereas specificity measures the classifier’s ability to detect negative labels. Upon establishing the most effective IMU for exercise classification, feature importance was calculated to reduce model complexity and develop an understanding of which of the aforementioned features are most

TM

Journal of Strength and Conditioning Research

Copyright ª 2017 National Strength and Conditioning Association

298

the

TM

Journal of Strength and Conditioning Research

| www.nsca.com

ate” result, and anything less than 59% was deemed a poor result. The authors chose these values after reviewing existing literature on identifying exercises with IMUs. In reviewing such literature, an existing accepted standard for a good, moderate, or poor classifier could not be found (10,14,26,31,32,37). Therefore, the above system was agreed on by the authors to facilitate interpretation of our range of results.

RESULTS Table 2 demonstrates the total accuracy for each individual IMU and the various combinations of multiple IMUs in identifying the exercise completed. Five IMUs are able to Figure 7. Graph showing accuracy, sensitivity, and specificity of exercise classification with the right shank IMU classify the exercise completed using 1–324 of the available features, ranked in the order of importance. with 98.7% accuracy. The best 3-IMU combination is the lumbar and both shanks. This IMU set classifies the exercises with 98.7% accuracy. An valuable for correct classification outcomes. This in turn IMU placed on each shank (i.e., a 2-IMU setup) classifies could lead to a more efficient end-user application, each exercise with 98.5% accuracy. The best single IMU increasing the rate of feedback to users and improving position for classification is the right shank with 98.2% the computational efficiency of the system. We used the accuracy. method described by Liaw and Wiener (20) to establish Table 3 shows the average sensitivity and specificity for variable importance. Benchmark accuracy was estabthe individual and multiple IMU combinations in correctly lished through using all of the computed features to train identifying the 5 exercises being completed. The 5-IMU and test a random forest classifier as previously described. combination had an average sensitivity and specificity of The process was to then permute the values of each fea98.7 and 99.7%, respectively. The best single IMU position ture and measure how much the permutation decreases was the right shank with an average sensitivity of 98.2% the accuracy of the model. For unimportant features, the and average specificity of 99.5%. permutation should have little to no effect on model Figures 5 and 6 are classification confusion matrices for accuracy, whereas permuting important features should differing combinations of IMUs. The horizontal rows repsignificantly decrease it. After permutation of each fearesent the actual exercise from which a repetition being ture, features were ranked based on their importance. tested comes, and the vertical columns demonstrate the Random forest classifiers were then trained and evaluated classifier’s predicted label. For instance, the fourth row in using just the top-ranked feature, then the top 2 ranked Figure 5 highlights that for all the single-leg squat repetifeatures, and so on until all 342 features were used. Accutions classified, 96.8% were correctly identified as single-leg racy, sensitivity, and specificity were plotted against the squat repetitions, 1.6% were misclassified as bodyweight number of features used for classification. This plot alsquats, and 1.6% were misclassified as deadlifts. In Figure lowed the identification of the number of top-ranked fea6, it can be seen that there is slightly more misclassification tures, which achieved classification quality comparable to that of using all features. when the system uses features derived from just the right In reviewing the accuracy, sensitivity, and specificity shank IMU. scores produced by each classifier, 90% or over was Figure 7 shows the classification accuracy, sensitivity, and specificity using incrementing numbers of top-ranked feaconsidered an excellent result, 80–89% was considered a “good” quality result, 60–79% was considered a “modertures from the right shank IMU. It is evident that using VOLUME 00 | NUMBER 00 | MONTH 2017 |

Copyright ª 2017 National Strength and Conditioning Association

7

299

Lower-limb Exercises & Wearable Sensors

TABLE 4. The 22 most important features for exercise detection with the right shank IMU. Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Feature Gyroscope X—Variance Gyroscope Z—75th Percentile Accelerometer X—Max Accelerometer Z—RMS Accelerometer Z—Range Magnetometer X—Kurtosis Magnetometer Y—Min Magnetometer Y—75th Percentile Magnetometer Y—Index of Maximum Magnetometer Z—Kurtosis Magnetometer Z—25th Percentile Pitch—Range Pitch—Detailed Wavelet Coefficients Acceleration magnitude—75th Percentile Gyroscope Magnitude—Index of Maximum Quaternion W—Variance Quaternion W—Level Crossing Rate Quaternion X—Level Crossing Rate Quaternion Y—Kurtosis Quaternion Y—Min Gyroscope X—Mean Gyroscope X—RMS

the 22 top features produces comparable results to using the 342 available features. The random forest trained and evaluated with these 22 features produced an overall accuracy of 98.2%, sensitivity of 98.2%, and specificity of 99.5% when assessed with LOSOCV. The 22 features used are listed in the order of importance in Table 4.

DISCUSSION The main objective of this study was to investigate whether a system based on data derived from IMUs can distinguish between 5 commonly completed compound and plyometric exercises. Furthermore, it aims to identify the minimal amount of IMUs that could be used to achieve acceptable performance. Overall, the results presented indicate that excellent lower-limb exercise activity recognition can be achieved with data from multiple and single IMUs. A system consisting of 5 or 3 IMUs can detect the exercise type with an overall accuracy of 99%. Inertial measurement units placed on both shanks or on the right shank alone are able to identify exercises with greater than 98% accuracy. These results form the foundation for an IMU system that could identify 5 lower-limb exercises and act as an adjunct to current training practices. The confusion matrices in Figures 5 and 6 show a more detailed breakdown of classification results with all 5 IMUs and the right shank IMU. Excellent classification scores are achieved for all exercises with both IMU setups (.96%).

8

the

The SL squat is the poorest classified exercise with a 5IMU setup (97%). It is most often misclassified as a squat or deadlift, possibly because of the similar amount of knee flexion and ankle dorsiflexion present in these movements. Interestingly, the most commonly confused exercise combination is the lunge misclassified as an SL squat with the right shank IMU (Figure 6). A possible explanation for this might be that the dominant leg for each of these exercises was the left leg (left leg forward lunge, SL squat on the left leg). Therefore, the right shank would have been relatively stationary in either exercise. Despite this, the lunge was still classified correctly with almost 97% accuracy with this single IMU system. The tuck jump has the highest classification rates in both setups. This is most likely because IMU signals from this exercise differ drastically from the other 4 exercises because of the plyometric nature of the exercise. These results are consistent with previous research in the area, which found that IMU systems could distinguish between exercises with good to excellent overall accuracy (10,14,26,32,37). Chang et al. (10) and Seeger et al. (37) were able to show that a 2-IMU system could differentiate between upper- and lower-limb exercises with 90% accuracy and 92% sensitivity, respectively. However, multiple IMU systems are more expensive and less practical for end users than single-IMU systems. This is due to the risk of placement error, power usage, and Bluetooth connectivity issues. Pernek et al. (32) demonstrated the ability of a single-IMU system to differentiate between 6 upper-limb exercises with 84% accuracy. Muehlbauer et al. (26) and Giggins et al. (14) were able to achieve an overall accuracy of 94% and 93–95% accuracy, respectively, when differentiating between upperand lower-limb exercises using a single-IMU system. It is difficult to compare our results directly with those of previous research because of differences in exercises investigated, number of exercises to be classified, number of IMUs, IMU position, and classification methods. However, the classification results presented in this article are higher than previous work. Furthermore, in contrast to some of the previous work, these excellent classification results can be maintained with a single IMU (98%). A single-IMU system would reduce overall cost and would not prove as cumbersome as multiple IMU systems. The excellent accuracy, sensitivity, and specificity scores using a single IMU presented in this study are very important for end users as incorrect exercise classification would have a significant impact on system use and reduce user satisfaction (25). Similar to the setup used by Giggins et al. (14), our singleIMU system is able to differentiate between a range of exercises with a high level of accuracy. However, the system presented by Giggins et al. identified only simple exercises such as the heel slide and straight leg raise. This is similar to that of Seeger et al. (37) who classified exercises such as barbell curl and cable triceps extension. The system examined in this article is able to classify complex compound lower-limb exercises (squat, deadlift, SL squat, and lunge)

TM

Journal of Strength and Conditioning Research

Copyright ª 2017 National Strength and Conditioning Association

300

the

TM

Journal of Strength and Conditioning Research and a plyometric activity (tuck jump). Compound and plyometric exercises are essential for improving power production (2) and are therefore included in many training programs. Furthermore, these types of movements form the basis for commonly used musculoskeletal injury risk screening tools such as the Tuck Jump Assessment (27) and the Functional Movement Screen (11). However, they are often more difficult to identify using IMU-based systems because of their complexity. Figure 7 shows the classifier’s ability to obtain similar accuracy, sensitivity, and specificity scores with around 6% of features compared with using all features with a right shank IMU. Table 4 highlights the 24 most important features for exercise detection with the right shank IMU. The main benefit of this reduced feature selection is the quicker processing time for exercise classification. This means that real-time feedback would be easier to implement. Furthermore, the reduced processing load would lead to increases in battery life, meaning an enhanced user experience. These are important elements within a commercial domain in which users would prefer a “set up and go” approach that involvesminimal interaction with the user interface (25). Despite these encouraging results, there are a number of limitations with the IMU system presented in this article at present. First, it is not possible for IMUs to detect the load lifted during exercises. This means that end users would need to manually insert this before/after the movement. In addition to this, the system can only distinguish between 5 commonly completed lower-limb exercises. These were chosen because of their prevalence in S&C, musculoskeletal injury risk screening, and rehabilitation. It is hoped that future work will involve collection of a greater range of exercises. Finally, the system presented in this study does not monitor technique during movement. Technique is important to prevent stress on joints and reduce the risk of injury (15). Work is ongoing to achieve this, and results have already been published to this end (29,42,43,44). However, the results in this article provide an important first step in developing an IMU-based system to monitor lower-limb exercises, and future work aims to build on this foundation. Furthermore, they create an awareness of the capabilities of IMU systems among the S&C community.

PRACTICAL APPLICATIONS The system presented in this article can automatically distinguish between 5 common lower-limb exercises. A system such as this has the potential to act as an important adjunct for gym users, coaches, and rehabilitation professionals. Gym Users: Despite the documented benefits of strength training, participation among the general population is poor (9). Access to an S&C coach has a significant impact on both adherence and motivation (36); however, this is not always possible because of financial and availability issues. An IMU system based on the work presented in this article offers the potential to provide individual tracking of exercise and

| www.nsca.com

repetitions more cost effectively and ubiquitously than previously possible. This individual feedback can increase motivation (24). Automatic exercise tracking can increase activity and compliance as shown with pedometry (6). The consumer electronic market has recognized the potential in this but to date has mainly focused on tracking cardiovascular exercise. The results presented in this article offer the first step in the remote tracking of lower-limb strengthening exercises. Questionnaires and diaries regarding activity levels have been shown to be unreliable in certain disease populations (31). This recall error may have an adverse effect on subsequent exercise prescription as the exercise professional may overestimate or underestimate their client’s current status. The ability of a system to automatically detect which exercise is being completed means that the risk of recall bias is removed. This could allow for greater transparency between the client and exercise professional, thus leading to more beneficial exercise prescription. Furthermore, an automated training diary is a very attractive concept for gym users (24). Automatic exercise classification is an integral feature of this setup as it could reduce unnecessary interaction with the system. S&C Coaches: An IMU system like that presented in this article could act as a useful adjunct to current S&C practice. A system that identifies exercises could allow coaches to help monitor clients and athletes remotely. The ability to capture what exercise is completed and transferring these data to a cloud server would allow coaches to access it at any location with Internet access. By monitoring what exercises their client is completing, coaches could tailor plans to suit individual needs. This would be especially useful for coaches who train clients who travel extensively. In sports teams, S&C coaches often train large numbers of athletes simultaneously, making it difficult for them to monitor and provide feedback to individuals. The ability to track exercises could mean greater ability to provide individualized training plans as they can constantly monitor the exercises completed by their clients. This greater transparency would allow S&C coaches track their client’s current status and adherence to the prescribed exercise program. Rehabilitation Professionals: The use of technology in medicine is growing and wearable technology is expected to assist with the detection and treatment of various diseases over the coming years (30). The 5 exercises investigated in this study are common rehabilitation exercises. An IMU system that can classify these rehabilitation exercises could allow them to be completed at home under remote supervision from rehabilitation professionals. This may allow patients to leave hospital earlier and/or attend fewer outpatient clinics, reducing health care costs (4). The movements classified in this work are also common in screening tools such as the Tuck Jump Assessment (27) and Functional Movement Screen (11). The ability to classify them is the first step in an automated musculoskeletal injury risk assessment system. These computer-assisted rehabilitation and screening systems may prove less labor intensive than VOLUME 00 | NUMBER 00 | MONTH 2017 |

Copyright ª 2017 National Strength and Conditioning Association

9

301

Lower-limb Exercises & Wearable Sensors current practice (3). This could improve therapist efficiency and increase the number of patients a therapist can screen for musculoskeletal injury risk. In conclusion, the purpose of this study was to determine whether an IMU system could differentiate between 5 common lower-limb exercises. This is an important first step in the development of an IMU-based system for lower-limb exercise tracking. The findings of this research indicate that such a system can identify these exercises with an excellent degree of overall accuracy, even with a single IMU. This could be beneficial to gym users, coaches, and rehabilitation professionals. As the sensors contained within IMUs (accelerometers, gyroscopes, and magnetometers) are present within many smartphones, this offers the potential for exercises to be tracked and logged using only a smartphone. Our future work aims to monitor a user’s technique while performing their exercise routine and provide feedback to maintain proper form throughout the exercises. We aim to develop a workout system that combines biomechanical analysis, workout planners, and an automated logbook. This would afford S&C coaches greater training insights for their clients and athletes.

ACKNOWLEDGMENTS Blank to anonymise submission. This project is partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer (EPSPG/2013/574) and partly funded by Science Foundation Ireland (SFI/12/RC/ 2289). M. O’Reilly, D. Whelan are joint lead authors.

REFERENCES 1. Uk.mathworks.com. (2017). Single-level discrete 1-D wavelet transform - MATLAB dwt - MathWorks United Kingdom. [online] Available at: https://uk.mathworks.com/help/wavelet/ref/dwt. html. Accessed 8 March, 2017. 2. Adams, K, O’Shea, JP, O’Shea, KL, and Climstein, M. The effect of six weeks of squat, plyometric and squat-plyometric training on power production. J Strength Cond Res 6: 36–41, 1992. 3. Ahamed, NU, Sundaraj, K, Ahmad, RB, Siva, N, Poo, T, and Rahman, S. Biosensors assisted automated rehabilitation systems: A systematic review. Int J Phys Sci 7: 5–17, 2012. 4. Avci, A, Bosch, S, Marin-Perianu, M, Marin-Perianu, R, and Havinga, P. Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey. In: Proceedings of the 23rd International Conference on Architecture of computing systems (ARCS). VDE, 2010, pp. 1–10. 5. Baechle, TR and Earle, RW. NSCA’s Essentials of Personal Training. Human Kinetics: Champaign, IL, 2004. 6. Bravata, DM, Smith-Spangler, C, Sundaram, V, Gienger, AL, Lin, N, Lewis, R, Stave, CD, Olkin, I, and Sirard, JR. Using pedometers to increase physical activity and improve health: A systematic review. J Am Med Assoc 298: 2296–2304, 2007. 7. Breiman, L. Random forests. Machine Learning 45: 5–32, 2001. 8. Burns, A, Greene, BR, McGrath, MJ, O’Shea, TJ, Kuris, B, Ayer, SM, Stroiescu, F, and Cionca, V. SHIMMER–a wireless sensor platform for noninvasive biomedical research. IEEE Sensors J 10: 1527–1534, 2010.

10

the

9. Centers for Disease Control and Prevention C. Adult participation in aerobic and muscle-strengthening physical activities United States 2011. Morbidity Mortality Weekly Report 62: 326–330, 2013. 10. Chang, KH, Chen, MY, and Canny, J. Tracking free-weight exercises. In: UbiComp 2007: Ubiquitous Computing. New York, NY: Springer, 2007. pp. 19–37. 11. Cook, G, Burton, L, and Hoogenboom, B. Pre-participation screening: The use of fundamental movements as an assessment of function—part 1. N Am J Sports Phys Ther 1: 62–72, 2006. 12. Fitzgerald, D, Foody, J, Kelly, D, Ward, T, Markham, C, McDonald, J, and Caulfield, B. Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises. Presented at Proceedings of the Engineering in Medicine and Biology Society, 2007, Lyon, 2007. 13. Fushiki, T. Estimation of prediction error by using K-fold crossvalidation. Stat Comput 21: 137–146, 2011. 14. Giggins, O, Sweeney, KT, and Caulfield, B. The use of inertial sensors for the classification of rehabilitation exercises. In: Proceedings of the Engineering in Medicine and Biology Society. IEEE, 2014, pp. 2965–2968. 15. Hall, M, Nielsen, JH, Holsgaard-Larsen, A, Nielsen, DB, Creaby, MW, and Thorlund, JB. Forward lunge knee biomechanics before and after partial meniscectomy. Knee 22: 506–509, 2015. 16. Jerri, AJ. The Shannon sampling theorem—its various extensions and applications: A tutorial review. In: Proceedings of the IEEE 65: 1565–1596, 1977. 17. Katz, MJ and George, EB. Fractals and the analysis of growth paths. Bull Math Biol 47: 273–286, 1985. 18. Kavanagh, JJ and Menz, HB. Accelerometry: A technique for quantifying movement patterns during walking. Gait Posture 28: 1–15, 2008. 19. Kranz, M, Mo¨ller, A, Hammerla, N, Diewald, S, Plo¨tz, T, Olivier, P, and Roalter, L. The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices. Pervasive Mobile Comput 9: 203–215, 2013. 20. Liaw, A and Wiener, M. Classification and regression by randomForest. R News 2: 18–22, 2002. 21. Lyons, EJ, Lewis, ZH, Mayrsohn, BG, and Rowland, JL. Behavior change techniques implemented in electronic lifestyle activity monitors: A systematic content analysis. J Medical Internet Research 16: e192, 2014. 22. Madgwick, SOH, Harrison, AJL, and Vaidyanathan, R. Estimation of IMU and MARG orientation using a gradient descent algorithm. Presented at Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR), 2011. 23. Mitchell, E, Ahmadi, A, O’Connor, NE, Richter, C, Farrell, E, Kavanagh, J, and Moran, K. Automatically detecting asymmetric running using time and frequency domain features. Presented at Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), 2015. 24. Mo¨ller, A, Roalter, L, Diewald, S, Scherr, J, Kranz, M, Hammerla, N, Olivier, P, and Plo¨tz, T. Gymskill: A personal trainer for physical exercises. Presented at Pervasive Computing and Communications, 2012. 25. Morris, D, Saponas, TS, Guillory, A, and Kelner, I. RecoFit: Using a wearable sensor to find, recognize, and count repetitive exercises. Presented at Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2014. 26. Muehlbauer, M, Bahle, G, and Lukowicz, P. What can an arm holster worn smart phone do for activity recognition?. In: Proceedings of the International Symposium on Wearable Computers (ISWC). IEEE, 2011, pp. 79–82. 27. Myer, GD, Ford, KR, and Hewett, TE. Tuck jump assessment for reducing anterior cruciate ligament injury risk. Athl Ther Today 13: 39–44, 2008.

TM

Journal of Strength and Conditioning Research

Copyright ª 2017 National Strength and Conditioning Association

302

the

TM

Journal of Strength and Conditioning Research 28. O’Donovan, G, Blazevich, AJ, Boreham, C, Cooper, AR, Crank, H, Ekelund, U, Fox, KR, Gately, P, Giles-Corti, B, and Gill, JM. The ABC of physical activity for health: A consensus statement from the British association of sport and exercise sciences. J Sports Sciences 28: 573–591, 2010. 29. O’Reilly, M, Whelan, D, Chanialidis, C, Friel, N, Delahunt, E, Ward, T, and Caulfield, B. Evaluating squat performance with a single inertial measurement unit. Presented at IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), 2015. 30. Pantelopoulos, A and Bourbakis, NG. A survey on wearable sensorbased systems for health monitoring and prognosis. Appl Rev 40: 1– 12, 2010. 31. Pernek, I, Hummel, KA, and Kokol, P. Exercise repetition detection for resistance training based on smartphones. Personal Ubiquitous Computing 17: 771–782, 2013. 32. Pernek, I, Kurillo, G, Stiglic, G, and Bajcsy, R. Recognizing the intensity of strength training exercises with wearable sensors. J Biomedical Informatics 58: 145–155, 2015. 33. Pitta, F, Troosters, T, Probst, V, Spruit, M, Decramer, M, and Gosselink, R. Quantifying physical activity in daily life with questionnaires and motion sensors in COPD. Eur Respiratory J 27: 1040–1055, 2006. 34. Pozaic, T, Varga, M, Dzaja, D, and Zulj, S. Closed-loop system for assisted strength exercising. Presented at Information and Communication Technology Electronics and Microelectronics (MIPRO), 2013. 35. Rawson, ES and Walsh, TM. Estimation of resistance exercise energy expenditure using accelerometry. Med Science Sports Exercise 42: 622–628, 2010. 36. Ryan, RM, Frederick, CM, Lepes, D, Rubio, N, and Sheldon, KM. Intrinsic motivation and exercise adherence. Int J Sport Psychol 28: 335–354, 1997. 37. Seeger, C, Buchmann, A, and Van Laerhoven, K. myHealthAssistant: a phone-based body sensor network that captures the wearer’s

| www.nsca.com

exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks. ICST, 2011, pp. 1–7. 38. Taylor, PE, Almeida, GJ, Kanade, T, and Hodgins, JK. Classifying human motion quality for knee osteoarthritis using accelerometers. Presented at Engineering in Medicine and Biology Society (EMBC), 2010. 39. Velloso, E, Bulling, A, Gellersen, H, Ugulino, W, and Fuks, H. Qualitative activity recognition of weight lifting exercises. Presented at Proceedings of the 4th Augmented Human International Conference, 2013. 40. Whatman, C, Hing, W, and Hume, P. Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Phys Ther Sport 13: 87–96, 2012. 41. Whelan, D, O’Reilly, M, Huang, B, Giggins, O, Kechadi, T, and Caulfield, B. Leveraging IMU data for accurate exercise performance classification and musculoskeletal injury risk screening. Presented at IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC), 2016. 42. Whelan, D, O’Reilly, M, Ward, T, Delahunt, E, and Caulfield, B. Evaluating performance of the single leg squat exercise with a single inertial measurement unit. Presented at Proceedings of the 3rd Workshop on ICTs for Improving Patients Rehabilitation Research Techniques, 2015. 43. Whelan, D, O’Reilly, M, Ward, T, Delahunt, E, and Caulfield, B. Evaluating performance of the lunge exercise with multiple and individual inertial measurement units. Presented at Pervasive Health 10th EAI International Conference on Pervasive Computing Technologies for Healthcare, 2016. 44. Whelan, D, O’Reilly, M, Ward, T, Delahunt, E, and Caulfield, B. Technology in rehabilitation: evaluating the single leg squat exercise with wearable inertial measurement units. Methods Inf Med 2016. Epub ahead of print.

VOLUME 00 | NUMBER 00 | MONTH 2017 |

Copyright ª 2017 National Strength and Conditioning Association

11

303

Sports Biomechanics

ISSN: 1476-3141 (Print) 1752-6116 (Online) Journal homepage: http://www.tandfonline.com/loi/rspb20

Classification of lunge biomechanics with multiple and individual inertial measurement units Martin A. O’Reilly, Darragh F. Whelan, Tomas E. Ward, Eamonn Delahunt & Brian Caulfield To cite this article: Martin A. O’Reilly, Darragh F. Whelan, Tomas E. Ward, Eamonn Delahunt & Brian Caulfield (2017): Classification of lunge biomechanics with multiple and individual inertial measurement units, Sports Biomechanics, DOI: 10.1080/14763141.2017.1314544 To link to this article: http://dx.doi.org/10.1080/14763141.2017.1314544

Published online: 19 May 2017.

Submit your article to this journal

Article views: 55

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=rspb20 Download by: [University College Dublin]

Date: 25 July 2017, At: 02:41

304

Sports Biomechanics, 2017 https://doi.org/10.1080/14763141.2017.1314544

Classification of lunge biomechanics with multiple and individual inertial measurement units Martin A. O’Reillya,b  , Darragh F. Whelana,b  , Tomas E. Wardc  , Eamonn Delahuntb  and Brian Caulfielda,b  a

Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland; bSchool of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland; cInsight Centre for Data Analytics, Maynooth University, Maynooth, Ireland

ABSTRACT

Lunges are a common, compound lower limb resistance exercise. If completed with aberrant technique, the increased stress on the joints used may increase risk of injury. This study sought to first investigate the ability of inertial measurement units (IMUs), when used in isolation and combination, to (a) classify acceptable and aberrant lunge technique (b) classify exact deviations in lunge technique. We then sought to investigate the most important features and establish the minimum number of top-ranked features and decision trees that are needed to maintain maximal system classification efficacy. Eighty volunteers performed the lunge with acceptable form and 11 deviations. Five IMUs positioned on the lumbar spine, thighs, and shanks recorded these movements. Time and frequency domain features were extracted from the IMU data and used to train and test a variety of classifiers. A single-IMU system achieved 83% accuracy, 62% sensitivity, and 90% specificity in binary classification and a five-IMU system achieved 90% accuracy, 80% sensitivity, and 92% specificity. A five-IMU set-up can also detect specific deviations with 70% accuracy. System efficiency was improved and classification quality was maintained when using only 20% of the top-ranked features for training and testing classifiers.

ARTICLE HISTORY

Received 10 October 2016 Accepted 27 March 2017 KEYWORDS

Wearable sensors; biomedical technology; lower extremity; inertial measurement units

1. Introduction The lunge is a compound full-body exercise commonly used in strength and conditioning (S&C), injury risk screening, and rehabilitation. The movement is weight bearing and promotes muscle activation patterns similar to those of gait (Riemann, Lapinski, Smith, & Davies, 2012), while also being critical in certain sporting activities (Cronin, McNair, & Marshall, 2003). The lunge is also an evaluative movement to assess an athletes’ risk of incurring lower limb musculoskeletal injury as it is representative of lower extremity function during activity (Powden, Hoch, & Hoch, 2015). Furthermore, it is an excellent rehabilitative exercise because it can be completed at home and requires no equipment. As such it is used

CONTACT  Martin A. O’Reilly 

[email protected]

© 2017 Informa UK Limited, trading as Taylor & Francis Group

305

2 

 M. A. O’REILLY ET AL.

for rehabilitating conditions such as patellofemoral pain (Escamilla, 2001) and following ligament injury (Alkjær, Simonsen, Magnusson, Aagaard, & Dyhre-Poulsen, 2002). The challenging nature of the lunge means it is difficult to maintain good technique throughout the movement, which is important to reduce stress on knee, hip, and ankle joints (Farrokhi et al., 2008). Clinical staff such as physiotherapists and S&C coaches can provide technique feedback to maintain acceptable lunge technique. This is often completed using two distinct methods; sophisticated biomechanical assessment or subjective examination. Both of these have a number of limitations. The use of 3-D motion capture systems is expensive and the application of skin-mounted markers may hinder normal movement (Ahmadi et al., 2014; Bonnechere et al., 2014). Furthermore, data processing can be time-intensive and specific expertise is often required to interpret the processed data and to make recommendations on the observed results. Therefore, these systems are not frequently used to assess lunge technique beyond the research laboratory (Bonnet, Mazza, Fraisse, & Cappozzo, 2013). In a clinical setting, subjective evaluation has the potential for bias and poor to moderate reliability (Whatman, Hing, & Hume, 2012). This may be due to the experience of the rater, the method used to rate performance of the exercise (ordinal vs. dichotomous scales) or the instructions given to the raters (Chmielewski et al., 2007; Whatman et al., 2012). As the lunge has a large number of potential deviations (Table 1), subjective analysis is difficult and often flawed. Wearable inertial measurement units (IMUs) offer the potential for low-cost, objective biomechanical analysis that can be completed in a clinical environment. Self-contained, wireless IMU devices are easy to set-up and allow for the acquisition of human movement data in unconstrained environments (McGrath, Greene, O’Donovan, & Caulfield, 2012). These IMUs are small, inexpensive sensors that consist of accelerometers, gyroscopes, and magnetometers. They are able to acquire data pertaining to the linear and angular motion of individual limb segments and the centre of mass of the body. In this paper, the term IMU system will be used to describe the IMU sensors, the IMU signals, the associated signal processing applied to them, and the output of the exercise classification algorithms. These IMU systems are ideally suited to quantify performance of these movements in a clinical setting, as they are not hampered by location, occlusion, and lighting issues unlike other biomechanical analysis tools (Morris, Saponas, Guillory, & Kelner, 2014). A growing body of scientific literature has investigated the ability of IMU systems to assess technique in order to provide this holistic exercise analysis (Giggins, Sweeney, & Caulfield, 2014; Melzi, Borsani, & Cesana, 2009; O’Reilly et al., 2015; Pernek, Kurillo, Stiglic, & Bajcsy, Table 1. List and description of Lunge exercise technique deviations used in this study. Deviation N KVL KVR KTF HSR HSL BO BFO SS PB STS STL

Explanation Normal lunge Left knee coming towards mid-line during downward phase Left knee moving away from mid-line during downward phase Left knee ahead of toes during downward phase Excessive lean to left hand side during entire lunge exercise Excessive lean to right hand side during entire lunge exercise Excessive flexion of hip and torso during entire lunge exercise Right foot externally rotated Loss of balance during upward phase resulting stuttered steps Pushing backwards during upward phase Starting stance too short Starting stance too long

306

SPORTS BIOMECHANICS 

 3

2015; Taylor, Almeida, Hodgins, & Kanade, 2012; Velloso, Bulling, Gellersen, Ugulino, & Fuks, 2013; Whelan, O’Reilly, Ward, Delahunt, & Caulfield, 2015, 2016). However, there is minimal work published evaluating the ability of an IMU system to accurately quantify lunge biomechanics. Fitzgerald et al. (2007) used a system that involved ten inertial sensors incorporated within a body suit to automatically monitor an individual’s lunge and tested the system using a case study. Visual analysis of the IMU gyroscope signals identified lower limb movement deviations in the injured athlete when compared to the non-injured athlete. Leardini et al. (2014) and Tang et al. (2015) examined IMUs potential to track body segments while completing the lunge. Both concluded that they had good accuracy compared to a laboratory based optical measurement system such as ViconTM. Chen, Jafari, and Kehtarnavaz (2016) found that combining IMUs with the Microsoft KinectTM, lunges could be identified with 100% accuracy. Gowing et al. (2014) demonstrated IMUs could identify a range of movements with greater accuracy than the Microsoft Kinect (91% overall accuracy), however, no specific results were given for the lunge. To our knowledge no study has analysed the ability of an IMU based system to objectively analyse lunge technique. In summary, the lunge is a compound full-body exercise that is included in resistance training, rehabilitation programmes and musculoskeletal injury risk screening protocols. To date, biomechanical analysis of the movement has come in the form of expensive labbased evaluation or subjective clinical evaluation. Both of these have inherent disadvantages and there is a need for biomechanical analysis of the lunge that is easy to use, low-cost, and provides objective data. IMUs may provide a platform to achieve this. The research question this study seeks to address is: ‘How well can an IMU-based system quantify lunge technique?’ We hypothesise that (a) IMUs, used in combination or isolation, positioned on the lumbar spine, thighs, and shanks, may be capable of distinguishing between acceptable and aberrant lunge technique; (b) these IMU systems may also be capable of detecting specific deviations from acceptable lunge technique; (c) that system efficiency may be improved, whilst maintaining accuracy, by finding a selection of most important features to train the classification algorithms and through optimising other classifier settings.

2. Methods This study was undertaken to determine the minimal IMU sensor set that can discriminate between different levels of lunge performance and identify aberrant exercise technique. Data were acquired from participants as they completed the lunge with acceptable technique for ten repetitions. IMU data were then acquired, while three repetitions of the same exercise were completed with 11 commonly observed deviations from acceptable technique. 2.1. Participants Eighty healthy volunteers (57 males, 23 females, age: 24.7 ± 4.9 years, height: 1.75 ± 0.09 m, body mass: 76.0 ± 13.3 kg) were recruited for the study. No participant had a current or recent musculoskeletal injury that would impair his or her lunge performance. All participants had prior experience with the exercise and completed it regularly as part of their own training regime for at least one year. Each participant signed a consent form prior to completing the study. The University College Dublin Human Research Ethics Committee approved the study protocol.

307

4 

 M. A. O’REILLY ET AL.

2.2.  Exercise technique and deviations Participants completed the initial lunge with acceptable technique as described by the National Strength and Conditioning Association (NSCA) guidelines (Baechle & Earle, 2004). This involved participants placing their left foot in front of the torso and right foot behind the torso with toes pointing forward and the torso kept upright. This torso position was maintained throughout the movement. The downward phase then started from this position. The leading left knee was flexed in order to lower the trailing right knee towards the floor. The lead knee was kept directly over the lead foot that remained on the floor. The leading knee continued to flex until it was roughly 90° perpendicular with the lower leg in the sagittal plane and the trailing knee was 3–6 cm above the floor. The upward phase immediately followed, whereby the lead knee extended back to starting position, whilst an upright torso posture was maintained. Table 1 shows deviations completed by participants in this study. In this study, there were no controls of the severity each participant completed the deviations with. 2.3.  Experimental protocol When participants arrived to the laboratory the testing protocol was explained to them. Following this they completed a ten-minute warm-up on an exercise bike maintaining a power output of 100 W at 75–85 rpm. Next, IMUs were secured on the participant at the following five locations; the spinous process of the fifth lumbar vertebra, the mid-point of both femurs (determined as half way between the greater trochanter and lateral femoral condyle), and on both shanks 2 cm above the lateral malleolus (Figure 1). The orientation and location of the IMUs were consistent for all study participants. A pilot study was used to determine an appropriate sampling frequency and the ranges for the accelerometer and gyroscope on board the IMUs (Shimmer 3, Shimmer, Dublin, Ireland). In the pilot study, squat data were collected at 512 Hz. A Fourier transform was then used to detect the characteristic frequencies of the signal which were all found to be less than 20 Hz. Therefore, a sampling frequency of 51.2 Hz was deemed appropriate for this study based upon the Nyquist criterion and the Shannon sampling theorem (Jerri, 1977). The Shimmer IMU was configured to stream tri-axial accelerometer (±2  g), gyroscope (±500  o/s), and magnetometer (±1.9 Ga) data with the sensor ranges chosen also based upon data from the pilot study. The IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application. Participants were then instructed on how to complete the lunge with acceptable technique and biomechanical alignment as outlined in the NSCA guidelines and explained in the ‘Exercise Technique and Deviations’ section. They completed ten repetitions with this acceptable technique. Once the lunge had been completed with acceptable technique the participant was instructed to complete the exercise with the deviations specified in Table 1. They completed three repetitions of each deviation as required. Verbal instructions and a demonstration were provided to all participants and they completed trial repetitions to ensure they were comfortable completing the deviations. All lunges were completed using body weight only. A chartered physiotherapist and an S&C trained individual were present throughout all data collection to ensure the lunge had been completed as instructed. If the movement completed was not in accordance with the description in Table 1, the participants were asked to repeat the exercise.

308

SPORTS BIOMECHANICS 

 5

Figure 1. Image showing the lunge exercise and the five sensor positions: (1) The spinous process of the 5th lumbar vertebra, (2&3) the mid-point of both femurs on the lateral surface (determined as half way between the greater trochanter and lateral femoral condyle), (4&5) and on both shanks 2 cm above the lateral malleolus.

2.4.  Data analysis Nine signals were obtained from each inertial measurement unit; accelerometer x, y, z gyroscope x, y, z, and magnetometer x, y, z. The direction of each of the above axes is relative to the IMUs. A precise description of these axes and signals can be found in Shimmer’s online documentation for users (http://www.shimmersensing.com/support/wireless-sensor-networks-documentation). These signals were low-pass filtered at fc  =  20  Hz using a Butterworth filter of order n = 8 to remove high-frequency noise and ensure all data analysed related to each participant’s movement. In particular, this filter helped remove high-frequency, high-amplitude spikes in the magnetometer data that occur due to electromagnetic interference. Six additional signals were then computed. The 3-D orientation of the inertial measurement unit was computed from the accelerometer, gyroscope, and magnetometer signals using the gradient descent algorithm as developed by Madgwick, Harrison, and Vaidyanathan (2011) which resulted in the quaternion W, X, Y, and Z signals. The acceleration magnitude was also computed from the accelerometer x, y, and z signals. Finally, the gyroscope magnitude was computed from the gyroscope x, y, and z signals. Each repetition from each exercise was extracted from the IMU data and resampled to a length of 250 samples in order to facilitate for where participants had completed lunges at different tempos. Descriptive features were then computed from the aforementioned 15 signals. Time-domain and frequency-domain descriptive features were computed in order to

309

6 

 M. A. O’REILLY ET AL.

describe the pattern of each of the 15 signals when the exercise was completed. The features were chosen based on a visual analysis of the data and similar work completed previously in the field (Giggins et al., 2014; O’Reilly et al., 2015; Whelan et al., 2015). These features were namely; signal peak, valley, range, mean, standard deviation, skewness, kurtosis, signal energy, level crossing rate, variance, 25th percentile, 75th percentile, median, and the variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 6. This resulted in 16 features for each of the 16 available signals producing a total of 240 features per IMU. Feature importance was calculated to reduce model complexity and develop an understanding of which of the aforementioned features are most valuable for correct classification outcomes. This in turn could lead to a more efficient end-user application, increasing the rate of feedback to the user and improving the computational efficiency off the system. We used the method described by Liaw and Wiener (2002) to establish variable importance. Initially, benchmark accuracy was established through using all of the computed features to train and test a random forests classifier. The process was to then permute the values of each feature and measure how much the permutation decreases the accuracy of the model. For unimportant features, the permutation should have little to no effect on model accuracy, while permuting important features should significantly decrease it. Following the permutation of each feature, features were ranked based on their importance with the most important feature being that which caused the largest reduction in accuracy when permuted and so on. The random forests method was employed to perform classification (Breiman, 2001). During analysis several types of classifiers were tested including k-nearest neighbours, support vector machines, and naïve Bayes classifiers, however, none were shown to provide improved results on this data-set and some of these classifiers increased computational time required. The random forests method was therefore chosen due to its superior classification results and efficiency versus these other classification methods. Random forests classifiers are also easily implemented and reproduced. The number of trees used in an end-user system will depend on application requirements, where fewer trees will increase system efficiency but reduce classification efficacy. In order to investigate this relationship, for the IMU-based lunge technique classification system, the out of bag error of a random forest was computed when using 1–500 trees. Out of bag error, is a method of measuring the prediction error of random forests and other machine learning models. It uses bootstrap aggregating to sub-sample data sampled used for training. Out of bag error is the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample (James, Witten, Hastie, & Tibshirani, 2013). This process was completed for both binary and multi-class lunge technique classification systems. A lower out of bag error score suggests a lower probability of misclassification but an increased number of trees will decrease computational efficiency. Classifiers were initially developed using 400 trees and all available features. They were evaluated for the following ten combinations of variables; the 1,200 (5 × 240) variables computed from every IMU, the 720 (3 × 240) variables from both shanks and the Lumbar IMU, the 720 variables from both thighs and the Lumbar IMU, the 480 (2 × 240) variables from both shanks, the 480 variables from both thighs and the 240 variables from each of the five individual IMUs. This was to compare classification scores using five IMUs, three

310

SPORTS BIOMECHANICS 

 7

IMUs, two IMUs, and one IMU at different anatomical locations and to establish model accuracy without considering efficiency due to feature selection. Classifiers were then developed using subsets of features based on their ranked importance which was computed using the aforementioned random forests variable permutation method (Liaw & Wiener, 2002). For each IMU combination, the features ranked in the top 2% were first used as training and test data. The accuracy of the system was determined using this reduced set of variables. Following this, the accuracy of the system was determined when using the top 4% of features and so on in increments of 2% until all variables were used again. The accuracy was plotted versus the percentage of variables used in order to build an understanding of the relationship of model efficiency and accuracy. Initially, binary classification was evaluated to establish how effectively each individual IMU and each combination of IMUs could distinguish between acceptable and aberrant lunge technique. All repetitions of acceptable performance of the lunge were labelled ‘0’ and all repetitions of the lunge performed with one of the deviations as outlined in Table 1 were labelled ‘1’. Multi-label classification was then evaluated on the IMU data-set to investigate how effectively each individual IMU and each combination of IMUs could be used to discriminate between correct performance of the lunge exercise and each of the 11 deviations from correct technique as described in Table 1. All repetitions of normal performance of the lunge remained labelled as ‘0’ and each of the different deviations were labelled ‘1–11’. Identifying the type of deviation occurring could allow for an automated biofeedback system to give more specific feedback to a user than a binary classifier is capable of. Moreover, they may require less IMUs to perform effective classification than a classifier that identifies the exact deviation in lunge technique that a user is making. In order to investigate this idea, the 12 classes were grouped as described in Table 2, to establish the ability of each IMU set-up to identify the type of deviation that was occurring. The four groups were chosen by a chartered physiotherapist and a person trained in S&C following discussions with clinical professionals. They pertain to the users’ lower limb alignment throughout each repetition of the exercise, the user’s starting position and their lumbar tracking. All of the above classifiers were developed using all computed features and subsets of the top features. System quality for both sets of classifiers was compared using the accuracy, sensitivity, and specificity metrics. The quality of the lunge exercise classification was established using leave-one-subjectout-cross-validation (LOSOCV) and the random forests classifier with 400 trees. Each participant’s data correspond to one fold of the cross-validation. At each fold, one participant’s Table 2. Grouped deviations used for classifying the type of deviation a system user may be making. Group name Normal Lower Limb Alignment

Numeric label 0 1

Starting Position

2

Lumbar Tracking

3

Deviations in group N KVL BFO KVR KTF PB SS STS STL BO HSL HSR

311

8 

 M. A. O’REILLY ET AL.

data are held out, while the random forests classifier is trained and then this held out data are used to assess the classifier’s ability to correctly categorise new data it is presented with. The use of LOSOCV ensures that there is no biasing of the classifiers, whereby the test ­participants’ data are completely unseen by the classifier prior to testing. Previous research has shown that not employing this method of testing can greatly skew results (Taylor, Almeida, Kanade, & Hodgins, 2010). In our system each individual repetition was classified. The scores used to measure the quality of classification were total accuracy, average sensitivity and average specificity. Accuracy is the number of correctly classified repetitions of all the exercises divided by the total number of repetitions completed; this is calculated as the sum of the true positives (TP) and true negatives (TN) divided by the sum of the true positives, false positives (FP), true negatives, and false negatives (FN):

Accuracy =

TP + TN TP + FP + TN + FN

(1)

In binary classification the ‘0’ label was considered positive and ‘1’ label as negative. Therefore, a single sensitivity and specificity score is suitable to describe the quality of the various IMU combinations. In multi-class classification, for every IMU combination evaluated the sensitivity and specificity were calculated for each of the 12 deviations and then the mean and standard deviation across the 12 values was taken, using the formulas below:

Sensitivity =

TP TP + FN

(2)

Specificity =

TN TN + FP

(3)

Sensitivity measures the effectiveness of a classifier at identifying a desired label, while specificity measures the classifiers ability to detect negative labels. A ‘positive’ label is the desired label. For example, in binary classification a lunge completed with correct form is considered the positive class. In reviewing the accuracy, sensitivity, and specificity scores produced by each classifier, 80% or higher was considered a ‘good’ quality result, 60–79% was considered a ‘moderate’ result, and anything less than 59% was deemed a poor result. The authors chose these values after reviewing existing literature on identifying deviations from acceptable exercise performance using data derived from IMUs. In reviewing such literature, an existing accepted standard for a good, moderate, or poor classifier could not be found. Therefore, the above system was agreed on by the authors to facilitate interpretation of our range of results.

3. Results Results are first presented from the classifiers trained with all available features. The most important features for model development are then presented. The subsequent effect of feature selection on classification is then presented. 3.1.  Classification with full feature set Total accuracy scores for each IMU position are shown in Table 3. A single-IMU set-up is able to detect whether a lunge is completed correctly with 72–82% accuracy but can only

312

 9

SPORTS BIOMECHANICS 

Table 3. Total accuracy, sensitivity, and specificity scores from LOSOCV for binary and both multi-label classifiers. Binary (%) (correct or incorrect) All 5 IMUs Lumbar + Shanks Lumbar + Thighs Shanks Thighs Left Shank Left Thigh Lumbar Right Thigh Right Shank

Acc 90 87 87 84 85 72 77 77 82 81

Sens 80 73 73 67 69 40 51 50 78 58

Spec 92 91 91 89 90 82 83 87 83 88

Multi-label (%) (type of deviation)

Multi-label (%) (specific deviation)

Acc 67 66 65 56 53 46 44 56 52 44

Acc 70 60 61 50 48 35 37 40 39 33

Sens 71 ± 13 69 ± 14 70 ± 13 50 ± 15 52 ± 16 39 ± 13 42 ± 4 59 ± 19 47 ± 18 42 ± 15

Good

66.4

33.6

Poor

12.1

87.9

Good

Poor

Spec 90 ± 3 88 ± 5 88 ± 4 85 ± 4 84 ± 1 81 ± 5 82 ± 4 85 ± 6 84 ± 2 81 ± 5

Sens 70 ± 10 62 ± 13 61 ± 13 49 ± 23 44 ± 2 38 ± 25 39 ± 3 43 ± 24 33 ± 25 29 ± 24

Spec 97 ± 1 96 ± 2 96 ± 2 95 ± 2 95 ± 3 94 ± 3 94 ± 3 95 ± 3 94 ± 3 94 ± 3

Figure 2. Heatmap confusion matrix showing binary classification results with right thigh IMU. Note: Rows show the actual class of a repetition and columns show the classifier’s prediction.

detect the exact mistake a user is making with 35–40% accuracy. A multi-IMU set-up using 5 IMUs worn on the lumbar, both thighs and both shanks is capable of distinguishing between good and bad performances of the lunge with 90% accuracy. Furthermore, this IMU set-up can detect the exact mistake a user is making from the list shown in Table 1 with 70% accuracy. Sensitivity and specificity scores are shown for all IMU combinations in Table 3. Moderate sensitivity and specificity scores are achieved for a single-IMU system in Binary classification. The IMU on the right thigh is capable of 78% sensitivity and 83% specificity. The left shank was the poorest position for binary classification using a single IMU with a sensitivity of 40% and specificity of 82%. The five-IMU set-up is most effective for both binary and multi-class classification. A reduced IMU set using three IMUs positioned on the lumbar and both shanks also produces good classification accuracy. Figures 2 and 3 are confusion matrices showing the exact percentage of repetitions correctly and incorrectly classified. The rows demonstrate the class the rep actually belongs to and the columns show which class the classifier outputted. In both figures, the top left shows the percentage of TPs, top right: FNs, bottom right: TNs, and bottom left: FNs. The true positive rate for the single IMU on the right thigh is 66% and for the five-IMU set is 71%. The true negative rate for the single IMU on the right thigh is 88% and for the fiveIMU set is 95%.

313

10 

 M. A. O’REILLY ET AL.

Good

70.7

29.3

Poor

5.0

95.0

Good

Poor

Figure 3. Heatmap confusion matrix showing binary classification results with all five IMUs. Note: Rows show the actual class of a repetition and columns show the classifier’s prediction.

Normal

Lower Limb Alignment

Starting Position

Lumbar Tracking

67.7

7.2

15.9

9.3

80.0

8.9

62.3

19.2

9.6

60.0

13.6

19.9

60.9

5.6

1.1

8.1

1.6

89.2

Normal

Lower Limb Alignment

Starting Position

Lumbar Tracking

70.0

50.0 40.0 30.0 20.0 10.0

Figure 4. Heatmap confusion matrix showing multi-label classification results (deviation type) with lumbar & shank IMUs. Note: Rows show the actual class of a repetition and columns show the classifier’s prediction.

Figure 4 is a confusion matrix for identifying the type of deviation occurring using the IMUs on the lumbar and both shanks. Mistakes pertaining to lumbar tracking are best detected with an 89% rate. Mistakes relating to lower limb alignment and starting position are the most mutually confused categories. Figure 5 shows a confusion matrix for multi-class classification (exact deviation) when using the five-IMU set-up. ‘Normal’ (N) performance of the squat is detected with a 68% TP rate. Six per cent of normal lunges were confused by the classifier to be ‘step too short’ (STS) and 7% of normal lunges were mistaken to be ‘step too long’ (STL). The ‘push back’ (PB) deviation is the worst detected deviation with a TP rate of just 46%, almost 10% of PB reps were mistaken to be ‘normal’ (N) by the system. The best detected deviation was ‘stutter step’ (SS) with a TP rate of 81%. 3.2.  Feature importance Table 4 highlights the 15 most important features for each type of classifier evaluated.

314

 11

SPORTS BIOMECHANICS 

N KVL KVR STS STL HSR HSL KTF BO PB SS BFO

67.8 3.2 1.6 6.7 9.2 0.0 0.0 0.0 0.0 9.9 3.2 0.0

0.0 74.2 0.0 5.1 1.5 1.6 0.0 1.5 1.6 6.3 3.2 0.0

1.7 0.0 77.3 4.6 6.1 1.6 3.1 6.0 1.6 1.6 1.6 0.0

6.3 8.1 1.6 63.1 0.0 2.1 1.6 6.0 3.3 7.9 0.0 3.2

6.8 1.6 4.9 1.5 65.3 7.3 3.1 7.5 0.0 4.2 1.6 4.9

0.0 0.0 0.0 0.0 0.0 73.4 7.8 0.0 1.6 0.0 0.0 3.2

1.7 3.2 4.9 3.1 3.1 4.7 76.6 3.0 3.3 4.7 5.3 3.2

6.6 8.1 1.6 3.1 8.7 4.7 3.1 65.7 3.3 7.9 1.6 1.6

6.6 0.0 0.0 4.6 0.0 0.0 3.1 7.5 76.9 8.9 1.1 4.9

1.5 0.0 1.6 3.1 0.0 3.1 0.0 1.5 8.2 45.5 1.6 1.6

0.2 0.0 1.6 2.6 0.0 0.0 0.0 0.0 0.0 0.0 81.1 0.0

1.0 1.6 4.9 2.6 6.1 1.6 1.6 1.5 0.0 3.1 0.0 77.3

N

KVL

KVR

STS

STL

HSR

HSL

KTF

BO

PB

SS

BFO

80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0

Figure 5. Heatmap confusion matrix showing multi-class classification results with all five IMUs. Note: Rows show the actual class of a repetition and columns show the classifier’s prediction.

Table 4. Top 15 features for binary and multi-label classification. Rank 1

8

Binary (correct or incorrect) Multi-label (type of deviation) Right Shank—Gyroscope X—Fractal Right Thigh—Magnetometer Dimension X—75th Percentile Right Shank—Roll—Approx Wave- Left Shank—Accelerometer Z- RMS let Coefficients Right Thigh—Magnetometer X— Lumbar—Acceleration magVariance nitude—Detailed Wavelet Coefficients Right Thigh—Quaternion W—Ap- Right Thigh—Quaternion Y—Level prox Wavelet Coefficients Crossing Rate Left Shank—Magnetometer Z— Left Thigh—Quaternion X—Max Detailed Wavelet Coefficients Left Thigh—Magnetometer Y— Right Thigh—Quaternion Y— Range Skewness Left Thigh—Acceleration magniRight Shank—Accelerometer Z— tude—RMS Skewness Lumbar—Magnetometer X—Min Left Thigh—Roll—Variance

9

Lumbar—Quaternion W—Mean

10

Lumbar—Quaternion W—RMS

11

Right Thigh—Accelerometer Y— Variance Right Shank—Accelerometer X— Level Crossing Rate

2 3 4 5 6 7

12 13 14 15

Right Shank—Magnetometer X— Energy Left Shank—Accelerometer X— Range Left Shank—Magnetometer Z— Range

Multi-label (specific deviation) Left Shank—Gyroscope X—Energy Left Shank—Gyroscope Z—Level Crossing Rate Right Shank—Accelerometer Y— Standard Deviation

Left Shank—Magnetometer X— Kurtosis Right Thigh—Pitch—Standard Deviation Right Thigh—Gyroscope Magnitude—Max Right Thigh—Magnetometer X— Range Right Thigh—Acceleration magnitude—Max Left Thigh—Gyroscope MagniLeft Shank—Gyroscope Magnitude—75th Percentile tude—Standard Deviation Lumbar—Gyroscope Y—Variance Left Shank—Quaternion X—Variance Right Thigh—Rol l—Approx Wavelet Right Shank—Gyroscope Y—Max Coefficients Right Shank—Accelerometer Y— Right Shank—Acceleration Kurtosis magnitude—Detailed Wavelet Coefficients Right Shank—Yaw—Energy Left Shank—Gyroscope X—Fractal Dimension Left Shank—Pitch—Energy Left Shank—Accelerometer Z—Max Lumbar—Quaternion Y—Standard Deviation

Left Thigh—Gyroscope Z—Max

315

12 

 M. A. O’REILLY ET AL.

3.3.  Classification with top-ranked features Figure 6 shows the quality of multi-class classification of exact deviation when using 1–100% of features ranked in order of importance. It is evident using just 20% of the top-ranked features produces similar classification quality to using 100% of the features. Similarly, Figure 7 demonstrates the quality of binary classification when using 1–100% of the features

Figure 6. Graph showing accuracy, sensitivity, and specificity scores in multi-class classification (exact deviation) with all five IMUs when using 1–100% of features ranked in order of importance.

Figure 7. Graph showing accuracy, sensitivity, and specificity scores in binary classification (acceptable or aberrant technique) with the right thigh IMU when using 1–100% of features ranked in order of importance.

316

SPORTS BIOMECHANICS 

 13

Figure 8.  The relationship between out of bag error and number trees used for the random forests classifiers when using all training data for binary classification using the right thigh IMU and all training data for multiclass classification using all five IMUs.

computed from the IMU located on the right thigh. A similar trend in accuracy, sensitivity and specificity scores was evident across all the classification problems discussed in this paper when using just 20% of the most important features. 3.4.  Number of trees used vs. out of bag error Figure 8 demonstrates the out of bag error versus number of trees used in the random forests classifiers. It is evident that the cost (computational efficiency)–benefit (improved classification scores) of using increased tree numbers starts to diminish after utilising 100–150 trees for classification.

4.  Discussion and implications In relation to hypothesis (a), the results presented in this paper show that a system consisting of five lower limb IMUs can distinguish between acceptable and aberrant lunge technique with 90% accuracy, 80% sensitivity, and 92% specificity. A system based on data derived from three IMUs achieves 87% accuracy, 73% sensitivity, and 91% specificity. The similar classification scores with a reduced IMU set-up arise because the features from both thighs and both shanks are likely to correlate. This provides minimal additional information to aid classification quality. The system using data solely from the right thigh IMU achieved 82% accuracy, 78% sensitivity, and 83% specificity. This IMU position vastly outperforms the others when used in isolation, particularly when considering the sensitivity achieved. This is likely because this position on the trailing leg for the lunges studied captures the differences between acceptable and aberrant lunges most optimally as at the other IMU positions, the inter-participant variability of how lunges are performed exceeds the intra-participant variability between acceptable and aberrant lunges to a greater extent.

317

14 

 M. A. O’REILLY ET AL.

In examining hypotheses (b) of this study, it has been shown that multi-label classification of specific deviations is possible with 70% accuracy, 70% sensitivity, and 97% specificity using data from the full five-IMU set-up. A three-IMU set-up may provide increased system practicality and efficiency compared to a five-IMU set-up, however, its accuracy and sensitivity drop by 8–10%. A system consisting of two IMUs did not produce favourable results for multi-label classification. These results are likely to be related to amount of classes the system is attempting to identify (a total of 12 as shown in Table 1). Furthermore, these deviations manifest more severely at specific anatomical points, making it difficult for a reduced IMU set-up to identify. When using reduced IMU set-ups for detecting these specific mistakes, it is likely the features will be less discriminative between each class when only coming from a single IMU. It should also be noted that the number of instances of each class in multi-label classification is considerably lower than that used for the binary classifiers where repetitions from 11 classes of deviations were pooled together to make one larger ‘aberrant’ class, negatively impacting multi-label classification scores. Interestingly, when pooling types of deviations together to create larger classes (Table 2), an increase of 4–16% accuracy was observed. This may be due to the reduced likelihood of classifiers ‘confusing’ these deviations or because each class now contains more instances than were available for specific multi-label classification. However, two- and single-IMU set-ups still showed poor sensitivity and accuracy with this form of classification. Subsequently, it would be difficult to implement effectively in a real-world environment. This work also identifies two methods of increasing system efficiency corresponding with hypothesis (c) of this study. Figures 6 and 7 show the ability of a reduced feature selection to obtain similar accuracy, sensitivity, and specificity scores compared to using all features. In these cases, using just 20–30% of top-ranked features to train and evaluate the random forests classifiers produces equivalent accuracy, sensitivity, and specificity results. Figure 8 also demonstrates that using a reduced number of trees in each random forests classifier can maintain the misclassification probability as determined by out of bag error. It is difficult to draw conclusions as to how many trees and what proportion of top-ranked features should be used by people implementing systems such as those described in this paper. These decisions should be made considering the specific application of the systems, the computational processing devices available (e.g. on device or cloud-based analysis) and a domain-specific judgement on the accuracy-efficiency trade-off. It is however shown that a broad range of choices will result in near equivalent classification scores. The top-ranked features shown in Table 4 demonstrate which features are most important for each type of classifier. There is a large degree of variety in the important IMU positions, signals, and feature types (time domain, frequency domain, and time–frequency domain). As such, it would be difficult to predict which signals and features are of most importance in advance of creating IMU-based exercise analysis systems for other exercises. The authors would recommend following a similar process of computing a large number of diverse features from a variety of signals and then identifying the most important features for each specific application. Interestingly, a number of features from the magnetometer and quaternion signals are highly ranked. The authors note that these signals were not analysed in similar research (Giggins et al., 2014; Pernek, Hummel, & Kokol, 2013; Taylor et al., 2012; Velloso et al., 2013) and that perhaps they could be beneficial to include in future IMU-based exercise analysis systems. It is difficult to directly compare results with previous work in the area due to differences in exercises investigated, IMU positions and overall size of data-sets analysed. The majority

318

SPORTS BIOMECHANICS 

 15

of research to date has investigated the ability of IMU systems to monitor technique in simple exercises such as straight leg raises (Taylor et al., 2012), dumbbell curls (Velloso et al., 2013) or heel slides (Giggins et al., 2014). This work builds on previous research by analysing lunge form. The lunge is a compound, non-symmetrical exercise making it more difficult to track than single-joint exercises. While some authors have tracked the lunge using IMUs (Fitzgerald et al., 2007; Leardini et al., 2014; Tang et al., 2015), none have quantitatively assessed technique using the IMUs. The complex nature of the lunge means a number of mistakes can occur at a variety of joint positions, making it more difficult to ascertain exact deviations using an IMU sensor set. This explains the moderate to low sensitivity and accuracy scores for multi-label classification seen in Table 3. The lower sensitivity and accuracy scores in Table 3 for multi-label classification are especially prevalent in reduced IMU set-ups (less than three IMUs). A minimal IMU set-up is advantageous as it is less cumbersome, easier to operate and reduces the risk of sensor placement error. Furthermore, it would reduce cost for end-users. A number of authors have evaluated exercise performance with a single IMU, reporting higher overall accuracy scores than this study (Giggins et al., 2014; O’Reilly et al., 2015; Pernek et al., 2013; Whelan et al., 2015). The scores presented here for a single IMU performing multi-class recognition may be lower for a number of reasons. The lunge is a multi-joint lower limb exercise unlike those presented in Giggins et al. (2014) and Pernek et al. (2013). The use of multiple joints makes it more difficult to detect deviations with a minimal IMU set-up due to the increased number of possible deviations and additional complexity of exercise biomechanics. Previous work (O’Reilly et al., 2015; Whelan et al., 2015) evaluated multi-joint exercises (squat and single leg squat) using a single-IMU set-up and found moderate to good overall accuracy scores. However, the number of potential deviations that the authors attempted to classify with the IMUs was reduced compared to the 11 deviations shown in this paper (Table 1). This makes it more difficult for the IMUs to classify each deviation, explaining some of the lower scores shown in Table 3. While this paper does show encouraging results for using an IMU system to track lunge biomechanics, there are a number of contextual factors that must be considered. Firstly, the data-set collected and analysed in this study may be limited in its transferability to realworld applications as all exercise deviations were deliberately induced. When deviations occur naturally, the exact way in which they present may differ from the induced deviations investigated in this study. Moreover, there were no controls as to the severity with which each participant performed a deviation and therefore it is possible that naturally occurring lunge deviations in a ‘real life’ application may be more acute, or occur in a more idiosyncratic fashion than those used in the presented classification systems. No gold-standard 3-D motion capture system was used to confirm that each deviation occurred and the 11 deviations studied may be a non-exhaustive list of lunge deviations. Another potential limitation of the current system in terms of its transferability to real-world application is that IMU positioning was strictly consistent and local frame IMU data was used. As such the IMU systems presented may only produce these results when set up and analysed in identical fashion to this study. Inexperienced users may place the IMUs with incorrect orientation or at different positions, affecting system accuracy. One such method to combat this could be to convert the local frame IMU data to a global frame that utilises vertical acceleration. However, as many of the signals and features used in this study are derived from local frame IMU data the authors anticipate this could have a negative consequence on classification

319

16 

 M. A. O’REILLY ET AL.

quality. Most notably, this may happen where information is lost in converting a signal from local frame to global frame. In the current system, the accelerometer x signal measures both inertial and gravitational acceleration with respect to the x-axis of the IMU. Converting signals such as this to acceleration in the global frame may cause loss of information, for instance, that which came from the effect of gravitational acceleration with respect to the IMU’s x-axis. This limitation requires further investigation and the authors recommend that IMU positioning and its importance should be made clear in instructions to all users of the current system. Future work will involve three key areas of focus: (a) addressing the aforementioned limitations of the system, including developing and evaluating lunge classification systems with naturally occurring technique deviations (b) improving system accuracy through collecting a larger data-set and investigating additional analysis methodologies, and (c) evaluating the usability, functionality, and perceived impact of IMU-based exercise analysis systems such as that described in this paper. A larger data-set should improve system accuracy and may allow for the application of deep learning classification techniques to the data that could further heighten system accuracy. In order to enable the collection of a larger data-set, a tablet-based tool is currently being developed which will allow for exercise professionals to simultaneously collect IMU and video data, which is then automatically segmented in to exercise repetitions and can be labelled by the exercise professional. The labelled IMU data can be appended to existing classifiers such as those described in this paper. This should take data collection out of the hands of researchers and in to the hands of exercise professionals, allowing for the creation of larger data-sets. A qualitative and quantitative evaluation of the system described in this paper will also be undertaken in order to establish its usability, functionality, and perceived benefit. This should help inform steps to improve the development of IMU-based exercise analysis systems and subsequently what future work may also involve. The lunge is an important movement in S&C, musculoskeletal injury risk screening, and rehabilitation. The ability to objectively quantify lunge biomechanics using low-cost IMU technology would have practical implications in all of these settings. In an S&C setting, the ability to remotely monitor lunge biomechanics may allow for guidance in how to complete the lunge exercise. This could help in achieving exercise goals and reducing the risk of injury by allowing coaches to monitor lunge biomechanics and progression. In musculoskeletal and sports medicine, the lunge is a common screening test as it is weight bearing and functional (Cook, Burton, & Hoogenboom, 2006; Powden et al., 2015). An IMU system that can identify aberrant biomechanics would allow for quicker and more objective injury risk identification and stratification. In a rehabilitation setting, the lunge is often completed following injury and surgery (Hall et al., 2015). Monitoring technique would be important to prevent further injury and possibly allow exercises to be completed at home without the need for constant supervision, reducing overall health care costs. Furthermore, the IMUs would allow the transfer of exercise data to a cloud-based server. This means therapists can assess patient compliance and technique during the prescribed exercise without the need for constant monitoring.

5. Conclusion The results of this study show that an IMU system is able to classify lunge technique as acceptable or aberrant with good accuracy using a five-, three-, two-, or single-IMU set-up. A five- and three-IMU set-up can detect specific deviations in a person’s lunge biomechanics

320

SPORTS BIOMECHANICS 

 17

with moderate accuracy. This is diminished with a reduced IMU set, even with a broader grouping of deviations. It is also shown that a more streamlined feature selection technique has similar outcomes to using all features to train the classifier, allowing for the possibility of an enhanced user experience. Taken together, these results suggest that a wearable IMU system has the potential to monitor lunge biomechanics, which would have important implications in injury risk screening, strengthening, and rehabilitation.

Acknowledgements The authors would also like to thank UCD Sport for providing equipment which was used in this study. The authors Martin O’Reilly and Darragh Whelan contributed equally to this study.

Disclosure statement No potential conflict of interest was reported by the authors.

Funding This project was partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer [EPSPG/2013/574] and partly funded by Science Foundation Ireland [SFI/12/RC/2289].

ORCID Martin A. O’Reilly   http://orcid.org/0000-0003-2425-5393 Darragh F. Whelan   http://orcid.org/0000-0003-4566-1243 Tomas E. Ward   http://orcid.org/0000-0002-6173-6607 Eamonn Delahunt   http://orcid.org/0000-0001-5449-5932 Brian Caulfield   http://orcid.org/0000-0003-0290-9587

References Ahmadi, A., Mitchell, E., Destelle, F., Gowing, M., O’Connor, N. E., Richter, C., & Moran, K. (2014, June). Automatic activity classification and movement assessment during a sports training session using wearable inertial sensors. In Proceedings of the 2014 11th international conference on wearable and implantable body sensor networks (BSN) (pp. 98–103). Zurich: IEEE. Alkjær, T., Simonsen, E. B., Magnusson, S. P., Aagaard, H., & Dyhre-Poulsen, P. (2002). Differences in the movement pattern of a forward lunge in two types of anterior cruciate ligament deficient patients: Copers and non-copers. Clinical Biomechanics, 17, 586–593. Baechle, T. R., & Earle, R. W. (2004). Resistance training exercise techniques. In Thomas R. Baechle & Roger W. Earle (Eds.), NSCA’s essentials of personal training (p. 324). Champaign, IL: Human Kinetics. Bonnechere, B., Jansen, B., Salvia, P., Bouzahouene, H., Omelina, L., Moiseev, F., & Van Sint Jan, S. (2014). Validity and reliability of the Kinect within functional assessment activities: Comparison with standard stereophotogrammetry. Gait & Posture, 39, 593–598. Bonnet, V., Mazza, C., Fraisse, P., & Cappozzo, A. (2013). Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Transactions on Biomedical Engineering, 60, 1920–1926. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

321

18 

 M. A. O’REILLY ET AL.

Chen, C., Jafari, R., & Kehtarnavaz, N. (2016). A Real-time human action recognition system using depth and inertial sensor fusion. IEEE Sensors Journal, 16, 773–781. Chmielewski, T. L., Hodges, M. J., Horodyski, M., Bishop, M. D., Conrad, B. P., & Tillman, S. M. (2007). Investigation of clinician agreement in evaluating movement quality during unilateral lower extremity functional tasks: A comparison of 2 rating methods. Journal of Orthopaedic & Sports Physical Therapy, 37, 122–129. Cook, G., Burton, L., & Hoogenboom, B. (2006). Pre-participation screening: The use of fundamental movements as an assessment of function—Part 1. North American Journal of Sports Physical Therapy, 1, 62–72. Cronin, J., McNair, P., & Marshall, R. (2003). Lunge performance and its determinants. Journal of Sports Sciences, 21, 49–57. Escamilla, R. F. (2001). Knee biomechanics of the dynamic squat exercise. Medicine & Science in Sports & Exercise, 33, 127–141. Farrokhi, S., Pollard, C. D., Souza, R. B., Chen, Y.-J., Reischl, S., & Powers, C. M. (2008). Trunk position influences the kinematics, kinetics, and muscle activity of the lead lower extremity during the forward lunge exercise. Journal of Orthopaedic & Sports Physical Therapy, 38, 403–409. Fitzgerald, D., Foody, J., Kelly, D., Ward, T., Markham, C., McDonald, J., & Caulfield, B. (2007, August). Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises. In Proceedings of the 29th annual international conference of the IEEE on engineering in medicine and biology society (EMBC) (pp. 4870–4874). Lyon, France: IEEE. Giggins, O. M., Sweeney, K. T., & Caulfield, B. (2014). Rehabilitation exercise assessment using inertial sensors: A cross-sectional analytical study. Journal of NeuroEngineering and Rehabilitation, 11, 158–168. Gowing, M., Ahmadi, A., Destelle, F., Monaghan, D. S., O’Connor, N. E., & Moran, K. (2014, January). Kinect vs. low-cost inertial sensing for gesture recognition. In Proceedings of the 2014 international conference on multimedia modeling (pp. 484–495). Dublin: Springer International Publishing. Hall, M., Nielsen, J. H., Holsgaard-Larsen, A., Nielsen, D. B., Creaby, M. W., & Thorlund, J. B. (2015). Forward lunge knee biomechanics before and after partial meniscectomy. The Knee, 22, 506–509. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Classification. In G. Casella, S. Fienberg, & I. Olkin (Eds.), An introduction to statistical learning (Vol. 6, pp. 127–174). New York, NY: Springer. Jerri, A. J. (1977). The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proceedings of the IEEE, 65, 1565–1596. Leardini, A., Lullini, G., Giannini, S., Berti, L., Ortolani, M., & Caravaggi, P. (2014). Validation of the angular measurements of a new inertial-measurement-unit based rehabilitation system: Comparison with state-of-the-art gait analysis. Journal of Neuroengineering and Rehabilitation, 11, 1–7. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2, 18–22. Madgwick, S. O., Harrison, A. J., & Vaidyanathan, R. (2011, June). Estimation of IMU and MARG orientation using a gradient descent algorithm. In Proceedings of the 2011 IEEE International Conference on Rehabilitation Robotics (ICORR) (pp. 1–7). Zurich: IEEE. McGrath, D., Greene, B. R., O’Donovan, K. J., & Caulfield, B. (2012). Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering, 15, 207–213. Melzi, S., Borsani, L. P., & Cesana, M. (2009, June). The virtual trainer: Supervising movements through a wearable wireless sensor network. In Proceedings of the 6th Annual Ieee Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, 2009 (SECON Workshops) (pp. 1–3). Rome, Italy: IEEE. Morris, D., Saponas, T. S., Guillory, A., & Kelner, I. (2014, April). RecoFit: Using a wearable sensor to find, recognize, and count repetitive exercises. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems (pp. 3225–3234). Toronto, Canada: ACM. O’Reilly, M., Whelan, D., Chanialidis, C., Friel, N., Delahunt, E., Ward, T., & Caulfield, B. (2015, June). Evaluating squat performance with a single inertial measurement unit. In Proceedings of the 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN) (pp. 1–6). Boston, MA: IEEE.

322

SPORTS BIOMECHANICS 

 19

Pernek, I., Hummel, K. A., & Kokol, P. (2013). Exercise repetition detection for resistance training based on smartphones. Personal and Ubiquitous Computing, 17, 771–782. Pernek, I., Kurillo, G., Stiglic, G., & Bajcsy, R. (2015). Recognizing the intensity of strength training exercises with wearable sensors. Journal of Biomedical Informatics, 58, 145–155. Powden, C. J., Hoch, J. M., & Hoch, M. C. (2015). Reliability and minimal detectable change of the weight-bearing lunge test: A systematic review. Manual Therapy, 20, 524–532. Riemann, B. L., Lapinski, S., Smith, L., & Davies, G. (2012). Biomechanical analysis of the anterior lunge during 4 external-load conditions. Journal of Athletic Training, 47, 372–378. Tang, Z., Sekine, M., Tamura, T., Tanaka, N., Yoshida, M., & Chen, W. (2015). Measurement and estimation of 3D orientation using magnetic and inertial sensors. Advanced Biomedical Engineering, 4, 135–143. Taylor, P. E., Almeida, G. J., Hodgins, J. K., & Kanade, T.. (2012, August). Multi-label classification for the analysis of human motion quality. In Proceedings of the 2012 Annual International Conference of the IEEE on Engineering in Medicine and Biology Society (EMBC) (pp. 2214–2218). San Diego, CA: IEEE. Taylor, P. E., Almeida, G. J., Kanade, T., & Hodgins, J. K. (2010, August). Classifying human motion quality for knee osteoarthritis using accelerometers. In Proceedings of the 2010 Annual International Conference of the IEEE on Engineering in Medicine and Biology Society (EMBC) (pp. 339–343). Buenos Aires, Argentina: IEEE. Velloso, E., Bulling, A., Gellersen, H., Ugulino, W., & Fuks, H. (2013, March). Qualitative activity recognition of weight lifting exercises. In Proceedings of the 4th Augmented Human International Conference (pp. 116–123). Stuttgart, Germany: ACM. Whatman, C., Hing, W., & Hume, P. (2012). Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Physical Therapy in Sport, 13, 87–96. Whelan, D., O’Reilly, M., Ward, T., Delahunt, E., & Caulfield, B. (2015, October). Evaluating performance of the single leg squat exercise with a single inertial measurement unit. In Proceedings of the 3rd 2015 Workshop on ICTs for Improving Patients Rehabilitation Research Techniques (REHAB) (pp. 144–147). ACM. Whelan, D., O’Reilly, M., Ward, T., Delahunt, E., & Caulfield, B. (2016). Evaluating performance of the lunge exercise with multiple and individual inertial measurement units. In Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare (pp. 101–108). Cancun, Mexico: ACM.

323

TECHNICAL REPORT

TECHNOLOGY IN STRENGTH AND CONDITIONING: ASSESSING BODYWEIGHT SQUAT TECHNIQUE WITH WEARABLE SENSORS MARTIN A. O’REILLY,1,2 DARRAGH F. WHELAN,1,2 TOMAS E. WARD,3 EAMONN DELAHUNT,2 BRIAN M. CAULFIELD1,2

AND

1

Insight Center for Data Analytics, University College Dublin, Dublin, Republic of Ireland; 2School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Republic of Ireland; and 3Insight Center for Data Analytics, Maynooth University, Maynooth, Republic of Ireland ABSTRACT

O’Reilly, MA, Whelan, DF, Ward, TE, Delahunt, E, and Caulfield, BM. Technology in strength and conditioning: assessing bodyweight squat technique with wearable sensors. J Strength Cond Res 31(8): 2303–2312, 2017—Strength and conditioning (S&C) coaches offer expert guidance to help those they work with achieve their personal fitness goals. However, it is not always practical to operate under the direct supervision of an S&C coach and consequently individuals are often left training without expert oversight. Recent developments in inertial measurement units (IMUs) and mobile computing platforms have allowed for the possibility of unobtrusive motion tracking systems and the provision of real-time individualized feedback regarding exercise performance. These systems could enable S&C coaches to remotely monitor sessions and help individuals record their workout performance. One aspect of such technologies is the ability to assess exercise technique and detect common deviations from acceptable exercise form. In this study, we investigate this ability in the context of a bodyweight (BW) squat exercise. Inertial measurement units were positioned on the lumbar spine, thighs, and shanks of 77 healthy participants. Participants completed repetitions of BW squats with acceptable form and 5 common deviations from acceptable BW squatting technique. Descriptive features were extracted from the IMU signals for each BW squat repetition, and these were used to train a technique classifier. Acceptable or aberrant BW squat technique can be detected with 98% accuracy, 96% sensitivity, and 99% specificity when using features derived from all 5 IMUs. A single IMU system can also distinguish between acceptable and aberrant BW squat biomechanics with excellent accuracy, sensitivity, and Address correspondence to Martin A. O’Reilly, [email protected]. 31(8)/2303–2312 Journal of Strength and Conditioning Research Ó 2017 National Strength and Conditioning Association

specificity. Detecting exact deviations from acceptable BW squatting technique can be achieved with 80% accuracy using a 5 IMU system and 72% accuracy when using a single IMU positioned on the right shank. These results suggest that IMUbased systems can distinguish between acceptable and aberrant BW squat technique with excellent accuracy with a single IMU system. Identification of exact deviations is also possible but multi-IMU systems outperform single IMU systems.

KEY WORDS inertial measurement units, compound exercise, exercise biofeedback, exercise classification INTRODUCTION

I

n resistance training, the bodyweight (BW) squat is a compound full-body exercise, whose constituent movements are integral to activities of daily living. It commonly features as a fundamental exercise in resistance training and rehabilitation programs. Furthermore, it is incorporated into musculoskeletal injury risk identification protocols (8). The National Strength and Conditioning Association (NSCA) have outlined a number of common deviations from acceptable squat technique (2). These aberrant biomechanics have been shown to increase stress on the joints of the lower extremity (11), potentiating the risk of injury. Thus, the reliable assessment of BW squat biomechanics is necessary to mitigate injury risk. The assessment of BW squat technique is typically undertaken using 1 of the 2 distinct methods: (a) 3-D motion capture; (b) subjective visual analysis. Both these have a number of limitations. 3-D motion capture systems are expensive, and the application of skin-mounted markers may hinder normal movement (1,3). Furthermore, data processing can be time intensive and specific expertise is often required to interpret the processed data and to make recommendations on the observed results. Therefore, these systems are not frequently used to assess BW squat technique beyond the research laboratory (5). In clinical- and gymbased settings, subjective visual assessment is typically used to assess BW squat technique. This subjective visual VOLUME 31 | NUMBER 8 | AUGUST 2017 |

2303

324

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

Assessing Squat Technique With Wearable Sensors assessment of human biomechanics is not always reliable even among experts as the need to visually assess numerous constituent components simultaneously is challenging (29). Wearable inertial measurement units (IMUs) may offer the potential to bridge the gap between laboratory and dayto-day “real-world” acquisition and assessment of human movement. These IMUs are small, inexpensive sensors that consist of accelerometers, gyroscopes, and magnetometers. They are able to acquire data pertaining to the inertial motion and 3D orientation of individual limb segments (15,21). Selfcontained, wireless IMU devices are easy to set-up and allow for the acquisition of human movement data in unconstrained environments (16). In this article, the term IMU system will be used to describe the IMU sensors, the sensor signals, the associated signal processing applied to them, and the output of the exercise classification algorithms. Inertial measurement unit systems can robustly track the variety of postures and environmental complexities associated with training, unlike camera-based systems, which are hampered by location, occlusion, and lighting issues in such settings (19). Inertial measurement units have also been shown to be as effective as marker-based systems at measuring joint angles (5,14,25). There are many commercially available examples of IMU systems that monitor physical activity (e.g., Jawbone (San Francisco, CA, USA) and Fitbit (San Francisco, CA, USA)). However, using IMU systems to assess gym-based exercises is less common. Researchers have demonstrated the ability of IMU-based systems to distinguish different gym-based exercises and count repetitions of these exercises with moderate to good levels of accuracy (7,20,22–24). Although these aforementioned systems detail information on the number of repetitions performed, they do not provide instruction on exercise technique and quality of performance. A holistic exercise tracking system should not only recognize the exercise completed, but should also provide technique feedback. Furthermore, in order for IMU systems to be used as an objective method to acquire and assess human movement data as part of a musculoskeletal injury risk-screening protocol, they need to be able to identify aberrant movement patterns and provide easily interpretable data to clinicians and coaches who use them. A growing body of scientific literature has investigated the utility of IMU systems to assess exercise technique. Taylor et al. (26) used a 5 IMU system to categorize 5 technique deviations during performance of the standing hamstring curl exercise and 4 technique deviations during performance of the straight-leg raise exercise. They were able to identify these deviations with 80% accuracy, 75% sensitivity, and 90% specificity. Pernek et al. (23) also used a 5 IMU system to monitor exercise intensity in 6 dumbbell upper limb exercises (biceps curl, single arm triceps extension, front vertical lift, lateral vertical lift, bent-over row, and military press), demonstrating an average error of just 6% for intensity prediction. Melzi et al. (17) used a wireless body area network of accelerometers to assess performance of the biceps curl.

2304

the

However, their proposed approach was very exercise specific and difficult to transfer to other regularly performed gym-based exercises (23). Research undertaken by Velloso et al. (28) examined the ability of a 4 IMU system to identify 4 deviations from acceptable technique during a unilateral dumbbell bicep curl. Their reported overall accuracy ranged from 74 to 86%. Although these results are encouraging, the multiple IMU systems detailed above are expensive and their use may prove impractical due to the increased risk of placement error and comfort issues (5). In addition, the use of multiple IMUs leads to increased overall power requirements and connectivity complexity with respect to the hosting device (e.g., smartphone, tablet). A reduced IMU set-up is more desirable for daily environment applications (4). Bonnet et al. (5) reported that a single IMU mounted on the lower back could measure ankle, knee, and hip joint angles in the sagittal plane during performance of the BW squat exercise, with a maximal error of 3.58 compared with a motion capture system. This indicates that relatively accurate quantification of sagittal plane lower limb joint movement can be achieved using a single sensor unit system. Giggins et al. (10) demonstrated an overall accuracy of 79–83% using a single IMU on either the foot, shin, or thigh to identify deviations in the performance of 7 exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads, and straight-leg raise). Pernek et al. (22) analyzed the ability of an accelerometer contained within a smartphone to assess exercise performance during 9 free-weight and machine resistance exercises. They assessed movement quality based on the speed of exercise performance and reported a temporal error of 11% for individual repetition duration. In summary, the BW squat is a compound full-body exercise that is typically a constituent component of resistance training, rehabilitation programs, and musculoskeletal injury risk-screening protocols. Incorrect BW squat technique can heighten the risk of injury. Traditionally, exercise technique has been evaluated using expensive motion capture systems or via subjective visual inspection from trained professionals. Inertial measurement unit systems offer an opportunity to provide low-cost exercise technique assessment. However, to date, no research has evaluated the capability of IMU systems to assess BW squat technique. The overall aim of this research was to evaluate the ability of a number of IMU system configurations to distinguish between acceptable and aberrant BW squat biomechanics. Aberrant BW squatting biomechanics were defined as BW squats that contained one of the common lower limb deviations identified by the NSCA (2). These deviations are outlined in Table 1.

METHODS Experimental Approach to the Problem

This study used an opportunistic approach to the development of a wearable sensor system for automatically assessing

TM

Journal of Strength and Conditioning Research

325

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

the

TM

Journal of Strength and Conditioning Research

| www.nsca.com

TABLE 1. List and description of squat exercise deviations used in this study and the number of repetitions (n) extracted of each class. Deviation ACC KVL KVR KTF HE BO

Description

Total reps (n)

Acceptable squat technique Knees coming together during downward phase Knees coming apart during downward phase Knees ahead of toes during downward phase Heels raising off the ground during squat exercise Excessive flexion of hip and torso during squat exercise

722 221 222 225 228 231

BW squat technique. Two distinct types of technique assessment were evaluated. “Binary classification” describes systems which classify one’s performance as “acceptable” or “aberrant” without detecting the specific deviation occurring. “Multi-label classification” describes systems that can detect the specific deviation occurring (Table 1) and discern between these different aberrant movement patterns. Participants were required to perform 10 BW squats (no external resistance) with acceptable technique, followed by 3 BW squats with each of the predefined deviations from acceptable technique shown in Table 1. During performance of the BW squats, data were acquired from 5 IMUs (SHIMMER; Shimmer Research, Dublin, Ireland) placed on the lumbar spine, right and left thigh, and right and left shank. The IMUs were positioned on each participant by the same researcher using a standardized and repeatable protocol. Participants were allowed a rest interval (around 1 minute) between performances of each set of BW squat repetitions. After data collection, a total of 306 variables were extracted from the sensor signals for every BW squat repetition from each IMU. These variables were used to develop and evaluate the quality of an automated classification system for the analysis of BW squat technique. This was undertaken using data derived from each individual IMU and combinations of multiple IMUs. Subjects

Seventy-seven healthy volunteers aged 16–40 (55 males, 22 females, age = 22.63 6 4.87 years, height = 1.75 6 0.09 m, body mass = 75.03 6 13.16 kg) participated in the study. No participant reported having a current or recent musculoskeletal injury that would impair his or her performance of the BW squat exercise. All participants reported a level of familiarity with the BW squat exercise. The University College Dublin Human Research Ethics Committee approved the study protocol, and written informed consent was obtained from all participants before testing. In cases where participants were younger than 18 years, written informed consent was also obtained from a parent or guardian.

Procedures

The testing protocol was explained to participants on their arrival at the laboratory. Before formal testing, all participants performed a 10-minute warm-up on a Lode B.V. exercise bike (Groningen, the Netherlands) maintaining a power output of 100 W and constant cadence of 75–85 revolutions per minute. After completion of the warm-up, a Chartered Physiotherapist secured the IMUs to predetermined specific anatomic locations on the participant as follows: over one’s clothing at the spinous process of the fifth lumbar vertebra, and directly strapped to the midpoint of both the right and left femurs (determined as half way between the greater trochanter and lateral femoral condyle), and on both shanks 2 cm above the lateral malleolus (Figure 1). The orientation and location of the IMUs was consistent across participants. The neoprene straps used were specifically designed for application in exercise environments. The thick elastic design of the straps minimized the unwanted deviation of IMU position because of loose clothing and movement artifact. A pilot study was undertaken to determine the most appropriate sampling rate and the ranges for the accelerometer and gyroscope on board the IMUs. For the pilot study, data were acquired (512 samples per second) during performance of the BW squat, lunge, deadlift, single-leg BW squat, and tuck jump exercises. A Fourier transform was then used to estimate the spectral extent of the signals which was found to be less than 20 Hz. Therefore, a sampling rate of 51.2 samples per second was chosen based on the Shannon sampling theorem and the Nyquist criterion (12). Each IMU was configured to stream triaxial accelerometer (616 g), gyroscope (65008$s21), and magnetometer (61 Ga) data with the sensor ranges chosen based on data from the pilot study. Each IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application (http://www.shimmersensing.com/shop/shimmer-9dofcalibration). Initially, participants completed 10 repetitions of the BW squat exercise with acceptable technique. The criteria for acceptable technique were based on the recommendations VOLUME 31 | NUMBER 8 | AUGUST 2017 |

2305

326

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

Assessing Squat Technique With Wearable Sensors Statistical Analyses

Figure 1. Image showing the 5 IMU positions: (1) The spinous process of the fifth lumbar vertebra, (2 and 3) the midpoint of both femurs on the lateral surface (determined as half way between the greater trochanter and lateral femoral condyle), (4 and 5) and on both shanks 2 cm above the lateral malleolus. IMU = inertial measurement unit.

detailed in the NSCA guidelines (2). This involved participants holding their chest up and out with the head tilted slightly up. As participants moved down into the BW squat position, they were instructed to allow their hips and knees to flex while keeping their torso to floor angle constant. Furthermore, they were required to keep their heels on the floor and knees aligned over their feet. Participants were required to continue flexing at the hips and knees until their thighs were parallel to the floor. As they moved upward, a flat back was to be maintained and they were instructed to keep their chest up and out. Hips and knees were to be extended at the same rate with heels on floor and knees aligned over feet. Participants then extended their hips and knees to reach the starting position. Once the BW squat had been completed with acceptable technique, the participant was instructed to complete the exercise with the technique deviations specified in Table 1. They completed 3 repetitions of each predefined technique deviation. Verbal instructions and a demonstration were provided to all participants, and they were allowed a practice trial to ensure that they were comfortable completing the BW squat with the predefined technique deviations. No external resistance was added during performance of any of the BW squat repetitions. A Chartered Physiotherapist was present during all data collection sessions and ensured that the BW squat and all predefined technique deviations had been completed as instructed. These technique deviations were chosen based on common lower limb deviations outlined by the NSCA (2).

2306

the

Nine signals were collected from each IMU; accelerometer x, y, and z, gyroscope x, y, and z, and magnetometer x, y, and z. Data were analyzed using MATLAB (2012, The MathWorks, Natick, MA, USA). To ensure that the data analyzed applied to each participant’s movement and to eliminate unwanted high-frequency noise, the 9 signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Nine additional signals were then calculated. The 3-D orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al. (15). The resulting W, X, Y, and Z quaternion values were also converted to pitch, roll, and yaw signals. The pitch, roll, and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal, and transverse plane respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y, and z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the lumbar spine and acceleration due to gravity. In addition, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y, and z. Each exercise repetition was extracted from the IMU data and resampled to a length of 250 samples; this was undertaken to minimize the influence of the speed of repetition performance on signal feature calculations. It also ensured the computed features related to differences in movement patterns and not the participant’s exercise tempo. Repetitions completed by the participant where the IMU’s Bluetooth signal dropped were excluded from analysis. The total number of repetitions extracted for each class and used for classification is shown in Table 1. Timedomain and frequency-domain descriptive features were computed to describe the pattern of each of the 18 signals when the 5 different exercises were completed. These features were as follows: “Mean,” “RMS,” “Standard Deviation,” “Kurtosis,” “Median,” “Skewness,” “Range,” “Variance,” “Max,” “Min,” “Energy,” “25th Percentile,” “75th Percentile,” “Level Crossing Rate,” “Fractal Dimension” (13), and the “variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 7” (http://uk.mathworks.com/ help/wavelet/ref/dwt.html). This resulted in 17 features for each of the 18 available signals producing a total of 306 features per IMU. Figure 2 summarizes the above, whereby 5 IMUs recorded 9 signals each; 9 more signals were derived from these, resulting in a total of 18 signals per IMU. Seventeen features were computed per BW squat repetition for each signal from each IMU, resulting in a total of 1,530 features (306 per IMU, 17 per signal). These features were then used to develop and evaluate a variety of classifiers.

TM

Journal of Strength and Conditioning Research

327

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

the

TM

Journal of Strength and Conditioning Research

| www.nsca.com

Initially, binary classification was evaluated to establish how effectively each individual IMU and each combination of IMUs could distinguish between acceptable and aberrant BW squat technique. All repetitions of acceptable performance of the BW squat were labeled “0,” and all repetitions of the BW squat performed with one of the predefined deviations as outlined in Table 1 were labeled “1.” Multilabel classification was then evaluated on the IMU data to investigate how effectively each individual IMU and each IMU combination could be used to discriminate between acceptable performance of the BW squat exercise and each of the 5 predefined deviations from acceptable technique as described in Table 1. All repetitions of acceptable performance of the BW squat remained labeled as “0,” and each of the different deviations Figure 2. Diagram linking number of IMUs, number of recorded and derived signals, number of features extracted, was labeled as “1–5.” and the variety of feature combinations used to test classifiers. IMUs = inertial measurement units; LOSOCV = The quality of the exercise leave-one-subject-out cross-validation. classification system was established using leave-onesubject-out cross-validation (LOSOCV) and the randomThe random-forests method was used to perform classiforests classifier with 400 trees (9). Each participant’s data fication (6). This technique was chosen as it has been shown correspond to one fold of the cross validation. At each fold, to be effective in analyzing exercise technique with IMUs one participant’s data are held out as test data, whereas the when compared with the Naive-Bayes and Radial-basis random-forests classifier is trained with all other particifunction network techniques (18). Four hundred decision pants’ data. Where each class in the training data did not trees were used in each random-forest classifier. Classifiers have an equal number of instances (i.e., equal number of were developed and evaluated for the 10 combinations of acceptable and aberrant repetitions in binary classification), IMUs as shown in Table 2.

TABLE 2. Inertial measurement unit combinations compared and number of features used for classification.* Multiple IMUs All 5 IMUs Lumbar and shanks Lumbar and thighs Both shanks Both thighs

N features 1,530 918 918 612 612

(5 (3 (3 (2 (2

3 3 3 3 3

306) 306) 306) 306) 306)

Individual IMUs

N features

Left shank Left thigh Lumbar Right thigh Right shank

306 306 306 306 306

*IMUs = inertial measurement units.

VOLUME 31 | NUMBER 8 | AUGUST 2017 |

2307

328

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

Assessing Squat Technique With Wearable Sensors

TABLE 3. Overall accuracy, sensitivity, and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV.* Sensor(s) All 5 sensors Lumbar and shanks Lumbar and thighs Both shanks Both thighs Left shank Left thigh Lumbar Right thigh Right shank

Accuracy (%)

Sensitivity (%)

Specificity (%)

97.68 96.30 97.26 96.24 97.41 95.71 97.70 95.17 95.76 95.09

96.24 94.26 95.18 93.80 96.30 93.19 96.02 92.73 93.63 93.69

98.55 97.55 98.56 97.80 98.10 97.38 98.81 96.73 97.09 95.95

*IMUs = inertial measurement units; LOSOCV = leave-one-subject-out cross-validation.

random instances of the overrepresented class(es) were removed to balance the training data. The held out data are used to assess the classifier’s ability to correctly categorize new data it is presented with. The use of LOSOCV ensures that there is no biasing of the classifiers because the test subjects’ data are completely unseen by the classifier before testing. Previous research by Taylor et al. (27) has shown that the use of the test data in training produces results which are not reflective of performance with data not previously seen. The scores used to measure the quality of classification were total accuracy, average sensitivity, and average specificity. Accuracy is the number of correctly classified repetitions of all the exercises divided by the total number of repetitions completed; this is calculated as the sum of the true positives (TPs) and true negatives (TNs) divided by the

sum of the TPs, false positives (FPs), TNs, and false negatives (FNs):

Accuracy ¼

TP þ TN : TP þ FP þ TN þ FN

(1)

In binary classification, acceptable BW squat technique was considered the “positive” class, and aberrant BW squat technique was considered the “negative” class. As such, single sensitivity and specificity values were computed to establish binary classification quality for each IMU combination. In multilabel classification, the sensitivity and specificity were calculated for each of the 6 class labels as outlined in Table 1. Each label was sequentially treated as the “positive” class, and then the mean and SD across the 6 values was taken. Sensitivity and specificity were computed using the formulas given below:

TABLE 4. Overall accuracy, average sensitivity, and average specificity in multilabel classification (exact deviation) for each combination of IMUs following LOSOCV.* Sensor(s) All 5 sensors Lumbar and shanks Lumbar and thighs Both shanks Both thighs Left shank Left thigh Lumbar Right thigh Right shank

Sensitivity (%) 6 SD

Accuracy (%) 79.91 77.18 71.25 74.48 66.97 72.12 66.65 59.75 63.35 73.10

75.19 71.79 63.27 68.52 56.39 65.03 55.60 47.86 52.98 66.94

6 6 6 6 6 6 6 6 6 6

12.73 14.94 20.38 16.88 27.19 17.83 26.22 29.62 27.41 16.66

Specificity (%) 6 SD 96.04 95.52 94.38 94.94 93.57 94.51 93.55 92.15 92.85 94.71

6 6 6 6 6 6 6 6 6 6

1.39 1.09 1.80 0.90 1.87 1.47 2.18 2.03 2.34 1.33

*IMUs = inertial measurement units; LOSOCV = leave-one-subject-out cross-validation.

2308

the

TM

Journal of Strength and Conditioning Research

329

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

the

TM

Journal of Strength and Conditioning Research Sensitivity ¼

TP TP þ FN

(2)

Specificity ¼

TN TN þ FP

(3)

Sensitivity measures the effectiveness of a classifier at identifying a desired label, whereas specificity measures the classifier’s ability to detect other labels. In reviewing the accuracy, sensitivity, and specificity scores produced by each classifier, 90% or higher was considered an “excellent” quality result; 80–89% was considered a “good” quality result; 60–79% was considered a “moderate” result; and anything less than 59% was deemed a poor result. The authors chose these values after reviewing the aforementioned literature on identifying deviations from acceptable exercise performance using data derived from IMUs. In reviewing such literature, an existing accepted standard for an excellent, good, moderate, or poor classifier could not be found. Therefore, the above system was agreed on by the authors to facilitate interpretation of results.

RESULTS Table 3 highlights the overall binary classification accuracy, sensitivity, and specificity achieved by each IMU combination and each individual IMU. Acceptable or aberrant BW squat technique can be detected with 98% accuracy, 96% sensitivity, and 99% specificity when using features derived from all 5 IMUs for classification. It is evident that systems developed using data derived from fewer IMUs can produce equivalent binary classification quality. For instance, a single IMU positioned on the left thigh also achieves 98% accuracy, 96% sensitivity, and 99% specificity.

Figure 3. Heat map confusion matrix for multilabel (exact deviation) classification results using data derived from all 5 IMUs. IMUs = inertial measurement units.

| www.nsca.com

Figure 4. Heat map confusion matrix for multilabel (exact deviation) classification results using data derived from 3 IMUs placed on both shanks and the lumbar spine. IMUs = inertial measurement units.

Table 4 demonstrates the multilabel classification results determined with LOSOCV. The overall accuracy, average sensitivity, and average specificity are displayed for multiple and single IMU systems positioned on various body segments. A 5 IMU system achieves 80% overall accuracy and an average of 75% sensitivity and 96% specificity. Figure 3 shows that for this IMU set-up, acceptable BW squat technique repetitions are detected with a 95% TP rate. The predefined deviation KTF is the least detected deviation with only a 57% TP rate. A system which uses 3 IMUs positioned on both shanks and the lumbar spine achieved 77% overall accuracy. This set-up averaged 72% and 96% sensitivity and specificity,

Figure 5. Heat map confusion matrix for multilabel (exact deviation) classification results using data derived from a single IMU placed on the right shank. IMU = inertial measurement unit.

VOLUME 31 | NUMBER 8 | AUGUST 2017 |

2309

330

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

Assessing Squat Technique With Wearable Sensors respectively. Figure 4 shows that acceptable BW squat technique is again detected with the highest TP rate; 94%. The predefined deviations KVL and KTF were the most poorly classified. This 3 IMU set-up misclassifies 15% of KTF repetitions to be HE and 12% of KVL repetitions as BO. A classifier developed with data from a single IMU positioned on the right shank produced 73, 67, and 95% overall accuracy, average sensitivity, and average specificity, respectively. Perhaps counterintuitively this IMU positioned on the shank correctly identifies more BO repetitions than KTF repetitions. In this set, almost 18% of KTF repetitions are misclassified as HE. KVR and HE repetitions are correctly identified with a 74 and 75% TP rate, respectively.

DISCUSSION This article describes a study to assess the ability of an IMUbased system to distinguish between acceptable and aberrant BW squat technique and compares results using different IMU set-ups. The results presented in this article indicate that an IMU-based system is capable of distinguishing between acceptable and aberrant BW squat technique (binary classification) with an excellent level of overall accuracy with 5, 3, 2, or single IMU set-ups (Table 3). Overall accuracy, sensitivity, and specificity scores are reduced when attempting to identify specific deviations (multilabel classification) as seen in Table 4. All 5 IMUs are capable of identifying specific deviations with a good level of accuracy (80%). Overall accuracy with a reduced IMU set-up (less than 5 sensors) is moderate, with right shank proving the best location for a single IMU. An IMU system at this location is able to distinguish between 5 deviations with just 7% less overall accuracy compared with a 5 IMU set-up. Multilabel scores are likely to be reduced compared with binary level scores because of the number of classes in the multilabel set. The multilabel classifier needs to be able to distinguish between 6 classes (acceptable, KVL, KVR, KTF, HE, and BO) compared with just 2 in binary classification (acceptable and aberrant). Table 4 demonstrates that multilabel sensitivity scores are far less than specificity scores, impacting on the overall accuracy. Accuracy is related to both sensitivity and specificity (equation 1). Sensitivity measures the effectiveness of a classifier at identifying a desired label (equation 2), whereas specificity measures the classifiers ability to identify if a desired label has not occurred (equation 3). In a real-world application, the ability to identify a desired label is likely to be more important than the ability to detect other labels (i.e., the IMU system would function better if it were able to identify if knee valgus was present rather than rule out knee valgus). Therefore, higher sensitivity scores would likely improve user experience. The confusion matrices for 5 (Figure 3), 3 (Figure 4), and single (Figure 5) IMU systems show that all are able to detect acceptable BW squatting technique with an excellent TP rate. In each IMU set-up, KTF is the label with the lowest TP rate and it is most commonly confused with

2310

the

HE. The HE deviation is quite similar in nature to KTF and feedback given to prevent this may be comparable. Therefore, an IMU system that looks to provide BW squat technique feedback may be able to group these deviation classes and provide similar feedback once this deviation grouping is detected. This would likely improve the multilabel scores shown in Table 4, as it would reduce the total number of classes to 5 (acceptable, KVL, KVR, KTF+HE, and BO). The presented results compare favorably with other research in the area. We were able to demonstrate the same overall accuracy (80%), sensitivity (75%), and a 16% improvement in specificity (96%) using a 5 IMU set-up compared with Taylor et al. (26) who analyzed 2 exercises (standing hamstring curl and straight-leg raise). Pernek et al. (23) also used a 5 IMU set-up to monitor exercise intensity in 6 dumbbell upper limb exercises. They did not identify specific deviations but rather exercise intensity as scored by Borg’s rating of perceived exertion. As such, their system would not be able to feedback the exact deviations to the user. Melzi et al. (17) presented a demonstration article on detecting errors in the biceps curl, however did not report on overall accuracy, sensitivity, and specificity scores making it impossible to compare the results. Velloso et al. (28) demonstrated an overall accuracy of 78% detecting 4 deviations from acceptable form in the unilateral dumbbell curl. The overall accuracy presented here is higher with a 5 IMU system and just 1% lower with the best 3 IMU system. It is worth noting that one of the IMUs used by Velloso et al. must be placed on the dumbbell being used. This may make it harder to implement in a real-world setting, as the IMU would need to be swapped between dumbbells during a training session. As discussed in the introduction, a single IMU system is more desirable than a multiple IMU set-up because of ease of use and power considerations. The results in Table 3 indicate that a single IMU system can classify BW squats as acceptable or aberrant with excellent overall accuracy scores (.95% regardless of sensor position). The ability of a single IMU system to identify which deviation has occurred (multilabel classification as shown in Table 4) is moderate, with the right shank showing the highest overall accuracy (73%). A possible reason for this may be that the deviations investigated may involve a high degree of movement in the shanks to complete the aberrant movement. The left and right shank show similar overall classification results. Overall classification scores are around 1–2% higher using the right shank sensor compared with the left. This discrepancy could be attributed to the fact that the majority of the participants were right foot dominant, leading to overcompensation on this side. The confusion matrix in Figure 5 shows that the right shank is able to classify normal reps with an excellent TP rate, but deviations such as KTF and BO are less clearly classified, possibly due to the similar movement profile needed to complete these deviations.

TM

Journal of Strength and Conditioning Research

331

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

the

TM

Journal of Strength and Conditioning Research Pernek et al. (22) used a single IMU to capture resistance training information and were able to recognize repetition duration with a temporal detection error of about 11%. However, different exercise goals may require varying movement speeds. As such, assessment based on speed alone is not a holistic way of evaluating exercise technique. Bonnet et al. (5) demonstrated that a single IMU system on the lower back could measure ankle, knee, and hip joint angles in the sagittal plane during the BW squat with a maximal error of 3.5 degrees compared with a Vicon motion capture system in human participants. However, the ability to display angles to the end user may not be actionable information to an individual not trained in biomechanics as they may not be able to distinguish what angle range represents an aberrant movement pattern. Furthermore, the authors were only able to identify sagittal plane angles. Many common BW squat deviations can occur in different or multiple planes simultaneously, such as knee valgus and varus deviations, which occur predominantly in the frontal plane. Giggins et al. (10) assessed the ability of a single IMU to identify deviations in 7 exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads, and straight-leg raise) with an overall accuracy of 79–83% depending on where the IMU was positioned. These results are slightly higher than those presented in this work; however, the authors looked at a maximum of 3 deviations for each exercise, whereas some of the exercises only had one deviation (i.e., binary level classification). Our work sought to identify 5 deviations from normal, and this may go some way to explaining the lower overall accuracy scores compared with Giggins et al. Furthermore, the BW squat exercise is more complex than the exercises classified by Giggins et al. with deviations occurring in multiple joints simultaneously. Comparing results presented in this article with the above work is challenging because of differences in exercises investigated, sensor positions, and feedback given to end users. However, these results build on the previous work. The majority of research to date has investigated the ability of IMU systems to monitor technique in simple exercises such as heel slides (10), dumbbell curls (28), or straight-leg raises (26). This article describes an evaluation of an IMU system’s ability to quantify BW squatting performance, a complex exercise that involves multiple joints. This system has also demonstrated the ability to identify 5 deviations from normal technique (Table 4). The reduced number of deviations in some of the studies (10,26,28) may make it easier for classifiers to identify specific deviations and subsequently produce higher accuracy, sensitivity, and specificity scores. Finally, the main focus of the IMU system analyzed in this article is to identify specific technique deviations and not just angles of movement (5), tempo (22), or exercise intensity (23). This information may be more actionable to gym users, particularly those without biomechanics training. It is difficult to ascertain whether the moderate levels of multilevel classification presented in this article are sufficient for real-life strength and conditioning (S&C) settings. Further research is being undertaken to determine usability, function-

| www.nsca.com

ality, and user perceptions of using wearable technology to assess exercise biomechanics. It is hoped this will give a greater indication as to the levels of accuracy S&C coaches, experienced and novice gym users would define as acceptable. It is possible that these levels may vary depending on the clients S&C coaches work with and the training goals of gym users. A number of contextual factors must be taken into account when interpreting the results. All deviations were deliberately induced and completed by healthy individuals. When deviations occur naturally, the exact way in which they present may differ from the induced deviations investigated in this study. Moreover, there were no controls as to the severity with which each participant performed a deviation, and therefore it is possible that naturally occurring BW squat deviations in a “real life” application may be more acute, or occur in a more idiosyncratic fashion than those used in the presented classification systems. No gold standard 3-dimensional motion capture system was used to confirm that each deviation occurred. However, a Chartered Physiotherapist and an individual trained in S&C were present for all data collection and ensured the deviations occurred through visual observation. The participant was asked to repeat the movement if the investigators felt it had not been completed satisfactorily. A motion capture system was not used as researchers have already shown the reliability of IMU set-ups compared with such systems (14,25). In addition, the 5 deviations identified by the IMU systems is a nonexhaustive list of those that can occur during the BW squat exercise. These deviations were chosen in consultation with sports and medicine practitioners, S&C coaches, and the NSCA guidelines (2). In conclusion, it is shown that a system based on data derived from body-worn IMUs can classify acceptable and aberrant BW squat biomechanics with excellent overall accuracy, sensitivity, and specificity. These excellent classification levels are maintained even with a single IMU. The ability to identify specific stimulated deviations is more difficult but can be achieved with a good level of overall accuracy with a 5 IMU system. A single IMU system can identify specific deviations with a moderate level of accuracy. These results are comparable with current research in the area, despite the BW squat being a more complex exercise than many of those previously investigated. However, it must be stressed that this is not a fully operational system at present. Such a system should be able to recognize and evaluate multiple exercises. Furthermore, the deviations investigated in this study are induced, meaning how they appear in a natural setting may be different. However, the results presented in this article are promising and further research is warranted to investigate an IMU system’s capability of monitoring technique in various movements. This work is currently ongoing with future research focusing on the analysis of natural deviations and exercises such as the deadlift and tuck jump.

PRACTICAL APPLICATIONS The BW squat is an important movement in S&C, musculoskeletal injury, risk screening, and rehabilitation. The ability to VOLUME 31 | NUMBER 8 | AUGUST 2017 |

2311

332

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

Assessing Squat Technique With Wearable Sensors objectively quantify BW squatting technique using low-cost IMU technology would have practical advantages in all these settings. In an S&C setting, the ability to remotely monitor form may help exercise goals be achieved and reduce the risk of injury. In musculoskeletal screening, an IMU system that can identify aberrant movement patterns would allow for quicker and more objective risk identification and stratification. In a rehabilitation setting, monitoring technique would be important to prevent further injury and possibly allow exercises to be completed at home without the need for constant supervision, reducing overall health care costs.

ACKNOWLEDGMENTS This project is partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer (EPSPG/2013/574) and partly funded by Science Foundation Ireland. M. A. O’Reilly and D. F. Whelan are Joint lead authors.

REFERENCES 1. Ahmadi, A, Mitchell, E, Destelle, F, Gowing, M, O’Connor, NE, Richter, C, and Moran, K. Automatic activity classification and movement assessment during a sports training session using wearable inertial sensors. In: Presented at Proceedings of the 11th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Zurich, Switzerland, June 16–19, 2014. 2. Baechle, TR and Earle, RW. NSCA’s Essentials of Personal Training. Champaign, IL: Human Kinetics, 2004. 3. Bonnechere, B, Jansen, B, Salvia, P, Bouzahouene, H, Omelina, L, Moiseev, F, Sholukha, V, Cornelis, J, Rooze, M, and Van Sint Jan, S. Validity and reliability of the Kinect within functional assessment activities: Comparison with standard stereophotogrammetry. Gait Posture 39: 593–598, 2014. 4. Bonnet, V, Mazza, C, Fraisse, P, and Cappozzo, A. A least-squares identification algorithm for estimating squat exercise mechanics using a single inertial measurement unit. J Biomech 45: 1472–1477, 2012. 5. Bonnet, V, Mazza, C, Fraisse, P, and Cappozzo, A. Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Trans Biomed Eng 60: 1920–1926, 2013. 6. Breiman, L. Random forests. Machine Learning 45: 5–32, 2001. 7. Chang, KH, Chen, MY, and Canny, J. Tracking free-weight exercises. In: UbiComp 2007: Ubiquitous Computing. Berlin, Germany: Springer, 2007. pp. 19–37. 8. Cook, G, Burton, L, and Hoogenboom, B. Pre-participation screening: The use of fundamental movements as an assessment of function—Part 1. N Am J Sports Phys Ther 1: 62–72, 2006. 9. Fushiki, T. Estimation of prediction error by using K-fold crossvalidation. Stat Comput 21: 137–146, 2011. 10. Giggins, OM, Sweeney, KT, and Caulfield, B. Rehabilitation exercise assessment using inertial sensors: A cross-sectional analytical study. J Neuroeng Rehabil 11: 158–168, 2014. 11. Hall, M, Nielsen, JH, Holsgaard-Larsen, A, Nielsen, DB, Creaby, MW, and Thorlund, JB. Forward lunge knee biomechanics before and after partial meniscectomy. Knee 22: 506–509, 2015. 12. Jerri, AJ. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proc IEEE 65: 1565–1596, 1977. 13. Katz, MJ and George, EB. Fractals and the analysis of growth paths. Bull Math Biol 47: 273–286, 1985. 14. Leardini, A, Lullini, G, Giannini, S, Berti, L, Ortolani, M, and Caravaggi, P. Validation of the angular measurements of a new

2312

the

inertial-measurement-unit based rehabilitation system: Comparison with state-of-the-art gait analysis. J Neuroeng Rehabil 11: 1–7, 2014. 15. Madgwick, SOH, Harrison, AJL, and Vaidyanathan, R. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Presented at Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR), Zurich, Switzerland, June 29– July 1, 2011. 16. McGrath, D, Greene, BR, O’Donovan, KJ, and Caulfield, B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Eng 15: 207–213, 2012. 17. Melzi, S, Borsani, L, and Cesana, M. The virtual trainer: Supervising movements through a wearable wireless sensor network. In: Presented at Sensor, Mesh and Ad Hoc Communications and Networks Workshops, 2009 SECON Workshops’ 09 6th Annual IEEE Communications Society Conference on, Rome, Italy, June 22–26, 2009. 18. Mitchell, E, Ahmadi, A, O’Connor, NE, Richter, C, Farrell, E, Kavanagh, J, and Moran, K. Automatically detecting asymmetric running using time and frequency domain features. Presented at Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Boston, MA, June 9–12, 2015. 19. Morris, D, Saponas, TS, Guillory, A, and Kelner, I. RecoFit: Using a wearable sensor to find, recognize, and count repetitive exercises. In: Presented at Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Toronto, ON, Canada, April 26– May 1, 2014. 20. Muehlbauer, M, Bahle, G, and Lukowicz, P. What can an arm holster worn smart phone do for activity recognition? In: Proceedings of the International Symposium on Wearable Computers (ISWC). IEEE, San Francisco, CA, June 12–15, 2011, pp 79–82. 21. O’Donovan, KJ, Greene, BR, McGrath, D, O’Neill, R, Burns, A, and Caulfield, B. SHIMMER: A new tool for temporal gait analysis. Presented at Engineering in Medicine and Biology Society, 2009 EMBC 2009 Annual International Conference of the IEEE, Minneapolis, USA, September 2nd–6th, 2009. 22. Pernek, I, Hummel, KA, and Kokol, P. Exercise repetition detection for resistance training based on smartphones. Personal Ubiquitous Computing 17: 771–782, 2013. 23. Pernek, I, Kurillo, G, Stiglic, G, and Bajcsy, R. Recognizing the intensity of strength training exercises with wearable sensors. J Biomed Inform 58: 145–155, 2015. 24. Seeger, C, Buchmann, A, and Van Laerhoven, K. myHealthAssistant: A phone-based body sensor network that captures the wearer’s exercises throughout the day. In: Proceedings of the 6th International Conference on Body Area Networks. ICST, Beijng, China, November 7–10, 2011, pp 1–7. 25. Tang, Z, Sekine, M, Tamura, T, Tanaka, N, Yoshida, M, and Chen, W. Measurement and estimation of 3D orientation using magnetic and inertial sensors. Adv Biomed Eng 4: 135–143, 2015. 26. Taylor, PE, Almeida, GJ, Hodgins, JK, and Kanade, T. Multi-label classification for the analysis of human motion quality. In: Presented at Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), San Diego, CA, August 28–September 1, 2012. 27. Taylor, PE, Almeida, GJ, Kanade, T, and Hodgins, JK. Classifying human motion quality for knee osteoarthritis using accelerometers. Presented at Engineering in Medicine and Biology Society (EMBC), Buenos Aires, Argentina, August 31–September 4, 2010. 28. Velloso, E, Bulling, A, Gellersen, H, Ugulino, W, and Fuks, H. Qualitative activity recognition of weight lifting exercises. In: Presented at Proceedings of the 4th Augmented Human International Conference, Stuttgart, Germany, March 7–8, 2013. 29. Whiteside, D, Deneweth, JM, Pohorence, MA, Sandoval, B, Russell, JR, McLean, SG, Zernicke, RF, and Goulet, GC. Grading the functional movement screen: A comparison of manual (real-time) and objective methods. J Strength Cond Res 30: 924–933, 2016.

TM

Journal of Strength and Conditioning Research

333

Copyright © National Strength and Conditioning Association Unauthorized reproduction of this article is prohibited.

1

Focus Theme – Original Articles

REHAB

Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units Darragh F. Whelan1,2*; Martin A. O'Reilly1,2*; Tomás E. Ward3; Eamonn Delahunt2; Brian Caulfield1,2 1Insight

Centre for Data Analytic, University College Dublin, Dublin, Ireland; of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland; 3Insight Centre for Data Analytics, Maynooth University, Kildare, Ireland 2School

Keywords Exercise therapy, biomedical technology, lower extremity, physical therapy speciality, inertial measurement units

Summary Background: The single leg squat (SLS) is a common lower limb rehabilitation exercise. It is also frequently used as an evaluative exercise to screen for an increased risk of lower limb injury. To date athlete / patient SLS technique has been assessed using expensive laboratory equipment or subjective clinical judgement; both of which are not without shortcomings. Inertial measurement units (IMUs) may offer a low cost solution for the objective evaluation of athlete / patient SLS technique.

Correspondence to: Darragh F. Whelan University College Dublin Insight Centre for Data Analytics Science Centre East Dublin Dublin Ireland E-mail: [email protected]

* These authors contributed equally to this work

1. Introduction The single leg squat (SLS) is a commonly used rehabilitative exercise following lower limb musculoskeletal injury [1]. Additionally, it is frequently utilized as an evaluative exercise to assess athletes’ risk of incurring lower limb musculoskeletal

Objectives: The aims of this study were to determine if in combination or in isolation IMUs positioned on the lumbar spine, thigh and shank are capable of: (a) distinguishing between acceptable and aberrant SLS technique; (b) identifying specific deviations from acceptable SLS technique. Methods: Eighty-three healthy volunteers participated (60 males, 23 females, age: 24.68 + / − 4.91 years, height: 1.75 + / − 0.09 m, body mass: 76.01 + / − 13.29 kg). All participants performed 10 SLSs on their left leg. IMUs were positioned on participants’ lumbar spine, left shank and left thigh. These were utilized to record tri-axial accelerometer, gyroscope and magnetometer data during all repetitions of the SLS. SLS technique was la-

Methods Inf Med 2016; 55: ■–■ http://dx.doi.org/10.3414/ME16-02-0002 received: March 4, 2016 accepted in revised form: August 19, 2016 epub ahead of print: ■■■ Funding This project is partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer (EPSPG / 2013 / 574) and partly funded by Science Foundation Ireland (SFI / 12 / RC / 2289).

injury [2]. The SLS is a popular evaluative exercise as it allows clinicians / practitioners to simultaneously assess trunk, pelvis, hip, and knee kinematics during a weight-bearing activity [3]. Therefore, it is necessary that patient / athlete performance of the SLS can be evaluated effectively and reliably.

belled by a Chartered Physiotherapist using an evaluation framework. Features were extracted from the labelled sensor data. These features were used to train and evaluate a variety of random-forests classifiers that assessed SLS technique. Results: A three IMU system was moderately successful in detecting the overall quality of SLS performance (77 % accuracy, 77 % sensitivity and 78 % specificity). A single IMU worn on the shank can complete the same analysis with 76 % accuracy, 75 % sensitivity and 76 % specificity. Single sensors also produce competitive classification scores relative to multi-sensor systems in identifying specific deviations from acceptable SLS technique. Conclusions: A single IMU positioned on the shank can differentiate between acceptable and aberrant SLS technique with moderate levels of accuracy. It can also capably identify specific deviations from optimal SLS performance. IMUs may offer a low cost solution for the objective evaluation of SLS performance. Additionally, the classifiers described may provide useful input to an exercise biofeedback application.

To date, objective quantification of patient / athlete performance of the SLS has been determined using marker-based motion analysis systems [1]. This approach is time intensive, expensive (over € 100,000 for a complete system) and the application of skin-mounted markers may hinder normal movement [4]. As such, beyond

© Schattauer 2016

Methods Inf Med 6/2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

334

REHAB

2

D. F. Whelan et al.: Evaluating the SLS with IMUs

the research laboratory, these systems are not frequently used for the objective quantification of patient / athlete SLS technique. As an alternative, real-time visual evaluation of patient / athlete performance of the SLS is more commonly used. In this instance, kinematics of the trunk, pelvis, hip and knee are simultaneously evaluated to provide an overall assessment of the patient’s / athlete’s performance of the exercise [3]. It is difficult to standardise SLS performance evaluation due to the experience of the rater [3], the method used to rate performance of the exercise (ordinal vs. dichotomous scales) or the instructions given to the raters [5]. Inaccurate evaluation of patient / athlete performance of the SLS could have implications for clinical and exercise progression decisions. Recent technological advances have allowed for the possibility of using inertial measurement units (IMUs) as part of a method for capturing human movement during the performance of exercises such as the SLS. IMUs are able to acquire data pertaining to the linear and angular motion of individual limb segments and the centre of mass of the body. They are small, inexpensive, easy to set-up and facilitate the acquisition of human movement data in unconstrained environments [6]. Thus, they offer the potential to bridge the gap between laboratory-based and day-to-day “real-world” acquisition of human movement. Body worn systems incorporating multiple IMUs have been shown to be effective at differentiating exercises and evaluating exercise performance. Chang et al. [7] incorporated accelerometers into a workout glove and belt clip with the aim of differentiating between, and counting the number of repetitions of, nine different upper and lower limb exercises. Their system achieved 95 % exercise classification and repetition counting accuracy. A case study completed by Fitzgerald et al. [8] used 10 IMUs to provide feedback to a healthy non-injured athlete and an athlete five weeks post-knee injury during the performance of a lunge exercise. Analysis of the gyroscope signals from the IMUs identified lower limb movement deviations in the injured athlete

when compared to the non-injured athlete. Seeger et al. [9] used three IMUs to differentiate between a total of 16 cardiovascular and weight-lifting exercises. Classification accuracy ranged from 71 – 100 % for the different exercises. However, multiple sensor systems are expensive for end-users. They may also prove impractical due to the increased risk of placement error and comfort issues. Furthermore a multiple sensor setup would put a bigger strain on power usage and BluetoothTM capabilities of the sensors and hosting smartphone. Consequently, the transferability of a multiple sensor set-up to day-to-day “real-world” situations is not practical [10]. For increased end user cost effectiveness and practicality a single sensor set-up is far more desirable. Giggins et al. [11] demonstrated the ability of a single IMU to successfully differentiate seven commonly prescribed orthopaedic rehabilitation exercises (heel slide, hip abduction, hip extension, hip flexion, inner range quads, knee extension, straight leg raise). Accuracies of 93 – 95 % were observed. Muehlbauer et al. [12] reported that a single sensor placed on the upper arm could distinguish between 10 different upper body exercises with an overall recognition rate of 94 %. Pernek et al. [13] used a single sensor in a smartphone to count repetitions of resistance exercises; observing an overall repetition count accuracy of 99 %. Evaluation of exercise performance is also vital to ensure that not only is the exercise completed but that it is completed with acceptable technique. Taylor et al. [14] used five IMUs to identify five technique deviations in the standing hamstring curl and four deviations in the straight-leg raise. They were able to classify the different deviations with 80 % accuracy, 75 % sensitivity and 90 % specificity. Velloso et al. [15] used four IMUs to classify deviations from normal form during a unilateral dumbbell bicep curl. They achieved an overall accuracy of 74 – 86 % in identifying specific deviations. Giggins et al. [16] demonstrated an overall accuracy of 79 – 81 % using different combinations of one, two or three sensors placed on the lower limb to analyse exercise technique in seven exercises (heel slide, hip abduction, hip flexion, hip exten-

sion, knee extension, inner range quads and straight-leg raise). These studies demonstrate that it is possible to evaluate exercise performance of simple exercises using multiple IMUs. However, the ability of an IMU based system to evaluate more complex exercises such as the SLS is less understood.

2. Objectives The research question this study seeks to address is: “How well can an IMU-based system quantify performance of the SLS?” The aims of this study were to determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of: (a) distinguishing between acceptable and aberrant SLS technique; (b) identifying specific deviations from acceptable SLS performance.

3. Methods Data were acquired from participants as they completed 10 SLS repetitions on their left leg with their best possible form. All repetitions were recorded using a high-definition video camera. Each participant’s performance of each SLS repetition was rated by a Chartered Physiotherapist using a scale developed by Whatman et al. (▶ Table 1) [3]. Data derived from the IMUs during each repetition were compared to this rating to determine if a single IMU on the lumbar spine could discriminate between different levels of SLS performance and identify the specific deviations from acceptable SLS form.

3.1 Participants Eighty-three healthy volunteers participated. No participant had a current or recent musculoskeletal injury that would impair their SLS performance. All participants had prior experience with the exercise. Each participant signed a consent form prior to completing the study. The University Human Research Ethics Committee approved the study protocol.

Methods Inf Med 6/2016

© Schattauer 2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

335

D. F. Whelan et al.: Evaluating the SLS with IMUs

The testing protocol was explained to participants upon their arrival to the research laboratory. All participants completed a 10-minute warm-up on an exercise bike; during which they were required to maintain a power output of 100 W and cadence of 75 – 85 revolutions per minute. Following the warm-up, an investigator (the same investigator for all participants) secured three IMUs (SHIMMER, Shimmer research, Dublin, Ireland) on the participant at the following anatomical locations: the level of the 5th lumbar vertebra, the midpoint of the left femur (determined as half way between the greater trochanter and lateral femoral condyle), and on the left shank 2 cm above the lateral malleolus (▶ Figure 1). The orientation and location of the IMU was consistent across all study participants. A pilot study was used to determine an appropriate sampling rate and the ranges for the accelerometer and gyroscope within the IMU. In the pilot study data during performance of the SLS data was collected at 512 samples / s. A Fourier transform was then used to detect the characteristic frequencies of the signal which were all found to be less than 20 Hz. Therefore, a sampling rate of 51.2 Hz was deemed appropriate for this study based upon the Shannon sampling theorem and the Nyquist criterion [17]. The Shimmer IMU was configured to stream tri-axial accelerometer (+ / – 2 G), gyroscope (+ / – 500 º/ s) and magnetometer (+ / – 1 Ga) data with the sensor ranges chosen also based upon data from the pilot study. The IMU was calibrated for these specific sensor ranges using the Shimmer 9 DoF Calibration application [18]. Participants completed 10 repetitions of a left leg SLS with their best form. A Charted Physiotherapist demonstrated and instructed all participants on how to complete the SLS with acceptable technique. This involved maintaining their trunk and pelvis in a neutral position, keeping their patella in line with the second toe, preventing their foot from moving into excessive pronation and keeping the movement throughout available range of motion as smooth as possible. Their right leg was kept as extended as possible in front of them

Table 1 SLS data labelling system used adapted from Whatman et al.

Visual rating sheet Trunk

Moves out of neutral in frontal or transverse plane

N: 0 Y :1

Pelvis 1

Moves out of neutral in frontal or transverse plane or moves away from midline

N: 0 Y :1

Knee

Patella moves out of line with 2nd toe

N: 0 Y :1

Foot

Moves in to excessive pronation

N: 0 Y :1

Oscillation

Observable oscillation

N: 0 Y :1

Loss of Balance

Visible loss of balance

N: 0 Y :1

Overall Score

Movement dysfunction

N: 0 Y :1

while the left knee was flexed between 60 and 90 degrees. All participants were allowed trial repetitions to ensure they were comfortable with the exercise before commencing their set of 10 repetitions.

3.3 Data Labelling Participants’ performance of the SLS was recorded using a high-definition video camera. A Chartered Physiotherapist with more than six years post-graduation experience and an MSc in Sports and Exer-

REHAB

3.2 Experimental Protocol

3

cise Medicine reviewed all recorded SLS repetitions. Each repetition was separated and reviewed on multiple occasions in a systematic format. For every repetition a score of 0 or 1 was given to each section as outlined in the scoring system shown in ▶Table 1. This was adapted from the scoring system described by Whatman et al. [3]. In order to establish the overall score of each repetition a ‘1’ (movement dysfunction) was given to repetitions that scored a ‘1’ in two or more of the six categories. All other repetitions were rated as ‘0’ (acceptable movement pattern). The Chartered Physiotherapist involved in the study developed the method of assigning an overall score following consultation with colleagues who work in musculoskeletal physiotherapy and sports medicine.

3.4 Data Analysis

Figure 1 Image showing IMU positions and SLS exercise (1 = left shank; 2 = left thigh; 3 = lumbar spine)

Nine signals were collected from each IMU: accelerometer x, y, z; gyroscope x, y, z; and magnetometer x, y, z. To ensure the data analysed applied to each participant’s movement and in order to eliminate unwanted high-frequency noise, the nine signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Nine additional signals were then calculated. The 3-D orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al. [19]. The resulting quaternion values (W, X, Y and Z) were then converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of the lumbar spine, left thigh and left shank in the sagittal, frontal plane and transverse planes respectively. The

© Schattauer 2016

Methods Inf Med 6/2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

336

REHAB

4

D. F. Whelan et al.: Evaluating the SLS with IMUs

magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y and z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the lumbar spine and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y and z. All ten repetitions from each participant’s SLS dataset were programmatically extracted using the IMU data and resampled to a length of 250 samples; this was undertaken to minimise the influence of the speed of repetition performance on signal feature calculations. It also ensured the computed features related to differences in movement patterns and not the participant’s exercise tempo. Time-domain and frequency-domain descriptive features were computed in order to describe the pattern of each of the eighteen signals when the five different exercises were completed. These features were namely signal mean, RMS, standard deviation, kurtosis, median, skewness, range, maximum, minimum, variance, energy, 25th percentile, 75th percentile, level crossing rate, fractal dimension and the variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 6. This resulted in seventeen features for each of the eighteen available signals producing a total of 306 features per sensor unit. The random-forests method was employed to perform classification [20]. A random forest is an ensemble of decision trees that will output a prediction value, in this case SLS quality. Each decision tree is constructed by using a random subset of

the training data. After you have trained your forest, you can then pass each test row through it, in order to output a predicted class. This technique was chosen as it has been shown to be effective in analysing exercise technique with IMUs when compared to the Naïve-Bayes and Radial-basis function network techniques [21]. Four hundred decision trees were used in each random-forest classifier. Classification quality was compared with and without performing principal component analysis (PCA) on the training data. Using PCA produced lower accuracy, sensitivity and specificity scores and therefore, PCA was not included in the final exercise classification system. Six separate random forests were used to analyse if a specific deviation had occurred as described in ▶ Table 1. A seventh random forest predicted the overall score of each SLS repetition. For each of the above classifiers a variety of training features were used in order to establish classification quality when using three, two and individual IMUs. Classifiers were developed and evaluated using the following seven combinations of variables; the 918 (3 × 306) variables computed from every IMU, the 612 (2 × 306) variables from the left shank and lumbar IMUs, the 612 (2 × 306) variables from the left thigh and lumbar IMUs, the 612 (2 × 306) variables from the left shank and left thigh IMUs, and the 306 variables from each of the three individual IMUs. To establish the quality of each classifier in discriminating between acceptable and aberrant SLS technique or identifying a specific deviation from acceptable SLS performance, repeated random sub-sample validation (RRSSV) was used. This method of classifier evaluation was chosen as the

Accuracy Sensitivity Specificity Left Shank

76 %

75 %

76 %

Left Thigh

75 %

71 %

77 %

Lumbar

73 %

74 %

72 %

All 3 IMUs

77 %

77 %

78 %

Lumbar + Shank

75 %

78 %

72 %

Shank + Thigh

78 %

78 %

78 %

Lumbar + Thigh

77 %

75 %

79 %

Table 2 Classification results for ‘Overall SLS Score’ for each IMU combination

data set used to train the classifier was relatively small. Leave-one-subject out cross-validation (LOSOCV) was not deemed necessary for this study due to the high inter-repetition variability of SLS performance in each participant’s set of the exercise. Data was shuffled programmatically. The first 80 % of data were used as the training set for the random forests classifier, initially resulting in 672 repetitions per training set. However, the training data was then balanced to avoid biasing the classifier. This was completed by counting the number of instances of each class (0 and 1) and removing a random selection of repetition from the class with more instances until there was an equal amount of training data to represent both classes. The remaining 20 % of observations were used as the test set for the classifier resulting in 168 test repetitions per evaluation. Accuracy, sensitivity and specificity metrics were calculated. Accuracy measures the overall effectiveness of a classifier and is computed by taking the ratio of correctly classified examples and the total number of examples available. Sensitivity measures the effectiveness of a classifier at identifying a desired label, while specificity measures the classifier’s ability to detect negative labels. This process was repeated ten times.

4. Results The demographics of the participants were as follows: 60 males, 23 females, age: 24.68 + / – 4.91 years, height: 1.75 + / – 0.094m, body mass: 76.01 + / – 13.29kg. ▶Table 2 demonstrates the mean sensitivity, specificity and accuracy for the overall score following the ten cycles of RRSSV for systems using each individual IMU and each combination of IMUs. The best single sensor for classifying overall score was the left shank with an accuracy of 76 %. The highest quality classification came from the two-sensor combination of the shank and thigh, which achieved 78 % accuracy. ▶ Table 3 demonstrates the classification scores for the detection of each specific deviation as described in ▶ Table 1. Deviation of the pelvis from the neutral position was the most poorly detected

Methods Inf Med 6/2016

© Schattauer 2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

337

D. F. Whelan et al.: Evaluating the SLS with IMUs

Table 3

Classification results for specific deviations for each IMU combination Trunk: Moves out of neutral in frontal or transverse plane

Pelvis: Moves out of neutral in frontal or transverse plane or moves away from midline

Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity Left Shank

70 %

73 %

69 %

66 %

67 %

66 %

Left Thigh

69 %

62 %

70 %

65 %

64 %

65 %

Lumbar

73 %

72 %

73 %

69 %

77 %

68 %

All 3 IMUs

74 %

61 %

76 %

70 %

80 %

69 %

Lumbar + Shank

76 %

75 %

76 %

70 %

73 %

69 %

Shank + Thigh

69 %

67 %

69 %

66 %

63 %

67 %

Lumbar + Thigh

75 %

73 %

75 %

69 %

70 %

69 %

Foot: Moves in to excessive pronation

5. Discussion

REHAB

deviation. The three-sensor combination detected this deviation with 70 % accuracy and a single sensor located on the lumbar spine detected this deviation with 69 % accuracy. In some cases (e.g. the foot moving into excessive pronation), the single sensor system outperformed the multi-sensor setups. The IMU positioned on the left shank produced an accuracy of 75 % for this deviation, superior to the accuracy of 73 % achieved when using all three IMUs. Single sensor set-ups appear comparable to multisensor set-ups for the detection of all six deviations.

5

Oscillation: Observable Oscillation

Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity

Our results indicate that an IMU sensorbased system is capable of evaluating SLS performance with moderate accuracy, sensitivity and specificity. A three-sensor setup can distinguish between acceptable and aberrant SLS technique with 77 % accuracy, 77 % sensitivity and 78 % specificity. Two sensors (shank and thigh) discriminate between acceptable and aberrant performance with 78 % accuracy, sensitivity and specificity. A single sensor (left shank) can identify acceptable SLS performance with 76 % accuracy, 75 % sensitivity and 76 % specificity. Specific deviations can also be classified with a moderate level of accuracy, sensitivity and specificity as shown in ▶Table 3. Overall accuracy for specific deviations ranged from 65 % – 76 %, sensitivity ranged from 60 % – 80 % and specificity from 61 % – 77 %. These results indicate that an IMU sensor set offers the possibility of monitoring SLS exercise form objectively outside of a laboratory setting. Importantly, a single sensor set-up has comparable accuracy, sensitivity and specificity to a multi sensor set-up if positioned appropriately. A single sensor set-up is a less cumbersome, more energy efficient and more cost effective solution for end users; which may increase the likelihood of adoption of this technology within a clinical setting. Authors have previously utilized multiple [7– 9] and single [11 – 13] IMU set-ups to differentiate exercise performance and count the number of exercise repetitions. However, patients are likely to move

Left Shank

75 %

63 %

77 %

75 %

63 %

77 %

Left Thigh

72 %

67 %

73 %

72 %

68 %

74 %

Lumbar

69 %

60 %

71 %

70 %

69 %

71 %

All 3 IMUs

73 %

64 %

75 %

74 %

76 %

71 %

Lumbar + Shank

74 %

71 %

74 %

76 %

77 %

65 %

Shank + Thigh

72 %

69 %

73 %

75 %

75 %

72 %

Lumbar + Thigh

72 %

64 %

73 %

70 %

71 %

69 %

Knee: Patella moves out of line with 2nd toe

Loss of Balance: Visible loss of balance

Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity Left Shank

71 %

73 %

71 %

73 %

74 %

61 %

Left Thigh

71 %

68 %

71 %

68 %

68 %

72 %

Lumbar

72 %

78 %

70 %

67 %

67 %

63 %

All 3 IMUs

75 %

76 %

75 %

71 %

71 %

70 %

Lumbar + Shank

71 %

62 %

73 %

76 %

77 %

65 %

Shank + Thigh

74 %

74 %

74 %

75 %

75 %

72 %

Lumbar + Thigh

74 %

69 %

75 %

72 %

73 %

62 %

beyond the exercises evaluated in these aforementioned studies relatively early in their rehabilitation programmes. The increased complexity of the SLS means it is predominantly used in later stages of rehabilitation. Furthermore, these studies focused on the recognition of specific exercises and counting of exercise repetitions and not on the quality of movement during exercise performance. A number of researchers have used multiple sensor systems to classify exercise performance with varying results. Taylor et al. [14] used five IMUs to identify five tech-

nique deviations in the standing hamstring curl and four deviations in the straight-leg raise. It is worth noting that only 7 % of the exercise repetitions were classified by a biomechanics expert, which may have resulted in an increased possibility of incorrect data labelling. Velloso et al. [15] used a total of four sensors to classify four deviations from normal form during a unilateral dumb-bell bicep curl. They demonstrated an overall accuracy of 74 – 86 % in identifying specific deviations. However, these deviations were induced and were possibly not representative of movement deviations

© Schattauer 2016

Methods Inf Med 6/2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

338

REHAB

6

D. F. Whelan et al.: Evaluating the SLS with IMUs

that occur in a natural exercise environment. The scores presented in this paper are comparable with Taylor et al. [14] and Velloso et al. [15], while having the added benefit of being reproducible with a single sensor set-up. Furthermore the deviations observed in our dataset occurred without simulation (i.e. were not prescriptively induced), thus providing a more realistic representation of SLS exercise performance. Giggins et al. [16] investigated the potential of a single sensor system to analyse natural deviations from acceptable form in a total of seven exercises (heel slide, hip abduction, hip flexion, hip extension, knee extension, inner range quads and straightleg raise). They demonstrated an overall accuracy of 79 – 81 % using different combinations of one, two or three sensors placed on specific lower limb body sites. While the accuracy results of our research are slightly lower (73 – 78 %), it should be noted that the SLS is a more complex exercise than those evaluated by Giggins et al. [19]. Furthermore, our classification system was able to detect deviations occurring at multiple body locations simultaneously unlike the majority of previous research in the area. There are a number of contextual factors that are appropriate to consider for discussion purposes. A 3-dimensional motion capture system was not used to confirm that each deviation occurred. Instead, a Chartered Physiotherapist recorded the presence of any deviations noted during the performance of each SLS repetitions. The use of video analysis allowed for multiple viewings of each SLS repetition. The ability to view the movement on multiple occasions and slow down the playback speed allowed for a detailed analysis of each repetition. A single Chartered Physiotherapist performed this analysis. Future work should involve multiple biomechanical experts rating SLS performance to increase the reliability of the rating labels. However, this approach could prove challenging as it may not be possible to obtain an agreed consensus between different experts as to what constitutes acceptable movement biomechanics. The overall accuracy, sensitivity and specificity scores presented in this work are slightly lower than that of other authors

[14 –16]. This may be due to the small amount of acceptable SLS performances seen in the dataset (52 acceptable SLS vs. 778 SLS with aberrant technique). It is hoped that future work will involve the collection of a greater number of acceptable SLSs. It is also envisaged that this future data collection will be combined with improved classification techniques to make it possible to not only identify where the deviation has occurred, but also to grade the severity of the deviation as described in the scale developed by Whatman et al. [3]. Along with the addition of a greater number of expert reviewers and acceptable SLS performances, future work will involve analysing a range of different movements, including squats, lunges, deadlifts and tuck jumps. It is hoped that a range of movements that are used commonly in rehabilitation and screening can be graded using data derived from an IMU based system. This could allow for the development of a system that can be used for musculoskeletal injury risk screening and exercise analysis.

5.1 Practical Implications A single sensor system that is able to automatically evaluate SLS technique could be very beneficial to clinicians. The SLS is a commonly used exercise to assess lower limb function [22]. The assessment of human movement proficiency is predominantly completed subjectively through the use of visual rating scales such as the Functional Movement Screen [23, 24], Tuck Jump Assessment [25] or lower extremity functional screening tests [3]. The subjectivity inherent in rating these screening tools leads to the potential for bias and / or measurement error. Furthermore, the process of screening can prove time consuming for clinicians, particularly when there are a large amount of participants, e.g. in a sports team setting. An IMU based system can offer clinicians the potential to screen multiple athletes simultaneously in an objective manner. This could lead to a quicker and more reliable method of screening than currently available. An IMU system also offers clinicians the potential to remotely monitor their patients’ compliance and technique when completing rehabilitation exercises. This

allows clinicians to evaluate their treatment more effectively. Furthermore, exercise technique feedback could be given to patients automatically. This means patients could correct their form during the exercise without the need for a clinician to be present [26]. This would increase the potential of home centred care, which may be effective at reducing health care costs [27]. The ability to remotely monitor the SLS and provide locally generated feedback would also prove very beneficial to strength and conditioning coaches as the SLS is often a component of their conditioning programmes.

6. Conclusions An IMU based system is capable of differentiating between acceptable and aberrant SLS technique with moderate accuracy. The overall accuracy presented in this work is comparable to other research investigating early-stage rehabilitation exercises technique with IMUs. This study has shown that is possible to classify a more complex exercise with IMUs and maintain moderate levels of accuracy. Furthermore, it is shown that a single IMU can produce comparable results to a multi-sensor setup. This suggests that the system can be cost-effective and practical to implement in a clinical setting. Future work should aim to develop a low cost biomechanical analysis system that is capable of measuring technique in a range of exercises. Such a system would offer clinicians the ability to screen for injury risks quickly and objectively while also allowing for the remote monitoring of their patients’ rehabilitation. Acknowledgment

The authors would like to thank UCD Sport for providing equipment which was used in this study.

References 1. Zwerver J, Bredeweg SW, Hof AL. Biomechanical analysis of the single-leg decline squat. Br J Sports Med. 2007; 41(4): 264 –268. 2. Willson JD, Ireland ML, Davis I. Core strength and lower extremity alignment during single leg

Methods Inf Med 6/2016

© Schattauer 2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

339

3.

4.

5.

6.

7.

8.

9.

squats. Med Sci Sports Exerc. 2006; 38(5): 945 – 952. Whatman C, Hing W, Hume P. Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Phys Ther Sport. 2012; 13(2): 87– 96. Ahmadi A, Mitchell E, Destelle F, Gowing M, O’Connor NE, Richter C, et al., editors. Automatic Activity Classification and Movement Assessment During a Sports Training Session Using Wearable Inertial Sensors. Proceedings of the 11th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2014: IEEE. Chmielewski TL, Hodges MJ, Horodyski M, Bishop MD, Conrad BP, Tillman SM. Investigation of clinician agreement in evaluating movement quality during unilateral lower extremity functional tasks: a comparison of 2 rating methods. J Orthop Sports Phys Ther. 2007; 37(3): 122 – 129. McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Eng. 2012; 15(4): 207– 213. Chang K-H, Chen MY, Canny J. Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Krumm J, Abowd GD, Seneviratne A, Strang T, editors. Ubicomp 2007: Ubiquitous Computing. Austria: Springer; 2007. p. 19 –37. Fitzgerald D, Foody J, Kelly D, Ward T, Markham C, McDonald J, et al., editors. Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises. Conf Proc IEEE Eng Med Biol Soc. 2007; 2007: 4870 – 4874. Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer’s exercises throughout the day. Proceedings of the 6th International Conference on Body Area Networks: ICST; 2011. p. 1– 7.

10. Morris D, Saponas TS, Guillory A, Kelner I, editors. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; 2014: ACM. 11. Giggins O, Sweeney KT, Caulfield B. The use of inertial sensors for the classification of rehabilitation exercises. Conf Proc IEEE Eng Med Biol Soc. 2014; 2014: 2965 – 2968. 12. Muehlbauer M, Bahle G, Lukowicz P. What can an arm holster worn smart phone do for activity recognition? Proceedings of the 15th Annual International Symposium on Wearable Computers (ISWC), San Francisco, CA, USA, 12 –15 June 2011; pp. 79 – 82. 13. Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Pers Ubiquitous Comput. 2013; 17(4): 771 –782. 14. Taylor PE, Almeida GJ, Hodgins JK, Kanade T, editors. Multi-label classification for the analysis of human motion quality. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2012: IEEE. 15. Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H, editors. Qualitative activity recognition of weight lifting exercises. Proceedings of the 4th Augmented Human International Conference; 2013: ACM. 16. Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. J Neuroeng Rehabil. 2014; 11(1): 158 –168. 17. Jerri AJ. The Shannon sampling theorem – Its various extensions and applications: A tutorial review. Proc. IEEE. 1977; 65(11): 1565 –1596. 18. Shimmer 9 DOF calibration. Available from: http://www.shimmersensing.com/shop/shimmer9dof-calibration.

19. Madgwick SOH, Harrison AJL, Vaidyanathan R, editors. Estimation of IMU and MARG orientation using a gradient descent algorithm. Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR); 2011: IEEE. 20. Breiman L. Random forests. Mach Learn. 2001; 45(1): 5 –32. 21. Mitchell E, Ahmadi A, O’Connor NE, Richter C, Farrell E, Kavanagh J, et al., editors. Automatically detecting asymmetric running using time and frequency domain features. Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); 2015: IEEE. 22. Ugalde V, Brockman C, Bailowitz Z, Pollard CD. Single Leg Squat Test and Its Relationship to Dynamic Knee Valgus and Injury Risk Screening. PM & R. 2014; 7(3): 229 –235. 23. Cook G, Burton L, Hoogenboom B. Pre-participation screening: the use of fundamental movements as an assessment of function – part 1. N Am J Sports Phys Ther. 2006; 1(2): 62 – 72. 24. Cook G, Burton L, Hoogenboom B. Pre-participation screening: the use of fundamental movements as an assessment of function – part 2. N Am J Sports Phys Ther. 2006; 1(3): 132 –139. 25. Myer GD, Ford KR, Hewett TE. Tuck jump assessment for reducing anterior cruciate ligament injury risk. Athl Ther Today. 2008; 13(5): 39 – 44. 26. Giggins O, Kelly D, Caulfield B, editors. Evaluating rehabilitation exercise performance using a single inertial measurement unit. Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare; 2013: ICST. 27. Avci A, Bosch S, Marin-Perianu M, Marin-Perianu R, Havinga P. Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: a survey. Proceedings of the 23rd International Conference on Architecture of computing systems (ARCS): VDE; 2010. p. 1–10.

© Schattauer 2016

7

REHAB

D. F. Whelan et al.: Evaluating the SLS with IMUs

Methods Inf Med 6/2016 Print proof! Publication, duplication or distribution (also online) is prohibited!

340

Journal of Biomechanics 58 (2017) 155–161

Contents lists available at ScienceDirect

Journal of Biomechanics journal homepage: www.elsevier.com/locate/jbiomech www.JBiomech.com

Classification of deadlift biomechanics with wearable inertial measurement units Martin A. O’Reilly a,b,1,⇑, Darragh F. Whelan a,b,1, Tomas E. Ward c, Eamonn Delahunt b, Brian M. Caulfield a,b a

Insight Centre for Data Analytics, University College Dublin, Ireland School of Public Health, Physiotherapy and Sports Science, University College Dublin, Ireland c Insight Centre for Data Analytics, Maynooth University, Ireland b

a r t i c l e

i n f o

Article history: Accepted 30 April 2017

Keywords: Wearable sensors Biomedical technology Lower extremity Inertial measurement units

a b s t r a c t The deadlift is a compound full-body exercise that is fundamental in resistance training, rehabilitation programs and powerlifting competitions. Accurate quantification of deadlift biomechanics is important to reduce the risk of injury and ensure training and rehabilitation goals are achieved. This study sought to develop and evaluate deadlift exercise technique classification systems utilising Inertial Measurement Units (IMUs), recording at 51.2 Hz, worn on the lumbar spine, both thighs and both shanks. It also sought to compare classification quality when these IMUs are worn in combination and in isolation. Two datasets of IMU deadlift data were collected. Eighty participants first completed deadlifts with acceptable technique and 5 distinct, deliberately induced deviations from acceptable form. Fifty-five members of this group also completed a fatiguing protocol (3-Repition Maximum test) to enable the collection of natural deadlift deviations. For both datasets, universal and personalised random-forests classifiers were developed and evaluated. Personalised classifiers outperformed universal classifiers in accuracy, sensitivity and specificity in the binary classification of acceptable or aberrant technique and in the multi-label classification of specific deadlift deviations. Whilst recent research has favoured universal classifiers due to the reduced overhead in setting them up for new system users, this work demonstrates that such techniques may not be appropriate for classifying deadlift technique due to the poor accuracy achieved. However, personalised classifiers perform very well in assessing deadlift technique, even when using data derived from a single lumbar-worn IMU to detect specific naturally occurring technique mistakes. Ó 2017 Elsevier Ltd. All rights reserved.

1. Introduction The deadlift is a compound full-body exercise that is fundamental in resistance training, rehabilitation and powerlifting (Escamilla et al., 2000; Hales, 2010). It is a complex movement that requires training to ensure correct form (Hales, 2010). Aberrant deadlift biomechanics have been shown to increase load shear forces in the lower back (Cholewicki et al., 1991), potentiating the risk of injury. Thus, reliable assessment of deadlift biomechanics is necessary to mitigate injury risk. The assessment of deadlift biomechanics is typically undertaken using 3-D motion capture or subjective visual analysis, both of which have limitations. Using 3-D motion capture systems is expensive and data processing can be time intensive (Bonnet ⇑ Corresponding author at: Insight Centre for Data Analytics, O’Brien Centre for Science, Science Centre East, Belfield, Dublin 4, EIRE, Ireland. E-mail address: [email protected] (M.A. O’Reilly). 1 Joint lead authors.

et al., 2013). Subjective visual assessment can prove unreliable as visually assessing numerous constituent components simultaneously is challenging (Whiteside et al., 2016). Wearable inertial measurement units (IMUs) could bridge the gap between laboratory and clinical acquisition and assessment of human biomechanics as they allow for an inexpensive method of acquiring objective human movement data in unconstrained environments (McGrath et al., 2012). In this paper the term IMU system will describe IMU sensors, sensor signals, associated signal processing and exercise classification algorithm output. A growing body of literature has investigated how these systems can be used for exercise biomechanics evaluation and feedback (Giggins et al., 2014; Gleadhill et al., 2016; Melzi et al., 2009; O’Reilly et al., 2015; Pernek et al., 2015; Taylor et al., 2012; Velloso et al., 2013; Whelan et al., 2015, 2016a, 2016b). These studies have demonstrated that IMU systems can monitor exercise biomechanics with moderate to excellent accuracy. Of these, only Gleadhill et al. (2016) analysed the deadlift using an IMU system. The authors compared an IMU system to a traditional

http://dx.doi.org/10.1016/j.jbiomech.2017.04.028 0021-9290/Ó 2017 Elsevier Ltd. All rights reserved.

341

156

M.A. O’Reilly et al. / Journal of Biomechanics 58 (2017) 155–161

3D motion capture system in identifying temporal features in deadlift technique variations. They found high agreement between the two systems and stated that the work provided the foundations to use IMU systems for activity recognition and technique analysis. While a promising first step, they only analysed correlations between the two systems and did not attempt to classify technique deviations, meaning application in a real world environment may be limited. Furthermore, no information is provided regarding the deadlift technique variations investigated or if these variations were induced or natural. The majority of the above research classified exercise technique as acceptable or aberrant using universal classifiers. A universal classifier is built using a large data set collected from multiple participants. This type of classifier will function when presented with new data from individuals not included in the training data. These classifiers are often developed using induced deviations (i.e. deviations intentionally performed by participants). However, natural deviations may be nuanced and subsequently more difficult to classify. Therefore, universal classifiers may not always be suitable for exercise analysis. This may be particularly true in the deadlift, as the intricacies associated with an optimal biomechanics can vary greatly between individuals (Hales, 2010). Furthermore, in a natural environment a variety of deviations may present in different quantities, some occurring less frequently than others. This makes collecting a large and balanced data set of natural deviations challenging, which is necessary for the development of a robust universal classification system (Chawla, 2005; He and Garcia, 2009; Kotsiantis et al., 2007). For these reasons a personalised classifier may be more appropriate for deadlift analysis. A personalised classifier is developed using data provided by a single person. IMU signals are collected from participants and each individual repetition is assessed and labelled by a movement expert through live or post hoc video analysis. IMU signals for each repetition can then be associated with this repetition’s movement pattern. When the data set used for training the IMU system is collected this way, the system can be individualised. While this may prove more labour intensive than using an IMU system based on a universal classifier, it may be appropriate when analysing complex exercises like the deadlift. The objective of this study was to determine whether an IMU system could identify deviations from acceptable deadlift biomechanics. The aims of this study were: (a) determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of distinguishing between acceptable and aberrant deadlift biomechanics; (b) determine the capabilities of an IMU system at identifying specific deviations from acceptable deadlift biomechanics; (c) compare a personalised to a universal classifier in identifying the above; (d) compare the above on a large data set of deliberately induced technique deviations and a smaller data set of naturally occurring technique deviations.

Table 1 List and description of deadlift exercise deviations used in this study and the number of repetitions (n) extracted for each class when using induced deviations ad naturally occurring technique deviations. Deviation

Description

Induced reps (n)

Natural reps (n)

ACC SBB

Acceptable deadlift technique Shoulders behind bar at start position Rounded back at any point during movement Hyperextended spine at any point during movement Bar tilting Other

796 212

854 0

211

40

191

85

393 0

12 17

RB HEX BT OTH

automated classification system. This was undertaken using data derived from each individual IMU and combinations of multiple IMUs. A universal and a personalised classification system were evaluated for every participant. 2.2. Participants Eighty healthy volunteers (57 males, 23 females, age: 24.68 ± 4.91 years, height: 1.75 ± 0.094 m, body mass: 76.01 ± 13.29 kg) were recruited for the first experiment in this study. Fifty-five members of this cohort also participated in the second experiment (37 males, 18 females, age = 24.21 ± 5.25 years, height = 1.75 ± 0.1 m, body mass = 75.09 ± 13.56 kg). All participants had prior experience with the exercise and no musculoskeletal injury that would impair deadlift performance. Each participant signed a consent form prior to study commencement. The University Human Research Ethics Committee approved the study protocol. 2.3. Procedures The testing protocol was explained to participants upon their arrival at the laboratory. Prior to testing a ten-minute warm-up on an exercise bike (Lode B.V., Groningen, The Netherlands) was completed. Next, a Chartered Physiotherapist secured the IMUs to the following pre-determined specific anatomic locations on the participant using neoprene straps; over clothing at the spinous process of the 5th lumbar vertebra, the mid-point of both the right and left thighs (determined as half way between the greater trochanter and lateral femoral condyle), and on both shanks 2 cms above the lateral malleolus (Fig. 1). The orientation and location of the IMUs were consistent across participants and local frame x, y and z axes were used for each IMU (Fig. 1). The straps used were specifically designed for exercise environments and minimised unwanted IMU position deviation due to clothing and movement artefact. The IMU settings chosen (sampling frequency: 51.2 Hz, tri-axial accelerometer (±2 g), gyroscope (±500 °/s) and magnetometer (±1.9 Ga)) replicate those used in previous research and were based on pilot data analysis as described in Whelan et al. (2016b). Each IMU was calibrated for these specific sensor ranges and the Shimmer 3 default local coordinate system using the Shimmer 9DoF Calibration application (http://www.shimmersensing.com/shop/shimmer-9dof-calibration). In experiment 1 the participants completed 10 deadlift repetitions with acceptable form and 3 repetitions of each deviation (Table 1). In order to ensure standardisation, form was considered acceptable if it was completed as defined by the National Strength and Conditioning Association (NSCA) (Baechle and Earle, 2004). In experiment 2, participants completed a 3 RM test. This involves increasing load incrementally until an individual cannot maintain acceptable form and is described in detail by Horvat et al. (2007). 2.4. Data labelling

2. Methods 2.1. Experimental approach to problem Two experiments were employed to enable the development of a wearable IMU system for assessing deadlift technique. In the first experiment 80 participants completed deadlifts with acceptable form and deliberately induced technique deviations (Table 1). In the second experiment 55 participants performed a 3-repetition maximum strength (3 RM) deadlift protocol to elicit natural deadlift biomechanics breakdown. A Chartered Physiotherapist labelled video data of each deadlift repetition as acceptable or containing one of the technique deviations (Table 1). The physiotherapist has extensive training in strength and conditioning and has previous experience evaluating deadlift biomechanics. In both experiments data were acquired from 5 IMUs (SHIMMER, Shimmer Research, Dublin, Ireland) (Fig. 1). A total of 306 variables were extracted from the sensor signals from each IMU for every deadlift repetition. These variables were used to develop and evaluate an

Each deadlift repetition was separated and viewed on multiple occasions in a systematic format by the Chartered Physiotherapist. Repetitions were labelled as acceptable or the most dominant deviation from acceptable form was chosen. 2.5. Signal processing Signal processing and classification analyses were completed using MATLAB (2012, The MathWorks, Natwick, USA). Spectral analysis was completed on the IMU data. It was found that all data pertaining to movement was in the 0–20 Hz frequency band. Therefore the accelerometer x, y, z, gyroscope x, y, z and magnetometer x, y, z signals were first low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Nine additional signals were then calculated as follows: IMU 3-D orientation was computed using the gradient descent algorithm developed by Madgwick et al. (2011). The resulting W, X, Y and Z quaternion values are a mathematical representation of an object’s 3D orientation in space and are not subject to

342

M.A. O’Reilly et al. / Journal of Biomechanics 58 (2017) 155–161

157

Fig. 1. Image showing the five IMU positions: (1) the spinous process of the 5th lumbar vertebra, (2&3) the mid-point of both femurs on the lateral surface (determined as half way between the greater trochanter and lateral femoral condyle), (4&5) and on both shanks 2 cms above the lateral malleolus. Local shimmer axes x, y and z are also shown.

gimbal lock (Kuipers, 1999). The rotation quaternions were also converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal and transverse plane respectively. The magnitude of acceleration and rotational velocity were also computed using the vector magnitude of accelerometer x, y, z and gyroscope x, y, z respectively. Following this, each exercise repetition was programmatically extracted from the IMU data and resampled to a length of 250 samples. This was undertaken to time-normalise the data and minimise the influence of repetition tempo on signal feature calculations. 2.6. Classification Time-domain and frequency-domain descriptive features were computed in order to characterise each exercise repetition. The 17 features computed for each signal were ‘mean’, ‘RMS’, ‘standard deviation’, ‘kurtosis’, ‘median’, ‘skewness’, ‘range’, ‘variance’, ‘maximum’, ‘minimum’, ‘energy’, ‘25th percentile’, ‘75th percentile’, ‘fractal dimension’, ‘level crossing-rate’ and the variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 6 (Fig. 2). These replicate those used in recent similar work (Whelan et al., 2016a, 2016b). These features, when used in combination, describe the shape of the various signals from each IMU. When a person’s motion is altered due to aberrant deadlift technique, the IMU signals will also change. The features used capture the diverse range of signal changes that can occur due to aberrant deadlift biomechanics. All computed features form a feature-vector for each repetition that is used along with the repetition’s label to train classification algorithms. The random-forests method was employed to perform classification (Breiman, 2001). During analysis several types of classifiers were tested including K-Nearest Neighbours, Support Vector Machines and Naïve Bayes classifiers, however none were shown to provide improved results on the datasets and some increased computational time. A total of 128 trees were used for each random forest. This number was chosen after observing the accuracy rate for incrementing number of trees from 1 to 500. While an increased number of trees will always improve classification accuracy, this increase was considered negligible when using more than 128 trees.

Additional trees also reduce end user application efficiency. Initially, binary classification was evaluated using data from experiment 1 to establish how effectively each individual IMU and combination of IMUs could distinguish between acceptable and aberrant deadlift technique in a large, balanced data set of deliberately induced technique deviations. Multi-label classification was then evaluated on this data set to investigate how effectively each individual IMU and each IMU combination could be used to discriminate between acceptable deadlift technique and each of the deliberate deviations from acceptable technique (Table 1). Equivalent binary and multi-label classifiers were then applied to the data set from experiment 2. For each classification task, universal classifiers were evaluated using leave-o ne-subject-out-cross-validation (LOSOCV) (Fushiki, 2011). Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the overrepresented class(es) were removed in order to balance the training data. The quality of the personalised exercise classification systems was established using leave-one-out-cross-validation (LOOCV) (Fushiki, 2011). Each deadlift repetition corresponds to one fold of the cross validation. At each fold, one repetition is held out as test data while the random forests classifier is trained with the same participant’s other completed repetitions. Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the overrepresented class(es) were removed in order to balance the training data. The held out data is used to assess the classifier’s ability to correctly categorise new data it is presented with. Participants were not included for this analysis if they did not have at least 2 repetitions belonging to each class being classified, as this would not allow for training and test data for that class. The scores used to measure classification quality were accuracy, sensitivity and specificity computed according to the below formulae (TP = True Positive; TN = True Negative; FP = False Positive; FN = False Negative). TPþTN 1. Accuracy ¼ TPþFPþTNþFN TP 2. Sensitiv ity ¼ TPþFN TN 3. Specificity ¼ TNþFP

343

158

M.A. O’Reilly et al. / Journal of Biomechanics 58 (2017) 155–161

Fig. 2. Diagram linking number of IMUs, number of recorded and derived signals, number of features extracted and the variety of feature combinations used to test classifiers. In reviewing the accuracy, sensitivity and specificity scores produced by each classifier, 90% or higher was considered an ’excellent’ quality result, 80–89% was considered a ‘good’ quality result, 60–79% was considered a ’moderate’ result and anything less than 59% was deemed a poor result. This classification accuracy rating system has been used in previously published work (Whelan et al., 2016a, 2016b). For personalised classifiers, each participant’s scores were calculated then the mean and standard deviation across all participants were computed.

sifiers. Results are also compared for systems developed using data from each individual IMU and a variety of combinations of 5, 3 and 2 IMUs. Multi-label classification results (i.e. detection of exact technique deviation) are demonstrated in Table 3. The results show classification efficacy when using data derived from each individual IMU and various combinations of multiple IMUs.

3. Results 3.3. Experiment 2: Naturally occurring technique deviations 3.1. Data set Table 1 shows the total number of extracted deadlift repetitions for each class in experiment 1 (induced reps) and experiment 2 (natural reps). The data set from experiment 1 is larger and more balanced that that arising from experiment 2. 3.2. Experiment 1: Induced technique deviations Binary classification results for the data set collected in experiment 1, where participants deliberately completed deadlifts with technique deviations, are demonstrated in Table 2. It shows the classification accuracy, sensitivity and specificity (Formulae 1–3) following cross-validation for both universal and personalised clas-

Table 4 compares the quality of universal classifiers and personalised classifiers in the binary classification of deadlift technique using the data set of naturally occurring technique deviations from experiment 2. Classification efficacy is shown for systems using multiple IMUs and systems developed using individual IMUs at various anatomical positions. The results shown in Table 5 show the capacity of IMU based systems to classify which natural deviation presents using universal and personalised classifiers. 4. Discussion The objective of this study was to determine whether an IMU system could identify deviations from acceptable deadlift biome-

344

159

M.A. O’Reilly et al. / Journal of Biomechanics 58 (2017) 155–161

Table 2 Overall accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV to evaluate global classifiers and LOOCV to evaluate personalised classifiers for induced technique deviations. IMU placement(s)

All 5 Sensors Lumbar & Shanks Lumbar & Thighs Both Shanks Both Thighs Left Shank Left Thigh Lumbar Right Thigh Right Shank

Personalised classifiers ð xðSDÞÞ

Global classifiers Accuracy (%)

Sensitivity (%)

Specificity (%)

Accuracy (%)

Sensitivity (%)

Specificity (%)

75 71 74 66 73 64 68 70 72 63

57 53 56 47 58 48 58 52 59 42

89 85 87 80 84 76 75 83 82 79

93 93 91 91 90 88 87 88 89 90

90 91 89 88 86 86 85 90 86 87

96 96 93 95 93 89 89 86 91 92

(6) (6) (6) (6) (8) (10) (8) (8) (9) (7)

(9) (7) (9) (9) (10) (11) (9) (9) (10) (8)

(6) (7) (9) (7) (8) (11) (10) (11) (10) (10)

Table 3 Overall accuracy, sensitivity and specificity in multi-class classification (exact deviation) for each combination of IMUs following LOSOCV to evaluate global classifiers and LOOCV to evaluate personalised classifiers for induced technique deviations. IMU placement(s)

All 5 Sensors Lumbar & Shanks Lumbar & Thighs Both Shanks Both Thighs Left Shank Left Thigh Lumbar Right Thigh Right Shank

Personalised classifiers ð xðSDÞÞ

Global classifiers Accuracy (%)

Sensitivity (%)

Specificity (%)

Accuracy (%)

Sensitivity (%)

Specificity (%)

60 56 55 38 48 37 37 49 44 34

62 59 56 43 46 41 38 52 43 39

92 91 91 88 89 87 87 90 89 87

81 81 79 75 74 71 67 72 74 73

83 83 81 77 76 73 69 75 75 74

96 96 96 95 95 94 93 94 94 94

(11) (10) (12) (14) (13) (15) (13) (13) (14) (13)

(13) (11) (13) (15) (14) (17) (16) (14) (14) (15)

(2) (2) (3) (3) (3) (3) (3) (3) (3) (3)

Table 4 Overall accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV to evaluate universal classifiers and LOOCV to evaluate personalised classifiers for natural technique deviations. IMU placement(s)

All 5 Sensors Lumbar & Shanks Lumbar & Thighs Both Shanks Both Thighs Left Shank Left Thigh Lumbar Right Thigh Right Shank

Personalised classifiers ð xðSDÞÞ

Global classifiers Accuracy (%)

Sensitivity (%)

Specificity (%)

Accuracy (%)

Sensitivity (%)

Specificity (%)

73 70 70 65 70 63 67 70 71 69

78 76 76 72 74 68 70 76 78 76

49 34 42 27 42 39 48 34 36 31

84 83 82 82 82 80 80 80 82 80

83 81 80 79 79 80 79 80 79 79

83 82 81 78 82 76 77 78 81 77

(13) (13) (12) (15) (14) (15) (14) (14) (13) (15)

(17) (17) (16) (22) (16) (17) (16) (15) (16) (21)

(17) (16) (20) (20) (23) (24) (22) (22) (21) (23)

Table 5 Overall accuracy, sensitivity and specificity in multi-class classification (exact deviation) for each combination of IMUs following LOSOCV to evaluate global classifiers and LOOCV to evaluate personalised classifiers for natural technique deviations. IMU placement(s)

All 5 Sensors Lumbar & Shanks Lumbar & Thighs Both Shanks Both Thighs Left Shank Left Thigh Lumbar Right Thigh Right Shank

Personalised classifiers ð xðSDÞÞ

Global classifiers Accuracy (%)

Sensitivity (%)

Specificity (%)

Accuracy (%)

Sensitivity (%)

Specificity (%)

54 54 32 32 48 47 56 36 38 53

18 18 18 17 13 11 11 11 13 15

87 87 82 84 83 82 81 81 77 84

78 75 77 73 74 71 67 75 71 69

74 78 75 72 72 74 68 74 71 65

90 66 81 78 77 70 68 83 71 78

chanics. The results in Section 3 indicate this is possible with good to excellent overall accuracy using a personalised classifier. Personalised classifiers outperform universal classifiers in attempting

(13) (13) (13) (18) (19) (18) (20) (14) (19) (13)

(21) (13) (15) (22) (25) (16) (21) (15) (24) (20)

(12) (34) (15) (21) (22) (29) (23) (12) (26) (20)

to identify both induced and natural deadlift deviations regardless of IMU set-up. IMU systems using a personalised classifier produce good to excellent accuracy when identifying induced deviations

345

160

M.A. O’Reilly et al. / Journal of Biomechanics 58 (2017) 155–161

from acceptable deadlift biomechanics (Tables 2 and 3). Personalised classifiers can also identify natural deviations with moderate to good accuracy (Tables 4 and 5). In reviewing the literature, no data was found on the ability of an IMU system to classify deadlift technique. Gleadhill et al. (2016) compared the ability of an IMU system in identifying temporal features in the deadlift to a motion capture system, finding high agreement. The results presented in this work build upon this research by using temporal and other time and frequency domain features to create a classification framework. Furthermore, the high agreement achieved by Gleadhill et al. (2016) are achieved with a 3 IMU system. The results presented in Section 3 indicate that a single IMU system is capable of classifying acceptable and aberrant deadlift technique with moderate to excellent accuracy. Single IMU systems are less expensive and more practical for end users due to reduced risk of placement error and power usage, making them more desirable for daily environment applications (Bonnet et al., 2012). It is difficult to directly compare results with similar work due to differences in exercises investigated, dataset sizes, sensor positions and end user feedback. However, these results compare favourably to research in the area (Giggins et al., 2014; Melzi et al., 2009; O’Reilly et al., 2015; Taylor et al., 2012; Velloso et al., 2013; Whelan et al., 2015, 2016a). The majority of research to date has investigated the ability of IMU systems to monitor technique in simple exercises such as straight leg raises, dumbbell curls or heel slides (Giggins et al., 2014). This paper evaluates an IMU system’s ability to assess deadlift biomechanics, a complex multi-joint exercise. The presented system can distinguish between six different deadlift classes (acceptable and five deviations) with moderate to good overall accuracy (Tables 3 and 5). The lower number of classes in some studies (Giggins et al., 2014; Taylor et al., 2012; Velloso et al., 2013) may make it easier for classifiers to identify specific deviations with higher efficacy. Furthermore, the system presented in this work is capable of identifying natural deviations from acceptable deadlift biomechanics. The majority of previous research identified induced deviations using a universal classifier (Giggins et al., 2014; O’Reilly et al., 2015; Taylor et al., 2012). Universal classification techniques have been shown to classify naturally occurring deviations in the single leg squat with moderate accuracy (Whelan et al., 2015, 2016b). However, the ability of this classifier to identify specific deadlift deviations is poor (Table 5). This may be due to a number of factors. The number of acceptable deadlifts far outnumbers any other label (Table 1). This unbalanced data set makes it difficult to create universal classifiers that can be used for all individuals (Chawla, 2005; He and Garcia, 2009). As many deviations were sporadic, the use of a universal classifier to identify specific deadlift deviations may require a larger data set including more deviations. Additionally, the intersubject variability in acceptable deadlift biomechanics, as described by the IMU sensor signal features, may exceed the intra-subject variability between acceptable technique and aberrant deadlift biomechanics. This would make universal classifier creation difficult. In addition to producing higher overall classification accuracy, a personalised classifier may offer other benefits. Personalised classifiers are more computationally efficient than universal classifiers as they use less training data and therefore require less memory. Unlike universal classifier development, they negate the need for a large data set to classify exercise biomechanics, (Chawla, 2005; He and Garcia, 2009). The use of a personalised classifier may also allow for the development of a universal classifier in the future. All labelled data collected for personalised classifier development could be stored and used to build the large data set necessary to improve universal classifiers. The main disadvantage associated

with a personalised classifier is that data must be collected and labelled from individual patients. This means practitioners must monitor exercise technique in real time or use post hoc video analysis and label appropriately, which may prove time consuming. However, since practitioners often monitor exercise biomechanics prior to independent exercise completion, it may fit into clinical practice smoothly. In an effort to streamline this process, the authors have recently developed a tablet application that enables clinicians to simultaneously capture video and IMU data from a person exercising. The application automatically splits video and IMU data into reps, allows efficient repetition labelling and can automatically build personalised classifiers. In conclusion, the deadlift is important in rehabilitation and strength and conditioning. Accurate deadlift biomechanics quantification is important to reduce injury risk and ensure goals are achieved. The work presented in this paper indicates that an IMU system can classify acceptable and aberrant deadlift biomechanics with good to excellent overall accuracy, sensitivity and specificity using a personalised classifier. Furthermore a personalised classification system is far better at identifying specific naturally occurring deadlift deviations. The results presented in this work are comparable with current research in the area. However, most of this research has been carried out using universal classifiers and identifying induced deviations. While a universal classifier may allow for less end user interaction, it is difficult to classify naturally occurring deviations from acceptable deadlift biomechanics using this technique. As a result, the use of a personalised classifier may be more appropriate for identifying aberrant deadlift biomechanics. Conflict of interest statement All authors of this article would like to state that there are no known conflicts of interest that could have biased or influenced the presented article. Acknowledgments This project is partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer (EPSPG/2013/574) and partly funded by Science Foundation Ireland (SFI/12/RC/2289). References Baechle, T.R., Earle, R.W., 2004. Resistance Training Exercise Techniques. NSCA’s Essentials of Personal Training, Champaign, IL. Bonnet, V., Mazza, C., Fraisse, P., Cappozzo, A., 2012. A least-squares identification algorithm for estimating squat exercise mechanics using a single inertial measurement unit. J. Biomech. 45, 1472–1477. Bonnet, V., Mazza, C., Fraisse, P., Cappozzo, A., 2013. Real-time estimate of body kinematics during a planar squat task using a single inertial measurement unit. IEEE Trans. Biomed. Eng. 60, 1920–1926. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Chawla, N.V., 2005. Data mining for imbalanced datasets: An overview. Data Mining and Knowledge Discovery Handbook. Springer, pp. 853–867. Cholewicki, J., McGill, S., Norman, R., 1991. Lumbar spine loads during the lifting of extremely heavy weights. Med. Sci. Sports Exerc. 23, 1179–1186. Escamilla, R.F., Francisco, A.C., Fleisig, G.S., Barrentine, S.W., Welch, C.M., Kayes, A.V., Speer, K.P., Andrews, J.R., 2000. A three-dimensional biomechanical analysis of sumo and conventional style deadlifts. Med. Sci. Sports Exerc. 32, 1265–1275. Fushiki, T., 2011. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21, 137–146. Giggins, O.M., Sweeney, K.T., Caulfield, B., 2014. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. J. Neuroeng. Rehabil. 11, 158–168. Gleadhill, S., Lee, J.B., James, D., 2016. The development and validation of using inertial sensors to monitor postural change in resistance exercise. J. Biomech. 49, 1259–1263. Hales, M., 2010. Improving the deadlift: understanding biomechanical constraints and physiological adaptations to resistance exercise. Strength Cond. J. 32, 44– 51.

346

M.A. O’Reilly et al. / Journal of Biomechanics 58 (2017) 155–161 He, H., Garcia, E.A., 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284. Horvat, M., Franklin, C., Born, D., 2007. Predicting strength in high school women athletes. J. Strength Cond. Res. 21, 1018–1022. Kotsiantis, S.B., Zaharakis, I., Pintelas, P., 2007. Supervised machine learning: a review of classification techniques. In: Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies. IOS Press, pp. 3–25. Kuipers, J.B., 1999. Quaternions and Rotation Sequences. Princeton University Press, Princeton. Madgwick, S.O.H., Harrison, A.J.L., Vaidyanathan, R., 2011. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR). IEEE, pp. 1–7. McGrath, D., Greene, B.R., O’Donovan, K.J., Caulfield, B., 2012. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Eng. 15, 207–213. Melzi, S., Borsani, L., Cesana, M., 2009. The virtual trainer: supervising movements through a wearable wireless sensor network. In: 6th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, 2009. IEEE, pp. 1–3. O’Reilly, M., Whelan, D., Chanialidis, C., Friel, N., Delahunt, E., Ward, T., Caulfield, B., 2015. Evaluating squat performance with a single inertial measurement unit. In: IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, pp. 1–6.

161

Pernek, I., Kurillo, G., Stiglic, G., Bajcsy, R., 2015. Recognizing the intensity of strength training exercises with wearable sensors. J. Biomed. Inform. 58, 145– 155. Taylor, P.E., Almeida, G.J., Hodgins, J.K., Kanade, T., 2012. Multi-label classification for the analysis of human motion quality. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, pp. 2214–2218. Velloso, E., Bulling, A., Gellersen, H., Ugulino, W., Fuks, H., 2013. Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference. ACM, pp. 116–123. Whelan, D., O’Reilly, M., Ward, T., Delahunt, E., Caulfield, B., 2015. Evaluating performance of the single leg squat exercise with a single inertial measurement unit. In: Proceedings of the 3rd Workshop on ICTs for Improving Patients Rehabilitation Research Techniques. ACM, pp. 144–147. Whelan, D., O’Reilly, M., Ward, T., Delahunt, E., Caulfield, B., 2016a. Evaluating performance of the lunge exercise with multiple and individual inertial measurement units. In: Pervasive Health 10th EAI International Conference on Pervasive Computing Technologies for Healthcare, pp. 101–108. Whelan, D., O’Reilly, M., Ward, T., Delahunt, E., Caulfield, B., 2016b. Technology in rehabilitation: evaluating the single leg squat exercise with wearable inertial measurement units. Meth. Inf. Med. 56, 88–94. Whiteside, D., Deneweth, J.M., Pohorence, M.A., Sandoval, B., Russell, J.R., McLean, S. G., Zernicke, R.F., Goulet, G.C., 2016. Grading the functional movement screen: a comparison of manual (real-time) and objective methods. J. Strength Cond. Res. 30, 924–933.

347

1

Original Articles

Technology in Rehabilitation: Comparing Personalised and Global Classification Methodologies in Evaluating the Squat Exercise with Wearable IMUs Darragh F. Whelan1,2*; Martin A. O'Reilly1,2*; Tomás E. Ward3,4; Eamonn Delahunt2; Brian Caulfield1,2 1Insight

Centre for Data Analytic, University College Dublin, Dublin, Ireland; of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland; 3Insight Centre for Data Analytics, Maynooth University, Maynooth, Ireland; 4Biomedical Engineering Research Group, Department of Electronic Engineering, Maynooth University, Maynooth, Ireland 2School

Keywords Exercise therapy, biomedical technology, lower extremity, physical therapy speciality, inertial measurement units

Summary Background: The barbell squat is a popularly used lower limb rehabilitation exercise. It is also an integral exercise in injury risk screening protocols. To date athlete/patient technique has been assessed using expensive laboratory equipment or subjective clinical judgement; both of which are not without shortcomings. Inertial measurement units (IMUs) may offer a low cost solution for the objective evaluation of athlete/patient technique. However, it is not yet known if global classification techniques are effective in identifying naturally occurring, minor deviations in barbell squat technique. Objectives: The aims of this study were to: (a) determine if in combination or in isolaCorrespondence to: Darragh Whelan Insight UCD Science Centre EAST Belfield Dublin 4 Ireland E-mail: [email protected]

tion, IMUs positioned on the lumbar spine, thigh and shank are capable of distinguishing between acceptable and aberrant barbell squat technique; (b) determine the capabilities of an IMU system at identifying specific natural deviations from acceptable barbell squat technique; and (c) compare a personalised (N=1) classifier to a global classifier in identifying the above. Methods: Fifty-five healthy volunteers (37 males, 18 females, age = 24.21 +/- 5.25 years, height = 1.75 +/- 0.1 m, body mass = 75.09 +/- 13.56 kg) participated in the study. All participants performed a barbell squat 3-repetition maximum max strength test. IMUs were positioned on participants’ lumbar spine, both shanks and both thighs; these were utilized to record tri-axial accelerometer, gyroscope and magnetometer data during all repetitions of the barbell squat exercise. Technique was assessed and labelled by a Chartered Physiotherapist using an evaluation framework. FeaMethods Inf Med 2017; 56: ■■–■■ https://doi.org/10.3414/ME16-01-0141 received: December 14, 2016 accepted: April 11, 2017 epub ahead of print: ■■■

tures were extracted from the labelled IMU data. These features were used to train and evaluate both global and personalised random forests classifiers. Results: Global classification techniques produced poor accuracy (AC), sensitivity (SE) and specificity (SP) scores in binary classification even with a 5 IMU set-up in both binary (AC: 64%, SE: 70%, SP: 28%) and multiclass classification (AC: 59%, SE: 24%, SP: 84%). However, utilising personalised classification techniques even with a single IMU positioned on the left thigh produced good binary classification scores (AC: 81%, SE: 81%, SP: 84%) and moderate-to-good multiclass scores (AC: 69%, SE: 70%, SP: 89%). Conclusions: There are a number of challenges in developing global classification exercise technique evaluation systems for rehabilitation exercises such as the barbell squat. Building large, balanced data sets to train such systems is difficult and time intensive. Minor, naturally occurring deviations may not be detected utilising global classification approaches. Personalised classification approaches allow for higher accuracy and greater system efficiency for end-users in detecting naturally occurring barbell squat technique deviations. Applying this approach also allows for a single-IMU set up to achieve similar accuracy to a multi-IMU setup, which reduces total system cost and maximises system usability.

* These authors contributed equally to this work

© Schattauer 2017

Methods Inf Med 4/2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

348

2

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

1. Introduction The squat is a compound full-body exercise, whose constituent movements are integral to activities of daily living. The barbell squat (squat with a weighted barbell placed across the upper shoulders) often features as a fundamental exercise in resistance training and rehabilitation programs. Furthermore, it is incorporated into musculoskeletal injury risk screening/identification protocols [1]. Aberrant squat technique has been shown to increase stress on the joints of the lower extremity [2], potentiating the risk of injury. Thus, the reliable assessment of technique is necessary to mitigate injury risk. The assessment of squat technique is typically undertaken using one of two distinct methods: (a) 3-D motion capture; (b) subjective visual analysis. Both of these have a number of limitations. 3-D motion capture systems are expensive and the application of skin-mounted markers may hinder normal movement [3, 4]. Furthermore, data processing can be time intensive and specific expertise is often required to interpret the processed data and make recommendations on the observed results. Therefore, these systems are not frequently used to assess squat technique beyond the research laboratory [5]. In clinical and gym-based settings, subjective visual assessment is typically used to assess technique. This subjective visual assessment of human movement is not always reliable even amongst experts, as the need to visually assess the numerous constituent components of the movement simultaneously is challenging [6]. Wearable inertial measurement units (IMUs) may offer the potential to bridge the gap between laboratory and day-to-day “real-world” acquisition and assessment of human movement. These IMUs are small, inexpensive sensors that consist of accelerometers, gyroscopes and magnetometers. They are able to acquire data pertaining to the linear and angular motion of individual limb segments and the centre of mass of the body. Self-contained, wireless IMU devices are easy to set-up and allow for the acquisition of human movement data in unconstrained environments [7]. In this paper the term IMU system will be used to describe the IMU sensors, the sensor sig-

nals, the associated signal processing applied to them and the output of the exercise classification algorithms. IMU systems can robustly track the variety of postures and environmental complexities associated with training, unlike camera-based systems, which are hampered by location, occlusion and lighting issues in such settings [8]. IMUs have also been shown to be as effective as markerbased systems at measuring joint angles [5, 9, 10]. There are many commercially available examples of IMU systems that monitor physical activity (e.g. JawboneTM and FitbitTM). However, using IMU systems to assess gym-based exercises such as the barbell squat is less common. Researchers have demonstrated the ability of IMUbased systems to distinguish different exercises and count exercise repetitions with moderate to good levels of accuracy [11–15]. Whilst these systems are capable of counting exercise repetitions, they do not provide instruction on technique and performance quality. A holistic exercise tracking system should not only recognise the exercise completed and count repetitions, but should also provide technique feedback. Furthermore, in order for IMU systems to assess human movement data as part of a musculoskeletal injury risk screening protocol, they need to be able to identify aberrant movement patterns and provide easily interpretable data to clinicians/coaches who use them. A growing body of scientific literature has investigated the ability of IMU systems to assess technique in order to provide this holistic exercise analysis [15–22]. The majority of authors have developed these IMU systems by employing the following steps: (a) collection of labelled dataset; (b) pre-processing of data; (c) data segmentation; (d) feature extraction; (e) classification development and evaluation [23]. These studies have demonstrated the ability of IMU systems in identifying deviations with moderate to excellent levels of accuracy in exercises such as the biceps curl, military press, squat and lunge. However the majority of these IMU systems were developed using a dataset consisting of induced deviations (i.e. deviations that were intentionally performed by participants). When deviations occur naturally, the exact way in which they present

may be more nuanced and subsequently more difficult to classify. This means that these systems may not be suitable for a real world environment where deviations present in a natural manner. When collecting data in a natural environment a variety of deviations may present in different quantities with some deviations occurring less frequently than others. This means collecting a large and balanced data set of natural deviations is challenging. This is necessary to allow for the development of a robust global classification system [24–26]. In these situations a personalised classifier may be more appropriate. A personalised classifier is a classifier developed on data provided by a single person (N of 1). The data used to develop this classifier is collected from participants as they complete exercises wearing IMUs. Each individual exercise repetition is assessed and labelled by a movement expert through live or post-hoc video analysis. This means that IMU signals for each exercise repetition can be associated with this repetition’s movement pattern. When the data set used for training the IMU system is collected in this way it means the system can be individualised. While this may prove more labour intensive than using an IMU system based on a global classifier it may be more appropriate in some situations.

2. Objectives The barbell squat is a compound full-body exercise that is typically a constituent component of resistance training, rehabilitation programs and musculoskeletal injury risk screening protocols. Incorrect technique can increase the risk of sustaining a musculoskeletal injury. Traditionally, exercise technique has been evaluated using expensive motion capture systems or via subjective visual inspection from trained professionals. IMU systems offer an opportunity to provide low-cost exercise technique assessment. However to date, no research has evaluated the capability of IMU systems to identify natural deviations in barbell squat technique. In this setting the use of an individualised classifier based on an N of 1 data set may prove more appropriate than global classifiers.

Methods Inf Med 4/2017

© Schattauer 2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

349

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

Therefore, the research question that this study seeks to address is: “how well can an IMU-based system quantify barbell squat technique?” The aims of this study were to: (a) determine if in combination or in isolation, IMUs positioned on the lumbar spine, thigh and shank are capable of distinguishing between acceptable and aberrant barbell squat technique; (b) determine the capabilities of an IMU system at identifying specific natural deviations from acceptable barbell squat technique; (c) compare a personalised to a global classifier in identifying the above.

3. Methods 3.1 Experimental Approach to Problem This study employed an opportunistic approach to the development of a wearable sensor system for automatically assessing barbell squat technique. Participants were required to perform a 3-repetition maximum barbell squat test. This test was recorded in HD video. A Chartered Physiotherapist then assessed each repetition video and labelled the labelled it appropriately (i.e. acceptable or containing one of the deviations identified in ▶ Table 1). In order to ensure standardisation, form was considered acceptable if it was completed as defined by the National Strength and Conditioning Association (NSCA) [27]. The deviations from this acceptable form are detailed in ▶ Table 1. During performance of the barbell squats, data was acquired from 5 IMUs (SHIMMER, Shimmer Research, Dublin, Ireland) placed on the lumbar spine, right and left thigh and right and left shank. The IMUs were positioned on each participant by the same researcher using a standardised and repeatable protocol. Participants were allowed a rest interval between performances of each set of repetitions. Following data collection, a total of 306 variables were extracted from the sensor signals for every repetition from each IMU. These variables were used to develop and evaluate the quality of an automated classification system for the analysis of barbell squat technique. This was undertaken using data derived from each individual IMU and combinations of multiple

Table 1 List and description of barbell squat exercise deviations used in this study.

Label

Description

Acceptable

Acceptable technique

Knee Valgus

Knees coming together during downward phase

Knee Varus

Knees coming apart during downward phase

3

Knees Too Forward Knees ahead of toes during downward phase Heels Elevated

Heels raising off the ground during exercise

Bent Over

Excessive flexion of hip and torso during exercise

Other

Other deviation, not highlighted in NSCA guidelines

NSCA = National Strength and Conditioning Association

IMUs. A global classification system was evaluated as well as separate (N of 1) personalised classifier for each participant.

3.3 Procedures The testing protocol was explained to participants upon their arrival at the laboratory. Prior to formal testing all participants performed a ten-minute warm-up on an exercise bike (Lode B.V., Groningen, The Netherlands) maintaining a power output of 100W and constant cadence of 75–85 revolutions per minute. Following completion of the warm-up, a Chartered Physiotherapist secured the IMUs to pre-determined specific anatomic locations on the participant as follows: the spinous process of the 5th lumbar vertebra, the mid-point of both the right and left femurs (determined as half way between the greater trochanter and lateral femoral condyle), and on both shanks 2cms above the lateral malleolus (▶ Figure 1). The orientation and location of the IMUs was consistent across participants.

3.2 Participants Fifty-five healthy volunteers (37 males, 18 females, age = 24.21 +/- 5.25 years, height = 1.75 +/- 0.1 m, body mass = 75.09 +/- 13.56 kg) participated in the study. No participant reported having a current or recent musculoskeletal injury that would impair his or her performance of the exercise. All participants reported a level of familiarity with the barbell squat exercise. The University College Dublin Human Research Ethics Committee approved the study protocol and written informed consent was obtained from all participants before testing. In cases where participants were under the age of 18, written informed consent was also obtained from a parent or guardian.

Figure 1 Image showing the five IMU positions: (1) the spinous process of the 5th lumbar vertebra, (2&3) the midpoint of both femurs on the lateral surface (determined as half way between the greater trochanter and lateral femoral condyle), (4&5) both shanks 2cm above the lateral malleolus.

1

2

4

3

5

© Schattauer 2017

Methods Inf Med 4/2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

350

4

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

Table 2 List and description of barbell squat exercise labels used in this study and the number of repetitions extracted of each class as labelled by the Chartered Physiotherapist. Label

Description

Total reps

Acceptable

Acceptable technique

884

Knee Valgus

Knees coming together during downward phase

Knee Varus

Knees coming apart during downward phase

183

Knees Too Forward

Knees ahead of toes during downward phase

50

Heels Elevated

Heels raising off the ground during exercise

Bent Over

Excessive flexion of hip and torso during exercise

Other

Other deviation, not highlighted in NSCA guidelines

22

7 96 250

NSCA = National Strength and Conditioning Association

A pilot study was undertaken to determine the most appropriate sampling rate and the ranges for the accelerometer and gyroscope on board the IMUs. For the pilot study, data were acquired (512 samples/s) during performance of the squat, lunge, deadlift, single-leg squat and tuck jump exercises. A Fourier transform was then used to estimate the spectral extent of the signals which was found to be less than 20 Hz. Therefore, a sampling rate of 51.2 samples/s was chosen based upon the Shannon sampling theorem and the Nyquist criterion [28]. Each IMU was configured to stream triaxial accelerometer (± 2 g), gyroscope (± 500 o/s) and magnetometer (± 1.9 Ga) data with the sensor ranges chosen based upon data from the pilot study. Each IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application [29]. Participants were required to complete a full 3-repetition maximum (3RM) strength test for the barbell squat [29]. Following a warm-up on an exercise bike, participants completed a set of barbell squat exercises with a resistance that allowed for 8–12 repetitions comfortably. After resting for 1-minute, the load was increased by 10–20% and they performed a further 4–6 repetitions. This was followed by a 2-minute rest period. Following this they performed 3 repetitions with near maximum load. They then rested for 2–4 minutes. If they passed the previous set, the weight was incremented by 5–10% and another 3-repetition set was completed. This load increment was repeated until the partici-

knees until their thighs were parallel to the floor. As they moved upward a flat back was maintained and their chest was held up and out. Hips and knees were to be extended at the same rate with heels on floor and knees aligned over feet until the starting position was reached. The bar was then placed back on the rack. If a repetition was not completed as above, then the Chartered Physiotherapist selected the most dominant deviation from a pre-defined list (▶ Table 1). This method of data labelling replicates methods from recently published work in the field of IMU based exercise technique classification systems [21].

pant could no longer lift the weight in a safe manner for three repetitions.

3.5 Signal Processing and Statistical Analysis

3.4 Data Labelling

Nine signals were collected from each IMU; accelerometer x, y, z, gyroscope x, y, z and magnetometer x, y, z. Data were analysed using MATLAB (2012, The MathWorks, Natwick, USA). To eliminate unwanted high-frequency noise during each repetition, the nine signals were low pass filtered at fc = 20 Hz using a Butterworth filter of order n = 8. Whilst classification is solely possible using features derived from the accelerometer, gyroscope and magnetometer signals, the use of additionally derived signals improves system accuracy, sensitivity and specificity. As such, nine additional signals were then calculated as follows: The 3-D orientation of the IMU was computed using the gradient descent algorithm developed by Madgwick et al. [30]. The resulting W, X, Y and Z quaternion values were also converted to pitch, roll and yaw signals. The pitch, roll and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal and transverse plane respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y, z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the magnitude of inertial acceleration of the lumbar spine and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y, z. Each exercise repetition was extracted from the IMU data and resampled to a

All repetitions were recorded using a HD video camera placed in front of the participants. The video recordings of each exercise repetition were reviewed by a Chartered Physiotherapist with over seven years experience in musculoskeletal and sports physiotherapy. Each exercise repetition was separated and reviewed on multiple occasions systematically. For each repetition, the Chartered Physiotherapist first deemed if exercise technique was ‘acceptable’. The criteria for acceptable technique were based upon the recommendations detailed in National Strength and Conditioning Association guidelines [27]. For safety reasons participants completed the exercise in a squat rack. The barbell was placed on the rack just above shoulder level and loaded appropriately. The participant then stepped under the bar and placed it on the back of their shoulders, slightly below their neck. The bar was held with both arms and lifted off the rack by pushing with the legs and straightening the torso. The participant then stepped away from the rack and completed the squatting movement. Their chest was held up and out with their head tilted slightly up. As participants moved into the squat position they were instructed to allow hips and knees to flex while keeping their torso to floor angle constant. They were required to keep their heels on the floor and knees aligned over their feet. Participants continued flexing at the hips and

Methods Inf Med 4/2017

© Schattauer 2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

351

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

5 IMUs

•Lumbar •Both Shanks •Both Thighs

9 Recorded + 9 Derived Signals per IMU

•Accelerometer x,y & z •Gyroscope x,y & z •Magnetometer x,y & z •Pitch, Roll & yaw •Accelerometer & Gyroscope Magnitude •Rotation quaternion W,X,Y &Z

17 Features computed from each repetition, of each signal from each IMU

Figure 2 Diagram linking number of IMUs, number of recorded and derived signals, number of features extracted and the variety of feature combinations used to test classifiers.

5

•'Mean', 'RMS', 'Standard Deviation', 'Kurtosis', 'Median', 'Skewness', 'Range', 'Variance', 'Max', 'Min', 'Energy', '25th Percentile', '75th Percentile', 'Approx. Wavelet Coefficients', 'Detailed Wavelet Coefficients', 'Level Crossing Rate', 'Fractal Dimension' 10 IMU Combinations used to develop and evaluate random forests classifiers

Global Classifiers evaluated with LOSOCV

Personalised Classifiers evaluated with LOOCV

length of 250 samples. This time-normalisation was undertaken in an attempt to minimise the influence a participant’s repetition tempo had on signal feature calculations. It also ensured consistent computational efficiency in applications for end users and has been used in recently published, similar work [19, 21, 22]. Repetitions completed by the participant where the IMU’s Bluetooth signal dropped were excluded from analysis. The total number of repetitions belonging to each class are shown in ▶ Table 2. Time-domain and frequency-domain descriptive features were computed in order to describe the pattern of each of the eighteen signals when the barbell squats were completed. These features were namely ‘Mean’, ‘RMS’, ‘Standard Deviation’, ‘Kurtosis’, ‘Median’, ‘Skewness’, ‘Range’, ‘Variance’, ‘Max’, ‘Min’, ‘Energy’, ‘25th Percentile’, ‘75th Percentile’, ‘Level Crossing Rate’, ‘Fractal Dimension’ [31] and the ‘variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 7’ [32]. This resulted in 17 features for each of

the 18 available signals producing a total of 306 features per IMU. ▶ Figure 2 summarises the above whereby, 5 IMUs recorded 9 signals each, 9 more signals were derived from these resulting in a total of 18 signals per IMU. 17 features were computed per repetition for each signal from each IMU resulting in a total of 1530 features (306 per IMU, 17 per signal). These features were then used to develop and evaluate a variety of classifiers as described below. The random-forests method was employed to perform classification [33]. This technique was chosen as it has been shown to be effective in analysing exercise technique with IMUs when compared to the Naïve-Bayes and Radial-basis function network techniques [34]. 128 decision trees were used in each random-forest classifier. Classifiers were developed and evaluated for the ten combinations of IMUs as shown in ▶ Figure 2. Initially, binary classification was evaluated to establish how effectively each individual IMU and each combination of IMUs could distinguish between acceptable and

aberrant barbell squat technique. All repetitions of acceptable technique were labelled ‘0’ and all repetitions performed with one of the pre-defined deviations as outlined in ▶ Table 1 were labelled ‘1’. Multi-label classification was then evaluated on the IMU data to investigate how effectively each individual IMU and each IMU combination could be used to discriminate between acceptable barbell squat technique and each of the six pre-defined deviations from acceptable technique as described in ▶ Table 1. All repetitions of acceptable performance remained labelled as ‘0’ and each of the different deviations were labelled ‘1–6’. The quality of the global exercise classification system was established using leave-one-subject-out-cross-validation (LOSOCV) and the random-forests classifier with 128 trees [35]. Each participant’s data corresponds to one fold of the cross validation. At each fold, one participant’s data is held out as test data while the random forests classifier is trained with all other participants’ data. Where each class in the training data did not have an equal

© Schattauer 2017

Methods Inf Med 4/2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

352

6

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

Sensor(s)

Accuracy (%)

Sensitivity (%) Specificity (%)

All 5 Sensors

64

70

28

Lumbar & Shanks

65

69

34

Lumbar & Thighs

62

68

21

Both Shanks

66

70

38

Both Thighs

63

75

26

Left Shank

62

70

31

Left Thigh

63

69

24

Lumbar

61

68

21

Right Thigh

63

70

27

Right Shank

62

69

45

number of instances (i.e. equal number of acceptable and aberrant repetitions in binary classification), random instances of the over-represented class(es) were removed in order to balance the training data. The held out data is used to assess the classifier’s■■Formel 1■■ ability to correctly categorise new data it is presented with. The use of LOSOCV ensures that there is no biasing of the classifiers, because the test subjects data is completely unseen by the classifier prior to testing. The quality of the personalised exercise ■■Formel 2■■established using classification systems was leave-one-out-cross-validation and a random forests classifier with 128 trees. Each repetition corresponds to one fold of the ■■Formel 1■■ cross validation. At each fold, one repetition is held out as test data while the random forests classifier is trained with the ■■Formel 3■■ same participant’s other completed repeti■■Formel 1■■ tions. Where each class in the training data did not have an equal number of instances (i.e. equal number of acceptable and aber■■Formel 2■■ rant repetitions in binary classification), random instances of the over-represented class(es) were removed in order to balance the training data. The held out data is used ■■Formel 2■■ to assess the classifier’s ability to correctly categorise new data it is presented with. Participants were not included for this ■■Formel 3■■ analysis if they did not have at least 2 repetitions belonging to each class being classified as this would not allow for training and test data for that class. ■■Formel 3■■ The scores used to measure the quality of classification were total accuracy, average sensitivity and average specificity. Accuracy is the number of correctly classified

Table 3 Overall accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOSOCV using global classifiers.

repetitions of all the exercises divided by the total number of repetitions completed; this is calculated as the sum of the true positives (TP) and true negatives (TN) divided by the sum of the true positives, false positives (FP), true negatives and false negatives (FN):

‫ ݕܿܽݎݑܿܿܣ‬ൌ

ܶܲ ൅ ܶܰ ܶܲ ൅ ‫ ܲܨ‬൅ ܶܰ ൅ ‫ܰܨ‬

In binary classification acceptable technique was considered the ‘positive’ class and aberrant technique was considered the ‘negative’ class. As such, single sensitivity ܶܲ and specificity values were computed to esܵ݁݊‫ ݕݐ݅ݒ݅ݐ݅ݏ‬ൌ tablish binary classification ܶܲquality ൅ ‫ܰܨ‬for each IMU combination. In multi-label classification, the sensitivity and ܶܲ specificity ൅ ܶܰ were ‫ݕܿܽݎݑܿܿܣ‬ calculated forൌeach of the six ܶܲ ൅ ‫ ܲܨ‬൅class ܶܰ labels ൅ ‫ܰܨ‬as outlined in ▶ Table 1. Each label was sequentially treated as the ‘positive’ class, and ܶܰ deviation then the mean and standard ܵ‫ݕݐ݂݅ܿ݅݅ܿ݁݌‬ ൌܶܲ ൅ ܶܰ ‫ݕܿܽݎݑܿܿܣ‬ ൌ across the six values wasܶܰ taken. ‫ܲܨ‬ Sensitivity ܶܲ ൅ ‫ ܲܨ‬൅൅ܶܰ ൅ ‫ܰܨ‬ and specificity were computed using the formulas below. Sensitivityܶܲ measures the ܵ݁݊‫ݕݐ݅ݒ݅ݐ݅ݏ‬ ൌ at identifying a effectiveness of a classifier ܶܲ ൅ ‫ܰܨ‬ desired label, while specificity measures the classifier’s ability to detect other labels.

ܵ݁݊‫ ݕݐ݅ݒ݅ݐ݅ݏ‬ൌ

ܶܲ ܶܲ ൅ ‫ܰܨ‬

ܵ‫ ݕݐ݂݅ܿ݅݅ܿ݁݌‬ൌ

ܶܰ ܶܰ ൅ ‫ܲܨ‬

In addition to these measures, receiver operating characteristic (ROC) curves were ܶܰ plottedܵ‫ݕݐ݂݅ܿ݅݅ܿ݁݌‬ to compare the ൌ quality of global ܶܰ ൅ ‫ܲܨ‬ and individualised binary classifiers. A

single ROC curve was created for individualised classifiers and global classifiers by pooling the true label score and predicted labels together for all participants. The MATLAB ‘perfcurve’ function was used to generate the X and Y points for both ROC curves (https://uk.mathworks.com/help/ stats/perfcurve.html). In reviewing the accuracy, sensitivity and specificity scores produced by each classifier, 90% or higher was considered an ‘excellent’ quality result, 80%-89% was considered a ‘good’ quality result, 60–79% was considered a ‘moderate’ result and anything less than 59% was deemed a poor result. The authors chose these values after reviewing the aforementioned literature on identifying deviations from acceptable exercise performance using data derived from IMUs. In reviewing such literature, an existing accepted standard for an excellent, good, moderate or poor classifier could not be found. Therefore, the above system was agreed on by the authors to facilitate interpretation of results.

4. Results ▶Table 2 shows the total number of repeti-

tions collected for each class, as labelled by the Chartered Physiotherapist. For binary classification, there were 884 acceptable repetitions and 606 aberrant repetitions recorded. ▶ Table 3 demonstrates the accuracy, sensitivity and specificity of the global classification methods in binary classification. ▶ Table 4 shows the total accuracy, mean sensitivity and mean specificity of the global classification methods in multiclass classification (detection of exact deviation). ▶Table 5 demonstrates the mean accuracy, sensitivity and specificity scores for each individual participant’s personalised barbell squat technique binary classifier that was evaluated with LOOCV. ▶Figure 3 shows an ROC curve for all participants when both global and individualised classification methodologies were used for a binary classification system based on data from the left thigh IMU. The area under the curve (AUC) for the global

Methods Inf Med 4/2017

© Schattauer 2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

353

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

method was 0.52 and the AUC for the personalised method was 0.98. ▶Table 6 demonstrates the mean accuracy, sensitivity and specificity scores for each individual participant’s personalised barbell squat technique multi-class classifier that was evaluated with LOOCV.

5. Discussion The aims of this study were to: (a) determine if an IMU system is capable of distinguishing between acceptable and aberrant barbell squat technique; (b) determine the capabilities of an IMU system at identifying specific natural deviations from acceptable barbell squat technique; and (c) compare a personalised (N of 1) classifier to a global classifier in identifying the above. The results of this paper indicate that an IMU system is not capable of detecting aberrant barbell squat technique using global classifiers as demonstrated by the low specificity scores (▶ Table 3). However, good levels of accuracy, sensitivity and specificity are achieved using a personalised classifier (▶ Table 5). Similarly, the ability of an IMU system to identify specific deviations in technique is poor using a global classifier (▶ Table 4) however these results are improved to moderate levels using a personalised classifier (▶ Table 6). To the best of the authors’ knowledge this is first paper to demonstrate the ability of an IMU system to identify natural deviations during performance of the barbell squat exercise. To date there has been a lack of research investigating the ability of IMU systems to classify technique in lower limb compound exercises. Whilst global classification techniques replicating those used in this paper have been shown to successfully classify naturally occurring deviations in the single leg squat [21, 36], they were shown to be ineffective in classifying barbell squat technique. Additionally, we have demonstrated that a personalised classifier out performs a global classifier in assessing barbell squat technique (▶ Figure 3, ▶ Tables 3–6). This is likely due to a number of factors. As outlined in ▶ Table 2 the number of acceptable repetitions far outnumbers any other label. This unbalanced data set makes it difficult to create

Table 4 Overall accuracy, average sensitivity and average specificity in multi-label classification (exact deviation) for each combination of IMUs following LOSOCV using global classifiers.

Table 5 Average accuracy, sensitivity and specificity in binary classification (acceptable or aberrant technique) for each combination of IMUs following LOOCV using personalised, N of 1 classifiers.

Sensor(s)

Accuracy (%)

Sensitivity (%) Specificity (%)

All 5 Sensors

59

24

84

Lumbar & Shanks

57

25

85

Lumbar & Thighs

57

22

84

Both Shanks

53

20

85

Both Thighs

52

15

82

Left Shank

48

19

85

Left Thigh

48

15

82

Lumbar

52

19

83

Right Thigh

51

14

82

Right Shank

55

21

86

Sensor(s)

Accuracy (%) ± SD

Sensitivity (%) Specificity (%) ± SD ± SD

All 5 Sensors

82 ± 13

83 ± 14

84 ± 14

Lumbar & Shanks

80 ± 14

81 ± 16

82 ± 14

Lumbar & Thighs

82 ± 12

82 ± 13

87 ± 11

Both Shanks

79 ± 16

80 ± 19

81 ± 15

Both Thighs

83 ± 11

84 ± 12

88 ± 12

Left Shank

79 ± 6

81 ± 17

80 ± 20

Left Thigh

81 ± 13

81 ± 13

84 ± 16

Lumbar

80 ± 14

81 ± 15

83 ± 16

Right Thigh

80 ± 16

84 ± 12

82 ± 17

Right Shank

80 ± 15

78 ± 17

82 ± 15

global classifiers that can be used for all individuals [24, 25]. As many deviations were seen sporadically, the use of a global classifier to identify specific deviations in the barbell squat may require the collection of a data set consisting of larger amounts of each deviation. The inter-subject variability in movement patterns that are considered acceptable in barbell squat technique may also exceed the intra-subject variability between acceptable technique and aberrant technique. This would make the creation of global classifiers exceptionally difficult. It is likely that this is not the case for the single leg squat and hence global classification methodologies worked better for classifying deviations in this exercise. It is difficult to directly compare results with previous work in the area due to differences in exercises investigated, sensor positions and classifier techniques employed. However, the results presented in

7

this paper using a personalised classifier compare favourably to other research in the area [16–19]. The majority of research to date has investigated the ability of IMU systems to monitor technique in simple exercises such as straight leg raises [16], dumbbell curls [18], or heel slides [19]. This paper describes an evaluation of an IMU system’s ability to quantify barbell squat technique, a more complex exercise that involves multiple joints. This system has also demonstrated the ability to identify a total of seven different classes (▶ Table 2). The lower number of classes in some of the studies [16, 18, 19] may make it easier for classifiers to identify specific deviations and subsequently produce higher accuracy, sensitivity and specificity scores. However, it must be noted that all of these systems used a global classifier in distinguishing between exercise technique and many of the studies classified deviations that were de-

© Schattauer 2017

Methods Inf Med 4/2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

354

8

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

Figure 3 ROC curves comparing binary classification systems when using global and personalised classification methodologies using data from the left thigh IMU. ‘Acceptable‘ technique was considered the ‘true’ class.

liberately induced. As shown in ▶ Table 4 the ability of a global classifier to identify specific deviations in barbell squat technique is poor. Therefore, a personalised classifier may be more suitable when assessing this exercise in a clinical setting where technique deviations are natural. The results presented in ▶ Table 5 and ▶Table 6 show that a single IMU system is comparable to a multiple IMU system in determining barbell squat technique using a personalised classifier. Multiple IMU systems are more expensive than a single IMU system due to the need to purchase additional sensors. Furthermore, they are less practical for end users as there is an increased risk of placement error in addition to power usage and BluetoothTM connectivity issues. For these reasons a reduced IMU

set-up is more desirable for daily environment applications [37]. Therefore, the single IMU system results presented in this paper increase the likelihood of clinical adoption. A personalised classifier offers a number of benefits compared to a global classifier when assessing barbell squat technique. Most obviously, the higher levels of accuracy would mean an improved user experience in a clinical setting. A personalised classifier also allows for analysis to be performed on data sets that are unbalanced, like the one shown in ▶ Table 2. Furthermore, personalised classifiers are also more computationally efficient than global classifiers as they are developed using less training data and therefore

Sensor(s)

Accuracy (%) ± SD

Sensitivity (%) Specificity (%) ± SD ± SD

All 5 Sensors

70 ± 20

73 ± 17

88 ± 12

Lumbar & Shanks

69 ± 20

71 ± 18

90 ± 8

Lumbar & Thighs

70 ± 17

70 ± 15

87 ± 9

Both Shanks

70 ± 18

71 ± 17

89 ± 7

Both Thighs

70 ± 16

72 ± 13

88 ± 11

Left Shank

67 ± 20

71 ± 17

86 ± 12

Left Thigh

69 ± 18

70 ± 18

89 ± 9

Lumbar

67 ± 20

70 ± 19

89 ± 10

Right Thigh

70 ± 16

72 ± 13

86 ± 12

Right Shank

67 ± 20

71 ± 15

88 ± 8

Table 6 Overall accuracy, average sensitivity and average specificity in multi-label classification (exact deviation) for each combination of IMUs following LOOCV using personalised, N of 1 classifiers.

require less memory. This would improve processing time and increase battery life. The main disadvantage associated with a personalised classifier is that the user must collect and label data sets from individual patients. This means clinicians must monitor exercise technique in real time or use post-hoc video analysis and label this appropriately. This may prove time consuming. Furthermore, this does not lend itself to a ‘set-up and go’ approach that involves minimal interaction with the user interface, which is more preferable for endusers [8]. However, as clinicians often monitor exercise technique prior to allowing patients complete their exercises it may fit into clinical practice without issue, with clinicians labelling repetitions as they analyse exercise completion. Furthermore, the labelled data set developed using this method could be used to build global classifiers better equipped at identifying natural deviations in the future. This is because all labelled data that is collected by practitioners could be stored and used to build the large data set necessary to improve global classifier scores. A challenging aspect of this work is to ascertain whether the results presented in this paper are sufficient for real-life applications. It is likely that the classification accuracy achieved using a global classifier is too low for use in healthcare environments, while those produced by a personalised classifier may be acceptable. However, it is important to note that what is considered an acceptable level of classification accuracy is likely to be influenced by application domain (injury rehabilitation, strength and conditioning, musculoskeletal injury risk screening, etc.) and end user profile (rehabilitation professionals, sports coaches, strength and conditioning staff, recreational gym users). Our research team is undertaking further projects to determine usability, functionality and user perceptions of wearable technology to assess exercise biomechanics. This information is being gathered from a range of professionals and patients, who incorporate exercises such as the barbell squat in their rehabilitation programme, exercise routine and injury risk screening protocols. It is envisaged that this will provide greater indication as to the levels of accuracy end users

Methods Inf Med 4/2017

© Schattauer 2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

355

D. F. Whelan et al.: Personalised vs. Global Classification of Squat

would define as acceptable. Furthermore, this work will contribute new information regarding how best to provide actionable feedback to these users that allows for safe and effective exercise completion.

6. Conclusion

6.

7.

Our results show that a system based on data derived from body worn IMUs can classify acceptable and aberrant barbell squat biomechanics with good overall accuracy, sensitivity and specificity using a personalised classifier. These classification scores are maintained even with a single IMU. The ability to identify specific deviations is more difficult but can be achieved with a moderate level of overall accuracy using a personalised classifier. Our results are comparable with other research in the area, despite the barbell squat being a more complex exercise then many of those previously investigated. However, most of this research has been carried out using global classifiers. While this may allow for less user interaction, it produces poor levels of accuracy when attempting to identify specific natural deviations during performance of the exercise. As a result, the use of a personalised classifier may be more appropriate for identifying natural deviations in barbell squat technique.

References

8.

9.

10.

11.

12.

13.

14.

1. Cook G, Burton L, Hoogenboom B. Pre-participation screening: the use of fundamental movements as an assessment of function – part 1. North American Journal of Sports Physical Therapy 2006; 1(2): 62–72. 2. Hall M, Nielsen JH, Holsgaard-Larsen A, Nielsen DB, Creaby MW, Thorlund JB. Forward lunge knee biomechanics before and after partial meniscectomy. The Knee 2015; 22(6): 506–509. 3. Ahmadi A, Mitchell E, Destelle F, Gowing M, O’Connor NE, Richter C, et al. Automatic Activity Classification and Movement Assessment During a Sports Training Session Using Wearable Inertial Sensors. Proceedings of the 11th International Conference on Wearable and Implantable Body Sensor Networks (BSN): IEEE; 2014. p. 98–103. 4. Bonnechere B, Jansen B, Salvia P, Bouzahouene H, Omelina L, Moiseev F, et al. Validity and reliability of the Kinect within functional assessment activities: Comparison with standard stereophotogrammetry. Gait & Posture 2014; 39(1): 593–598. 5. Bonnet V, Mazza C, Fraisse P, Cappozzo A. Realtime estimate of body kinematics during a planar

15.

16.

17.

18.

squat task using a single inertial measurement unit. IEEE Transactions on Biomedical Engineering 2013; 60(7): 1920–1926. Whiteside D, Deneweth JM, Pohorence MA, Sandoval B, Russell JR, McLean SG, et al. Grading the functional movement screen: A comparison of manual (real-time) and objective methods. The Journal of Strength & Conditioning Research 2016; 30(4): 924–933. McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Engineering 2012; 15(4): 207–213. Morris D, Saponas TS, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: ACM; 2014. p. 3225–3234. Leardini A, Lullini G, Giannini S, Berti L, Ortolani M, Caravaggi P. Validation of the angular measurements of a new inertial-measurement-unit based rehabilitation system: comparison with state-of-the-art gait analysis. Journal of Neuroengineering and Rehabilitation 2014; 11(1): 1–7. Tang Z, Sekine M, Tamura T, Tanaka N, Yoshida M, Chen W. Measurement and Estimation of 3D Orientation using Magnetic and Inertial Sensors. Advanced Biomedical Engineering 2015; 4: 135–143. Muehlbauer M, Bahle G, Lukowicz P. What can an arm holster worn smart phone do for activity recognition? Proceedings of the International Symposium on Wearable Computers (ISWC): IEEE; 2011. p. 79–82. Chang K-H, Chen MY, Canny J. Tracking freeweight exercises. In: Krumm J, Abowd GD, Seneviratne A, Strang T, editors. UbiComp 2007: Ubiquitous Computing. Berlin, Heidelberg: Springer; 2007. p. 19–37. Seeger C, Buchmann A, Van Laerhoven K. myHealthAssistant: a phone-based body sensor network that captures the wearer’s exercises throughout the day. Proceedings of the 6th International Conference on Body Area Networks: ICST; 2011. p. 1–7. Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Personal and Ubiquitous Computing 2013; 17(4): 771–782. Pernek I, Kurillo G, Stiglic G, Bajcsy R. Recognizing the intensity of strength training exercises with wearable sensors. Journal of Biomedical Informatics 2015; 58: 145–155. Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC): IEEE; 2012. p. 2214–2218. Melzi S, Borsani L, Cesana M. The virtual trainer: supervising movements through a wearable wireless sensor network. 6th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, 2009: IEEE; 2009. p. 1–3. Velloso E, Bulling A, Gellersen H, Ugulino W, Fuks H. Qualitative activity recognition of weight lifting exercises. Proceedings of the 4th Aug-

19.

20.

21.

22.

23.

24.

25. 26.

27.

28. 29. 30.

31. 32.

9

mented Human International Conference: ACM; 2013. p. 116–123. Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. Journal of Neuroengineering and Rehabilitation 2014; 11(1): 158–168. O’Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T, et al. Evaluating squat performance with a single inertial measurement unit. IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN): IEEE; 2015. p. 1–6. Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Single Leg Squat Exercise with a Single Inertial Measurement Unit. Proceedings of the 3rd Workshop on ICTs for Improving Patients Rehabilitation Research Techniques: ACM; 2015. p. 144–147. Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating Performance of the Lunge Exercise with Multiple and Individual Inertial Measurement Units. Pervasive Health 10th EAI International Conference on Pervasive Computing Technologies for Healthcare 2016. p. 101–108. Whelan D, O’Reilly M, Huang B, Giggins O, Kechadi T, Caulfield B. Leveraging IMU data for accurate exercise performance classification and musculoskeletal injury risk screening. IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC); 2016. p. 659–662. Chawla NV. Data mining for imbalanced datasets: An overview. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. New York: Springer; 2005. p. 853–867. He H, Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 2009; 21(9): 1263–1284. Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques. In: Maglogiannis I, Karpouzis K, Wallace M, Soldatos J, editors. Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies. Amsterdam: IOS Press; 2007. p. 3–25. Baechle TR, Earle RW. Resistance Training Exercise Techniques. In: Baechle TR, Earle RW, editors. NSCA’s Essentials of Personal Training. 1st ed. Champaign, IL: Human Kinetics; 2004. p. 295–332. Jerri AJ. The Shannon sampling theorem – Its various extensions and applications: A tutorial review. Proceedings of the IEEE 1977; 65(11): 1565–1596. Shimmer 9DOF calibration [cited 2017 March 13]. Available from: http://www.shimmersensing.com/ shop/shimmer-9dof-calibration. Madgwick SOH, Harrison AJL, Vaidyanathan R. Estimation of IMU and MARG orientation using a gradient descent algorithm. Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR): IEEE; 2011. p. 1–7. Katz MJ, George EB. Fractals and the analysis of growth paths. Bulletin of Mathematical Biology 1985; 47(2): 273–286. Single-level discrete 1-D wavelet transform [cited 2017 March 13]. Available from: http://uk.mathworks.com/help/wavelet/ref/dwt.html.

© Schattauer 2017

Methods Inf Med 4/2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

356

10

D. F. Whelan et al.: Personalised vs. Global Classification of Squat 33. Breiman L. Random forests. Machine Learning 2001; 45(1): 5–32. 34. Mitchell E, Ahmadi A, O’Connor NE, Richter C, Farrell E, Kavanagh J, et al. Automatically detecting asymmetric running using time and frequency domain features. Proceedings of the 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN): IEEE; 2015. p. 1–6.

35. Fushiki T. Estimation of prediction error by using K-fold cross-validation. Statistics and Computing 2011; 21(2): 137–146. 36. Whelan D, O’Reilly M, Ward T, Delahunt E, Caulfield B. Technology in Rehabilitation: Evaluating the Single Leg Squat Exercise with Wearable Inertial Measurement Units. Methods of Information in Medicine 2016; 55(6): 88–94.

37. Bonnet V, Mazza C, Fraisse P, Cappozzo A. A least-squares identification algorithm for estimating squat exercise mechanics using a single inertial measurement unit. Journal of Biomechanics 2012; 45(8): 1472–1477.

Methods Inf Med 4/2017

© Schattauer 2017 Print proof! Publication, duplication or distribution (also online) is prohibited!

357

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

Original Paper

Mobile App to Streamline the Development of Wearable Sensor-Based Exercise Biofeedback Systems: System Development and Evaluation Martin O'Reilly1,2, BEng (Hons), HDip; Joe Duffin1, BSc; Tomas Ward3,4, BE, MEngSc, PhD; Brian Caulfield1,2, BSc, MSc, PhD 1

Insight Centre for Data Analytics, University College Dublin, Belfield, Ireland

2

School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland

3

Biomedical Engineering Research Group, Department of Electronic Engineering, Maynooth University, Maynooth, Ireland

4

Insight Centre for Data Analytics, Maynooth University, Maynooth, Ireland

Corresponding Author: Martin O'Reilly, BEng (Hons), HDip Insight Centre for Data Analytics University College Dublin 3rd Floor, O'Brien Centre for Science Science Centre EAST Belfield, D4 Ireland Phone: 353 871245972 Fax: 353 17162667 Email: [email protected]

Abstract Background: Biofeedback systems that use inertial measurement units (IMUs) have been shown recently to have the ability to objectively assess exercise technique. However, there are a number of challenges in developing such systems; vast amounts of IMU exercise datasets must be collected and manually labeled for each exercise variation, and naturally occurring technique deviations may not be well detected. One method of combatting these issues is through the development of personalized exercise technique classifiers. Objective: We aimed to create a tablet app for physiotherapists and personal trainers that would automate the development of personalized multiple and single IMU-based exercise biofeedback systems for their clients. We also sought to complete a preliminary investigation of the accuracy of such individualized systems in a real-world evaluation. Methods: A tablet app was developed that automates the key steps in exercise technique classifier creation through synchronizing video and IMU data collection, automatic signal processing, data segmentation, data labeling of segmented videos by an exercise professional, automatic feature computation, and classifier creation. Using a personalized single IMU-based classification system, 15 volunteers (12 males, 3 females, age: 23.8 [standard deviation, SD 1.8] years, height: 1.79 [SD 0.07] m, body mass: 78.4 [SD 9.6] kg) then completed 4 lower limb compound exercises. The real-world accuracy of the systems was evaluated. Results: The tablet app successfully automated the process of creating individualized exercise biofeedback systems. The personalized systems achieved 90.00% (1080/1200) accuracy, with 90.00% (1080/1200) sensitivity and 89.00% (1068/1200) specificity for assessing aberrant and acceptable technique with a single IMU positioned on the left thigh. Conclusions: A tablet app was developed that automates the process required to create a personalized exercise technique classification system. This tool can be applied to any cyclical, repetitive exercise. The personalized classification model displayed excellent system accuracy even when assessing acute deviations in compound exercises with a single IMU. (JMIR Rehabil Assist Technol 2017;4(2):e9) doi:10.2196/rehab.7259 KEYWORDS exercise therapy; biomedical technology; lower extremity; physical therapy specialty

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.1 (page number not for citation purposes)

358

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

Introduction Background Exercise rehabilitation for the treatment of musculoskeletal conditions such as osteoarthritis, following an injury or orthopedic surgical procedures, is accepted as an essential treatment tool [1-3]. Resistance training may also be used to improve one’s muscular strength, hypertrophy, and power in nonpatient populations [4-6]. However, many people completing exercise programs encounter a variety of difficulties when performing their exercises without the supervision of a trained exercise professional such as a physiotherapist or strength and conditioning (S&C) coach. One such difficulty is that in many circumstances, people may execute their exercises incorrectly [7,8]. Incorrect alignment during exercise, incorrect speed of movement, and poor quality of movement may have an impact on the efficacy of exercise and may therefore result in a poor outcome [7,8]. It is therefore essential that accurate assessment of exercise performance is available to ensure that people perform their exercises properly. This is particularly necessary in cases where an individual completes their exercise program in the absence of an exercise professional’s supervision, for example, during home-based rehabilitation programs or S&C programs where the person performing the exercises cannot afford a personal trainer. Recent research has shown inertial measurement unit (IMU)–based biomechanical biofeedback systems to be an accurate exercise assessment tool. Biomechanical biofeedback involves (1) the measurement of one’s movement, postural control, or force output and (2) the provision of feedback to the user regarding these measurements [9]. IMUs are able to acquire data pertaining to the linear and angular motion of individual limb segments and the center of mass of the body. They are small, inexpensive, and easy to set up, and facilitate the acquisition of human movement data in unconstrained environments [10]. Research in this field has shown the ability of multiple body-worn IMUs to evaluate exercise quality for a variety of exercises [11-14]. These range from early-stage rehabilitation exercises such as heel slides and straight leg raises [15] to more complex late-stage rehabilitation exercises or S&C exercises such as bodyweight squats [16], lunges [17], and single-leg squats [18-20]. More cost-effective and practical systems using a single body-worn IMU have also been shown to be effective in the analysis of exercise technique [17,18,21,22]. Systems that are based on a single IMU are considered preferential, as they can provide equivalent exercise analysis quality to multiple IMU setups at a lower cost. However, in a number of cases, a single IMU setup achieves lower quality exercise analysis levels than multiple IMU setups. The ability of a single IMU setup to detect acute naturally occurring technique deviations in compound late-stage rehabilitation and S&C exercises such as deadlifts, lunges, and squats is also largely unknown; although this has been shown as possible for single-leg squats [18], the reported findings on lunges and squats pertain to detecting deliberately induced exercise technique deviations [16,17]. There is also a need to iteratively improve the accuracy, sensitivity, and specificity of http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

O'Reilly et al IMU-based exercise technique biofeedback systems and increase the number of exercises that can be analyzed with IMUs. IMU-based exercise biofeedback systems should be able to assess technique for a comprehensive range of exercises, both accurately and in a manner that is practical for people completing the exercises. There are a number of considerable challenges in the creation of such biofeedback systems. First, for machine learning classification algorithms to produce desirable results, they require large volumes of training data. As such, it is difficult to collect IMU data on a large variety of exercises in a research environment. Subsequently, current research has mainly assessed very commonly completed exercises that span the scope of musculoskeletal screening, rehabilitation, and S&C. There remain thousands of exercises for which the ability of IMUs to assess their technique is unknown. Classification algorithms such as random forests and logistic regression also require balanced training datasets, where each class (eg, acceptable or aberrant) has the same amount of instances in the training data [23-25]. This provides a huge challenge in creating systems that aim to detect natural technique deviations that occur idiosyncratically and at greatly differing frequencies. This challenge is heightened in circumstances where the intersubject variation of completing an exercise with acceptable form exceeds the intrasubject variation between one’s acceptable and aberrant form. One solution to combatting the aforementioned challenges may be to create individualized exercise classification systems. In this circumstance, a classifier is created using training data solely from the person whose exercise is to be assessed. Preliminary research has shown that such classifiers can produce superior accuracy as compared with global classification systems [26,27]. Additionally, some global classification systems have only been developed and evaluated with deliberately induced technique deviations [16,17]. Personalized systems may allow for many more exercises to be evaluated for a particular person performing the exercises and could allow for acute naturally occurring technique deviations to be detected with a single body-worn IMU where this has not been previously possible. The classifiers would also be less memory intensive and more efficient, as they are developed using smaller training datasets. However, to the best of the authors’ knowledge, there is a lack of tools currently available to efficiently capture and label IMU data during exercise to enable the efficient development of personalized exercise technique classification systems.

Objectives Therefore, the purpose of this investigation was to create a tablet app that enabled efficient creation of personalized single IMU-based exercise biofeedback systems. We also sought to investigate the accuracy of this personalized system in a real-world evaluation using a sample of 4 compound lower limb exercises (lunges, single-leg squats, squats and deadlifts) in 15 participants. In this paper, an overview of the developed app is first presented. An experimental evaluation of the system in the real world is then described.

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.2 (page number not for citation purposes)

359

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

Methods System Overview In exercise classification with IMUs, there exist a number of universal steps that allow for the development of exercise biofeedback systems [28]. First, IMU data must be collected from participants as they exercise. Each repetition of each exercise must be labeled by an exercise professional. The signals collected from the IMU must be filtered to eliminate unwanted noise, and additional signals may be computed that, for instance, describe the IMU’s three-dimensional (3D) orientation. The signals are segmented into epochs, each of which pertains to

O'Reilly et al

one repetition of an exercise. Features are computed from these segmented signals as described in the upcoming “Feature Computation and Classifier Creation” section. Finally, a classification model is trained using both the labels provided by an exercise professional and the features computed from the sensor signals that pertain to the same repetitions (Figure 1). The tablet app, presented in this paper, allows for simultaneous IMU and video data capture. It then allows labeling of each IMU data epoch through reviewing its associated video epoch. Features are then automatically computed from the IMU signal epochs, and classifiers are built using these features and the labels provided by the exercise professional.

Figure 1. Steps involved in the development of an inertial measurement unit (IMU)–based exercise classification system.

Overview of Data Collection Tool The tablet app was developed using Android Studio (Android, Google) and ran on a Samsung Galaxy S2 tablet. It contains a number of tabs that enable a vast degree of functionality to enable the automated creation of personalized classification systems. Figure 2 demonstrates the processes involved and highlights the need for data labeling from an exercise professional. The various tabs within the app are demonstrated in Figure 3. The system can connect to a maximum of 5 Shimmer (Shimmer sensing) IMUs [29] and stream synchronized data from them simultaneously. All IMUs were automatically configured to stream triaxial accelerometer (±2

g), gyroscope (±500°/s), and magnetometer (±1.9 Ga) data at 51.2 Hz. These values were chosen, as they have previously been shown to be appropriate for the analysis of rehabilitation exercise with IMUs [15,18,19]. However, the sampling rate and sensor ranges may be insufficient for faster exercises such as jumping or plyometric exercises. Future iterations of the system will address this by allowing the exercise professional to select sampling rate and sensor ranges based on exercise type before data collection. For this study, the IMU was calibrated by the lead investigator of this study. This took roughly 10 min. The app then allows for the automation of all the aforementioned steps in the development of an exercise technique classifications system as shown in Figures 1 and 2.

Figure 2. Schematic demonstrating the flow and functionality of the tablet app.

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.3 (page number not for citation purposes)

360

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

Figure 3. Home screen of tablet app, demonstrating its variety of functions.

Video and IMU Data Collection Following sensor setup, navigating to the “Record a New Session” tab allows an exercise professional to take a video of their client as they exercise, as data from the IMUs are

simultaneously collected. The video is captured at the tablet’s natural sampling rate, and IMU data are collected at 51.2 Hz (Figure 4). The exercise professional may choose to record their client from the frontal or sagittal plane depending on the exercise being evaluated.

Figure 4. Data capture part of the app that allows IMU (inertial measurement units) data and video to be captured simultaneously.

Signal Processing and Segmentation Following the recording of a set of a particular exercise, a number of steps were completed by the app in processing the IMU data. To ensure that the data analyzed applied to each participant’s movement and to eliminate unwanted high-frequency noise, 6 signals were low-pass filtered at fc=20 Hz using a Butterworth filter of order n=8. Nine additional signals were then calculated. The 3D orientation of the IMU http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

was computed using the gradient descent algorithm developed by Madgwick et al [30]. The resulting quaternion values (W, X, Y, and Z) were then converted to pitch, roll, and yaw signals. The pitch, roll, and yaw signals describe the inclination, measured in radians, of each IMU in the sagittal, frontal, and transverse planes, respectively. The magnitude of acceleration was also computed using the vector magnitude of accelerometer x, y, and z. The magnitude of acceleration describes the total acceleration of the IMU in any direction. This is the sum of the JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.4 (page number not for citation purposes)

361

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

magnitude of inertial acceleration of the IMU and acceleration due to gravity. Additionally, the magnitude of rotational velocity was computed using the vector magnitude of gyroscope x, y, and z. Although these magnitude signals do not allow for specific body segment planes to be analyzed, they can aid in capturing detection of aberrant movement when deviations are very pronounced or occur in multiple planes. The signals and video data were then programmatically segmented into epochs that relate to single full repetitions of the completed exercises. Many algorithms are available to segment human motion for rehabilitation exercises, including the sliding window algorithm [31]; top-down, bottom-up algorithms [32]; zero velocity–crossing algorithms; template-base matching methods [33]; and the combination algorithms of the above [34]. These algorithms have advantages and disadvantages. For the purpose of the creation of a functioning classifier creation tool, a simple peak-detection algorithm was used on the gyroscope signal with the largest amplitude for any particular exercise. The start and end points of each repetition can then be found by looking for the

O'Reilly et al

corresponding zero-crossing points of the gyroscope signal leading up to and following the location of a peak in the signal. Figure 5 demonstrates example results of the segmentation algorithm used on the gyroscope Z signal, from an IMU positioned on the left thigh during 3 repetitions of the deadlift exercise. Following the signal processing and segmentation of the IMU data, the video was cut into epochs based on the start and end points of repetitions found in the IMU data. The session name, exercise name, repetition number, IMU data, and video data for each individual exercise repetition were stored as objects in a database. The specific signal processing and segmentation processes selected were chosen based on their demonstrated capability in similar research [16-19]. In future iterations of the app, a variety of additional signal processing and segmentation options may be presented to the exercise professionals using the system, or the functions will be updated to match the emerging state of the art.

Figure 5. Plot showing detection of peak, start, and end points of repetitions through identifying neighboring zero-crossing values to the peak locations. The signal shown is the gyroscope Z signal from the left thigh during 3 repetitions of a deadlift.

Data Labeling The app enables a number of different functionalities regarding data labeling. The exercise professional using the tablet app first has the ability to add new exercises and technique deviations as possible labels for the stored and segmented data. These labels also become available to the exercise professional when they record new exercise sessions. The exercise professional then has the option of labeling the videos, repetition-by-repetition, through viewing them according to the filter criteria “session name” or by “exercise type.” The default class for all repetitions is “Acceptable” until they are labeled as “Aberrant” or as a specific deviation from an acceptable technique. An unlimited number of possible labels can be created for each exercise.

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

Once data have been collected for each exercise, there is also an “Auto-label” function. This function uses data already labeled by the exercise professional to build a random forests classifier, which estimates the class for currently unlabeled data. As shown in Figure 6, the app then presents the classifier’s predicted label with the video of the repetition and allows the exercise professional to either keep the prediction or ignore the prediction. If the prediction is ignored, the repetition can then manually be labeled in the “review by exercise” or “review by session” tab. The database can also be manually updated at any time, allowing the exercise professional to remove particular repetitions or edit the current label for it. Figure 6 highlights the app’s various data-labeling functionalities.

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.5 (page number not for citation purposes)

362

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

Figure 6. Various data labeling functionalities of the app.

Feature Computation and Classifier Creation Once the data have been labeled as desired by the exercise professional, the app can then build the personalized exercise technique classification objects for each client and each exercise they completed. A separate classifier is created for each different exercise. Time-domain and frequency-domain descriptive features are computed to describe the pattern of each of the 18 signals when the 5 different exercises were completed. These features were, namely, “Mean,” “RMS,” “Standard Deviation,” “Kurtosis,” “Median,” “Skewness,” “Range,” “Variance,” “Max,” “Min,” “Energy,” “25th Percentile,” “75th Percentile,” “Level Crossing Rate,” “Fractal Dimension” [35], and the “variance of both the approximate and detailed wavelet coefficients using the Daubechies 5 mother wavelet to level 7.” This resulted in 17 features for each of the 18 available signals, producing a total of 306 features per IMU. Training data are balanced to ensure the developed classifiers are unbiased. This is done by removing random observations of overrepresented classes until all classes have an equal number of observations. For instance, if a labeled dataset of squat repetitions has 50 “acceptable” repetitions and 40 “aberrant” repetitions, 10 “acceptable” repetitions, which are chosen randomly using a programmatic method, will not be used to train the classifier. Finally, the app builds random forests classifier objects with 400 trees.

System Evaluation Participants Fifteen volunteers currently not undergoing any rehabilitation participated, whereby no participant had a current or recent musculoskeletal injury that would impair their exercise performance. Participants were recruited via poster advertisements on notice boards in the local area and were, therefore, a sample of convenience. Of these, 5 participants were beginner exercisers who had been screened to have naturally aberrant technique and were untrained in the exercises in the study, whereas 10 participants were experienced with the exercises and were required to deliberately mimic aberrant technique at appropriate times during the experiment. Each participant signed a consent form before completing the study. The University College Dublin Human Research Ethics Committee approved the study protocol.

Experimental Protocol

The choice of features computed, balancing of training data, and use of a random forests classifier all replicate recently published work in the field [15-18]. Similar to signal processing and segmentation, these processes can be updated in future iterations of the app to match the emerging state of the art in exercise technique classification with IMUs.

The testing protocol was explained to the participants upon their arrival at the research laboratory. Their gender was recorded and their weight was measured using a weighing scale. Height was then measured with a stadiometer. All participants completed a 5-min warm-up on an exercise bike, during which they were required to maintain a power output of 100W and cadence of 75 to 85 revolutions per minute. Following the warm-up, an investigator placed a single IMU on the participant at the midpoint of the left femur (determined as halfway between the greater trochanter and lateral femoral condyle). The orientation and location of the IMU was consistent across all study participants. The IMU sampling rate and sensor range settings used were identical to those described in the “Overview of tool” section.

The developed classifier objects can then be exported from within the tablet app to individual’s exercise biofeedback apps on their mobile phones for use in monitoring their rehabilitation exercise programs.

Video and IMU data were then simultaneously collected as the participant completed 4 of the following exercises: bodyweight left leg, single-leg squats; bodyweight lunges; bodyweight or barbell squats; and barbell deadlifts. These exercises were

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.6 (page number not for citation purposes)

363

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

chosen pragmatically, as they represent compound lower limb exercises that span both the late-stage rehabilitation (knee, kip, and ankles) and S&C domains. They also cannot be easily analyzed by any existing systems. Forty repetitions of each exercise were collected; 20 repetitions were completed with “acceptable” form, whereas 20 repetitions were completed with “aberrant” form. The “aberrant” repetitions from the 5 beginners were naturally occurring, whereas the 10 experienced participants deliberately induced their “aberrant” form. Following these data collection, the IMU was removed from the participants’ left thigh.

themselves and connected the wireless Shimmer IMU to the mobile app. These steps took roughly 1 min. They then completed 2 sets of 10 repetitions for each of the 4 exercises. In the first set of each exercise, they were instructed to exercise with their best possible technique, and in the second, they were asked to try and replicate the mistake they had made before being coached by the exercise professional. The video of the whole session was simultaneously taken, and the classifier’s predictions of the participants’ technique were stored in the background storage folders on the tablet.

As the participant rested, the exercise professional then used the segmented videos to label all exercise repetitions of the 4 exercises as being “acceptable” or “aberrant” technique (160 repetitions per participant). For each participant, 4 binary random forests classifiers were then created, each pertaining to 1 of the 4 aforementioned exercises. These random forests objects were imported into a biofeedback app. The data labeling and classifier creation took a maximum of 30 min per participant. The biofeedback app entitled “Formulift” (Figure 7) allows a person performing the exercises to connect to a Shimmer IMU, select each of the above exercises, and have their repetitions of each exercise be classified as “acceptable” or “aberrant.”

Following the participants’ use of their personalized biofeedback app, the system’s predicted labels (acceptable or aberrant) for each repetition of each exercise were stored. The videos of each repetition of each exercise were then labeled by an S&C coach with more than 5 years’ experience in visual analysis of the exercises. They were labeled as acceptable or aberrant in a systematic format. The S&C coach could view the repetitions as many times as necessary to make a clear judgment on the label. Labeling all data for each beginner participant took under 25 min and was quicker for the experienced participants as their aberrant form was deliberately induced. Example types of aberrant form that the exercise professional was looking for included knee valgus, knee varus, and asymmetry as used in similar recent research [16-19].

Following the creation of their personalized biofeedback system, the participants first secured the IMU to their left thigh by

Data Analyses

Figure 7. Screenshot from the “Formulift app,” which uses the classifiers developed from the tablet app to analyze whether a person’s exercise technique is acceptable or aberrant as they complete squats, deadlifts, lunges, and single-leg squats.

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.7 (page number not for citation purposes)

364

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

Figure 8. Formulae for: a) accuracy, b) sensitivity, and c) specificity.

The personalized, classifiers-predicted labels were then compared with the exercise professional’s labels, which were considered to be ground truth for each repetition of each exercise from each participant. Where the exercise professional had labeled a repetition as “acceptable” and the classifier predicted “acceptable,” this was counted as a true positive (TP). However, if the classifier predicted “aberrant,” in this circumstance a false negative (FN) was counted. If the exercise professional and classifier both deemed a repetition to be “aberrant,” it was counted as a true negative (TN). However, if the exercise professional deemed a repetition to be “aberrant” and the classifier predicted it as “acceptable,” this was counted as a false positive (FP).The scores used to measure the quality of classification were total accuracy, sensitivity, and specificity. Accuracy is the number of correctly classified repetitions of all the exercises divided by the total number of repetitions

completed. This is calculated as the sum of the TPs and TNs divided by the sum of the TPs, FPs, TNs, and FNs. Sensitivity measures the effectiveness of a classifier at identifying a desired label, whereas specificity measures the classifier’s ability to detect negative labels. These three metrics were used to assess the classification quality of each individual participant for each of the 4 exercises completed. The formulae for accuracy, sensitivity and specificity are shown in Figure 8.

Results Participant Demographics The demographics of the participants were as follows: 12 males, 3 females, age: 23.8 [standard deviation, SD 1.8] years, height: 1.79 [SD 0.07] m, body mass: 78.4 [SD 9.6] kg. Each participant’s characteristics are shown in Table 1.

Table 1. Participant characteristics. Type

Gender

Age, in years

Height, in meters

Weight, in kilograms

Beginner

Male

20

1.68

66.5

Beginner

Male

25

1.75

68

Beginner

Male

22

1.76

76

Beginner

Female

26

1.74

86

Beginner

Female

26

1.7

65

Experienced

Male

23

1.85

85

Experienced

Female

21

1.77

72.5

Experienced

Male

24

1.88

86

Experienced

Male

25

1.83

74

Experienced

Male

26

1.7

63

Experienced

Male

23

1.75

83

Experienced

Male

25

1.805

84

Experienced

Male

22

1.93

86

Experienced

Male

24

1.775

84

Experienced

Male

25

1.88

97

23.8 (1.8)

1.79 (0.07)

78.4 (9.6)

a

Mean (SD ) a

SD: standard deviation.

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.8 (page number not for citation purposes)

365

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

System Evaluation Results Table 2 demonstrates the mean accuracy, sensitivity, and specificity scores for all participants using their 4 personalized classifiers for each exercise under study, in the real-world evaluation as described in the “System Evaluation” section. The mean results for the 5 beginner participants who had naturally aberrant technique and for the more experienced participants who had deliberately induced technique mistakes are shown. The system was more accurate for the experienced exercisers’ group (98.59%) than the beginners’ group (88.00%) for the deadlift exercise but was otherwise more accurate for the beginners. This is particularly interesting as the beginner’s technique aberrations were naturally occurring, and the experienced group’s aberrations were deliberately induced. The system was least accurate for lunges (84.14%) and most accurate for single-leg squats (97.26%) across all participants. Accuracy

O'Reilly et al varied considerably for each individual in the lunge and squat exercises, as can be seen in the presented standard deviations (Table 2). The range of accuracies across all participants was less variable for the single-leg squat and deadlift exercises. For the single-leg squat exercise, the mean sensitivity was 98% and the mean specificity was 93%. This means the system was better at detecting acceptable single-leg squat technique than aberrant technique or that 7% of aberrant exercise repetitions were misclassified as acceptable. The system had relatively similar sensitivity and specificity in classifying lunges and deadlifts. Therefore, it would not appear biased to either the “acceptable” or “aberrant” class to an exerciser using the system. However, for the squat exercise there was a 13% chance that an acceptable repetition may be classified as aberrant and a 17% chance that an aberrant repetition may be classified as acceptable.

Table 2. Mean accuracy, sensitivity, and specificity of personalized classifiers for the binary evaluation (acceptable or aberrant technique) of each exercise and each participant. Exercise

Participants

Accuracy, mean (SDa), %

Sensitivity, mean (SD), %

Specificity, mean (SD), %

Beginners (N=5)

99.17 (1.86)

100.00 (0.00)

98.33 (3.73)

Experienced (N=10)

95.98 (6.69)

97.00 (4.83)

90.41 (15.24)

All (N=15)

97.26 (5.54)

98.00 (4.00)

93.03 (19.09)

Beginners (N=5)

92.63 (10.5)

96.67 (7.45)

88.70 (16.36)

Experienced (N=10)

77.77 (21.26)

74.07 (3.19)

83.82 (32.17)

All (N=15)

84. 14 (18.96)

83.11 (27.49)

85.78 (20.85)

Beginners (N=5)

84.83 (16.58)

75.00 (35.47)

95.00 (5.00)

Experienced (N=10)

82.71 (15.43)

90.98 (15.25)

74.44 (32.01)

All (N=15)

84.53 (16.38)

87.06 (27.53)

82.67 (29.00)

Beginners (N=5)

88.00 (8.16)

84.00 (16.25)

90.00 (2.00)

Experienced (N=10)

98.59 (2.71)

98.15 (3.55)

98.99 (2.86)

All (N=15)

94.81 (7.93)

93.10 (13.35)

95.78 (14.35)

Single leg squats

Lunges

Squats

Deadlifts

a

SD: standard deviation.

Discussion

systems tailored to their client’s exercise needs and specific movement patterns.

System Development

There are a number of notable benefits to taking an individualized analysis approach to the development of IMU-based exercise technique analysis systems. Recent work has shown such systems to be more accurate and computationally efficient than global classifier [27]. The development of global classifiers is extremely time-intensive and requires hundreds of hours of data collection and analysis by researchers. Data must be collected in such fashion for any exercise for which a technique classifier is desired. This means that, currently, there exist only a handful of exercises that have been proven to be possible to assess with IMUs. The system described in this paper should allow for the creation of a

The tool described in this paper successfully automates the process of creating personalized IMU-based exercise technique classification systems. The previously laborious sequence of data collection, data labeling, and data analyses in software such as MATLAB (MathWorks, Natwick) has been streamlined as an Android tablet app that can be used by an exercise professional. The app eliminates the need for a data analysis professional to develop the classification systems by automating the common steps in the development of such systems (Figure 1). A key benefit of this tool for exercise professionals is that it allows rapid development of personalized exercise feedback http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.9 (page number not for citation purposes)

366

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES personalized exercise classifier for any rehabilitation or S&C exercises that are cyclical and repetition based. Therefore, clinicians would not be limited in their exercise choices when designing specific programs to meet their clients’ needs. The app described in this paper could be conceivably used by a clinician during a patient’s visit to their clinic, and then the data labeled from this session could be used to create a functioning analysis tool for their program, which they may complete in the absence of professional supervision.

System Evaluation The preliminary evaluation of the system also suggests that the accuracy, sensitivity, and specificity of the personalized exercise technique classifiers may exceed that of global exercise technique classification systems. This reflects other similar research that compared sensor setups and classification methodologies for the barbell squat and deadlift exercises [27]. Although it is difficult to make direct comparisons with the previous research, it can be noted that a single IMU positioned on the left thigh has been demonstrated as capable of assessing acceptable or aberrant lunge technique with 77% accuracy [17] and single-leg squat technique with 75% accuracy [18]. These values were computed using leave-one-subject-out cross-validation. The personalized systems, evaluated in the real world, achieved 84% and 97% accuracy for the same analysis of lunges and single-leg squats, respectively. The binary classification of squat technique has previously been shown to be 80% accurate in a global classification system using a single lumbar-worn IMU [16]. The individualized systems described in this paper ranged from 50% to 100% accuracy and had a mean value of 85% across the 15 participants. It can also be noted that the deviations collected from the 5-participant beginner group used for analysis in this paper were naturally occurring, whereas in the aforementioned lunge and squat global classifiers, the deviations from correct technique were deliberately induced by study participants. This may make individualized classifiers more functional and usable in the real world. This paper’s deadlift accuracy result of 95% exceeds recently published work on binary classification of the deadlift with a left thigh IMU where 84% accuracy was achieved [27]. This is likely because there was more training data for each individual in this study. The personalized classification systems used in this preliminary evaluation of the tablet app were developed using 4 sets of each exercise (a total of 40 repetitions). Increasing the amount of training data used for each individual would likely further improve the accuracy of their personalized exercise technique evaluation system [24,25].

Limitations There are a number of contextual factors to this study that should be considered. Most notably, although the tool described allows for the efficient creation of an IMU-based exercise technique classifier for any cyclical, repetition-based exercise, it is not as simple as using a global classification system for exercises for which they exist. The tool described requires at least one recorded session with an exercise professional and requires the exercise professional’s time and expertise to label the video data. However, the tool described could be conceivably used to

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

O'Reilly et al fill in the gaps in a client’s exercise program where a global classifier is not yet available. Moreover, the labeled data can all be stored in a database, and the data that were initially used to create individualized classifiers can be pooled together to make a global classifier. The exercise professional could switch to this global classifier when they deem it accurate enough to negate the benefits of creating an individualized classifier for each of their clients. A key area that limits the findings of the evaluation study is that it was small scale, and the participants were not balanced in experience or gender. Moreover, the study participants were relatively homogenous in the evaluation study, and it is not yet understood whether the results found would be generalizable to other populations such as older, obese, or underweight people. In particular, the system evaluation was completed with individuals not currently undergoing rehabilitation. Future work should investigate the system with individuals undergoing rehabilitation. It is foreseen that it should still work, provided the exercise professional can label the data appropriately for each individual’s needs. The authors also acknowledge that more work is required to assess the capabilities of classifiers created with this new tool, particularly in the detection of exact deviations in exercise technique. The capabilities of a multiple IMU setup must be examined. However, the results presented show excellent potential for a single IMU setup to assess complex compound lower limb exercises when using personalized classifiers.

Future Work It should be noted that this paper only describes the development of this new tool and its first evaluation. It is not yet fully understood how it will be incorporated into clinical practice. Future work should investigate the influence of the exercise professional’s experience on system accuracy and usability and how the system can be incorporated into a clinician’s use of time. Only 1 exercise professional labeled the data in the evaluation study. The coding was not compared with other professionals; this should be investigated in future studies. Finally, the tool described only replicates current state of the art in the field, and the signal processing, feature computation, and classification methods ought to be iterated as the field progresses.

Conclusions In this paper, a tablet app that streamlines the creation of IMU-based exercise technique analysis systems is presented. The tool replicates the data analysis pathways that have been used in recently published research [16-19]. It also allows an exercise professional to record video data simultaneously to IMU data and label it efficiently, following a session with a client. The app then creates personalized exercise technique classifiers for the client based on the labeled IMU data. These personalized classifiers are less memory-intensive and more accurate than equivalent global classifiers for the exercises used in this study. In addition to this, data collected with the tool could ultimately be used to train new global classification systems with increased accuracy because of the increased amount of training data available.

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.10 (page number not for citation purposes)

367

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

Acknowledgments This project is partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer (EPSPG/2013/574) as well as the Science Foundation Ireland (SFI/12/RC/2289). The authors would like to thank UCD Sport (University College Dublin) for providing equipment that was used in this study.

Conflicts of Interest While this work was cofunded by Shimmer, the developer of the IMU used in this study, we believe the results are generalizable to any IMU with the same sampling rate and on-board sensor ranges as described in this study. We can clarify that there was no outside business interest or other biasing factors from Shimmer during this study’s design and execution.

References 1. 2.

3.

4. 5.

6. 7. 8. 9. 10. 11. 12.

13.

14.

15. 16.

17.

Health Quality Ontario. Physiotherapy rehabilitation after total knee or hip replacement: an evidence-based analysis. Ont Health Technol Assess Ser 2005;5(8):1-91 [FREE Full text] [Medline: 23074477] Hernández-Molina G, Reichenbach S, Zhang B, Lavalley M, Felson DT. Effect of therapeutic exercise for hip osteoarthritis pain: results of a meta-analysis. Arthritis Rheum 2008 Sep 15;59(9):1221-1228 [FREE Full text] [doi: 10.1002/art.24010] [Medline: 18759315] Zhang W, Doherty M, Bardin T, Pascual E, Barskova V, Conaghan P, EULAR Standing Committee for International Clinical Studies Including Therapeutics. EULAR evidence based recommendations for gout. Part II: Management. Report of a task force of the EULAR Standing Committee for International Clinical Studies Including Therapeutics (ESCISIT). Ann Rheum Dis 2006 Oct;65(10):1312-1324 [FREE Full text] [doi: 10.1136/ard.2006.055269] [Medline: 16707532] Frontera WR, Meredith CN, O'Reilly KP, Knuttgen HG, Evans WJ. Strength conditioning in older men: skeletal muscle hypertrophy and improved function. J Appl Physiol (1985) 1988 Mar;64(3):1038-1044. [Medline: 3366726] Ahtiainen JP, Pakarinen A, Alen M, Kraemer WJ, Häkkinen K. Muscle hypertrophy, hormonal adaptations and strength development during strength training in strength-trained and untrained men. Eur J Appl Physiol 2003 Aug;89(6):555-563. [doi: 10.1007/s00421-003-0833-3] [Medline: 12734759] Kraemer WJ, Mazzetti SA, Nindl BC, Gotshalk LA, Volek JS, Bush JA, et al. Effect of resistance training on women's strength/power and occupational performances. Med Sci Sports Exerc 2001 Jun;33(6):1011-1025. [Medline: 11404668] Bassett S. Measuring patient adherence to physiotherapy. J Nov Physiother 2012;02(07):60-66. [doi: 10.4172/2165-7025.1000e124] Friedrich M, Cermak T, Maderbacher P. The effect of brochure use versus therapist teaching on patients performing therapeutic exercise and on changes in impairment status. Phys Ther 1996 Oct;76(10):1082-1088. [Medline: 8863761] Giggins OM, Persson UM, Caulfield B. Biofeedback in rehabilitation. J Neuroeng Rehabil 2013 Jun 18;10:60 [FREE Full text] [doi: 10.1186/1743-0003-10-60] [Medline: 23777436] McGrath D, Greene BR, O’Donovan KJ, Caulfield B. Gyroscope-based assessment of temporal gait parameters during treadmill walking and running. Sports Eng 2012 Jul 3;15(4):207-213. [doi: 10.1007/s12283-012-0093-8] Chang K, Chen M, Canny J. Tracking free-weight exercises. 2007 Presented at: Ubiquitous Computing; September 16, 2007; Innsbruck, Austria. [doi: 10.1007/978-3-540-74853-3_2] Fitzgerald D, Foody J, Kelly D, Ward T, Markham C, McDonald J, et al. Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises. Conf Proc IEEE Eng Med Biol Soc 2007;2007:4870-4874. [doi: 10.1109/IEMBS.2007.4353431] [Medline: 18003097] Seeger C, Buchmann A, Van LK. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. 2011 Presented at: 6th International Conference on Body Area Networks; November 07, 2011; Beijing, China p. 1-7. [doi: 10.1145/2320000/2318778] Morris D, Saponas T, Guillory A, Kelner I. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. 2014 Presented at: 32nd annual ACM conference on Human factors in computing systems (CHI); April 26, 2014; Toronto, Canada. [doi: 10.1145/2556288.2557116] Giggins OM, Sweeney KT, Caulfield B. Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study. J Neuroeng Rehabil 2014 Nov 27;11:158 [FREE Full text] [doi: 10.1186/1743-0003-11-158] [Medline: 25431092] O'Reilly M, Whelan D, Chanialidis C, Friel N, Delahunt E, Ward T. Evaluating squat performance with a single inertial measurement unit. 2015 Presented at: IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks; June 9, 2015; Boston, MA. [doi: 10.1109/BSN.2015.7299380] Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating performance of the lunge exercise with multiple and individual inertial measurement units. 2016 Presented at: 10th EAI International Conference on Pervasive Computing Technologies for Healthcare; May 16, 2016; Cancun, Mexico p. 16-19. [doi: 10.4108/eai.16-5-2016.2263319]

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.11 (page number not for citation purposes)

368

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES 18.

19.

20.

21.

22.

23. 24.

25. 26.

27. 28.

29. 30.

31.

32. 33. 34. 35. 36.

O'Reilly et al

Whelan DF, O'Reilly MA, Ward TE, Delahunt E, Caulfield B. Technology in rehabilitation: evaluating the single leg squat exercise with wearable inertial measurement units. Methods Inf Med 2017 Mar 23;56(2):88-94. [doi: 10.3414/ME16-02-0002] [Medline: 27782290] Whelan D, O'Reilly M, Ward T, Delahunt E, Caulfield B. Evaluating performance of the single leg squat exercise with a single inertial measurement unit. 2015 Presented at: Proceedings of the 3rd 2015 Workshop on ICTs for improving Patients Rehabilitation Research Techniques; October 01, 2015; Lisbon, Portugal. [doi: 10.1145/2838944.2838979] Kianifar R, Lee A, Raina S, Kulic D. Classification of squat quality with inertial measurement units in the single leg squat mobility test. Conf Proc IEEE Eng Med Biol Soc 2016 Aug;2016:6273-6276. [doi: 10.1109/EMBC.2016.7592162] [Medline: 28269683] Giggins O, Kelly D, Caulfield B. Evaluating rehabilitation exercise performance using a single inertial measurement unit. 2013 Presented at: 7th International Conference on Pervasive Computing Technologies for Healthcare; May 5, 2013; Venice, Italy. [doi: 10.4108/icst.pervasivehealth.2013.252061] Johnston W, O'Reilly M, Dolan K, Reid N, Coughlan GF, Caulfield B. Objective classification of dynamic balance using a single wearable sensor. 2016 Presented at: 4th International Congress on Sport Sciences Research and Technology Support (icSports); November 1, 2016; Porto, Portugal. [doi: 10.5220/0006079400150024] Moreno-Torres JG, Saez JA, Herrera F. Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Netw Learn Syst 2012 Aug;23(8):1304-1312. [doi: 10.1109/TNNLS.2012.2199516] [Medline: 24807526] Sugawara E, Nikaido H. Properties of AdeABC and AdeIJK efflux systems of Acinetobacter baumannii compared with those of the AcrAB-TolC system of Escherichia coli. Antimicrob Agents Chemother 2014 Dec;58(12):7250-7257 [FREE Full text] [doi: 10.1128/AAC.03728-14] [Medline: 25246403] Kotsiantis S. Supervised machine learning: a review of classification techniques. Informatica 2007;31:268. [doi: 10.1023/A:1014046307775] Taylor PE, Almeida GJ, Hodgins JK, Kanade T. Multi-label classification for the analysis of human motion quality. 2012 Presented at: Conf Proc IEEE Engineering in Medicine and Biology Society; August 28, 2012; San Diego, CA p. 2214-2218. [doi: 10.1109/EMBC.2012.6346402] O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM. Classification of deadlift biomechanics with wearable inertial measurement units. J Biomech 2017 Jun 14;58:155-161. [doi: 10.1016/j.jbiomech.2017.04.028] [Medline: 28545824] Whelan D, O'Reilly M, Huang B, Giggins O, Kechadi T, Caulfield B. Leveraging IMU data for accurate exercise performance classification and musculoskeletal injury risk screening. Conf Proc IEEE Eng Med Biol Soc 2016 Aug;2016:659-662. [doi: 10.1109/EMBC.2016.7590788] [Medline: 28268414] Burns A, Greene B, McGrath M, O'Shea T, Kuris B, Ayer S. SHIMMERTM - a wireless sensor platform for noninvasive biomedical research. IEEE Sens J 2010;10(9):1527-1534. [doi: 10.1109/JSEN.2010.2045498] Madgwick SO, Harrison AJ, Vaidyanathan A. Estimation of IMU and MARG orientation using a gradient descent algorithm. 2011 Presented at: IEEE International Conference on Rehabilitative Robotics; JunE 29, 2011; Zurich, Switzerland. [doi: 10.1109/ICORR.2011.5975346] Shatkay H, Zdonik S. Approximate queries and representations for large data sequences. 1996 Presented at: Proceedings of the Twelfth International Conference on Data Engineering; February 26, 1996; Washington, DC. [doi: 10.1109/ICDE.1996.492204] Duda RO, Hart PW, Strok DG. Pattern classification. New York, NY: John Wiley & sons cop; 1973. Pomplun M, Mataric M. Evaluation metrics and results of human arm movement imitation. 2000 Presented at: First IEEE-RAS International Conference on Humanoid Robots; September 7, 2000; Boston, MA. Lin JF, Kulić D. Online segmentation of human motion for automated rehabilitation exercise analysis. IEEE Trans Neural Syst Rehabil Eng 2014 Jan;22(1):168-180. [doi: 10.1109/TNSRE.2013.2259640] [Medline: 23661321] Katz MJ, George EB. Fractals and the analysis of growth paths. Bull Math Biol 1985;47(2):273-286. [Medline: 4027437] Matlab. The Mathworks. Natwick, U.S.A. R2012b 2012.

Abbreviations FN: false negative FP: false positive IMU: inertial measurement unit S&C: strength and conditioning SD: standard deviation 3D: three-dimensional TN: true negative TP: true positive

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.12 (page number not for citation purposes)

369

JMIR REHABILITATION AND ASSISTIVE TECHNOLOGIES

O'Reilly et al

Edited by G Eysenbach; submitted 07.02.17; peer-reviewed by J Richards, V Gay, K Ng, N Ridgers; comments to author 06.05.17; revised version received 27.05.17; accepted 12.07.17; published 23.08.17 Please cite as: O'Reilly M, Duffin J, Ward T, Caulfield B Mobile App to Streamline the Development of Wearable Sensor-Based Exercise Biofeedback Systems: System Development and Evaluation JMIR Rehabil Assist Technol 2017;4(2):e9 URL: http://rehab.jmir.org/2017/2/e9/ doi:10.2196/rehab.7259 PMID:

©Martin O'Reilly, Joe Duffin, Tomas Ward, Brian Caulfield. Originally published in JMIR Rehabilitation and Assistive Technology (http://rehab.jmir.org), 23.08.2017. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Rehabilitation and Assistive Technology, is properly cited. The complete bibliographic information, a link to the original publication on http://rehab.jmir.org/, as well as this copyright and license information must be included.

http://rehab.jmir.org/2017/2/e9/

XSL• FO RenderX

JMIR Rehabil Assist Technol 2017 | vol. 4 | iss. 2 | e9 | p.13 (page number not for citation purposes)

370

JMIR MHEALTH AND UHEALTH

Dominguez Veiga et al

Original Paper

Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation Jose Juan Dominguez Veiga1, BSc (Hons), MSc; Martin O'Reilly2, BEng (Hons), HDip; Darragh Whelan2, BSc (Hons), MSc; Brian Caulfield2, BSc (Hons), MSc, PhD; Tomas E Ward1, BE, MEngSc, PhD 1

Insight Centre for Data Analytics, Department of Electronic Engineering, Maynooth University, Maynooth, Ireland

2

Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland

Corresponding Author: Jose Juan Dominguez Veiga, BSc (Hons), MSc Insight Centre for Data Analytics Department of Electronic Engineering Maynooth University Bioscience and Engineering Building, North Campus Co. Kildare Maynooth, Ireland Phone: 353 17086000 Fax: 353 16289063 Email: [email protected]

Abstract Background: Inertial sensors are one of the most commonly used sources of data for human activity recognition (HAR) and exercise detection (ED) tasks. The time series produced by these sensors are generally analyzed through numerical methods. Machine learning techniques such as random forests or support vector machines are popular in this field for classification efforts, but they need to be supported through the isolation of a potentially large number of additionally crafted features derived from the raw data. This feature preprocessing step can involve nontrivial digital signal processing (DSP) techniques. However, in many cases, the researchers interested in this type of activity recognition problems do not possess the necessary technical background for this feature-set development. Objective: The study aimed to present a novel application of established machine vision methods to provide interested researchers with an easier entry path into the HAR and ED fields. This can be achieved by removing the need for deep DSP skills through the use of transfer learning. This can be done by using a pretrained convolutional neural network (CNN) developed for machine vision purposes for exercise classification effort. The new method should simply require researchers to generate plots of the signals that they would like to build classifiers with, store them as images, and then place them in folders according to their training label before retraining the network. Methods: We applied a CNN, an established machine vision technique, to the task of ED. Tensorflow, a high-level framework for machine learning, was used to facilitate infrastructure needs. Simple time series plots generated directly from accelerometer and gyroscope signals are used to retrain an openly available neural network (Inception), originally developed for machine vision tasks. Data from 82 healthy volunteers, performing 5 different exercises while wearing a lumbar-worn inertial measurement unit (IMU), was collected. The ability of the proposed method to automatically classify the exercise being completed was assessed using this dataset. For comparative purposes, classification using the same dataset was also performed using the more conventional approach of feature-extraction and classification using random forest classifiers. Results: With the collected dataset and the proposed method, the different exercises could be recognized with a 95.89% (3827/3991) accuracy, which is competitive with current state-of-the-art techniques in ED. Conclusions: The high level of accuracy attained with the proposed approach indicates that the waveform morphologies in the time-series plots for each of the exercises is sufficiently distinct among the participants to allow the use of machine vision approaches. The use of high-level machine learning frameworks, coupled with the novel use of machine vision techniques instead of complex manually crafted features, may facilitate access to research in the HAR field for individuals without extensive digital signal processing or machine learning backgrounds.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.1 (page number not for citation purposes)

371

JMIR MHEALTH AND UHEALTH

Dominguez Veiga et al

(JMIR Mhealth Uhealth 2017;5(8):e115) doi:10.2196/mhealth.7521 KEYWORDS machine learning; exercise; biofeedback

Introduction Background Inertial sensors are ubiquitous in everyday objects such as mobile phones and wristbands and can provide large amounts of data regarding movement activity. Analysis of such data can be diverse, but in general terms can be characterized as complex operations using a broad range of machine learning techniques and highly sophisticated signal processing methods. The latter is required to extract salient features that can improve recognition performance. These features are not only complex to calculate, but also making a priori reasoned arguments toward their effectiveness in improving overall results is difficult. The temptation to include additional features in an attempt to improve classification accuracy may result in pipelines (infrastructure) with excessive complexity, yielding slower processing and increased resource usage. To counter this proliferation of features, it is common to use dimensionality reduction techniques including linear approaches such as principal component analysis and increasingly common nonlinear methods principally based on manifold learning algorithms. In contrast to this complex tool, we propose a method to classify human activity from inertial sensor data based on images and using deep learning-based machine vision techniques. This approach reduces the amount of deep domain knowledge needed in terms of digital signaling processing (DSP), down to some basic steps of preprocessing and segmentation, substituting instead a neural network that can learn the appropriate features independent of a user-driven feature candidature step. Convolutional networks are not trivial to work with, but the recent availability of higher level deep learning frameworks such as TensorFlow [1] and the use of transfer learning, a technique to reuse already trained convolutional neural networks (CNNs), considerably reduces the skills needed to set up and operate such a network. In this study, we sought to demonstrate a novel application of machine vision techniques as a classification method for inertial measurement unit (IMU) data. The main goal of this work was to develop a novel data analysis pathway for researchers who are most interested in this type of work, such as medical and exercise professionals. These individuals may not have the technical background to implement existing state-of-the-art data analysis pathways. We also aimed to evaluate the efficacy of our new classification technique by attempting to detect five commonly completed lower-limb exercises (squats, deadlifts, lunges, single-leg squats, and tuck jumps) using the new data analysis pathway. The accuracy, sensitivity, and specificity of the pathway were compared with recently published work on the same dataset.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

Related Work The three main topics in this section are as follows: (1) a brief overview of the current human activity recognition (HAR) and exercise detection (ED) literature, (2) an account of some of the newer advances in the field that are using neural networks for certain parts of the feature discovery and reduction process, and (3) an introduction to transfer learning, highlighting its benefits in terms of time and resource savings, and working with smaller datasets.

Activity Classification for Inertial Sensor Data Over the past 15 years, inertial sensors have become increasingly ubiquitous due to their presence in mobile phones and wearable activity trackers [2]. This has enabled countless applications in the monitoring of human activity and performance spanning applications in general HAR, gait analysis, the military field, the medical field, and exercise recognition and analysis [3-6]. Across all these application spaces, there are common challenges and steps which must be overcome and implemented to successfully create functional motion classification systems. Human activity recognition with wearable sensors usually pertains to the detection of gross motor movements such as walking, jogging, cycling, swimming, and sleeping [5,7]. In this field of motion tracking with inertial sensors, the key challenges are often considered to be (1) the selection of the attributes to be measured; (2) the construction of a portable, unobtrusive, and inexpensive data acquisition system; (3) the design of feature extraction and inference methods; (4) the collection of data under realistic conditions; (5) the flexibility to support new users without the need for retraining the system; and (6) the implementation in mobile devices meeting energy and processing requirements [3,7]. With the ever-increasing computational power and battery life of mobile devices, many of these challenges are becoming easier to overcome. Whereas system functionality is dependent on hardware constraints, the accuracy, sensitivity, and specificity of HAR systems are most reliant on building large, balanced, labeled datasets; the identification of strong features for classification; and the selection of the best machine learning method for each application [3,8-10]. Investigating the best features and machine learning methods for each HAR application requires an individual or team appropriately skilled in signal processing and machine learning and a large amount of time. They must understand how to compute time-domain, frequency-domain, and time-frequency domain features from inertial sensor data and train and evaluate multiple machine learning methods (eg, random forests [11], support vector machines [12], k-nearest neighbors [13], and logistical regression [14]) with such features [3-5]. This means that those who may be most interested in the output of inertial sensor based activity recognition systems (eg, medical professionals, exercise professionals, and biomechanists) are unable to design and create the systems

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.2 (page number not for citation purposes)

372

JMIR MHEALTH AND UHEALTH

without significant engagement with machine learning experts [4]. The above challenges in system design and implementation are replicated in activity recognition pertaining to more specific or acute movements. In the past decade, there has been a vast amount of work in the detection and quantification of specific rehabilitation and strength and conditioning exercises [15-17]. Such work has also endeavored to detect aberrant exercise technique and specific mistakes that system users make while exercising, which can increase their chance of injury or decrease their body’s beneficial adaptation due to the stimulus of exercise

Dominguez Veiga et al

[17,18]. The key steps in the development of such systems have been recently outlined as (1) inertial sensor data collection, (2) data preprocessing, (3) feature extraction, and (4) classification (Figure 1) [4]. Whereas the first step can generally be completed by exercise professionals (eg, physiotherapists and strength and conditioning coaches), the remaining steps require skills outside that included in the training of such experts. Similarly, when analyzing gait with wearable sensors, feature extraction and classification have been highlighted as essential in the development of each application [19,20]. This again limits the type of professional who can create such systems and the rate at which hypotheses for new systems can be tested.

Figure 1. Steps involved in the development of an inertial measurement unit (IMU)-based exercise classification system.

Neural Networks and Activity Recognition In the past few years, CNNs have been applied in a variety of manners to HAR, in both the fields of ambient and wearable sensing. Mo et al applied a novel approach utilizing machine vision methods to recognize twelve daily living tasks with the Microsoft Kinect. Rather than extract features from the Kinect data streams, they developed 144×48 images using 48 successive frames from skeleton data and 15×3 joint position coordinates and 11×3×3 joint rotation matrices. These images were then used as input to a multilayer CNN which automatically extracted features from the images that were fed in to a multilayer perceptron for classification [21]. Stefic and Patras utilized CNNs to extract areas of gaze fixation in raw image training data as participants watched videos of multiple activities [22]. This produced strong results in identifying salient regions of images that were then used for action recognition. Ma et al also combined a variety of CNNs to complete tasks, such as segmenting hands and objects from first-person camera images and then using these segmented images and motion images to train an action-based and motion-based CNN [23]. This novel use of CNNs allowed an increase in activity recognition rates of 6.6%, on average. These research efforts demonstrated the power of utilizing CNNs in multiple ways for HAR. Research utilizing CNNs for HAR with wearable inertial sensors has also been published recently. Zeng et al implemented a method based on CNNs which captures the local dependency and scale invariance of an inertial sensor signal [24]. This allows features for activity recognition to be identified automatically. The motivation for developing this method was the difficulties in identifying strong features for HAR. Yang et al also highlighted the challenge and importance of identifying strong features for HAR [25]. They also employed CNNs for feature learning from raw inertial sensor signals. The strength of CNNs in HAR was again demonstrated here as its use in this circumstance outperformed other HAR algorithms, on multiple datasets, which utilized heuristic hand-crafting of features or http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

shallow learning architectures for feature learning. Radu et al also recently demonstrated that the use of CNNs to identify discriminative features for HAR when using multiple sensor inputs from various mobile phones and smartwatches, which have different sampling rates, data generation models, and sensitivities, outperforms classic methods of identifying such features [26]. The implementation of such feature learning techniques with CNNs is clearly beneficial but is complex and may not be suitable for HAR system developers without strong experience in machine learning and DSP. From a CNN perspective, these results are interesting and suggest significant scope for further exploration for machine learning researchers. However, for the purposes of this paper, their inclusion is to both succinctly acknowledge that CNN has been applied to HAR previously and to distinguish the present approach which seeks to use well developed CNN platforms tailored for machine vision tasks in a transfer learning context for HAR recognition using basic time series as the only user created features.

Transfer Learning in Machine Vision Deep learning-based machine vision techniques are used in many disciplines, from speech, video, and audio processing [27], through to HAR [21] and cancer research [28]. Training deep neural networks is a time consuming and resource intensive task, not only needing specialized hardware (graphics processing unit [GPU]) but also large datasets of labeled data. Unlike other machine learning techniques, once the training work is completed, querying the resulting models to predict results on new data is fast. In addition, trained networks can be repurposed for other specific uses which are not required to be known in advance of the initial training [29]. This arises from the generalized vision capabilities that can emerge with suitable training. More precisely, each layer of the network learns a number of features from the input data and that knowledge is refined through iterations. In fact, the learning that happens at different layers seems to be nonspecific to the dataset, including the identification of simple edges in the first few layers, the JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.3 (page number not for citation purposes)

373

JMIR MHEALTH AND UHEALTH

Dominguez Veiga et al

subsequent identification of boundaries and shapes, and growing toward object identification in the last few layers. These learned visual operators are applicable to other sets of data [30]. Transfer learning then is the generic name given to a classification effort when a pretrained network is reused for a task for which it was not specifically trained for. Deep learning frameworks such as Caffe [31] and TensorFlow can make use of pretrained networks, many of which have been made available by researchers in repositories such as the Caffe Model Zoo, available in their github repository.

needs of the new dataset (alongside other hyperparameters such as learning rate and training steps).

Retraining requires not only a fraction of the time that a full training session would need (min/h instead of weeks), but more importantly in many cases, allows for the use of much smaller datasets. An example of this is the inception model provided by Google, whose engineers reportedly spent several weeks training on ImageNet [32] (a dataset of over 14 million images in over 2 thousand categories), using multiple GPUs and the TensorFlow framework. In their example [33], they use in the order of 3500 pictures of flowers in 5 different categories to retrain the generic model, producing a model with a fair accuracy rating on new data. In fact, during the retraining stage, the network is left almost intact. The final classifier is the only part that is fully replaced, and “bottlenecks” (the layer before the final one) are calculated to integrate the new training data into the already “cognizant” network. After that, the last layer is trained to work with the new classification categories. This happens in image batches of a size that can be adapted to the

Methods

Each step of the training process outputs values for training accuracy, validation accuracy, and cross entropy. A large difference between training and validation accuracy can indicate potential “overfitting” of the data, which can be a problem especially with small datasets, whereas the cross entropy is a loss function that provides an indication of how the training is progressing (decreasing values are expected).

Study Design Given the potential advantages of transfer learning in machine vision for the purposes of HAR, we next describe an exemplar study where we apply these ideas for the purposes of classifying exercise data from inertial sensors. This very specific example is sufficiently comprehensive in scale, and scope to represent a typical use case for the approach which to reiterate will use pretrained CNNs with one lightweight additional training step, to classify inertial sensor data based on images generated from the raw data (Figure 2). The level of DSP skills to perform this analysis will be shown to be much lower compared with other methods of classifying this type of data with other machine learning techniques that rely on engineered features (Figure 1). This section contains all the details required to replicate this approach, focusing on how the data was collected, and how our system was set up and used.

Figure 2. Depiction of the changes between traditional methods and the one presented in this paper, in particular steps 3 and 4.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.4 (page number not for citation purposes)

374

JMIR MHEALTH AND UHEALTH

Data Collection Participants A total of 82 healthy volunteers aged 16-38 years (59 males, 23 females, age: 24.68 years [SD (standard deviation) 4.91], height: 1.75m [SD 0.09], body mass: 76.01kg [SD 13.29]) were recruited for the study. Participants did not have a current or recent musculoskeletal injury that would impair performance of multi-joint, lower-limb exercises. All participants had been completing each of the five exercises as part of their training regime for at least one year. The human research ethics committee at University College Dublin approved the study protocol and written informed consent was obtained from all participants before testing. In cases where participants were under the age of 18 years, written informed consent was also obtained from a parent or guardian.

Procedures The testing protocol was explained to participants upon their arrival at the laboratory. Following this, they completed a 10-min warm-up on an exercise bike (Lode BV, Groningen, The Netherlands), maintaining a power output of 100W at 75-85 revolutions per min. Next, an IMU (SHIMMER, Dublin, Ireland) was secured on the participant by a chartered physiotherapist at the spinous process of the 5th lumbar vertebra (Figure 3). The orientation and location of all the IMUs was consistent for all the study participants across all exercises.

Dominguez Veiga et al

A pilot study was used to determine an appropriate sampling rate and the ranges for the accelerometer and gyroscope on board the IMU. In the pilot study, squat, lunge, deadlift, single-leg squat, and tuck jump data were collected at 512 samples/s. A Fourier transform was then used to determine signal and noise characteristics of the signal that were all found to be less than 20 Hz. Therefore, a sampling rate of 51.2 samples/s was deemed appropriate for this study based upon the Shannon sampling theorem and the Nyquist criterion [34]. The Shimmer IMU was configured to stream tri-axial accelerometer (±16 g) and gyroscope (±500 ˚/s) data with the sensor ranges chosen based upon data from the pilot study. Each IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration application. After completion of their warm up, participants proceeded to do one set of 10 repetitions of bodyweight squats, barbell deadlifts at a load of 25kg, bodyweight lunges, and bodyweight single-leg squats (Figure 4). A chartered physiotherapist demonstrated the correct technique for each of the exercises. Participants familiarized themselves with each exercise, and their technique was assessed to be correct by the physiotherapist. Correct technique for squats, lunges, and deadlifts was defined using guidelines from the National Strength and Conditioning Association [35]. Single leg squats were completed according to the scoring criteria outlined by Whatman et al [36]. Finally, each participant completed the 10-second tuck jump test while attempting to maintain good form throughout [37].

Figure 3. Inertial measurement unit (IMU) position: the spinous process of the 5th lumbar vertebra.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.5 (page number not for citation purposes)

375

JMIR MHEALTH AND UHEALTH

Dominguez Veiga et al

Figure 4. The five exercises completed for this study: bodyweight squat (upper left), bodyweight lunge (upper middle), barbell deadlift (lower left), single leg squat (lower middle), and tuck jump (right).

Preparation for Transfer Learning Based on the previous design for an IMU-based exercise classification system (Figure 1), with this new method the feature extraction step is not needed (Figure 2) as the CNN will take care of automatically both training the model and discovering the features by itself. The segmentation process is directly followed by the classification task (training and inference).

Convolutional Neutral Network (CNN) Infrastructure Working with convolutional networks is not a trivial task. Fortunately, since the advent of deep learning in the last few years, a number of frameworks such as TensorFlow and Caffe have appeared in the market and are readily available for researchers. Most of these frameworks are open source, supported by large companies or universities, and provide not only helper libraries for numerical computation and machine learning but also a flexible architecture and the possibility to almost trivially use multiple central processing units (CPUs) and GPUs if available. The authors used TensorFlow for the particular results provided in this paper, but any other framework or higher level library would suffice. Installing TensorFlow can be cumbersome, but Google provides a Docker container [38] with all the components to run TensorFlow out of the box. Documentation and scripts are also provided to retrain [39] networks and query [40] the new classifier. The aforementioned Docker container and scripts were used in this paper with minimal modifications.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

The preprocessing and segmentation of inertial data to create the images that are fed into the CNN were prepared with MATLAB (2012, The MathWorks), as explained in the following section.

Data Preparation Six signals were collected from the IMU; accelerometer x, y, and z; and gyroscope x, y, and z. Data were analyzed using MATLAB. To ensure the data analyzed applied to each participant’s movement and to eliminate unwanted high-frequency noise, the six signals were low pass filtered at fc=20 Hz using a Butterworth filter of order n=8. The filtered signals were then programmatically segmented into epochs that relate to single, full repetitions of the completed exercises. Many algorithms are available to segment human motion during exercise. These include the sliding window algorithm, top-down, bottom-up algorithms, zero-velocity crossing algorithms, template-base matching methods, and combination algorithms of the above [4]. These algorithms all have advantages and disadvantages. For the purpose of the creation of a functioning exercise detection classifier, a simple peak-detection algorithm was used on the gyroscope signal with the largest amplitude for each exercise. The start and end points of each repetition were found by looking for the corresponding zero-crossing points of the gyroscope signal leading up to and following the location of a peak in the signal. Example results of the segmentation algorithm used on the gyroscope x signal, from an IMU positioned on the spine during 3 repetitions of the deadlift exercise, are provided (Figure 5).

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.6 (page number not for citation purposes)

376

JMIR MHEALTH AND UHEALTH Each extracted repetition of exercise data was resampled to a length of 250 samples. The six signals were then plotted using the MATLAB subplot function. The first subplot, gyroscope x (sagittal plane) was plotted between the y-axis range of ±250 °/s. Subplots 2 and 3, gyroscope y and z (frontal and transverse plane) were plotted between the y-axis range of ±100 °/s. Accelerometer x (subplot 4) was plotted in the y-axis range of ±3 m/s2 and accelerometer y and z (subplots 5 and 6) were plotted in the range ±15 m/s2. Axes labels and markers were programmatically hidden, and the blank space between each subplot was minimized. Following this, the graphs were saved as 470x470 JPEG files. Examples of the generated JPEG files are provided (Figure 6).

Retraining and Using the New Model Transfer learning is the main technique used in this paper. This reuses an already trained CNN for classification purposes. In this case, the framework TensorFlow was used, which provides access to a model called “inception” trained on over 14 million images and also provides example scripts to retrain the network, that is, discarding the provided classifier and adjusting the values of the last layer of the network according to the new data provided. The retraining scripts expect to find the images in a particular folder (passed as a parameter) and layout (Figure 7), that is, a folder for each category that the new classifier will learn to identify, containing training pictures in jpg format.

Dominguez Veiga et al During training, the network will automatically identify the features to use to create the classifier. There are a number of hyperparameters that can be changed depending on the new data used to retrain, such as the validation and training split of data to be used, the size of the batches to train on, or the learning rate applied (probably the most important of all for fine tuning and avoiding extra computation). The only parameter changed in this work was the number of steps, from a default 4000 iterations to 96,000 steps. This number provides high accuracy without showing signs of overfitting (see Results section). The output of the training phase is simply two files, one with new weights (the retrained network) and a second file with labels for the data trained (the default names are retrained_graph.pb and retrained_labels.txt). These two files are all that is needed to predict results coming from new data. The classifier can be queried with the classify_image script mentioned previously. Retraining and querying are actions that can be performed in a multitude of ways, with different frameworks and in different configurations. This work is about making things accessible and available. The Docker container for Tensorflow, with the documentation and helper scripts, was the simplest route the authors could find.

Figure 5. Detection of peak, start, and end points of exercise repetitions (neighboring zero crossing values to the peak locations).

Figure 6. Samples of the generated plots (JPEG files) which were used as training and test data in this study.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.7 (page number not for citation purposes)

377

JMIR MHEALTH AND UHEALTH

Dominguez Veiga et al

Figure 7. Folders containing images for the five exercises (Bodyweight squat: SQ, bodyweight lunge: LUL, barbell deadlift: DL, single leg squat: SLSL, and tuck jump: TJ).

Results As mentioned in the previous section, each training batch outputs training and validation accuracy and a cross entropy (loss function) amount, alongside with final validation accuracy. Rolled averages for those four values for training sessions of 96,000 steps are shown (Figure 8). As observed, the cross entropy keeps falling steadily, and the average difference between training and testing is not very large, so overfitting is not an issue. Averaged over 5 runs of training, the final accuracy result was a 95.89% (3928/3991) for 96,000 steps. Figure 9 shows a confusion matrix for this method. Figure 10 is an illustration of a misclassified plot. Part (a) of the image shows a typical lunge signal, whereas part (c) shows a typical single leg squat signal. Part (b) in the middle shows an example of a lunge repetition misclassified as a single leg

squat. The issue seems to be concentrated in the top part of the image. The most likely reason for the odd lunge signal shape is that the subject may have looked over their shoulder or twisted for some reason during the repetition, and the final result is confusing the classifier, as it would confuse an expert looking directly at the plot. These results are equivalent with a recently published method on the same dataset whereby the accuracy was found to be 94.1% [17]. Figure 11 shows the confusion matrix for this feature-based classification effort, and as it can be seen, the results are similar. However, leave-one-subject-out-crossvalidation was used in this instance, so the results are not directly comparable. The emphasis on this work, though, is in the ease of setup by using transfer learning and the need of only basic digital processing skills to prepare the data, when compared with other methods in this area.

Figure 8. Training (blue) and validation (green) accuracy during training phase, with final accuracy (orange) and cross entropy (red) for 96,000 steps.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.8 (page number not for citation purposes)

378

JMIR MHEALTH AND UHEALTH

Dominguez Veiga et al

Figure 9. Confusion matrix for the machine vision-based classification method.

Figure 10. A lunge signal (a), a lunge signal misclassified as a single leg squat (b), and a single leg squat signal (c) for comparison.

Figure 11. Confusion matrix for the feature based classification method.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.9 (page number not for citation purposes)

379

JMIR MHEALTH AND UHEALTH

Discussion Principal Findings An analysis of the data collected with the method proposed obtained an average 95.89% (3827/3991) classification accuracy, which is competitive with current state-of-the-art techniques. This high level of accuracy indicates that the distinctive waveforms in the plots for each of the exercises can be generalized among different participants, and the patterns created are appropriate for classification efforts. These results are coupled with the underlying recurrent theme for this work—to enable a more approachable entry path into the HAR and ED fields. To do so, high-level machine learning frameworks, coupled with a novel use of machine vision techniques, are used in two main ways: first, to avoid the complexity of manually crafted features only available through advanced DSP techniques, and second, to facilitate dimensionality reduction by allowing the CNN to take care of both feature extraction and classification tasks.

Comparison With Prior Work The methodology employed and the results achieved in this paper can be directly compared with a recently published ED paper on the exact same dataset [17]. In this recently published work, identical filtering and segmentation methodologies were employed. However, a vast amount of additional signals and data processing were required to achieve classification with the lumbar worn IMU. As well as the 6 signals from the accelerometer and gyroscope used in this paper, 12 additional signals were used for classification. These were magnetometer x, y, and z, magnitude of acceleration, magnitude of rotational velocity, and the IMU’s three-dimensional (3-D) orientation as represented by a rotation quaternion (W, X, Y, and Z) and Euler angles (pitch, roll, and yaw). Furthermore, 19 features were then computed from the segmented epochs of the 18 signals. These features were namely “mean,” “RMS,” “standard deviation,” “kurtosis,” “median,” “skewness,” “range,” “variance,” “max,” “index of max,” “min,” “index of min,” “energy,” “25th percentile,” “75th percentile,” “level crossing rate,” “fractal dimension,” and the “variance of both the approximate and detailed wavelet coefficients using the Daubechies 4 mother wavelet to level 7.” This resulted in a total of 342 features per exercise repetition. These features and their associated exercise label were used to evaluate and train a random forests classifier with 400 trees. Following leave-one-subject-out-cross-validation an accuracy result of 94.64% was achieved with this method. This recent work also demonstrated the laborious process of identifying the most important features for classification that can improve the efficiency of the reported technique used. Although the accuracy result achieved in this recent work (94.64%) is slightly less than that presented in this paper (95.89%), the results should not be directly compared. This is because the additional signals used by O’Reilly et al [17] and the different method of cross-validation utilized in both studies to compute accuracy mean it is not a perfectly like-for-like comparison. However, it can be stated that similar levels of accuracy have been achieved with both methods. Most http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

Dominguez Veiga et al importantly, the ease of implementation of the classification method presented here greatly exceeds that presented by O’Reilly et al [17]. Most notably, the need to use additional signals and derive many features from them has been eliminated. This minimizes the signal processing and machine learning experience needed by the person investigating the possibility of creating a classifier. This is in line with the core objective of this paper.

Limitations Simplicity was of utmost importance when designing this novel classification method for accelerometer and gyroscope data. Subsequently, maximal possible accuracy may not have been achieved. Utilizing a better understanding on how to parameterize the retraining effort and other techniques such as fine tuning (a method to reuse certain parts of a pretrained network instead of simply changing the last layer and classifier), could produce better results. A better understanding on how to deal with the type of data we are using could be beneficial. In general, machine vision work is plagued with issues such as partial occlusion, deformation, or viewpoint variation, which the data in this work does not suffer from. Due to that, and also to make the baseline of this work as simple as possible, no data augmentation or any kind of image processing techniques has been used. The results reported have been obtained only with resources from readily available frameworks, mostly on default settings. It should also be noted that the presented method of classifying inertial sensor data with machine vision techniques has only been evaluated on exemplar samples of exercises that were conducted in a laboratory setting. Results are of high accuracy and competitive, with recent work on the same dataset [17] and therefore, act as a proof of concept for the method. However, the method has not yet been evaluated in classifying inertial sensor data arising from free-living activities and other HAR classification tasks. Future work should investigate the method’s efficacy in such areas. Of key importance will be to simplify each application’s preprocessing and segmentation of the inertial sensor data.

Conclusions This paper has described a novel application approach for the classification of inertial sensor data in the context of HAR. There are two stand-out benefits of the machine vision approach described. The first is the ease of setting up the infrastructure for the CNNs involved through the use of transfer learning. The second is the reduction in the depth of digital signal processing expertise required on the part of the investigator. Due to the many difficulties in creating inertial sensor based activity recognition systems, the authors believe there is a need for a system development path which is easier to use for people who lack significant background in signal processing and machine learning. In particular, the new development pathway should eliminate the most difficult tasks conventionally identified with this area, that is, feature development or extraction and dimensionality reduction for the best machine learning method for each new application (Figure 1). The new development pathway, although eliminating these steps, does not compromise the attainment of high quality classification accuracy, sensitivity, JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.10 (page number not for citation purposes)

380

JMIR MHEALTH AND UHEALTH and specificity which is currently achieved through their successful implementation by appropriate experts (Figure 4). The exemplar study described here illustrates that the method is very competitive in comparison with customized solutions. Either way, the new pathway, at the very least, will allow for the easier testing of hypotheses relating to new inertial sensor-based activity classification systems, that is, is the classification possible at all based on the collected dataset? Ideally, it should also achieve equivalent. Whereas the presented method does successfully eliminate the need for feature crafting and identification of optimal classification algorithms, it does not eliminate the process of signal preprocessing and signal segmentation before performing classification. Therefore, there remains some complexity in the process of achieving exercise classification when using the machine vision technique. However, the authors consider the process of filtering, segmenting, and plotting inertial sensor signals considerably less complex than identifying and computing strong features and an optimal classification method for the classification of inertial sensor data.

Dominguez Veiga et al

Future Work Even though the current infrastructure used is readily available, certain skills such as familiarity with Docker or with Python data science stacks and basic DSP skills are still needed. The creation of a full package that could be installed on the researcher’s machine could be an avenue to explore. Also the preprocessing and segmentation steps to prepare the data could be simplified by providing a set of scripts. A number of professional machine vision companies exist in the market, and some provide online services that allow retraining of their custom models and could also be used for this type of work, avoiding the need for setting up the CNN infrastructure locally. The availability of this technology on Android mobile devices is something that the authors are also pursuing. TensorFlow may provide some initial support in this area. Finally, although this paper emphasizes the lack of a necessity to present features other than the basic time series, it is clear that augmentation with derived features presents further opportunities for performance tweaking. For researchers more comfortable with such feature development, this application avenue is worth exploring.

Acknowledgments The Insight Centre for Data Analytics is supported by Science Foundation Ireland under grant number SFI/12/RC/2289. This project was also partly funded by the Irish Research Council as part of a Postgraduate Enterprise Partnership Scheme with Shimmer (EPSPG/2013/574).

Conflicts of Interest None declared.

References 1. 2. 3. 4.

5.

6. 7. 8.

9. 10.

Arxiv. 2016 Mar 16. TensorFlow: large-scale machine learning on heterogeneous distributed systems URL: https://arxiv. org/abs/1603.04467 [accessed 2017-07-25] [WebCite Cache ID 6sDHItHpX] Perez A, Labrador M, Barbeau S. G-Sense: a scalable architecture for global sensing and monitoring. IEEE Network 2010 Jul;24(4):57-64. [doi: 10.1109/MNET.2010.5510920] Lara OD, Labrador MA. A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutorials 2013;15(3):1192-1209. [doi: 10.1109/SURV.2012.110112.00192] Whelan DF, O'Reilly MA, Ward TE, Delahunt E, Caulfield B. Technology in rehabilitation: evaluating the single leg squat exercise with wearable inertial measurement units. Methods Inf Med 2017 Mar 23;56(2):88-94. [doi: 10.3414/ME16-02-0002] [Medline: 27782290] Preece SJ, Goulermas JY, Kenney LP, Howard D, Meijer K, Crompton R. Activity identification using body-mounted sensors--a review of classification techniques. Physiol Meas 2009 Apr;30(4):R1-33. [doi: 10.1088/0967-3334/30/4/R01] [Medline: 19342767] Stoppa M, Chiolerio A. Wearable electronics and smart textiles: a critical review. Sensors 2014;14(7):11957-11992. [doi: 10.3390/s140711957] Kim E, Helal S, Cook D. Human activity recognition and pattern discovery. IEEE Pervasive Comput 2010;9(1):48 [FREE Full text] [doi: 10.1109/MPRV.2010.7] [Medline: 21258659] Sugawara E, Nikaido H. Properties of AdeABC and AdeIJK efflux systems of Acinetobacter baumannii compared with those of the AcrAB-TolC system of Escherichia coli. Antimicrob Agents Chemother 2014 Dec;58(12):7250-7257 [FREE Full text] [doi: 10.1128/AAC.03728-14] [Medline: 25246403] Moreno-Torres JG, Saez JA, Herrera F. Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Netw Learn Syst 2012 Aug;23(8):1304-1312. [doi: 10.1109/TNNLS.2012.2199516] [Medline: 24807526] Razavi HA, Kurfess TR. Detection of wheel and workpiece contact/release in reciprocating surface grinding. J Manuf Sci Eng 2003;125(2):394. [doi: 10.1115/1.1559160]

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.11 (page number not for citation purposes)

381

JMIR MHEALTH AND UHEALTH 11.

12. 13. 14. 15.

16. 17. 18. 19. 20. 21.

22. 23. 24.

25.

26.

27. 28.

29. 30.

31.

32.

33. 34. 35.

Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJ, Population Health Metrics Research Consortium (PHMRC). Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011 Aug 04;9:29 [FREE Full text] [doi: 10.1186/1478-7954-9-29] [Medline: 21816105] Singh KP, Basant N, Gupta S. Support vector machines in water quality management. Anal Chim Acta 2011 Oct 10;703(2):152-162. [doi: 10.1016/j.aca.2011.07.027] [Medline: 21889629] Dudani SA. The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst, Man, Cybern 1976 Apr;SMC-6(4):325-327. [doi: 10.1109/TSMC.1976.5408784] Bishop CM. Pattern Recognition and Machine Learning. New York, NY: Springer; 2006. Giggins O, Kelly D, Caulfield B. Evaluating rehabilitation exercise performance using a single inertial measurement unit. 2013 Presented at: Pervasive Computing Technologies for Healthcare; May 5-8, 2013; Venice, Italy p. 49-56. [doi: 10.4108/icst.pervasivehealth.2013.252061] Patel S, Park H, Bonato P, Chan L, Rodgers M. A review of wearable sensors and systems with application in rehabilitation. J Neuroeng Rehabil 2012;9:21 [FREE Full text] [doi: 10.1186/1743-0003-9-21] [Medline: 22520559] O'Reilly M, Whelan D, Ward T, Delahunt E, Caulfield B. Technology in S&C: tracking lower limb exercises with wearable sensors. J Strength Cond Res 2017 Feb 15:- Epub ahead of print. [doi: 10.1519/JSC.0000000000001852] [Medline: 28234711] Bassett SF. The assessment of patient adherence to physiotherapy. NZ J Physiother 2003;31(2):60-66. Nyan MN, Tay FE, Seah KH, Sitoh YY. Classification of gait patterns in the time-frequency domain. J Biomech 2006;39(14):2647-2656. [doi: 10.1016/j.jbiomech.2005.08.014] [Medline: 16212968] Tao W, Liu T, Zheng R, Feng H. Gait analysis using wearable sensors. Sensors 2012;12(2):2255-2283. [doi: 10.3390/s120202255] Mo L, Li F, Zhu Y, Huang A. Human physical activity recognition based on computer vision with deep learning model. 2016 Presented at: Instrumentation and Measurement Technology Conference; May 23-26, 2016; Taipei, Taiwan p. 1-6. [doi: 10.1109/I2MTC.2016.7520541] Stefic D, Patras I. Action recognition using saliency learned from recorded human gaze. Image Vis Comput 2016 Aug;52:195-205. [doi: 10.1016/j.imavis.2016.06.006] Ma M, Fan H, Kitani KM. Going deeper into first-person activity recognition. 2016 Presented at: IEEE Conference on Computer Vision and Pattern Recognition; June 27-30, 2016; Seattle, WA p. 1894-1903. [doi: 10.1109/CVPR.2016.209] Zeng M, Nguyen LT, Yu B, Mengshoel OJ, Zhu J, Wu P, et al. Convolutional neural networks for human activity recognition using mobile sensors. 2014 Presented at: 6th International Conference on Mobile Computing, Applications and Services; November 6-7, 2014; Austin, Texas p. 197-205. [doi: 10.4108/icst.mobicase.2014.257786] Yang, J., Nguyen, M.N., San, P.P., Li, X. and Krishnaswamy, S., Yang JB, Nguyen MN, San PP, Li XL, Krishnaswamy S. Deep convolutional neural networks on multichannel time series for human activity recognition. 2015 Presented at: Proceedings of the 24th International Conference on Artificial Intelligence; July 25-31, 2015; Buenos Aires, Argentina p. 3995-4001. Radu V, Lane ND, Bhattacharya S, Mascolo C, Marina M, Kawsar F. Towards multimodal deep learning for activity recognition on mobile devices. 2016 Presented at: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct; September 12-16, 2016; Heidelberg, Germany p. 185-188. [doi: 10.1145/2968219.2971461] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015 May 28;521(7553):436-444. [doi: 10.1038/nature14539] [Medline: 26017442] Cruz-Roa AA, Arevalo OJ, Madabhushi A, González OF. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med Image Comput Comput Assist Interv 2013;16(Pt 2):403-410. [Medline: 24579166] Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S. Arxiv. 2014. CNN features off-the-shelf: an astounding baseline for recognition URL: https://arxiv.org/abs/1403.6382 [accessed 2017-07-25] [WebCite Cache ID 6sDKO67Go] Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? 2014 Presented at: Proceedings of the 27th International Conference on Neural Information Processing Systems; December 08-13, 2014; Montreal, Canada p. 3320-3328. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, et al. Caffe: convolutional architecture for fast feature embedding. 2014 Presented at: Proceedings of the 22nd ACM international conference on Multimedia; November 03-07, 2014; Orlando, FL p. 675-678. [doi: 10.1109/72.279181] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF. ImageNet: a large-scale hierarchical image database. 2009 Presented at: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009); June 20-25, 2009; Miami, FL. [doi: 10.1109/CVPR.2009.5206848] Tensorflow. TensofFlow image retraining example URL: https://www.tensorflow.org/how_tos/image_retraining/ [accessed 2017-02-07] [WebCite Cache ID 6o6I1SMkw] Jerri A. The Shannon sampling theorem - its various extensions and applications: a tutorial review. Proc IEEE 1977 Nov;65(11):1565-1596. [doi: 10.1109/PROC.1977.10771] Earle RW, Baechle TR. NSCA's Essentials of Personal Training. Champaign, IL: Human Kinetics; 2004.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

Dominguez Veiga et al

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.12 (page number not for citation purposes)

382

JMIR MHEALTH AND UHEALTH 36. 37. 38. 39. 40.

Dominguez Veiga et al

Whatman C, Hing W, Hume P. Physiotherapist agreement when visually rating movement quality during lower extremity functional screening tests. Phys Ther Sport 2012 May;13(2):87-96. [doi: 10.1016/j.ptsp.2011.07.001] [Medline: 22498149] Myer GD, Ford KR, Hewett TE. Tuck jump assessment for reducing anterior cruciate ligament injury risk. Athl Ther Today 2008 Sep 01;13(5):39-44 [FREE Full text] [Medline: 19936042] Docker. Tensorflow docker conatiner URL: https://hub.docker.com/r/tensorflow/tensorflow/ [accessed 2017-02-07] [WebCite Cache ID 6o6I63rVr] Github. Image retraining script URL: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/ image_retraining/retrain.py [accessed 2017-02-07] [WebCite Cache ID 6o6IAPV4K] Github. Image classification script URL: https://github.com/tensorflow/models/blob/master/tutorials/image/imagenet/ classify_image.py [accessed 2017-02-07] [WebCite Cache ID 6o6IE43KZ]

Abbreviations 3-D: three-dimensional CNN: convolutional neural network CPU: central processing unit DSP: digital signal processing ED: exercise detection GPU: graphics processing unit HAR: human activity recognition IMU: inertial measurement unit SD: standard deviation

Edited by G Eysenbach; submitted 16.02.17; peer-reviewed by L Mo, G Norman; comments to author 30.03.17; revised version received 10.05.17; accepted 31.05.17; published 04.08.17 Please cite as: Dominguez Veiga JJ, O'Reilly M, Whelan D, Caulfield B, Ward TE Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation JMIR Mhealth Uhealth 2017;5(8):e115 URL: http://mhealth.jmir.org/2017/8/e115/ doi:10.2196/mhealth.7521 PMID:28778851

©Jose Juan Dominguez Veiga, Martin O'Reilly, Darragh Whelan, Brian Caulfield, Tomas E Ward. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 04.08.2017. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.

http://mhealth.jmir.org/2017/8/e115/

XSL• FO RenderX

JMIR Mhealth Uhealth 2017 | vol. 5 | iss. 8 | e115 | p.13 (page number not for citation purposes)

383

Appendix D

Conference Publications

384

Evaluating Squat Performance with a Single Inertial Measurement Unit Martin O’Reilly*†‡ , Darragh Whelan*†‡ , Charalampos Chanialidis† , Nial Friel† , Eamonn Delahunt ‡ , Tom´as Ward† and Brian Caulfield†‡ † Insight

Centre for Data Analytics of Public Health, Physiotherapy and Population Science University College Dublin Email: [email protected], [email protected] *Joint lead authors ‡ School

Abstract—Inertial measurement units (IMUs) may be used during exercise performance to assess form and technique. To maximise practicality and minimise cost a single-sensor system is most desirable. This study sought to investigate whether a single lumbar-worn IMU is capable of identifying seven commonly observed squatting deviations. Twenty-two volunteers (18 males, 4 females, age: 26.09±3.98 years, height: 1.75±0.14m, body mass: 75.2±14.2 kg) performed the squat exercise correctly and with 7 induced deviations. IMU signal features were extracted for each condition. Statistical analysis and leave one subject out classifier evaluation were used to assess the ability of a single sensor to evaluate performance. Binary level classification was able to distinguish between correct and incorrect squatting performance with a sensitivity of 64.41%, specificity of 88.01% and accuracy of 80.45%. Multi-label classification was able to distinguish between specific squat deviations with a sensitivity of 59.65%, specificity of 94.84% and accuracy of 56.55%. These results indicate that a single IMU can successfully discriminate between squatting deviations. A larger data set must be collected and more complex classification techniques developed in order to create a more robust exercise analysis IMU-based system.

I.

I NTRODUCTION

Incorrect exercise performance (i.e. faulty exercise form and technique) may result in ineffective training, inadequate rehabilitation, as well as increasing the likelihood of training induced injuries. This is especially pertinent for athletes who train with free-weights [1]. Training induced injuries are frequently caused by excessive tissue loading as a result of aberrant exercise form and technique [2]. Therefore, feedback on exercise performance is an important consideration to ensure that athletes perform prescribed exercises correctly. Traditionally this feedback has been provided onsite by professional strength and conditioning (S&C) coaches or rehabilitation staff. However, such direct supervision and individualized feedback on exercise performance is not always a possibility, as is the situation when a large number of athletes are training simultaneously [3]. Furthermore, it has also been challenging to provide objective exercise performance data to athletes in this environment with most assessments being subjective in nature. To date marker-based motion analysis systems have been used to provide objective data relative to exercise performance [4]. However, there are a number of limitations with such an approach; set-up is time intensive, the equipment is expensive and the application of markers may hinder normal athletic

movement [2], [5]. Furthermore, this type of analysis is typically performed in specialised research or commercial motion analysis laboratories. These environments may artificially constrict, simplify or influence the movement patterns of those being tested [6]. Therefore, these marker-based systems have not tended to be accepted into routine practice. Recent technological advances support the use of inertial measurement units (IMUs) as a viable option for the assessment and quantification of exercise performance beyond the motion analysis laboratory [2]. These IMUs offer a number of potential advantages over traditional marker-based systems; they are small, inexpensive, easy to set-up and enable the assessment of human movement in an unconstrained environment [7]. Accelerometers and gyroscopes are becoming an increasingly popular method of assessing and quantifying human movement as they are present in many smartphones. This means that these ubiquitous technologies may have the potential to measure human movement and provide feedback relative to the quality of the movement performed [8]. IMUs have been used in a number of different ways from measuring energy expenditure [9] to gait analysis [10] to medical monitoring [11]. These sensors have also been used in the athletic arena in sports such as skiing [12] and golf [13]. Recently the utilization of IMUs as a method of tracking gym and rehabilitation exercises has been investigated. Lin and colleagues [14] evaluated data obtained from IMUs at the hip, knee and ankle during a number of lower limb exercises. Data from the IMUs were used to estimate joint angles; with the authors comparing the IMU derived joint angles to those quantified via a marker-based motion analysis capture system. The authors concluded that these joint angles were accurate when compared to those obtained via the more traditional methodology. However, the quality of the exercise performance was not classified. Pernek and colleagues [8] used accelerometers to assess exercise performance during gym-based resistance type exercises. They assessed movement quality based on the speed of exercise performance. However, different exercise goals may require varying movement speeds and as such, the assessment of movement quality based on speed alone does not offer a holistic way of evaluating exercise technique. Taylor and colleagues [15] attempted to more accurately evaluate exercise performance using IMUs. Five body worn

978-1-4673-7201-5/15/$31.00 ©2015 IEEE

385

accelerometers were used to evaluate three lower limb single joint exercises (standing hamstring curl, straight leg raise and reverse hip abduction) in healthy college students. The authors were able to discriminate correct from incorrect exercise performance, with their subsequently developed exercise classifier exhibiting an overall average accuracy of 80% for standing hamstring curl, 65% for reverse hip abduction and 62% for straight leg raise. These results were based on leave-onesubject-out cross-validation (LOSOCV) testing. However, they only recorded data from nine participants and the use of a nonexpert in labelling correct or incorrect exercise performance was a methodological limitation. The same authors built on this work in 2012 [16] and evaluated the use of multi-label classifiers to assess exercise performance in patients with knee osteoarthritis using five IMUs. On this occasion each IMU contained a tri-axial gyroscope as well as an accelerometer. Again their classifiers displayed high accuracy (86%), sensitivity (84%) and specificity (99%) in detecting errors that can occur during the performance of the exercises investigated. However, the overall results and their wider extrapolation are limited by the small participant sample size (n = 8). Furthermore, the exercises utilized were all single joint exercises (standing hamstring curl and straight leg raise) and the number of sensors used may not always be practical. While these exercises may be used in a clinical population during the early stage of rehabilitation they are likely to be inadequate as the rehabilitation progresses or for higher-level conditioning. Velloso and colleagues [3] have also attempted to evaluate the quality of exercises using IMUs. They defined exercise quality as “the adherence to the execution of an activity to its specification”. They evaluated two upper limb single joint exercises (biceps curl and lateral raise). Using a leave-onesubject-out testing protocol they obtained an overall recognition performance of 78.2%. The authors also reported that participants responded favourably to feedback that aided with the correct completion of the exercises. A recent study by Giggins et al [7] suggested that a single IMU may be used to identify poor technique in five of seven single joint exercises investigated (heel slide, straight leg raise, knee extension, hip abduction and hip extension). However, these results were based solely on statistical analysis with the absence of classifier evaluation. A follow up study by the same authors [17] showed that a single IMU worn on the thigh could achieve on average 82% sensitivity, 72% specificity and 83% accuracy in binary classification across the seven exercises and 49% sensitivity, 77% specificity and 61% accuracy in multi-class classification across a subset of four of the exercises. These results were based on LOSOCV testing. A number of studies have demonstrated the viability of multiple IMUs to assess and quantify exercise performance [8], [14]. More recent research has also shown that it may be possible to evaluate these exercises more comprehensively [3], [15], [16], and possibly with a single IMU [7], [17]. This study differs from previous work in the field as it aims to evaluate if a single body-worn IMU is capable of distinguishing between seven levels of performance in a compound exercise (i.e. body weight squat). This may have the potential for applications in the areas of injury screening, S&C and rehabilitation.

II.

M ETHODS

This study was undertaken to determine if a single IMU can discriminate between different levels of squat performance and identify poor exercise technique. Data were acquired from participants as they completed the squat with normal technique for 10 repetitions. IMU data were then acquired while the same exercise was completed for three repetitions with commonly observed deviations from correct technique. A. Participants Twenty two healthy volunteers (18 males, 4 females, age: 26.09±3.98 years, height: 1.75±0.14m, body mass: 75.2±14.2kg) were recruited for the study. No participant had a current or recent musculoskeletal injury that would impair their squat performance. All participants had prior experience with the squat exercise and regularly used it as part of their own training regime for at least one year. Each participant signed a consent form prior to completing the study. The University Human Research Ethics Committee approved the study protocol. B. Exercise Technique and Deviations Participants completed the initial squat with good form as described by the National Strength and Conditioning Association (NSCA) guidelines [p.320-322] [18]. This involved participants holding their chest up and out with the head tilted slightly up. As participants moved down into the squat position they were instructed to allow their hips and knees to flex while keeping their torso to floor angle relatively constant. Furthermore, they were required to keep their heels on the floor and knees aligned over their feet. Participants were required to continue flexing at the hips and knees until their thighs were parallel to the floor. As they moved upward a flat back was to be maintained and they were instructed to keep their chest up and out. Hips and knees were to be extended at the same rate with heels on floor and knees aligned over feet. Participants then extended their hips and knees to reach starting position. The deviations from the aforementioned correct technique that were completed were knee valgus (KVL), knee varus (KVR), weight shift right (WSR), weight shift left (WSL), knees to far forward (KTF), heels elevated (HE) and bent over (BO). These are outlined in table 1. TABLE I: List and description of squat exercise performance. Deviation N KVL KVR WSR WSL KTF HE BO

Explanation Normal squat Knees coming together during downward phase Knees coming apart during downward phase Excessive lean to right hand side during entire squat exercise Excessive lean to left hand side during entire squat exercise Knees ahead of toes during downward phase Heels off ground during entire squat exercise Excessive flexion of hip and torso during entire squat exercise

978-1-4673-7201-5/15/$31.00 ©2015 IEEE

386

C. Experimental Protocol A pilot study was used to determine an appropriate sampling rate and the ranges for the accelerometer and gyroscope on board the IMU (SHIMMER, Shimmer research, Dublin, Ireland). In the pilot study squat data was collected at 512Hz. A Fourier transform was then used to detect the characteristic frequencies of the signal which were all found to be less than 20Hz. Therefore, a sampling rate of 51.2Hz was deemed appropriate for this study based upon the Nyquist criterion. The Shimmer IMU was configured to stream tri-axial accelerometer (±16G), gyroscope (±500o /s) and magnetometer (±1Ga) data with the sensor ranges chosen also based upon data from the pilot study. The IMU was calibrated for these specific sensor ranges using the Shimmer 9DoF Calibration. When participants arrived to the laboratory the testing protocol was explained to them. Following this they completed a ten minute warm-up on an exercise bike maintaining a power output of 100W at 75-85 revolutions per minute. Next the IMU was secured on the participant at the level of the 5th lumbar vertebra using an elasticated strap. This sensor placement was selected based on clinical judgement as to the location that would most likely identify deviations and is shown below in Figure 1. The orientation and location of the IMU was consistent for all study participants. Participants were then instructed on how to complete the squat with good form and biomechanical alignment as outlined in the NSCA guidelines as explained in section B. They completed ten repetitions with this good form. Once the squat had been completed with normal technique the participant was instructed to complete the exercise with the deviations specified in table 1. They completed three repetitions of each deviation. Verbal instructions and a demonstration were provided to all participants and they were allowed a trial to ensure they were comfortable completing the deviations. All squats were completed using body weight only. A Chartered Physiotherapist was present throughout all data collection to ensure the squat had been completed as instructed. D. Data Analysis Data were low-pass filtered at fc =20 Hz using a Butterworth filter of order n=8 in order to remove high frequency noise and ensure all data analysed related to each participants movement as confirmed using the Fourier transform during the pilot study. For each repetition of the exercise a total of fifteen features were extracted from the IMU to allow for statistical analysis. These were maximum, minimum and range of the acceleration (accel) signals in X, Y and Z planes and maximum and minimum angular velocity (gyro) in X, Y and Z planes. Initially a repeated measures t test was considered as an appropriate comparison between the eight squat conditions. However, it was shown using a normal quantiles plot that the difference between the means of any two conditions did not follow the Gaussian distribution (Figure 2) and thus the data is not normally distributed. Therefore, the non-parametric pairwise Wilcoxon signed-rank test was used to analyse whether there was a difference in the IMU parameters between the various squat techniques. A P value

Suggest Documents