An Upper Extremity Rehabilitation System Using ...

applied sciences Article

An Upper Extremity Rehabilitation System Using Efficient Vision-Based Action Identification Techniques Yen-Lin Chen 1 1

2

*

ID

, Chin-Hsuan Liu 1

ID

, Chao-Wei Yu 1 , Posen Lee 2, * and Yao-Wen Kuo 1

Department Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan; [email protected] (Y.-L.C.); [email protected] (C.-H.L.); [email protected] (C.-W.Y.); [email protected] (Y.-W.K.) Department Occupational Therapy, I-Shou University, Kaohsiung 82445, Taiwan Correspondence: [email protected]; Tel.: +886-7-657-7711 (ext. 7516)

Received: 30 May 2018; Accepted: 10 July 2018; Published: 17 July 2018

Featured Application: This study proposes an upper extremity rehabilitation system using efficient action identification system for home based on color and depth sensor information, and can perform well under complex ambient environments. Abstract: This study proposes an action identification system for home upper extremity rehabilitation. In the proposed system, we apply an RGB-depth (color-depth) sensor to capture the image sequences of the patient’s upper extremity actions to identify its movements. We apply a skin color detection technique to assist with extremity identification and to build up the upper extremity skeleton points. We use the dynamic time warping algorithm to determine the rehabilitation actions. The system presented herein builds up upper extremity skeleton points rapidly. Through the upper extremity of the human skeleton and human skin color information, the upper extremity skeleton points are effectively established by the proposed system, and the rehabilitation actions of patients are identified by a dynamic time warping algorithm. Thus, the proposed system can achieve a high recognition rate of 98% for the defined rehabilitation actions for the various muscles. Moreover, the computational speed of the proposed system can reach 125 frames per second—the processing time per frame is less than 8 ms on a personal computer platform. This computational efficiency allows efficient extensibility for future developments to deal with complex ambient environments and for implementation in embedded and pervasive systems. The major contributions of the study are: (1) the proposed system is not only a physical exercise game, but also a movement training program for specific muscle groups; (2) The hardware of upper extremity rehabilitation system included a personal computer with personal computer and a depth camera. These are economic equipment, so that patients who need this system can set up one set at home; (3) patients can perform rehabilitation actions in sitting position to prevent him/her from falling down during training; (4) the accuracy rate of identifying rehabilitation action is as high as 98%, which is sufficient for distinguishing between correct and wrong action when performing specific action trainings; (5) The proposed upper extremity rehabilitation system is real-time, efficient to vision-based action identification, and low-cost hardware and software, which is affordable for most families. Keywords: upper extremity identification; color and depth sensors; skeleton points; rehabilitation actions; home rehabilitation; computer vision

Appl. Sci. 2018, 8, 1161; doi:10.3390/app8071161

www.mdpi.com/journal/applsci

Appl. Sci. 2018, 8, 1161

2 of 22

1. Introduction Telemedicine and home-care systems have become the trend because of the integration of technology into the practice of medicine [1–5]. Some medical care can be applied at home using some simple system. If technology products are easy to operate, then patients or caregivers can conveniently perform daily self-care activities at home by themselves. This can improve the time spent on patient care, the quality of care, and the therapeutic benefits of care. Telemedicine and home care systems can also reduce the costs and time associated with transportation between the hospital and home for follow-up care or treatment [6–9]. In rehabilitation, the person receiving treatment exhibits impaired mobility. Treatment typically takes a long time to achieve a positive effect, and if the treatment is stopped or interrupted, then a functional decline or reversal of progress may occur [10,11]. Therefore, rehabilitation is a very lengthy process that takes a heavy psychological, physical, and economic toll on the patients and families of patients [12]. If an efficient rehabilitation system can be designed for use at home that helps patients to perform the movements that they must perform repeatedly every day to maintain their mobility and physical function, then with such a system, transportation challenges and costs are eliminated. Moreover, infection risks for patients with weakened immune systems can be avoided because frequent hospital stays are eliminated too. A home-based rehabilitation program can also make rehabilitation more flexible and enables more frequent exercise. In sum, transportation burdens are reduced, families have more time, and exercises can be more frequently performed. Rehabilitation involves the use of repetitive movements for maintaining or improving physical or motor functions [10]. After receiving a professional evaluation and a recommended exercise regimen to perform at home, a patient may only need to be observed and recorded by a rehabilitation supporting instrument during the basic training sections performed at home. A professional demonstrates an action or use of an instrument, and the patient can then operate a quality monitoring system or review visual feedback provided by a home rehabilitation system for quality self-monitoring of the motions performed during the daily rehabilitation training at home [13]. Some somatosensory games claim that they can achieve the effects of sports and entertainment, and some of these have been used for physical training and physical rehabilitation [14–16]. However, these games are usually designed to involve whole body activities, and most require a standing position [17–22] often in front of a camera at a distance of at least 100 cm. Beside most of those somatosensory games and human pose estimation are focus on whole body pose discrimination [22] neither pay attention on changes of range of motion of joints nor think about how muscles works in those poses. Nowadays, there are many human pose estimation systems that extract skeletons or skeleton points from depth sensors [18–23]. However, determining the joint movements (include directions and angles) are necessary for rehabilitation applications. In [18–20], they derive a unified gaussian kernel correlation (GKC) representation and develop an articulated GKC and articulated pose estimation for both the full body and the hands. That study can achieve effective human pose estimation and tracking, but may have limitations in single joint movement determination, such as shoulder rotation, wrist pronation. In [21], the input depth map is extracted to match with a set of pre-captured motion exemplars to generate a body configuration estimation, as well as semantic labeling of the input point cloud, but as we know that body figure can be shown or defined as a point cloud but the real body skeletons is segmental, each movement comes from the angle change of the key joint rather than each point cloud of the body. In [23], the system grabs a color image as input and extract the 2D locations of anatomical keypoints, and then applies an architecture for jointly learning parts detection and parts association. The pros of [23] is 2D pose estimation of multiple people in images, that can be applied in social context identification assisting system for Autism or security monitoring system for home or public spaces, but may not be sufficiently accurate for medical assessment. In our study, we extract both color and depth information from RGB-D sensors, and rebuild a new upper extremity skeleton and skeleton points. To provide an efficient rehabilitation system that can assist the patients to train

Appl. Sci. 2018, 8, 1161

3 of 22

specific joints and muscles, we need to obtain not only the pose changes, but also to find out these changes coming from which joints in real human body skeletons. Although there are many pose estimation systems and somatosensory games have been developed and presented, most of those existent systems are performed based on extracting skeletons or skeleton points from a depth sensor and mainly focus on providing human pose estimation, rather than determining the joint movements (include directions and angles). In this study, we desire to obtain not only the pose changed, but also try to find out these changes come from which joint in real human body skeleton. Most of those systems and game sets may not focus on training some specific actions for the purpose of medical rehabilitation, have insufficient evidence base on medical applications, and might be unsuitable for someone who are unable to stand or stand for long, lack of standing balance, or only need to rehabilitate their upper extremity, such as patients with spinal cord injuries, hemiplegia, or advanced age. To overcome the aforementioned challenges, this study proposes an action identification system for home upper extremity rehabilitation, which is a movement training program for specific joints movement and muscle groups, and have advantages such as cost-economic, suitable for sitting position rehabilitations, and easy for operating at home. The major contributions of the study are: (1) the proposed system is not only a physical exercise game, but also a movement training program for specific muscle groups; (2) The hardware of upper Extremity Rehabilitation System included a personal computer with personal computer and a depth camera. These are economic equipment, so that patients who need this system can set up one set at home; (3) patients can perform rehabilitation actions in sitting position to prevent him/her from falling down during training; (4) the accuracy rate of identifying rehabilitation action is as high as 98%, which is sufficient for distinguishing between correct and wrong action when performing specific action trainings; (5) The proposed upper extremity rehabilitation system is real-time, efficient to vision-based action identification, and low-cost hardware and software, which is affordable for most families. 2. The Proposed Action Identification System for Home Upper Extremity Rehabilitation In the proposed system, we apply an RGB-depth sensor to capture the image sequences of the patient’s upper extremity actions to identify its movements, and a skin color detection technique is used to assist with extremity identification and to build up the upper extremity skeleton points. A dynamic Appl. time algorithm is used to determine the rehabilitation actions (Figure 1). Sci.warping 2018, 8, x 4 of 22

Figure 1. System block diagram of the proposed upper extremity rehabilitation system.

Figure 1. System block diagram of the proposed upper extremity rehabilitation system. 2.1. Experimental Environment We suppose the user is sitting in front of the table to use this system. If the patient performs rehabilitation actions in a sitting position, the patient saves energy compared with standing, and the patient can focus his or her mind on the motion control training of the upper extremities. Moreover, if patients have limited balance abilities, they can sit at a table or desk, and the table or desk can provide support and prevent falls. The proposed system is set up similarly to a study desk or

Appl. Sci. 2018, 8, 1161

4 of 22

Figure 1. System block diagram of the proposed upper extremity rehabilitation system.

2.1. Experimental Environment We suppose the user is sitting in front of of the the table table to to use use this this system. system. If the patient performs rehabilitation actions in a sitting position, the patient saves energy compared with standing, and the training of of the the upper upper extremities. extremities. Moreover, Moreover, patient can focus his or her mind on the motion control training abilities, they cancan sit atsita table or desk, and the table deskor candesk provide if patients patientshave havelimited limitedbalance balance abilities, they at a table or desk, and theortable can support and prevent The proposed is set system up similarly a study desk to or computer table or at provide support andfalls. prevent falls. Thesystem proposed is settoup similarly a study desk home, which provides a familiar and convenient environment. Caregivers are not required to prepare computer table at home, which provides a familiar and convenient environment. Caregivers are not an additional space toan setadditional up the rehabilitation system, and the patient can engage in rehabilitation in required to prepare space to set up the rehabilitation system, and the patient can a small space (Figure 2). engage in rehabilitation in a small space (Figure 2).

Figure Figure 2. 2. Conceptual Conceptual lab lab setup. setup.

The hardware of the upper extremity rehabilitation system includes a personal computer and a Kinect camera. The first step of the vision-based action identification technique is to identify the bones of the patient. Many methods exist to accomplish this task. In this study, we suppose that the user’s sitting posture is similar to that of sitting at a table. In this situation, the bones of their upper extremities can be identified accordingly. Second, through the human skin detection approach, skeleton joint points can be determined as well. After the human skeletal structure is established, motion determination is conducted. Through the motion determination process, we can determine what type of motion the user just performed, and then inform the patient or caregiver whether the rehabilitation action was correct or not. The main purpose of these processes is to create a home rehabilitation system that can provide efficient upper extremity training programs for patients and their caregivers. 2.2. The Proposed Methods 2.2.1. Depth/RGB Image Sensor A home rehabilitation system should be easy to use and low cost; thus, we adopted the Microsoft Kinect RGB-depth (D) sensor to capture salient features. The Kinect depth camera and color camera have some differences, and some distortions occur between RGB color features and depth features extracted from the RGB-D sensors. Therefore, a calibration process to adjust the color and depth features of images is necessary to achieve a coherent image.

2.2.1. Depth/RGB Image Sensor A home rehabilitation system should be easy to use and low cost; thus, we adopted the Microsoft Kinect RGB-depth (D) sensor to capture salient features. The Kinect depth camera and color camera Appl. Sci. 2018, 8,have 1161 some differences, and some distortions occur between RGB color features5 and of 22 depth features extracted from the RGB-D sensors. Therefore, a calibration process to adjust the color and depth features of images is necessary to achieve a coherent image. Figure 33indicates indicatesthe themanner mannerinin which angle of the depth camera andangle the angle the Figure which thethe angle of the depth camera and the of theof color color camera should be adjusted to calibrate depth information and color information to achieve camera should be adjusted to calibrate depth information and color information to achieve a a coherent image. coherent image.

(a)

(b)

Figure 3. Calibration Calibrationprocess process to adjust the color and features depth features to be coherent: (a) before Figure 3. to adjust the color and depth to be coherent: (a) before adjustment adjustment and (b) after adjustment. and (b) after adjustment.

2.2.2. 2.2.2. Skeletonizing Skeletonizing In this study, study,we wedetected detected skeletonized the patient’s extremities using OpenNI In this andand skeletonized the patient’s upperupper extremities using OpenNI [24,25] [24,25] techniques. OpenNI the human skeleton immediately at adistance, short distance, and techniques. OpenNI cannot cannot identifyidentify the human skeleton immediately at a short and human human skeleton detection is difficult in a sitting position. Thus, we extracted the contours of the skeleton detection is difficult in a sitting position. Thus, we extracted the contours of the image, image, and then determined the upper humanbody upper body bones from these contours, thereby identifying and then determined the human bones from these contours, thereby identifying the joints the joints of the upper body. This study did not directly adopt the skeletal determination ofNI; Open NI; of the upper body. This study did not directly adopt the skeletal determination of Open rather, rather, we applied the distance transform process for the body contour image in preprocessing we applied the distance transform process for the body contour image in preprocessing [26,27]. [26,27]. • RGB to Gray color transform • RGB to Gray color transform If the background and color of the characters are too similar, mistakes can be made in the If the background andbody colorand of errors the characters are too similar, mistakesofcan be made in the identification of the human can be made in the establishment skeletal joint points. identification the human body and errors cantoberemove made in establishment of skeletal joint points. Therefore, theof image is converted to gray-scale thethe background. Therefore, the image is converted to gray-scale to remove the background. • Distance Transform A distance transform, also known as a distance map or distance field, is a derived representation of a digital image. The map labels each pixel of the image with the distance to the nearest obstacle pixel. The most common type of obstacle pixel is a boundary pixel in a binary image. One technique that may be used in a wide variety of applications is the distance transform or Euclidean distance map [28,29]. The distance transform method is an approximate Euclidean distance map. The distance transform of the image labels each object pixel of the binary image with the distance between that pixel and the nearest background pixel. For example, if the binary image has a pixel it is 1, and if there is no pixel it is 0 (Figure 4a). After distance transform, the farther away from the pixel value of 0, the greater the result that Euclidean distance will obtain. The pixel value of the center will change from 1 to 3 as depicted in Figure 4b. Thus, the distance transform features can highlight the outline of a skeleton frame (Figure 5).

distance map. The distance transform of the image labels each object pixel of the binary image with distance map. The distance transform of the image labels each object pixel of the binary image with the distance between that pixel and the nearest background pixel. For example, if the binary image the distance between that pixel and the nearest background pixel. For example, if the binary image has a pixel it is 1, and if there is no pixel it is 0 (Figure 3a). After distance transform, the farther away has a pixel it is 1, and if there is no pixel it is 0 (Figure 4a). After distance transform, the farther away from the pixel value of 0, the greater the result that Euclidean distance will obtain. The pixel value of from the pixel value of 0, the greater the result that Euclidean distance will obtain. The pixel value of the center will change from 1 to 3 as depicted in Figure 3b. Thus, the distance transform features can Appl. Sci. 2018, 8, 1161 6 ofcan 22 the center will change from 1 to 3 as depicted in Figure 4b. Thus, the distance transform features highlight the outline of a skeleton frame (Figure 4). highlight the outline of a skeleton frame (Figure 5).

(a) (b) (a) (b) Figure 3. Distance transform process (a)binary binary Imageand and distance transform. Figure 4. Distance transform processusing usingEuclidean Euclidean distance: distance: (b)(b) distance transform. Figure 4. Distance transform process using Euclidean distance:(a) (a) binaryImage Image and (b) distance transform.

(a) (a)

(b) (b)

Figure 4. The distance transform process to highlight the outline of a skeleton frame: (a) before Figure 5.5.The Thedistance distance transform process to highlight the outline of a skeleton frame: before Figure transform process to highlight the outline of a skeleton frame: (a) before(a) distance distance transform and (b) after distance transform. distance transform and (b) after distance transform. transform and (b) after distance transform.

• Gaussian Blur 7 × 7 • Gaussian Blur 7 × 7 • Gaussian Blur 7 × 7 The Gaussian smoothing method is applied to reduce the noisy features. After the Gaussian The Gaussian smoothing method is applied to reduce the noisy features. After the Gaussian The Gaussian method is applied reduce the noisy features. After the Gaussian smoothing process smoothing is performed, the edges of the to image become blurred, and that reduces pepper smoothing process is performed, the edges of the image become blurred, and that reduces pepper smoothing process is performed, the edges of the image become blurred, and that reduces pepper and salt noises simultaneously. The Gaussian blur convolution kernel sizes mostly range from 3and ×3 and salt noises simultaneously. The Gaussian blur convolution kernel sizes mostly range from 3 × 3 salt9 noises simultaneously. Gaussian blur convolution mostly from 3 × 3for to to × 9. This study adopted The the convolution kernel size 7 × 7,kernel whichsizes could obtainrange the best results to 9 × 9. This study adopted the convolution kernel size 7 × 7, which could obtain the best results for 9 × overall 9. This skeleton study adopted convolution kernel 7 × 7, which could5). obtain the best results for the the frame the building process in oursize experiments (Figure the overall skeleton frame building process in our experiments (Figure 6). overall Appl. Sci. skeleton 2018, 8, x frame building process in our experiments (Figure 6). 7 of 22

(a)

(b)

Figure 5. The Theprocess process to highlight the outline of a skeleton frame: before 7Gaussian 7 × 7 Figure 6. to highlight the outline of a skeleton frame: (a) before(a)Gaussian × 7 smoothing smoothing and (b) after Gaussian 7 × 7 smoothing. and (b) after Gaussian 7 × 7 smoothing.

•

Convolution Filtering

Next, to strengthen the skeleton features, we used directional convolution filters to enhance the features. This study used the 5 × 5-sized convolution kernel because it is faster than convolution kernel 7 × 7 and clearer than convolution kernel 3 × 3 to highlight the features of an image of the human body (Figure 6).

(a)(a)

(b) (b)

Figure Theprocess processtotohighlight highlightthe theoutline outlineofofa askeleton skeletonframe: frame:(a)(a)before beforeGaussian Gaussian7 7× ×77 7of 22 Figure 5. 6. The Appl. Sci. 2018, 8, 1161 smoothing and after Gaussian 7 smoothing. smoothing and (b)(b) after Gaussian 7 ×7 7× smoothing.

•• • Convolution Filtering Convolution Filtering Convolution Filtering Next, to strengthen the skeleton features, we used directional convolution filters to enhance the Next, strengthen the skeleton features, we used directional convolution filters enhance the Next, toto strengthen the skeleton features, we used directional convolution filters toto enhance the features. This study used the 5 × 5-sized convolution kernel because it is faster than convolution features.This Thisstudy studyused usedthe the5 5× ×5-sized 5-sizedconvolution convolutionkernel kernelbecause becauseit itisisfaster fasterthan thanconvolution convolution features. kernel 77 × and totohighlight highlight the features of the kernel 7× ×77 7and andclearer clearerthan thanconvolution convolutionkernel kernel33 3× highlightthe thefeatures featuresof image the kernel clearer than convolution kernel × ×333to ofofan ananimage image ofof the human body (Figure 7). human body (Figure 7). human body (Figure 6).

(a)(a)

(b) (b)

(c)(c)

Figure The effects various convolution kernel processes to highlight the features image Figure 6. 7. The effects of various convolution kernel processes toto highlight the features ofof anan image ofof Figure 7. The effects ofof various convolution kernel processes highlight the features of an image the human body: (a) 3 × 3 convolution kernel, (b) 5 × 5 convolution kernel, and (c) 7 × 7 convolution thethe human body: (a) 3(a) × 33convolution kernel,kernel, (b) 5 × 5(b) convolution kernel, and (c) 7 ×and 7 convolution of human body: × 3 convolution 5 × 5 convolution kernel, (c) 7 × 7 kernel. kernel. convolution kernel.

Throughthe theconvolution convolutionfiltering filteringprocess, process,the theareas areasthat thatrequired requiredhandling handlingwere werehighlighted. highlighted. Through Through the convolution filtering process, the areas that required handling were highlighted. We adopted four directions of convolution filtering. Through the four-directional convolution Weadopted adoptedfour four directions of convolution filtering. Through the four-directional convolution We directions of convolution filtering. Through the four-directional convolution filtering, filtering,the thecontour contourpixels pixelsinineach eachofofthe thefour fourdirections directionsare arestrengthened. strengthened.The Theresult resulthighlights highlights filtering, the contour pixels in each of the four directions are strengthened. The result highlights pixels in each pixelsinineach each singledirection. direction.Take Take0 0degree degreeterms termsfor forananexample, example,asasdepicted depictedininFigure Figure7.8.We We pixels single direction.single Take 0 degree terms for an example, as depicted in Figure 8. We compute the maximal compute the maximal values from the four-directional convolution filtering results as the feature compute thethe maximal values from the four-directional convolution filtering as the feature values from four-directional convolution filtering results as the feature valuesresults of the corresponding valuesofofthe thecorresponding correspondingfeature featurepoints pointsininthe thefollowing followingprocess. process. values feature points in the following process.

Figure Four directions convolution filtering. Figure 7. 8. Four directions ofof convolution filtering. Figure 8. Four directions of convolution filtering.

• Binarization We used a binarization process as presented in Equation (1) to exclude nonskeletal features. First, we examined the image from the bottom left corner of the origin (0, 0) to the top right corner of the screen (255, 255) to determine if pixels were present. To determine whether a pixel is skeleton or noise, the cut off threshold is set at 6. ( ) 0, output Img( x, y) < 6 output Img( x, y) = (1) 255, output Img( x, y) > 6

First, we examined the image from the bottom left corner of the origin (0, 0) to the top right corner of the screen (255, 255) to determine if pixels were present. To determine whether a pixel is skeleton or noise, the cut off threshold is set at 6. Appl. Sci. 2018, 8, 1161

 0,output Img(x, y) < 6  output Img(x, y) =   255,outputImg(x, y) > 6

8 of 22

(1)

Following the the steps steps of Gaussian blur, blur, and Following of distance distance transform, transform, Gaussian and convolution convolution filtering, filtering, we we could could exclude non-skeleton noise and obtain the image of a human skeleton as depicted in Figure exclude non-skeleton noise and obtain the image of a human skeleton as depicted in Figure 9. 8.

(a)

(b)

Figure after skeletonizing. skeletonizing. Figure 8. 9. Results Results before before and and after after skeletonizing: skeletonizing: (a) (a) before before skeletonizing skeletonizing and and (b) (b) after

2.2.3. 2.2.3. Skin Skin Detection Detection We integrated skin color color detection detection into into the the skeletonizing skeletonizing process more We integrated skin process to to enable enable the the system system to to more accurately establish the the skeleton. skeleton. accurately establish ••

RGBtotoYCbCr YCbCr(luminance (luminanceand andchroma) chroma)Transform Transform RGB In we applied applied the the YCbCr YCbCr elliptical In this this study, study, we elliptical skin skin color color model model [30] [30] to to detect detect the the skin skin regions. regions. The YCbCr color model decouples the color and intensity features to reduce the lighting effects in The YCbCr color model decouples the color and intensity features to reduce the lighting effects in color color features. Although the YCbCr color model consumes more computational time than the RGB features. Although the YCbCr color model consumes more computational time than the RGB model, model, it performs computations faster than (Hue-Saturation-Value) the HSV (Hue-Saturation-Value) color [28]. it performs computations faster than the HSV color model [28].model Therefore, Therefore, using YCbCr color features from human bodies, we can extract skin regions with using YCbCr color features from human bodies, we can extract skin regions with computational computational efficiency and more accurately establish the human corresponding human skeletons. efficiency and more accurately establish the corresponding skeletons. ••

EllipticalSkin SkinModel Model Elliptical We usedananelliptical elliptical skin model to determine whether skinpresent. was present. We an created an We used skin model to determine whether skin was We created elliptical elliptical skin model where the pixels outside the elliptical color, skin model where if the pixelsifoutside the elliptical model do notmodel matchdo thenot skinmatch color, the thenskin the pixels then the pixels inside the ellipse represent the skin color. inside the ellipse represent the skin color. ••

MorphologcalClose CloseOperation Operation Morphologcal The RGBimage imagemust must converted YCbCr, andelliptical the elliptical skin is model is separate used to The RGB bebe converted intointo YCbCr, and the skin model used to separate areas that may be skin color from those that are not. When an area in which the RGB areas that may be skin color from those that are not. When an area in which the RGB image isimage likely is likely to be the skin color is separated and reserved, the images may nevertheless include too to be the skin color is separated and reserved, the images may nevertheless include too much noise. much In that case, we would conduct morphological close on feature the resultant In thatnoise. case, we would conduct morphological close processing onprocessing the resultant maps.feature maps. • Largest Connected Object • Largest Connected Object Morphological close makes the object itself more connected and also filters out noise. The operation Morphological closetomakes object itself more andand alsothen filters out the noise. The uses the connected object find thethe largest connected objectconnected in the picture, records location operation uses the connected object to find the largest connected object in the picture, and then of its center point. The center of the position is the head of the skeleton depicted in Figure 10.

Appl. Sci. 2018, 8, x

9 of 22

Appl. Sci. 2018, 8, x

9 of 22

records the location of its center point. The center of the position is the head of the skeleton depicted Figure 9. of its center point. The center of the position is the head of the skeleton recordsinthe location Appl. Sci. 2018, 8, 1161 9 of 22 depicted in Figure 9.

(a) (a)

(b) (b)

(c)

(d)

(c)

(d)

Figure 9. Skin detection process: (a) RGB image, (b) Elliptical skin model, (c) Morphological close, Figure 9.Skin Skin detection detection process: (a) RGB image, (b) Elliptical skin model, (c) Morphological close, Figure and (d)10. Largest connectedprocess: object. (a) RGB image, (b) Elliptical skin model, (c) Morphological close, and Largestconnected connectedobject. object. and (d)(d) Largest

2.2.4. Skeleton Point Establishment 2.2.4. Skeleton Point Establishment 2.2.4. Skeleton Point Establishment After obtaining the curve-shaped human skeleton from the body contour image and skin color After obtaining the curve-shaped human skeleton from the body contour image and skin color After the obtaining theiscurve-shaped skeleton from the body contour image and skin color detection, consistent with the theactual actual upper extremity structure the detection, theskeleton skeleton isnot not yet yet quite quitehuman consistent with upper extremity structure of of the detection, the skeleton skeleton iswith notjoints. yet quite consistent with thedetect actual upper extremity itstructure of the linear human Therefore, to accurately accurately detectbody body movements, it is necessary linear human skeletonwith joints. Therefore, to movements, is necessary linear human skeleton with joints. Therefore, to accurately detect body movements, it is necessary to to establish the accurate position of each joint where movements are produced. In this study, to establish the accurate position of each joint where movements are produced. In this study, wewe establish the accurate position of each joint where movements are produced. In this study, we present present a rule-based process for establishing the skeleton point, which is as follows. present a rule-based process for establishing the skeleton point, which is as follows. a rule-based process for establishing the skeleton point, which is as follows. Head Skeleton Head SkeletonPoint PointDetermination Determination Head Skeleton Point Determination ToTorapidly of the the human human head, head,we weassume assumethat thatwhen when rapidlydetermine determine the the skeleton skeleton points points of thethe To rapidly determine the skeleton of the then human we assume wheninthe rehabilitation user sufficiently near the the points screen center, head can bethat detected rehabilitation user isissufficiently near screen center, thenthe thehead, headskeleton skeleton can be detected in rehabilitation user isofof sufficiently near the screen then the the head skeleton can bemap detected in the the thecentral centralregion region the screen, screen, denoted C feature and the the the denoted by SScenter, C.. We We adopted adopted theskeleton skeleton feature map and central region of the screen, denoted by SSR, adopted the skeleton feature map and the detected detected skinregion, region, denoted bySK SK and and SR, respectively, to the point. Because C . We detected skin denoted by respectively, tolocate locate thehead headskeleton skeleton point. Because the user’s face should be sufficiently close to the camera, the area of the face’s skin region should be be skin region, denoted by SK and SR, respectively, to locate the head skeleton point. Because the user’s the user’s face should be sufficiently close to the camera, the area of the face’s skin region should sufficiently large. The search search process for the skeleton point thethe face should be sufficiently close to the camera, thehead area of the face’s skinisregion shouldthrough be sufficiently sufficiently large. The process for the head skeleton point isperformed performed through following steps. large. The search process for the head skeleton point is performed through the following steps. following steps. Step Scanthe theskeleton skeletonfeature feature map map within within is is located at the Step 1: Scan the skeleton feature map within the thecircular circularregion regionwhose whosecenter center is located at the the Step 1:1:Scan the circular region whose center located at central coordinate of the input image with a radius of max(W/4, H/4), where W and H denote thethe central coordinate of the input image with a radius of max(W/4, H/4), where W and H denote central coordinate of the input image with a radius of max(W/4, H/4), where W and H denote the width and height ofthe the image,respectively. respectively. width width and and height height of of the image, image, respectively. Step 2: the If skeleton the skeleton feature map can be obtained Stepwe1,validate then we validate the Step 2:2:If feature map can obtained in Step 1,inin then Step If the skeleton feature mapbe can be obtained Step 1, then the wecorresponding validate the corresponding connected-component area of the skin color region covering the skeleton feature connected-component area of the skin color the skeleton feature map is sufficiently corresponding connected-component arearegion of thecovering skin color region covering the that skeleton feature map that is sufficiently large to be a human face—its area should be larger than a given threshold Ts, largethat to beisasufficiently human face—its area should beface—its larger than a should given threshold Tthan the threshold threshold Tiss, s , where map large to be a human area be larger a given where the threshold is set at 800 pixels in our experiments. As a result, we can obtain the location of set at 800 in our As a result, we can obtain the location theobtain head skeleton point, where thepixels threshold is experiments. setdenoted at 800 pixels our experiments. As10. a result, weof can the location of the head skeleton point, by SH in and depicted in Figure and depicted in Figure 11. denoted by S H the head skeleton point, denoted by SH and depicted in Figure 10.

(a)

(b)

Figure 10. Establishing the(a) skeleton point of the head: (a) The user (b) approaches the preset point, which triggers detection. (b) The skeleton feature map combined with YCbCr elliptical skin color Figure 10. Establishing head: (a) Figure 11. Establishing the the skeleton skeleton point point of of the the head: (a) The The user user approaches approaches the the preset preset point, point, detection establishes the head skeleton point. which triggers color which triggers detection. detection. (b) (b) The The skeleton skeleton feature feature map map combined combined with with YCbCr YCbCr elliptical elliptical skin skin color detection establishes the head skeleton point. detection establishes the head skeleton point.

Appl. Sci. 2018, 8, x Appl. Sci. 2018, 8, 1161

10 of 22

10 of 22

Shoulder Skeleton Point Determination After the skeleton of the head (SH) is established, we can determine the pair of shoulder Shoulder Skeleton Point point Determination skeleton points according to the position of the head because the shoulders should appear under After the skeleton point of the head (SH ) is established, we can determine the pair of shoulder the two sides of the head: skeleton points according to the position of the head because the shoulders should appear under the Step 3: To find the initial shoulder skeleton points, we first search down the rectangular two sides of the head: boundaries of the box formed by the head’s skin color region to locate the pair of skeleton feature Step 3: To the initial shoulder skeleton points, down the rectangular points that are find first encountered under the head. Then, we we set first thesesearch two skeleton points as the boundaries of the box formed by the head’s skin color region to locate the pair of skeleton feature initial left and right shoulder skeleton points, denoted by ܵௌ௅ and ܵௌோ respectively. points Step that are first encountered under thethe head. Then, we setpoints theseare twopossibly skeleton points as the initial 4: As depicted in Figure 12a,b, initial shoulder determined on the left andofright shoulder skeletonunder points, by SSL andbut SSRthe respectively. basis the clavicle positions thedenoted head’s boundaries, actual shoulder points should be Stepto4:the Aslateral depicted 12a,b, the initial shoulder possibly determined closer endsinofFigure the clavicles. Therefore, we set thepoints initialare shoulder points ܵௌ௅ and on ܵௌோthe basis of the clavicle positions under the head’s boundaries, but thebyactual shoulder points as the centers of the corresponding semicircular regions formed the movement regionsshould of the be two to arms. closer the lateral ends of the clavicles. Therefore, we set the initial shoulder points SSL and SSR as the 5: corresponding Because the real shoulder points should be by at the rotational centers centersStep of the semicircular regions formed the movement regionsofofthe theshoulder two arms. joints, set the initial shoulder points ܵௌ௅should and ܵௌோ as the the rotational centers, and set aofradius r with the Stepwe 5: Because the real shoulder points be at centers the shoulder joints, ranging from 90° to −90° theSSR semicircular regions left and rightthe shoulders. weangles set theߠinitial shoulder points SSL of and as the centers, andof setthe a radius r with angles θThen, ranging ◦ to −90◦ whether we 90 determined skeleton feature points were over the two semicircular regions from of the semicircular regions of the leftpresent and right shoulders. Then, we determined through Equation (2). whether skeleton feature points were present over the two semicircular regions through Equation (2).

= r × cos θ + S SS X X= r × cos θ + S xx sin θθ + + SSyy SYSY==−−rr×× sin

(2) (2)

S SL Sand SSR S),SR ), where the xxand andyycoordinates coordinates initial shoulder points andSSyy denote denote the of of thethe initial shoulder points (i.e., (i.e., whereSxSxand SL and respectively, coordinatesof ofthe thepossible possibleshoulder shoulder points respectively,and andSXܵ௑and andSYܵ௒represent representthe the xx and and yy coordinates points wewe evaluated along the semicircular regions of the left and right shoulders, respectively. Because the evaluated along the semicircular regions of the left and right shoulders, respectively. Because thereal shoulder point will thebe rotational centercenter of theof shoulder joint, joint, the candidate shoulder pointpoint should real shoulder pointbewill the rotational the shoulder the candidate shoulder accord with this characteristic. Thus, if a Thus, pair ofifskeleton is located search of the should accord with this characteristic. a pair ofpoints skeleton points at is the located at regions the search left and right shoulders, denoted by S and S respectively, then we set these two skeleton points L denoted R by S L and S R respectively, then we set these regions of the left and right shoulders, astwo the skeleton actual left and right shoulder points. The searchpoints. process forsearch the shoulder skeleton points is points as the actual left and right shoulder The process for the shoulder depicted in Figure 12. skeleton points is depicted in Figure 12.

(a)

(b)

(c)

Figure The shoulder point search process: initial shoulder point is sought under the Figure 12.12. The shoulder point search process: (a) (a) TheThe initial shoulder point is sought under the square square region (b) The initial shoulder points (c) The actual shoulder points. region (b) The initial shoulder points (c) The actual shoulder points.

Elbow and Wrist Skeleton Point Determination Elbow and Wrist Skeleton Point Determination After establishing actual shoulder skeleton points ( S , S ), we can determine the elbow and After establishing actual shoulder skeleton points (S L ,LSR ), Rwe can determine the elbow and wrist wrist skeleton skeleton points. points. S LS, L ,SSRR))from thethe centers, Step6:6:We Weset set the the obtained obtained shoulder points (i.e., Step shoulderskeleton skeleton points (i.e., fromStep Step5 as 5 as centers, ◦ to 315◦ to search a 3/4-circular region to find the then set a radius r with the angles θ ranging from 45 then set a radius r with the angles ߠ ranging from 45° to 315° to search a 3/4-circular region to find theskeleton arm skeleton as depicted in Equation (3).search The search is completed the maximum arm pointspoints as depicted in Equation (3). The is completed whenwhen the maximum value is valueand is found, all the found points constitute arm points, skeletonaspoints, as depicted Figure found, all theand found points constitute arm skeleton depicted in Figurein13a. We 13a. set the

Appl. Sci. 2018, 8, x Appl. Sci. 2018, 8, x

11 of 22 11 of 22

S

We set the end of points as the pair of wrist skeleton points, denoted by

Appl. 2018, 8, 1161 WeSci.set the end of points as the pair of wrist skeleton points, denoted by

respectively, and depicted in Figure 13b. respectively, and depicted in Figure 13b.

and

S

,

WR22 11 of SWL WLand SWR ,

r × cos θ + Sxpoints, (SL ), denoted r××cos θ++SSS(xSWR (S)R, respectively, ) end of points as the pairSSXof=wrist skeleton Scos and depicted Wθ L and SSXX ==rby X = r × cos θ + S x ( S L ), x R (3) in Figure 13b. (3) SSY ==−−rr××sin θθ ++ SS x ((SSL),), SSY ==−−rr××sin sinθθ++S S(xS(S)R ) sin R) Y = r × cos θ + S xx ( S R SYX = r × cos θ + Sxx (SLL ), SX (3) where the is determined to be the width the shoulder because = −rregion × sin θrr + =be −twice rtwice × sinthe θ+ Sx (SRof)of x ( S L ), SYto Ysearch where theradius radiusofofthe theSsearch region is S determined width the shoulder because thethelength ofofthe human arm isis typically within twice twice the theshoudler shoudlerwidth. width.It Itcan can determined length the armregion typically within bebe determined where the radius of human the search r is determined to be twice the width of the shoulder because the using the following equation: using following equation: length ofthe the human arm is typically within twice the shoudler width. It can be determined using the r＝ S xx (SSLL )) −− SSxx((SSRR) ) following equation: (4) (4) r = Sx (S L ) − Sx (SR ) (4) Step right elbow elbowskeleton skeletonpoints pointshalfway halfway between whole Step7:7:Next, Next,we wecan canplace place the the left left and right between thethe whole Step 7: Next, weThey can place the left and elbow skeleton pointsand halfway between whole SSERER arm skeleton points. They are denoted denoted , ,respectively, depicted inthe Figure arm skeleton points. are as Sright and respectively, andare are depicted in Figure EL EL and arm skeleton points. They are denoted as S and S , respectively, and are depicted in Figure 13c. EL ER 13c. 13c.

(a) (a)

(b)

(c)

(b)

(c)

Figure The established skeleton points of wrist the wrist and elbow: the points we constitute found Figure 13.13. The established skeleton points of the and elbow: (a) All(a) theAll points we found Figure 13. The skeleton thepoints wristatand elbow: (a) Allpoints. the points constitute arm established skeleton points. (b) Wepoints set theof end the wrist skeleton (c) Wewe set found the arm skeleton points. (b) We set the end points at the wrist skeleton points. (c) We set the skeleton point constitute arm skeleton points. (b) We set thethe end points the wrist skeleton points. (c) We set the skeleton point of the elbow halfway between whole armatskeleton points. of the elbow halfway between the whole arm skeleton points. skeleton point of the elbow halfway between the whole arm skeleton points.

The Overall Skeleton Points Correction Process The Overall Overall Skeleton The Skeleton Points Points Correction Correction Process Process If the user does not sit in an upright position or if the user’s body rotates, these situations may If the user does not in position if body rotates, situations may If the user doesin not sitskeleton in an an upright upright position or if the the user’s user’s body14a. rotates, these may cause some errors thesit point setting, asor indicated in Figure This these studysituations used depth cause some errors errors in skeleton point setting, indicated in 14a. This study depth information to correct positions of skeletons. the depth information varies for used the right cause some in the thethe skeleton point setting, as asWhen indicated in Figure Figure 14a. This study used depth information to correct the positions of skeletons. When the depth information varies for the right side and lefttoside, we can the radius r for searching on the far side and extend the radius r for information correct the shrink positions of skeletons. When the depth information varies for the right side and left side, we can shrink the radius r for searching on the far side and extend the radius for searching the we proximal side, as Figure 14b. on the far side and extend the radius rr for side and lefton side, can shrink thedepicted radius rinfor searching searching on searching on the the proximal proximal side, side, as as depicted depicted in in Figure Figure 14b. 14b.

(a)

(b)

Figure 14. Results before and after arm point correction: (a) before arm point correction and (b) after (a) (b) arm skeleton point correction.

Figure beforeand andafter afterarm armpoint point correction: before point correction (b) after Figure 14. 14. Results Results before correction: (a)(a) before armarm point correction and and (b) after arm arm skeleton point correction. skeleton point correction.

Appl. Sci. 2018, 8, 1161

12 of 22

2.2.5. Action Classifier Appl. Sci. Sci. 2018, 2018, 8, 8, xx Appl.

1222 of 22 22 12 of of 12 12 of 22 12 12ofof22 22

Appl. 2018, Appl. Sci.Sci. 2018, 8, x8, x Appl. Appl.Sci. Sci.2018, 2018,8,8,xx

When the seven {S H , S L , SR , SEL , SER , SW L , SWR } major human skeletal points are established, 2.2.5. Action Classifier 2.2.5. Action Classifier 2.2.5. Action Classifier 2.2.5. Action 2.2.5. Action Classifier then the actions canClassifier be determined. To determine whether a continuous action is a predefined }major When major skeletal points established, S, SRR compare ,EL ,, SSWL When thethe seven human skeletal points areare established, S,ER SWL ,WL S,}SWR EL ER }WR major skeletal points , S,S,S,EL , S,S,S,ER {{S{{SHSSH,H{H,SS,S,LSHHSL,L,L,S,SS,RSSLLR,R,Rto rehabilitationWhen action, it isseven necessary the actions ofhuman the rehabilitation patient with the ER WL WRWR When the seven human skeletal points are established, ,S,SEL Whenthe theseven seven majorhuman human skeletal pointsare areestablished, established, SELEL SERER SWL WR WL, ,SS WR}} major then the actions can be determined. To determine whether a continuous action a predefined then theactions actions canbe bedetermined. determined. Todetermine determine whether acontinuous continuous action isa is apredefined predefined then the can To whether a action is rehabilitation actions listed and defined in Table 1. We have preset seven types of rehabilitation then the To whether aa continuous action then the actions actions can can be be determined. determined. Totodetermine determine whether continuous action isis apatient a predefined predefined rehabilitation action, it is necessary compare actions rehabilitation with rehabilitation action, is necessary to compare thethe actions of of thethe rehabilitation patient with thethe rehabilitation action, itititit isisis necessary toto compare the rehabilitation action, necessary compare the actions the rehabilitation patient with the rehabilitation action, necessary to compare theactions actionsofof ofthe therehabilitation rehabilitationpatient patientwith withthe the movements for specific muscle groups in the system. rehabilitation actions listed and defined Table have preset seven types rehabilitation rehabilitation actions listed and defined in in Table 1. 1. WeWe have preset seven types of of rehabilitation rehabilitation rehabilitation actions listed and defined Table We have preset seven types rehabilitation rehabilitationactions actionslisted listedand anddefined definedinin inTable Table1.1.1.We Wehave havepreset presetseven seventypes typesofof ofrehabilitation rehabilitation movements for specific muscle groups in the system. movements for specific muscle groups in the system. movements for specific muscle groups in the system. movements movementsfor forspecific specificmuscle musclegroups groupsin inthe thesystem. system.

Table 1. Seven types of rehabilitation movement.

Position No. Position No. Position No. Position No. Position PositionNo. No.

Table 1. Seven types of rehabilitation movement. Table 1. Seven types rehabilitation movement. Table 1. Seven types of of rehabilitation movement. Table Table1.1.Seven Seventypes typesof ofrehabilitation rehabilitationmovement. movement.

Position ofof Skeleton Points Position of Skeleton Points Position Skeleton Points Position of Skeleton Points Position Positionof ofSkeleton SkeletonPoints Points

Demo of Movement Demo of Movement Demo of Movement Demo of Movement Demo of Movement Demo of Movement

position position position 00 0 0 position position 00 position (initial position) (initial position) (initial position) (initial position) (initial (initialposition) position)

position position position 11 1 1 position position position11

position position 22 2 2 position position position position22

position 33 3 3 position position position position position33

Appl. Sci. 2018, 8, x Appl. Sci. 2018, 8, x

position 4 position 4 position 4

position 5 position 5

13 of 22 13 of 22

Appl. Sci. 2018, 8, x

13 of 22

Appl. Sci. 2018, 8, 1161 position position 44

13 of 22

position 4

Table 1. Cont. Position No.

Position of Skeleton Points

Demo of Movement

position position 55 position 5 position 5

position position 66 position 6 position 6

After After the the user user is is trained, trained, the the screen screen will will display display the the movement movement starting starting point point and and end end point point to to After the is trained, the screen action. will display the movement starting point and end point guide user to conduct the However, the complete each action may guide the the user touser conduct the appropriate appropriate action. However, the time time to to complete each action may to After the user is user trained, the screen will display the movement starting point and end point to guide the to conduct the can appropriate However, theitittime to complete each action may vary by Euclidean distance two but might not to vary by user. user. Euclidean distance can compare compareaction. two sequences, sequences, but might not be be able able to compare compare vary user.movements Euclidean distance can action. compare two sequences, it might not be able to compare guide the user toby conduct the appropriate However, thebuttime to complete each action may two sequential of durations. The traditional Euclidean distance calculation two sequential movements of different different durations. The traditional Euclidean distance calculation two sequential movements of different durations. Thetwo traditional Euclidean calculation method easily to of between similar but different vectors method easily leads leads to the the expansion expansion of distance distance between two similar but different vectors because vary by user. Euclidean distance can compare two sequences, but it might not distance be able because to compare two method easily leads to the expansion of distancelengths. between two similar but different vectors because the vectors contain action and time and different The action classifier process is equal to the vectors contain action and time and different lengths. The action classifier process is equal to method sequential movements of different durations. The traditional Euclidean distance calculation the vectors contain and time different lengths. The action process is equal to comparing two action sequences of different lengths. Therefore, we must aa suitable comparing two actionaction sequences of and different lengths. Therefore, weclassifier must adopt adopt suitable easily leadsmethodology tocomparing the expansion ofthe distance between two similar but different because the vectors action sequences of different lengths. Therefore, we vectors must adopt aachieve suitable to handle of lengths on axis, still methodology totwo handle the two two action action vectors vectors of different different lengths on the the time time axis, and and still achieve methodology to handle the two action vectors of different lengths on the time axis, and still achieve contain action and time and different lengths. The action classifier process is equal to comparing two the the best best correspondence. correspondence. The The nonlinear nonlinear dynamic dynamic time time warped warped alignment alignment allows allows aa more more intuitive intuitive the best correspondence. The nonlinear dynamic time warped alignment allows a more intuitive measure to be calculated. The dynamic time warping (DTW) algorithm [31] can solve the problem measureof to different be calculated. The dynamic time warping (DTW) algorithm [31] can methodology solve the problemto handle action sequences lengths. Therefore, we must adopt a suitable measurelengths to be calculated. The dynamic timeadditional warping computations (DTW) algorithm [31]the cantraining solve the problem of of performing during process. of various various lengths of time time without without performing computations during the process. the two action vectors of different lengths on theadditional time axis, and still achieve thetraining best correspondence. ofDTW various lengths of time without additional computations during the training The algorithm can achieve the best point-to-point correspondence between two The DTW algorithm can achieve theperforming best nonlinear nonlinear point-to-point correspondence betweenprocess. two The nonlinear dynamic time warped alignment allows point-to-point aofmore intuitive measure to be is calculated. The DTW algorithm can still achieve the best nonlinear correspondence between two motion trajectories, and compare the of various The motion trajectories, and can can still compare the trajectories trajectories various lengths. lengths. The DTW DTW algorithm algorithm is motion trajectories, and can algorithm still compare the trajectories ofthe various lengths. DTW algorithm The dynamic time warping (DTW) [31] can solve problem ofThe various lengthsisof time widely used in voice recognition, gesture recognition, and limb motion recognition widely used in audio audio matching, matching, voice recognition, gesture recognition, and limb motion recognition widely in audio matching, voice recognition, recognition, and limb motion recognition [32]. Many studies use DTW to compare motion trajectories. [32]. Manyused studies use the the DTW algorithm algorithm toduring comparegesture motion trajectories. without performing additional computations the training process. The DTW algorithm can [32]. Many studies use the DTW algorithm to compare motion trajectories. In this study, we adopted the DTW algorithm to compare actions and determine the similarity In this study, we adopted the DTW algorithm to compare actions and determine the similarity achieve the best nonlinear point-to-point correspondence between two motion trajectories, and can thisdefined study, we the DTW algorithm compare actions and determine the similarity between and the action, and to the of times betweenInthe the defined andadopted the user-performed user-performed action, to and to avoid avoid the influence influence of various various times still compare theon trajectories of various lengths. The DTW algorithm is widely used audio matching, between the defined and the user-performed action, and to avoid influence of in various times spent rehabilitation actions on determination. The DTW algorithm identified spent on similar similar rehabilitation actions on the the action action determination. Thethe DTW algorithm identified spent on similar rehabilitation actions on the action determination. The DTW algorithm identified voice recognition, gesture recognition, and limb motion recognition [32]. Many studies use the the features features of of continuous continuous action action sequences, sequences, and and tolerated tolerated the the effect effect of of deviation deviation of of two two vectors vectors of ofthe DTW the features of continuous action sequences, and tolerated the effect of deviation of two vectors different lengths on the time series in the movement comparison. On the basis of the dynamic different lengths on the time series in the movement comparison. On the basis of the dynamic of algorithm to compare motion trajectories. different lengths on the time series in the movement comparison. On the basis of the dynamic programming optimization correlations between the two trajectories were programming optimization results, correlations between the actions two nonlinear nonlinear trajectories weresimilarity In this study, we adopted the results, DTW algorithm to compare and determine the programming optimization results, correlations between the two nonlinear trajectories were determined to the point-to-point matching relations. According to determined to be be the corresponding corresponding point-to-point matching relations. According to the the between the defined andtothe action, and to avoid the influence of varioustotimes determined beuser-performed the corresponding point-to-point matching relations. According the spent

on similar rehabilitation actions on the action determination. The DTW algorithm identified the features of continuous action sequences, and tolerated the effect of deviation of two vectors of different lengths on the time series in the movement comparison. On the basis of the dynamic programming optimization results, correlations between the two nonlinear trajectories were determined to be the corresponding point-to-point matching relations. According to the optimization results of the dynamic programming, the best point-to-point corresponding relationship between the two nonlinear trajectories was found, and different movement trajectories with different time lengths for the same action could be accurately identified. 3. Results The hardware of the upper extremity rehabilitation system comprised a personal computer with the Intel Core2 CPU and a Kinect camera. The software comprised the WIN7 operating system, a QT2.4.1 platform, and OpenNI, as presented in Table 2.

Appl. Sci. 2018, 8, 1161

14 of 22

Table 2. Hardware and software of the proposed system. Hardware

Software

CPU: Intel Core(TM)2 Quad 2.33 GHz 2.34 GHz RAM: 4 GB

OS: WIN7 Platform: QT 2.4.1 Library: QT 4.7.4, OpenCV-2.4.3 OpenNI1.5.2.23(only for RGB image and depth information) /

Depth camera: Kinect RGB camera: LogitechC920

Using Kinect depth information, we performed experiments at the resolutions of 640 × 480 and 320 × 240 as displayed in Table 3. To improve execution speed, we adopted the resolution of 320 × 240 for further examination. Table 3. Comparison of system execution speed at various resolutions. Release

Dpi 640 × 480 320 × 240

ms

fps

27 8

37 125

We examined 20 samples, and each of them had at least 500 frames. The total number of frames was 11,134. We used skin color detection to determine the presence of the human body and to orient the head. The skin color detection process yielded an average execution speed of 0.26 ms, equivalent to 3889 frames per second (FPS) in a release model. The results from testing all of the system processing functions, including skin color detection and skeletonizing, indicated that each frame only required 8.02 ms, equivalent to 125 FPS in a release model; results are presented in Table 4. Results of test samples are depicted in Figures 15–17. Table 4. Experimental data of skin color detection speed and system execution speed.

Test Sequence

Test Sample 1 Test Sample 2 Test Sample 3 Test Sample 4 Test Sample 5 Test Sample 6 Test Sample 7 Test Sample 8 Test Sample 9 Test Sample 10 Test Sample 11 Test Sample 12 Test Sample 13 Test Sample 14 Test Sample 15 Test Sample 16 Test Sample 17 Test Sample 18 Test Sample 19 Test Sample 20 average

Skin Color Detection Speed

System Execution Speed

Release Model

Release Model

No. of Frames

618 555 554 509 541 547 539 640 612 540 629 534 517 517 604 528 535 553 524 538 557

ms

fps

ms

fps

0.27 0.25 0.26 0.26 0.25 0.27 0.25 0.26 0.26 0.26 0.25 0.27 0.28 0.25 0.25 0.25 0.25 0.27 0.24 0.25 0.26

3704 4000 3846 3846 4000 3704 4000 3846 3846 3846 4000 3704 3571 4000 4000 4000 4000 3704 4167 4000 3889

8.46 8.29 8.04 8.14 8.09 8.20 8.02 7.97 8.08 7.93 7.86 7.87 8.58 7.65 7.94 8.07 7.67 7.90 7.75 7.90 8.02

118 121 124 123 124 122 125 125 124 126 127 127 117 131 126 124 130 127 129 127 125

Test Sample 15 Test Sample 16 Test Sample 17 Test Sample 18 Test Sample 19 Test8,Sample 20 Appl. Sci. 2018, 1161 average

604 528 535 553 524 538 557

0.25 0.25 0.25 0.27 0.24 0.25 0.26

4000 4000 4000 3704 4167 4000 3889

Figure 15. Results of test samples 1–7. Figure 15. Results of test samples 1–7.

7.94 8.07 7.67 7.90 7.75 7.90 8.02

126 124 130 127 129 127 125

15 of 22

Appl. Sci. 2018, 8, 1161

16 of 22

Appl. Sci. 2018, 8, x

16 of 22


Appl. Sci. 2018, 8, 1161

17 of 22

Appl. Sci. 2018, 8, x

17 of 22


Because the action identification system proposed in this study is aimed at assisting patients the actionrehabilitation identification actions, system proposed in this study is aimed assisting who Because must complete calculation of its accuracy is atbased on patients whetherwho the must complete rehabilitation actions, calculation of its accuracy is based on whether the designated designated user has performed an action correctly or not; a correct performance receives an OK user hasresponse, performed an action or not; a correct performance an OK system response, system whereas an correctly improperly performed action receives receives a NO system response. whereas an improperly performed action receives a NO system response. In addition to quantitatively evaluating the performance of the proposed action identification In to evaluated quantitatively evaluating theaccuracy performance of the proposed action identification system,addition this study the identification of rehabilitation actions. Table 5. displays system, this study evaluated the identification accuracy of rehabilitation actions. Table displays the quantitative identification accuracy data for rehabilitation actions performed 5using the the quantitative identification accuracy data for rehabilitation actions performed using the proposed proposed system. The table data indicate that the proposed system can achieve high accuracy in system. Therehabilitation table data indicate that the proposed system can achieve high accuracy in identifying identifying actions. The average action identification accuracy was 98.1%. Results of rehabilitation actions. The average action identification accuracy was 98.1%. Results of rehabilitation rehabilitation action identification are depicted in Figure 18. Accordingly, the high accuracy of the action identification are depicted in Figure 18. Accordingly, the high accuracy of the system’s system’s identification of rehabilitation actions enables it to effectively assist patients who perform identification rehabilitation enables it to assist patients whospeed perform rehabilitation rehabilitationofactions of the actions upper extremity ateffectively home. The computational of the proposed actions of the upper extremity at home. The computational speed of the proposed system system can reach 125 FPS, which equates to a processing time per frame of less than 8 ms.can Thisreach low computation cost ensures that the proposed system can effectively satisfy the demands of real-time processing. Such computational efficiency allows for extensibility with respect to future system

Appl. Sci. 2018, 8, 1161

18 of 22

125 FPS, which equates to a processing time per frame of less than 8 ms. This low computation Appl. Sci. 2018, 8, xthe proposed system can effectively satisfy the demands of real-time processing. 18 of 22 cost ensures that Such computational efficiency allows for extensibility with respect to future system developments that developments that account for complex ambient environments or implementation in embedded and account for complex ambient environments or implementation in embedded and pervasive systems. pervasive systems.

Figure 18. The results of rehabilitation action identification: position 0 to 6.

Figure 18. The results of rehabilitation action identification: position 0 to 6.

Appl. Sci. 2018, 8, 1161

19 of 22

Table 5. Accuracy of identification of rehabilitation actions. Test Sequence

No. of Actual Determined Actions

No. of Correctly Determined Actions

Accuracy Rate of Identifying Rehabilitation Action

position 1 position 2 position 3 position 4 position 5 position 6 Total

151 150 156 190 120 186 953

149 146 155 184 119 182 935

98.7% 97.3% 99.4% 96.8% 99.2% 97.8% 98.1%

4. Discussion In this study, we replaced dual cameras with a Kinect depth camera. Compared with the study using dual cameras [33], we spent more time to detect skin color—approximately 12 ms. However, this disadvantage can be overcome and palm position can be determined if the user wears long sleeves that reveal only the palm. In addition, the rapid processing speed for the upper extremity tracking ensures the accuracy of the proposed system because we can make a direct judgment of each position. We also check the key points of the skeleton to ensure movements are correctly enacted. Compared with the study that used HSV for skin color detection [27], we selected to use the YCbCr and elliptical skin models for skin color detection to hasten the skin color detection process. The skeletonizing process of the proposed system optimizes skeletonizing and achieves a high degree of accuracy. It can perform well at a high dpi (640 × 480) as well as at a low dpi (320 × 240), and achieves a high processing speed. The hand-pair gesture presented by Patlolla et al. [33] used two skeleton points (palms), achieved an accuracy of 93%, and operated nearly in real time. The body gesture method proposed by Gonzalez-Sanchez et al. [27], used three skeleton points and achieved 32 FPS with an accuracy of 98%. The method proposed in this study used seven skeleton points, and achieved 37 FPS at 640 × 480 dpi with an accuracy of 98%; at the lower dpi of 320 × 240, the speed was 8 ms, equivalent to 125 FPS. The results of comparison with two studies are depicted in Table 6. Table 6. Comparison with two studies. Title

dpi

No. of Skeleton Points

ms

fps

Accuracy Rate

Real-time hand-pair gesture recognition using a stereo webcam [33]

640 × 480

2

40

25

93%

Real-time body gesture recognition using depth camera [27]

640 × 480

3

32

32

98%

proposed

640 × 480

7

27

37

98%

proposed

320 × 240

7

8

125

98%

5. Conclusions This study proposed an action identification system for daily home upper extremity rehabilitation. In this section, we will discuss the key contributions of this study. First, in Skeleton Point Establishment phase, we set up those skeleton points not only according to the changes of image features, but also consider the real movements of joints and muscles. Hence, we set up and correct those skeleton points following the principles of anatomy of actions and principles of muscles testing, and those principles also be applied to set rehabilitation action programs. We have preset seven kinds of rehabilitation movements for specific muscle groups in the system to provide a guide for users to perform the actions. The selection of these rehabilitation actions was based on their importance for rehabilitating the ability to perform daily living activities that require the use of the upper extremity such as dressing oneself or reaching for things. Each rehabilitation action program corresponds to training of specific

Appl. Sci. 2018, 8, 1161

20 of 22

muscle groups such as the biceps, triceps, or deltoid muscles. Second, the hardware used in the upper extremity rehabilitation system comprises a personal computer with a Kinect depth camera that can be set up similarly to a study desk or computer table at home; thus, the system provides a familiar and convenient environment for rehabilitation users, and the patient can perform the rehabilitation routine even in a limited space without extraneous equipment. Third, patients who cannot stand for a long periods of time, such as stroke, hemiplegia, muscular dystrophy, advanced age, etc., can perform rehabilitation actions in a sitting position, which reduces energy expenditure and enables the patient to focus on the motion control training of the upper extremities without worrying about falling. Fourth, the execution speed of the proposed system can reach 8 ms at a resolution of 320 × 240, handle a frame rate equivalent to 125 FPS, and achieve 98% accuracy when identifying rehabilitation actions. Fifth, the proposed upper extremity rehabilitation system operates in real time, achieves efficient vision-based action identification, and consists of low-cost hardware and software. In light of these enumerated benefits, we contend that this system can be effectively used by rehabilitation patients to perform daily exercises, and can reduce the burden of transportation and the overall cost of rehabilitation. The current limitations of the proposed system are: (1) We combined the skin color detection into skeletonizing process in order to allow the system to be more accurate in establishing the skeleton. That may cause some errors in skeletonizing process if user wears long-sleeved clothes; (2) the proposed system has to establish six rehabilitation actions in the training programs for specific joints such as shoulders, elbows and for specific muscles, such as biceps, triceps, deltoid, that might insufficient for a totally training program. We need to consider how the motion of each plane and axis of the human body translates into 3D or 2D pose estimation in order to develop a more comprehensive and more accurate action rehabilitation system. In our further studies, the proposed rehabilitation system can also be improved and extended to fit the real human body skeleton to measure the range of motion of human action by image instead of manual of profession in the future based on the machine learning techniques. If set an accurate and real human body skeleton map in pose estimation or action identification system is possible, then we can know not only the pose changed but also the amount of movements and the maximal ranges of active motions. According to this base, a vision-based movement analysis system can be built in our future study. The system can analyze the image sequences of a patient who are performing given activities, then analytical results will determine whether if a patient uses a compensatory action or an error movement pattern violate biomechanical principles. Author Contributions: Y.-L.C. and P.L. have investigated the ideas, designed the system architecture, algorithm and methodology of the proposed upper extremity rehabilitation system, and wrote the manuscript; C.-H.L. conceived of the presented ideas, implemented the proposed system, and wrote the manuscript with support from Y.-L.C.; C.-W.Y. and Y.-W.K. conducted the experiments, analyzed the experimental data, and provided the analytical results; All authors discussed the results and contributed to the final manuscript. Funding: This research was funded by Ministry of Science and Technology of Taiwan under the grant numbers MOST-106-2628-E-027-001-MY3 and MOST-106-2218-E-027-002. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2.

3.

Kvedar, J.; Coye, M.J.; Everett, W. Connected health: A review of technologies and strategies to improve patient care with telemedicine and telehealth. Health Aff. 2014, 33, 194–199. [CrossRef] [PubMed] Lindberg, B.; Nilsson, C.; Zotterman, D.; Soderberg, S.; Skar, L. Using Information and Communication Technology in Home Care for Communication between Patients, Family Members, and Healthcare Professionals: A Systematic Review. Int. J. Telemed. Appl. 2013, 2013, 461829. [CrossRef] [PubMed] Bianciardi Valassina, M.F.; Bella, S.; Murgia, F.; Carestia, A.; Prosseda, E. Telemedicine in pediatric wound care. Clin. Ther. 2016, 167, e21–e23.

Appl. Sci. 2018, 8, 1161

4. 5. 6. 7.

8.

9. 10. 11. 12.

13.

14. 15. 16. 17.

18. 19.

20.

21.

22.

23.

21 of 22

Gattu, R.; Teshome, G.; Lichenstein, R. Telemedicine Applications for the Pediatric Emergency Medicine: A Review of the Current Literature. Pediatr. Emerg. Care 2016, 32, 123–130. [CrossRef] [PubMed] Burke, B.L., Jr.; Hall, R.W. Telemedicine: Pediatric Applications. Pediatrics 2015, 136, e293–e308. [CrossRef] [PubMed] Grabowski, D.C.; O’Malley, A.J. Use of telemedicine can reduce hospitalizations of nursing home residents and generate savings for medicare. Health Aff. 2014, 33, 244–250. [CrossRef] [PubMed] Isetta, V.; Lopez-Agustina, C.; Lopez-Bernal, E.; Amat, M.; Vila, M.; Valls, C.; Navajas, D.; Farre, R. Cost-effectiveness of a new internet-based monitoring tool for neonatal post-discharge home care. J. Med. Internet Res. 2013, 15, e38. [CrossRef] [PubMed] Henderson, C.; Knapp, M.; Fernandez, J.L.; Beecham, J.; Hirani, S.P.; Cartwright, M.; Rixon, L.; Beynon, M.; Rogers, A.; Bower, P.; et al. Cost effectiveness of telehealth for patients with long term conditions (Whole Systems Demonstrator telehealth questionnaire study): Nested economic evaluation in a pragmatic, cluster randomised controlled trial. BMJ 2013, 346, f1035. [CrossRef] [PubMed] Patel, S.; Park, H.; Bonato, P.; Chan, L.; Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. Neuroeng. Rehabil. 2012, 9, 21. [CrossRef] [PubMed] DeLisa, J.A.; Gans, B.M.; Walsh, N.E. Physical Medicine and Rehabilitation: Principles and Practice; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2005; Volume 1. Cameron, M.H.; Monroe, L. Physical Rehabilitation for the Physical Therapist Assistant; Elsevier: Amsterdam, The Netherlands, 2014. Taylor, R.S.; Watt, A.; Dalal, H.M.; Evans, P.H.; Campbell, J.L.; Read, K.L.; Mourant, A.J.; Wingham, J.; Thompson, D.R.; Pereira Gray, D.J. Home-based cardiac rehabilitation versus hospital-based rehabilitation: A cost effectiveness analysis. Int. J. Cardiol. 2007, 119, 196–201. [CrossRef] [PubMed] Lange, B.; Chang, C.Y.; Suma, E.; Newman, B.; Rizzo, A.S.; Bolas, M. Development and evaluation of low cost game-based balance rehabilitation tool using the Microsoft Kinect sensor. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 1831–1834. Jorgensen, M.G. Assessment of postural balance in community-dwelling older adults—Methodological aspects and effects of biofeedback-based Nintendo Wii training. Dan. Med. J. 2014, 61, B4775. [PubMed] Bartlett, H.L.; Ting, L.H.; Bingham, J.T. Accuracy of force and center of pressure measures of the Wii Balance Board. Gait Posture 2014, 39, 224–228. [CrossRef] [PubMed] Clark, R.A.; Bryant, A.L.; Pua, Y.; McCrory, P.; Bennell, K.; Hunt, M. Validity and reliability of the Nintendo Wii Balance Board for assessment of standing balance. Gait Posture 2010, 31, 307–310. [CrossRef] [PubMed] Seamon, B.; DeFranco, M.; Thigpen, M. Use of the Xbox Kinect virtual gaming system to improve gait, postural control and cognitive awareness in an individual with Progressive Supranuclear Palsy. Disabil. Rehabil. 2016, 39, 721–726. [CrossRef] [PubMed] Ding, M.; Fan, G. Articulated and generalized gaussian kernel correlation for human pose estimation. IEEE Trans. Image Process. 2016, 25, 776–789. [CrossRef] [PubMed] Ding, M.; Fan, G. Articulated gaussian kernel correlation for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 57–64. Ding, M.; Fan, G. Generalized Sum of Gaussians for Real-Time Human Pose Tracking from a Single Depth Sensor. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 5–9 January 2015; pp. 47–54. Ye, M.; Wang, X.; Yang, R.; Ren, L.; Pollefeys, M. Accurate 3d pose estimation from a single depth image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 731–738. Baak, A.; Müller, M.; Bharaj, G.; Seidel, H.P.; Theobalt, C. A data-driven approach for real-time full body pose reconstruction from a depth camera. In Consumer Depth Cameras for Computer Vision; Springer: London, UK, 2013; pp. 71–98. Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu Hawaii, 21–26 July 2017; pp. 7291–7299.

Appl. Sci. 2018, 8, 1161

24. 25. 26. 27. 28. 29. 30. 31.

32.

33.

22 of 22

PCL/OpenNI Tutorial 1: Installing and Testing. Available online: http://robotica.unileon.es/index.php/ PCL/OpenNI_tutorial_1:_Installing_and_testing (accessed on 17 July 2018). Falahati, S. OpenNI Cookbook; Packt Publishing Ltd.: Birmingham, UK, 2013. Hackenberg, G.; McCall, R.; Broll, W. Lightweight palm and finger tracking for real-time 3D gesture control. In Proceedings of the IEEE Virtual Reality Conference, Singapore, 19–23 March 2011; pp. 19–26. Gonzalez-Sanchez, T.; Puig, D. Real-time body gesture recognition using depth camera. Electron. Lett. 2011, 47, 697–698. [CrossRef] Rosenfeld, A.; Pfaltz, J.L. Sequential operations in digital picture processing. J. ACM 1966, 13, 471–494. John, C.R. The Image Processing Handbook, 6th ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 654–659. ISBN 9781439840634. Kakumanu, P.; Makrogiannis, S.; Bourbakis, N. A survey of skin-color modeling and detection methods. Pattern Recognit. 2007, 40, 1106–1122. [CrossRef] Sempena, S.; Maulidevi, N.U.; Aryan, P.R. Human action recognition using dynamic time warping. In Proceedings of the IEEE International Conference on Electrical Engineering and Informatics (ICEEI), Bandung, Indonesia, 17–19 July 2011; pp. 1–5. Muscillo, R.; Schmid, M.; Conforto, S.; D’Alessio, T. Early recognition of upper limb motor tasks through accelerometer: Real-time implementation of a DTW-based algorithm. Comput. Biol. Med. 2011, 41, 164–172. [CrossRef] [PubMed] Patlolla, C.; Sidharth, M.; Nasser, K. Real-time hand-pair gesture recognition using a stereo webcam. In Proceedings of the IEEE International Conference on Emerging Signal Processing Applications (ESPA), Las Vegas, NV, USA, 12–14 January 2012; pp. 135–138. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).