Portable Computer Vision-Based Cardiac Estimation as a Teaching Aid Dustin Terence van der Haar Academy of Computer Science and Software Engineering Auckland Park, Johannesburg, Gauteng, South Africa
[email protected]
ABSTRACT
KEYWORDS
The prevalence of pervasive computing has made computing platforms more portable and has introduced an array of sensors that are useful for many applications. The potential of these sensors are finally being realized in many fields. One area that is especially benefiting from these sensors is the field of the biometrics. However, we are still grappling with user acceptance issues such as intrusiveness and hygiene problems, which stifle the uptake of the technology. This study investigates the potential of using one of the more common sensors, a portable device camera, and using it to assist educators. By implementing a system that captures heart rate in a novel manner, a basic affective biometric system is formed that requires no contact and is portable. The system segments relevant areas that highlight blood flow in the face, extrapolates heart rate variability through color space changes. By analyzing the extent of color change, a cardiac waveform can be formed with a QRS complex derivative, which can be used for the task of sentiment classification. The sentiment derived can then be used by an educator to inform them of any potential uncertainty during their teaching in real-time. Abrupt changes in sentiment can then be addressed during the class, thereby improving the potential uptake of concepts taught in a classroom. The study validates that it is possible to derive heart rate variability using a camera, it also shows that using this heart rate for basic sentiment classification is feasible using a portable device even with limited resources and it warrants more attention.
Biometrics, Affective Computing and Education
CCS CONCEPTS
Computing methodologies Biometrics; Applied computing Computer-assisted instruction; Humancentered computing Ubiquitous and mobile computing systems and tools;
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. ICCIP’17, November 2017, Tokyo, Japan © 2017 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn
ACM Reference Format: Dustin Terence van der Haar. 2017. Portable Computer VisionBased Cardiac Estimation as a Teaching Aid. In Proceedings of International Conference on Communication and Information Processing, Tokyo, Japan, November 2017 (ICCIP’17), 6 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1
INTRODUCTION
Recent advances in technology have resulted in an increase in sensor diversity among consumer end users. Smart mobile phones and wearable devices have dramatically increased the amount of data that can be captured from a user. Using this data, biometric attributes can be derived and applied for various purposes, such as vital sign monitoring [9] or lifestyle evaluation [1]. By leveraging the biometric data gained from these sensors, biometric systems can be created that are highly portable and able to provide additional insights within a specific context. Although the prevalence of these sensors is increasing, the user acceptance of them is not necessarily meeting that same growth. Wearable devices are being introduced to the market that can capture biometric attributes such as heart rate and this makes it an attractive alternative for biometric system designers. However, wearable device-based systems have a few limitations, such as user acceptance, lack of processing resources and auto-calibration [25]. Users do not want to wear some of these devices and as a result wearable systems become plagued with hygiene issues that can cause more harm than the good they provide [3]. This means our options are limited with regard to using biometric attributes for various purposes. A suitable alternate solution needs to be found that minimizes these limitations and hygiene issues present in wearable systems, but that can still capture useful biometric attributes. One potential alternative is the use of a camera to derive the biometric attribute instead of the user wearing a contact sensor. Camera quality and the field of computer vision has progressed to the point where there are more image processing techniques available, along with more robust methods. Facial detection and analysis can be achieved in real-time using mature methods [26]. These methods, together with high frame rates in modern cameras, make it possible to track color difference on a more accurate level, as seen in
ICCIP’17, November 2017, Tokyo, Japan
Dustin Terence van der Haar
Hosseinpour’s approach [11] for tracking food coloring during drying to infer temperature change. A similar solution can be formed that attributes facial coloring to blood flow in the facial region. Some approaches [20, 24] leverage a basic approach that hint at deriving human heart rate, but they have not been explored further or applied for various classification tasks. The study aims to track this change in facial coloring in order to derive a cardiac trace, akin to one captured with an electrocardiograph (ECG) and uses the trace as a biometric attribute for sentiment classification in a classroom setting. The paper begins by providing an adequate problem background to the study by discussing affective computing, the current work on computer vision-based heart rate estimation and the experiment setup. A model is then proposed that outlines the vision-based approach to QRS complex-based sentiment classification, followed by a discussion of the preliminary model results. The paper ends with considerations for future work and a conclusion.
2
PROBLEM BACKGROUND
Technology assisted education has come a long way in solving historical challenges such as unfamiliarity with technology and a lack of skills [8]. There has been proof that technology can be seen as a way to achieve personalized education and that personalized education improves learning [16]. However, the different ways to achieve this, along with how it is implemented is still being investigated.
2.1
Portable Affective Computing
One promising approach to achieve the personalized education experience is to integrate the sentiment of students and react accordingly during a lecture. Constructs and methods within the field of affective computing are applied to derive sentiment, which is then enacted upon. In order to accommodate a classroom setting that maximizes engagement and interaction, portability is also required, thereby limiting potential solutions. Firstly, by scaling the hardware down to a smaller size for additional portability, it comes at the cost of lower resources and capability, therefore, limiting on-device processing methods to less resource intensive algorithms. Although many sensors are integrated in these small devices, they lack processing power and interoperability with other devices (especially across vendors). Secondly, the cost of these devices is still high for mainstream consumer use and cheaper alternatives (such as custom made Google glasses [7]) come at the cost of more limitations. Lastly, because many of these devices make contact with the user, there are hygiene concerns that may arise. These hygiene issues can be mitigated by implementing methods such as Boscart’s recommendations for hospitals [3], but the effectiveness of these protocols deteriorate over time and are subject to human error.
Figure 1: A sinus rhythm depicting P, Q, R, S and T fiducial marker[5]. One biometric attribute captured by some sensors in a portable and affective context is heart variability. Traditionally, the sinus rhythm with the various phases of polarization and depolarization occurring across the walls of the human heart, as depicted in figure 1, is captured using an electrocardiogram (or ECG). However, as opposed to the multiple leads required by an ECG, wearable devices scale down on the amount of leads required (to no leads) and capture the electric fluctuation at a lower resolution. Traditionally, once the sinus rhythm has been captured, certain methods are applied to process and analyze the signal. However in our approach a computer vision-based cardiac trace, as highlighted in the next subsection, is used instead.
2.2
Heart-Based Biometric Systems
In a heart-based biometric system, once the sinus rhythm has been captured, the signal goes though certain steps to achieve authentication. Once the heart signal has been captured, it undergoes preprocessing to eliminate noise found in the signal. The noise can typically be attributed to power line noise or baseline drift and can be filtered using a bandpass Butterworth filter between 5 Hz and 15 Hz in a similar to [23] or a wavelet filter [17]. The preprocessed signal that is free of most of the noise can then be used to derive features relevant for biometric authentication. Common features derived include the QRS complex and T wave amplitudes (or other variations [22]) or coefficients of wavelet decomposition for fragments containing complete heart beats [4]. After the signal has been preprocessed, a feature selection stage or dimensional reduction technique where fiducial markers are extracted or spatial methods such as principal component analysis (PCA) can be applied to improve subsequent classification.
Portable Computer Vision-Based Cardiac Estimation as a Teaching Aid Once features have been derived, classification takes place to facilitate the authentication process. Classification of the derived features can be achieved by using decision-based neural networks [22] or Linear Discriminant analysis (LDA) [6]. The classifier is trained with persistent features and tested in subsequent authentication events. The outcome of the classifier determines which identity it maps towards in the enrollment database. However, as seen in the next subsection, an ECG is not the only way to attain a sinus rhythm.
2.3
Computer Vision-Based Heart Rate
The advent of higher video quality recordings has introduced much potential in the field of computer vision. The increase in capture quality has ushered many new applications that leverage the additional detail for various tasks, such as subtle color changes [11]. By analyzing these subtle changes, we can monitor regions of interest better and track useful changes in them. One of these useful changes to track is the color change that occurs in the epidermis in various parts of the human body. As briefly shown by Takano and Ohta in [24], there exists a high correlation between the skin’s change in brightness and the subject’s heart and respiratory rate. A later study by Poh etal. [20] also shows there are signs that it may be possible to detect the pulse of subjects captured with a web camera. This initial evidence indicates that deriving pulse is possible, largely through the presence of R peaks found in the extrapolated data. This study explores the potential of extracting these cardiac R peaks, counting them and using it to achieve sentiment classification.
3
EXPERIMENT SETUP
The study took a primarily positivist experimental approach, which determines if sentiment classification using a computer vision derived heart trace is viable through empirical study. This approach includes the formation of a model based on literature and the author’s findings, followed by its implementation as a prototype and benchmark to determine its practical feasibility.
3.1
Data Collection
The study is performed in a controlled environment with 10 subjects (without any pre-knowledge of their medical history), with adequate lighting and using one subject at a time for capture. Subjects were asked to face the portable device’s camera in a profile position closer than two meters away and were asked to remain stationary for a period of 30 seconds (two 15 second periods). The camera then captured 1280 by 720 resolution frames at a frame rate of 30 frames per second. Following the recording of these subjects, each of these videos were used as input for the model and subsequent analysis performed.
3.2
ICCIP’17, November 2017, Tokyo, Japan
Implementation
The model was implemented according to the model specifications in the Python programming language, used relevant development packages for various methods (or implemented manually when not available) and benchmarked on a portable device with a quad-core 1.73Hz CPU and 4 GB of RAM. The configuration parameters for many of the methods in the various components were optimized based on the environmental constraints. The upper-bound face detection size is based on the frame ratio that is proportional to a subject at least two meters away from the camera. Cardiac trace points that deviated significantly from the mean were discarded and the cardiac trace rendering for visualization was disabled during benchmarks. Lastly, the number of captured regions of interest within a defined period are also factored in during the cardiac derivation process. The implementation is then assessed based one certain metrics, such as the failure to capture rate (FTAR).
4
MODEL
A non-contact approach that captures physiological signals such as heart rate is a sought after method that reduces the intrusiveness of capturing attributes and monitoring the user. Additionally, non-contact biometric systems are also sought after for the same reason, to minimize user intrusion. The paper proposes such an approach by using a camera-mounted portable device and processing according to the subsections below.
4.1
Video Capture and ROI Segmentation
Firstly, video frames are captured with a camera, which has been mounted on a wearable device that captures frames at a resolution of 1280 by 720 at 30 frames per second. Once each frame is captured it is sent to the region of interest (ROI) segmentation component to find the face of the subject and isolates his or her forehead. The face of the subject is isolated using the Viola and Jones face detection method [26]. Unlike previous similar studies [24] and [20], the cheek or whole facial region is not used, but the subject’s forehead is segmented that constitutes 20% of the overall facial region. The forehead is isolated using a geometric offset found in the facial ROI. In order to mitigate alignment jitter that occurs in subsequent frames, the component keeps track of ROI coordinates and uses them to stabilize the ROI over time. Once the ROI has been established, the segmented regions within the frame are passed further for preprocessing.
4.2
Preprocessing and Signal Formation
During preprocessing, the ROI image is refined further and the contribution to the overall cardiac waveform is established. A gray-scale filter is first applied to the ROI image, followed by contrast equalization and a quality parse that checks for extreme lighting conditions. The zero mean is then derived for each ROI image and cascaded with subsequent derived zero means to form a signal trace as seen in the black trace in figure 4. In order to minimize significant mean drift, an
ICCIP’17, November 2017, Tokyo, Japan
Dustin Terence van der Haar
Wearable Device Signal Formation
Preprocessing
A
B
C
ROI Segmentation Feature Extraction
Face Detection
Classification
Figure 2: A model that achieves vision heart-based sentiment classification.
4.3
Feature Extraction
In order to get a succinct representation of the cardiac waveform, the cardiac signal must undergo feature extraction. According to [2],[12] and [23], the P, Q, R, S and T fiducial points are good features for cardiac applications. The amplitude measurements are directly proportional to the cardiac tissue density that affects polarization and depolarization events, therefore making R peaks the ideal indicator for determining heart rate. The QRS complex is detected using a variation of the Pan-Tompkins method [19] and their amplitudes serve as features in the classification that follows.
4.4
Figure 3: The derived cascaded mean trace for a subject.
adaptive threshold from the previous mean is applied to reject potential artefacts not relevant to the cardiac trace. Once the cardiac waveform has been extracted, a bandpass Butterworth filter between 0.5 Hz and 5 Hz (wider than Poh etal’s [0.75 to 4 Hz] range [20]) minimizes further noise. The filtered cardiac signal can then be passed further for feature extraction.
Classification
Finally, once the features have been derived, classification can take place in order to facilitate sentiment classification. By comparing other previously persisted feature sets (the first video recordings) with the currently captured feature set (the subsequent recordings) an outcome or label can be calculated with a classifier. The classifier is trained to fit the existing basic classes: bored, idle and excited. The derived features of the newly captured sample is tested against it. Based on the performance in [6] and [27] with a traditional ECG signal, the classifier chosen for the model is that of a K-Nearest Neighbor (KNN) of selected components derived from principal component analysis (PCA) on the cardiac trace. The classification outcome is then known and further action within the context may be applied, such as covering a concept again in the classroom or providing additional examples.
Portable Computer Vision-Based Cardiac Estimation as a Teaching Aid
ICCIP’17, November 2017, Tokyo, Japan
more accurate reading or you may risk artefacts from resolution shift, which is inevitable in portable environments. Forehead segmentation both stabilizes ROI drift and jitter, reduces movement artefacts and improves color space variability necessary for cardiac estimation. We have discovered that these potential artefacts do affect the quality of the captured sample and can affect the classification outcome as a result.
Figure 4: The red, blue, combined and green heart traces for a subject, labeled in red, blue, black and green (listed in chronological order from the top of the graph).
5
PRELIMINARY RESULTS AND DISCUSSION
The study aims to provide a biometric system that uses a camera-mounted portable device to capture video, which is used to derive heart variability that serves as a biometric attribute for sentiment classification. The biometric attribute can then be used to train a classifier that can identify potential subjects that are within their line of sight. The system can effectively classify individuals in real-time when facing a single individual. Although it can process multiple subjects at a time, it increases the computational footprint significantly and is bound by the hardware available on hand. As a result of this increase in computational overhead, it takes longer to classify and decreases the battery life dramatically. When analyzing the normalized red, blue, green and combined channel traces derived from the captured sample (figure 4), we can see that the red and green trace contributes more than the blue trace towards the raw cardiac waveform. Upon further analysis we have discovered that the combined trace provides a better waveform for QRS complex detection. However, extreme lighting does affect the overall traces and should be factored into the guidelines of use for the system. We have also discovered that by taking a larger area or ROI, such as the overall face into account, it introduces movement artefacts in the trace that affect cardiac estimation and the overall estimation in a negative manner. Instead, we propose a smaller, more easier to track region with minimal local movement is used to minimize artefacts. Additionally, we have discovered that the captured ROI cannot be too small and should be approximately 20% of the facial region for a
The subsequent cardiac processing and classification yields adequate results even for sentiment classification, using the our methods describe above. The principal component analysis (PCA) of the trace yields good principal components for the KNN classifier. However, PCA does slow down the classification process and alternative methods will be explored further that save computational resources without too much impact on the classification outcome. The study (with 10 participants) shows it is possible to authenticate users in this manner. Although portable devices that capture non-contact biometrics have a great deal of potential there are still user challenges that need to be resolved. Matsushita etal.[18] mentions that through the use of wearable devices, a lower price point is possible, but the initial cost is shifted to the end user instead (the student) and would require further user acceptance. Although, the user gains active personalization, the current cost benefit of the devices and their indirect costs is yet to be ideal for end users (especially in developing countries). Similar to other non-contact biometrics [10, 13], the risk of capturing a subject’s biometric without their consent is increased and needs to be regulated to prevent abuse. During preprocessing any error that prevents the sample from continuing in the pipeline is a relevant metric in the system and all contributes towards the failure to capture rate (FTAR). Overall, in the benchmark the FTAR was low and a large percentage of this error can be attributed towards failure to detect a face and frames that fail the signal formation quality parse (and are typically not skin). In feature extraction, the R peaks present in the QRS complex can be counted within a specific period to determine if any heart beats were missed or artefacts are present in the cardiac trace. The analysis of the cardiac traces has shown that over 92% of the time R peaks are found in the cardiac trace and is directly impacted by QR phase frames being discarded in previous steps.
6
CONCLUSION
According to Jain etal. [15] a biometric should be universal, unique and easily measured for it to be effective for use. The study introduces a non-contact physiological signal-based biometric attribute that any unmasked subject can provide, which only requires a camera. We also explore the potential to integrate it within a portable device context for sentiment classification and align the approach to a lower computational
ICCIP’17, November 2017, Tokyo, Japan footprint with a minimal cost to performance. We also show that color spaces changes found in a segmented forehead region found in a captured live video feed can capture R peaks to estimate heart rate. We show a QRS complex can be extracted and used for effective sentiment classification. Although the study shows sentiment classification is feasible in this manner, there are certain improvements that could be made to improve performance further. One of these improvements include factoring in soft biometrics [14] to reduce the color range required and to extrapolate cardiac phase better during estimation. The study could also benefit from a much clearer measure on classification performance by applying the model to a more substantial data set. Overall, the increase in computational resources in portable devices and other smart mobile phones has allowed us to apply more resource intensive methods within a pervasive context. Although Satyanarayanan’s concerns of privacy and trust in pervasive computing [21] still remain a problem, the potential of pervasive and biometric systems are still on the rise. Just as the digital revolution saw the ushering of digital computers to the masses, so are we transitioning to the age of mobility.
REFERENCES [1] Yicheng Bai, Chengliu Li, Yaofeng Yue, Wenyan Jia, Jie Li, ZhiHong Mao, and Mingui Sun. 2012. Designing a wearable computer for lifestyle evaluation. In Bioengineering Conference (NEBEC), 2012 38th Annual Northeast. IEEE, 93–94. [2] Lena Biel, Ola Pettersson, Lennart Philipson, and Peter Wide. 2001. ECG analysis: a new approach in human identification. Instrumentation and Measurement, IEEE Transactions on 50, 3 (2001), 808–812. [3] VM Boscart, KS McGilton, A Levchenko, G Hufton, P Holliday, and GR Fernie. 2008. Acceptability of a wearable hand hygiene device with monitoring capabilities. Journal of Hospital Infection 70, 3 (2008), 216–222. [4] Chuang-Chien Chiu, Chou-Min Chuang, and Chih-Yu Hsu. 2008. A Novel Personal Identity Verification Approach Using a Discrete Wavelet Transform of the ECG Signal. In Multimedia and Ubiquitous Engineering, 2008. MUE 2008. International Conference on. 201–206. https://doi.org/10.1109/MUE.2008.67 [5] Cryptex. 2011. Sinus Rhythm Labels. (04 2011). http://commons. wikimedia.org/wiki/File:SinusRhythmLabels-it.svg. [6] P De Chazel and RS Reilly. 2000. A comparison of the ECG classification performance of different feature sets. In Computers in Cardiology 2000. IEEE, 327–330. [7] Rod Furlan. 2013. Build your own google glass [Resources Hands On]. Spectrum, IEEE 50, 1 (2013), 20–21. [8] Deryn Graham, Tony Valsamidis, et al. 2006. A framework for elearning: a blended solution?: current developments in technologyassisted education. (2006). [9] Clarence P Groff and Paul L Mulvaney. 2000. Wearable vital sign monitoring system. (Aug. 15 2000). US Patent 6,102,856. [10] Waleed S Haddad. 2005. Non-contact optical imaging system for biometric identification. (Feb. 8 2005). US Patent 6,853,444. [11] Soleiman Hosseinpour, Shahin Rafiee, Seyed Saeid Mohtasebi, and Mortaza Aghbashlo. 2013. Application of computer vision technique for on-line monitoring of shrimp color changes during drying. Journal of Food Engineering 115, 1 (2013), 99 – 114. https://doi.org/10.1016/j.jfoodeng.2012.10.003 [12] Steven A Israel, John M Irvine, Andrew Cheng, Mark D Wiederhold, and Brenda K Wiederhold. 2005. ECG to identify individuals. Pattern recognition 38, 1 (2005), 133–142. [13] Anil Jain, Ruud Bolle, and Sharath Pankanti. 1996. Introduction to biometrics. In Biometrics. Springer, 1–41.
Dustin Terence van der Haar [14] Anil K Jain, Sarat C Dass, and Karthik Nandakumar. 2004. Can soft biometric traits assist user recognition?. In Defense and Security. International Society for Optics and Photonics, 561– 572. [15] Anil K Jain, Arun Ross, and Salil Prabhakar. 2004. An introduction to biometric recognition. Circuits and Systems for Video Technology, IEEE Transactions on 14, 1 (2004), 4–20. [16] Daphne Koller. 2011. Death knell for the lecture: Technology as a passport to personalized education. New York Times 5 (2011). [17] Cuiwei Li, Chongxun Zheng, and Changfeng Tai. 1995. Detection of ECG characteristic points using wavelet transforms. Biomedical Engineering, IEEE Transactions on 42, 1 (Jan 1995), 21–28. https://doi.org/10.1109/10.362922 [18] Nobuyuki Matsushita, Shigeru Tajima, Yuji Ayatsuka, and Jun Rekimoto. 2000. Wearable key: Device for personalizing nearby environment. In Wearable Computers, The Fourth International Symposium on. IEEE, 119–126. [19] Jiapu Pan and Willis J Tompkins. 1985. A real-time QRS detection algorithm. Biomedical Engineering, IEEE Transactions on 3 (1985), 230–236. [20] Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard. 2010. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Optics Express 18, 10 (2010), 10762–10774. [21] Mahadev Satyanarayanan. 2001. Pervasive computing: Vision and challenges. Personal Communications, IEEE 8, 4 (2001), 10–17. [22] T.W. Shen, W.J. Tompkins, and Y.H. Hu. 2002. One-lead ECG for identity verification. In Engineering in Medicine and Biology, 2002. 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society EMBS/BMES Conference, 2002. Proceedings of the Second Joint, Vol. 1. 62–63 vol.1. https: //doi.org/10.1109/IEMBS.2002.1134388 co, F. Canento, A. Fred, and N. Raposo. 2013. [23] H. Silva, A. Louren¸ ECG Biometrics: Principles and Applications. In 6th Int. Conf. on Bio-inspired Systems and Signal Processing. 228–235. [24] Chihiro Takano and Yuji Ohta. 2007. Heart rate measurement based on a time-lapse image. Medical engineering & physics 29, 8 (2007), 853–857. [25] DWF Van Krevelen and R Poelman. 2010. A survey of augmented reality technologies, applications and limitations. International Journal of Virtual Reality 9, 2 (2010), 1. [26] Paul Viola and Michael J Jones. 2004. Robust real-time face detection. International journal of computer vision 57, 2 (2004), 137–154. [27] Yongjin Wang, Konstantinos N Plataniotis, and Dimitrios Hatzinakos. 2006. Integrating analytic and appearance attributes for human identification from ECG signals. In Biometric Consortium Conference, 2006 Biometrics Symposium: Special Session on Research at the. IEEE, 1–6.