# Low-Cost and Unobtrusive Human Motion Analysis

## Justin Albert

Chair for Connected Healthcare
Hasso Plattner Institute

Office: Campus III Building G2, Room G-2.1.20
Tel.: +49 331 5509-4853
Email: Justin.albert(at)hpi.de

Supervisor: Prof. Dr. Bert Arnrich

We investigate how low-cost cameras (color and depth cameras like the Microsoft Kinect sensor) can be used for health-related applications such as human motion analysis, gait analysis or exercise recognition. Especially in connection with the rehabilitation of stroke or Parkinson's disease (PD) patients.

## Introduction

Motion analysis is an important tool for the assessment of physiological or neurological diseases. By using inexpensive consumer hardware together with novel tracking algorithms, kinematic data can be easily recorded e.g. alone at home when performing physical exercises, or in a clinical/rehabilitation environment either by relatives or clinical staff for gait analysis or exercise recognition. In our work the goal is to utilize novel human pose estimation methods to obtain kinematic data which is then used as input for our applications including gait analysis or exercise recognition.

## Background: Human Pose Estimation

For the evaluation of human motion, high quality analysis is usually performed in specialized laboratories using high-end multi-camera motion capturing systems. To obtain accurate results, subjects must be prepared with reflective markers, which is a very expensive and time-consuming process and requires presence in the laboratory. In the following we present related work on motion tracking systems and algorithms for cost-effective 2D/3D pose estimation, where we also use some of the algorithms to record our own data sets.

### 2D/3D Human Pose Estimation from RGB Images

The task of human pose estimation is to estimate the position of certain joints of the human body in either 2D or 3D coordinates for one or multiple given input images. Convolutional Neural Networks (CNN) enjoyed huge success in this computer vision task. These networks are generally well suited for image processing tasks because the architecture of the models is designed to work with data arranged in grid structures. One of the first papers (Toshev et al. [1]) in the field of  pose estimation with CNNs aimed at regressing the joint locations as $$(x,y)$$ pixel coordinates directly for a given image, resulting in noisy data. A better strategy for estimating joint locations in images than the simple regression of pixel coordinates is to predict a heatmap where higher values indicate higher confidence of the joint location. In two landmark papers, Stacked Hourglass [2] and Simple Baseline [3], this regression method was used, each with a different arrangement of convolutional layers and optimization strategies but both achieving state-of-the-art performance at the time.

Another commonly used system called OpenPose [7] is a real-time multi-person 2D and 3D pose estimation model that consists of a two-step approach that first identifies key points of one or more persons in a given image and then selects all joints belonging to the same person from the set of all joints found. Figure 1 shows pictures taken from an own training process of a 2D human pose estimation model (implementation of Simple Baseline) utilizing data augmentation techniques by rotating and scaling the images in the training set.

Figure 1: Training Process of a Human Pose Estimation Model

### RGB-D Camera based Human Pose Estimation

In 2010, Microsoft launched its Kinect camera, a motion tracking device with integrated depth sensor designed as a gaming controller for the Microsoft Xbox, brining a cost-effective and unobtrusive motion tracking system onto the global market. Since the second generation of the Kinect camera, the Time-of-Flight (ToF) principle has been used which estimates depth by emitting light into the scene and measuring the time until it gets reflected and returns to the sensor. For 3D motion tracking, Kinect v2 used randomized decision forests to estimate the joint locations, as described in the according paper Shotton et al. [4]. In addition, for each joint position, the joint orientation is given, represented as a local coordinate system that is aligned with respect to the parent joints. In 2019, a new Kinect generation, Azure Kinect, was released where the focus is shifting away from games towards industrial applications. The skeleton tracking algorithm now also uses Deep Learning (Convolutional Neural Networks) to estimate human poses. Both devices are of interest for our work.

## Current Research

### Validation of Pose Estimation Quality of a novel RGB-D Camera Device

As mentioned above, the Microsoft Kinect cameras (or similar depth camera models) are inexpensive and unobtrusive devices, but they also lack accuracy compared to high-end motion capturing systems. In the past, the first and second generation of the Microsoft Kinect camera have been evaluated for a specific application in comparison to a gold standard system to determine whether the Kinect is suitable for a particular physical assessment task. One study (Capecci et al. [5]) found that the Kinect camera could be used to evaluate physical activity in a rehabilitation scenario. Another study (Galna et al. [6]) evaluated how the Kinect sensor could be used to evaluate typical low-back pain exercises. The subject of the evaluation is often the spatial agreement between joints (using certain distance measures) or the range of motion (RoM) of certain limbs or, in gait analysis, the comparison of discrete spatio-temporal gait parameters. The aim of this current project is to investigate the tracking quality of the latest Azure Kinect camera (as mentioned above) in relation to a specific use case of physical activity. The high-end motion detection system Vicon (Vicon, UK) with a full-body 32 reflective marker setup will be used to compare the resulting kinematic data.

### Predicting Clinical Scores based on Kinematic Data

The steps planned for the follow-up project are the use of the Azure Kinect camera or a tracking algorithm presented above to evaluate physical movements by estimating the clinical scores of exercises performed by patients. The deployment of tracking methods and applications on a mobile phone could also be of interest. Supervised machine learning methods such as CNNs, RNNs or LSTMs or even unsupervised methods could be applied to the kinematic data in order to classify patients into categories or regress clinical scores. Figure 2 shows an example from a previous experiment, where a human pose estimation model (OpenPose) was used in order to identify gait cycles during treadmill walking based on the kinematic data (heel strike and toe off events are circled in red). With further processing it is possible to derive a common set of gait parameters. This method could be used to measure patients in a clinical environment with only one RGB camera from e.g. a smartphone.

Figure 2: Application of Deep Learning Model to Measure Gait on a Treadmill

## References

1. Toshev, A.; Szegedy, C.; DeepPose: Human Pose Estimation via Deep Neural Networks, arXiv:1312.4659, doi: 10.1109/CVPR.2014.214

2. Newell A., Yang K., Deng J. (2016) Stacked Hourglass Networks for Human Pose Estimation. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9912. Springer, Cham

3. J. Martinez, R. Hossain, J. Romero and J. J. Little, "A Simple Yet Effective Baseline for 3d Human Pose Estimation," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2659-2668.

4. Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time Human Pose Recognition in Parts from Single Depth Images 2011. pp. 1297–1304. doi:10.1109/CVPR.2011.5995316.
5. Capecci, M.; Ceravolo, M.G.; Ferracuti, F.; Iarlori, S.; Longhi, S.; Romeo, L.; Russi, S.N.; Verdini, F. Accuracy Evaluation of the Kinect v2 Sensor during Dynamic Movements in a Rehabilitation Scenario 2016. pp. 5409–5412
6. Galna, B.; Barry, G.; Jackson, D.; Mhiripiri, D.; Olivier, P.; Rochester, L. Accuracy of the Microsoft Kinect Sensor for Measuring Movement in People with Parkinson’s Disease. Gait and Posture 2014, 39, 1062-1068. doi:10.1016/j.gaitpost.2014.01.008.
7. Cao, Z.; Hidalgo Martinez, G.; Simon, T.; Wei S.; Sheikh, Y.; OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017