Markerless Motion Tracking Using Computer Vision and Wearable Sensing for Physical Exercise Quantification

Justin Albert

Chair for Digital Health - Connected Healthcare
Hasso Plattner Institute

Office: Campus III Building G2, Room G-2.1.20
Tel.: +49 331 5509-4853
Email: Justin.albert(at)hpi.de
Links: Homepage

Starting Date: 01.10.2019

In my research, I focus on human motion analysis using primarily 3D cameras but also other sensor modalities such as Inertial Measurement Units and Electrocardiography. The projects range from using a low-cost 3D camera for gait analysis to predicting subjective exertion in strength training. In the following sections, I want to give an overview of the past projects of the last year and my current research.

Current Research

Gait Analysis Using 3D Human Pose Estimation on 2D Images

Introduction

Gait analysis is essential to assess a patient's physical or mental state. It is valuable for the early detection of neurological diseases or evaluation of fall risk in older people. Usually, gait is assessed in a specialized laboratory with expensive marker-based motion capture systems. Medical professionals must attach these markers to the subjects, which are then tracked by an optical system. The recent advances in the field of human pose estimation using deep learning enable motion tracking on images. These algorithms work purely on 2D images and can predict the 3D coordinates of specific human key points (such as head, shoulder, feet, etc.) without any markers. This technology holds considerable potential as it alleviates the need for marker placement on the subjects. In this project, we aim to utilize human pose estimation for gait analysis that can be deployed on consumer-graded devices such as smartphones.

An estimated 3D skeleton based on a monocular 2D video.

Study Setup

We have recorded a dataset of 16 subjects (8 female, 8 male) walking on a treadmill at three different velocities. We recorded the subjects using a 12 MP color camera and a marker-based motion capture system (Vicon). We used the 39 full-body marker Plug-in Gait model. The Vicon system sampled data at 100 Hz, while the RGB camera recorded at 30 fps. We tried different state-of-the-art models, including the GAST-Net [1], MediaPipe [2], or VideoPose3D [3], to estimate the 3D coordinates of humans based on 2D images. After the extraction, we apply signal processing methods such as filtering and temporal and spatial alignment of the skeleton data. We evaluated different aspects, including the spatial agreement of joint locations and gait parameters (step length, step time, step width, and stride time). The Figure below shows an early result of the step length parameter calculated using a model based on VideoPose3D and the Vicon system. The X-axis shows reference values from the Vicon system, and the Y-axis shows the gait parameters from the pose estimator. The Figure indicates that the tracking performance leaves room for improvement.

A scatter plot for the evaluation of the step length parameter. Ground truth values from the Vicon system are represented on the X-axis, the estimated gait parameters are on the Y-axis. For a perfect prediction, all values would lie on the diagonal line.

Data Analysis

The early results of this project have shown that the pre-trained models need further improvements. Most models were per-trained solely on publicly available human activity recognition datasets. Those general-purpose datasets contain many activities; however, gait is usually underrepresented. Therefore, the next step is to fine-tune the pose estimation models on our gait dataset. For this, we prepare the dataset for training by synchronizing the Vicon and Video data temporally. Subsequently, the 3D Vicon markers are projected onto the image plane of the RGB camera. We then train the models to predict the Vicon marker locations with the generated gound-truth data. The hypothesis is that the 3D human pose estimation performance will increase when trained on the gait-specific dataset. The evaluation will quantify how the model has improved compared to the pre-trained model.

Prediction of Subjective Exertion in Resistance Training

Introduction

Quantifying load during physical activity has been of high interest to the research community. For athletes, it is desirable to optimize their exercises to align the applied training load most closely with the value desired by the training plan. Too much load induces a decrease in force production ability and increases the risk of injuries. Exercise load, e.g., during rehabilitation or recreational sports, is also important to avoid injuries for the general population. Training load can be quantified utilizing internal and external measures. External measures include, e.g., the distance traveled, the travel speed, or the lifted weight. Internal load is often measured as a rating of perceived exertion (RPE), which specifies how exhausting an exercise was for a specific person by reporting a single value on a scale. A standard RPE scale is the so-called Borg scale, which ranges from 6 (not exhausting) to 20 (extremely exhausting) [4]. Retrieving such a rating is quickly done by giving subjects a scale to mark their exhaustion a short time after the load concludes. Given this, we aim to build a system that can automatically predict RPE values based on sensor measurements. We hypothesize that such a system could warn users when a significant training overload is experienced to avoid fatigue injuries. In this initial project, we utilize multiple 3D cameras for motion tracking and methods from machine learning to predict subjective RPE values.

Study Setup

For this project, we aim for the maximum effect on exertion. Therefore, the squat exercise was chosen as it involves large muscle groups. The exercises were performed on a so-called flywheel machine. A flywheel training machine does not use a weight that is accelerated downwards by gravity. Instead, all power generated by the subject standing up is stored in a flywheel, transmitted by a belt. This belt is connected to the participant via a hip harness and wrapped around a transmission shaft fixed to the flywheel. Thus, when the participant stands up, he unwraps the belt from the shaft, spinning up the flywheel. Standing up is the concentric movement in a squat. The belt wraps back around the transmission shaft at the topmost position because the flywheel continues to spin. Thus, during the downwards movement, the participant has to deaccelerate the flywheel back down in the eccentric movement. Finally, the subject will again be in a squatting position, as shown in the following figure. In total, N=21 subjects have participated in our study, performing a specific protocol consisting of several sets with 12 repetitions in each set.

Data Analysis

We used two Microsoft Azure Kinect 3D cameras to capture the participants during the experiment. Both cameras were placed at a 45-degree angle, pointing to the subject. In order to obtain one final skeleton, the skeleton from each camera must be integrated into one. The skeleton fusion was achieved by an external camera calibration using calibration patterns. An example sequence of a fused skeleton is shown in the video below. Afterward, signal processing methods must be applied to filter the kinematic data and to remove outliers. In the initial phase of this project, the aim is to explore and analyze the recorded kinematic data and manually craft feature sets. These include various skeleton features, such as relative joint positions, joint angles, and joint angle velocities. After obtaining an extensive feature set, we eliminate meaningless features from the feature set using various feature elimination methods. Subsequently, statistical features such as mean, standard deviation, and median are calculated on the previously mentioned skeleton features. To predict fatigue during squats, the focus, for now, is on conventional machine learning rather than advanced methods from deep learning. We utilize Random Forests, Gradient Boosting Regression, K-NN regression, and multi-layer perceptron (MLP) to predict the subjective value from the Borg scale.

Eingebettetes YouTube-Video

Hinweis: Dieses eingebettete Video wird von YouTube, LLC, 901 Cherry Ave., San Bruno, CA 94066, USA bereitgestellt.
Beim Abspielen wird eine Verbindung zu den Servern von Youtube hergestellt. Dabei wird Youtube mitgeteilt, welche Seiten Sie besuchen. Wenn Sie in Ihrem Youtube-Account eingeloggt sind, kann Youtube Ihr Surfverhalten Ihnen persönlich zuzuordnen. Dies verhindern Sie, indem Sie sich vorher aus Ihrem Youtube-Account ausloggen.

Datenschutzerklärung Video anzeigen

Former Project: Data Augmentation of Kinematic Time-Series from Rehabilitation Exercises

Neurological diseases such as Parkinson's or stroke are common, severe conditions in modern society. Usually, physicians or experts assess the progress of these or other neurological diseases in the hospital. Hence, their decisions can suffer from a subjective bias. Furthermore, many healthcare systems dismiss patients from the rehabilitation program early, forcing them to continue the training program at home without an expert's supervision. Nowadays, exercise recognition systems are developed which can evaluate a user's movement. These systems could support physicians with an objective decision-making process or automatically assess the exercises performed alone at home. Training such a machine learning system requires large amounts of representative data to achieve good results, especially for deep learning-based approaches. Large and diverse datasets are publicly available in the field of Human Activity Recognition (HAR). However, the collection of medical datasets is challenging as access to patients is restricted. Also, detailed knowledge of medical experts and equipment is needed to collect the data and obtain ground truth labels. Especially for studies including a healthy control group, the potentially limited access to patients leads to unbalanced datasets, with most data points belonging to the healthy subjects. To overcome these challenges, a common strategy for increasing the size of a collected dataset is dataset augmentation or the synthesis of entirely new datasets with artificial examples. We have developed a method to generate long-term synthetic sequences of human motion data for a given class utilizing a Generative Adversarial Network (GAN) to tackle this issue.

The here-developed network produces realistic-looking repetitions of a specific exercise over a long period. Our network architecture is inspired by and builds upon the Human-Pose-GAN (HP-GAN) model [5]. The architecture consists of an encoder and a decoder network and takes ten prior poses from an arbitrary sequence. From there, it aims to predict 20 new output poses of the sequence. By recursively inferring the network, the method creates long data sequences. We demonstrated the approach's usefulness by balancing the KIMORE (KInematic Assessment of MOvement and Clinical Scores for Remote Monitoring of Physical REhabilitation) dataset [6]. In this dataset, patient classes are underrepresented compared to the healthy control group. We have trained and focused our approach on the squat exercises performed by Parkinson's disease and stroke patients and healthy persons. For evaluation, we trained a classification network to identify stroke and Parkinson's patients. Balancing the dataset using our method increased the classification accuracy by 11 percentage points for a three-class classification of stroke and Parkinson's disease patients and healthy subjects. The approach and results were published at the IEEE COINS conference in September 2021. The video below shows generated skeleton data for a hand-raise exercise using our algorithm. Shown are ground-truth data, generated data as well as two error cases.

Eingebettetes YouTube-Video

Datenschutzerklärung Video anzeigen

References

Liu, Junfa, et al. "A graph attention spatio-temporal convolutional network for 3D human pose estimation in video." 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021.
Lugaresi, Camillo, et al. "Mediapipe: A framework for perceiving and processing reality." Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR). Vol. 2019. 2019.
Pavllo, Dario, et al. "3d human pose estimation in video with temporal convolutions and semi-supervised training." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
Gunnar Borg. “Perceived exertion as an indicator of somatic stress.” In: Scandinavian journal of rehabilitation medicine (1970).
E. Barsoum, J. Kender, and Z. Liu, “HP-GAN: Probabilistic 3D Human Motion Prediction via GAN,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018.
M. Capecci, M. G. Ceravolo, F. Ferracuti, S. Iarlori, A. Monteri`u, L. Romeo, and F. Verdini, “The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 27, no. 7, pp. 1436–1448, July 2019.

Publications

2024

A computer vision approach to continuously monitor fatigue during resistance training. Albert, Justin Amadeus; Arnrich, Bert in Biomedical Signal Processing and Control (2024). 89 105701.

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

@article{ALBERT2024105701,
  abstract = {Monitoring fatigue during resistance training is essential to avoid injuries caused by overtraining. Fatigue can be comprehensively quantified by the external and internal load, where the external load is the work done by the athlete, and the internal load is the psychological and physiological response to the external load. This paper proposes a computer vision method to continuously monitor fatigue during resistance training by predicting external and internal parameters, namely the generated power and the rating of perceived exertion. We utilize the human pose estimation from two Microsoft Azure Kinect cameras to capture the movement of athletes while performing stationary exercises. Our method processes the obtained kinematic data, computes skeleton features to train traditional machine learning algorithms, and constructs feature maps to train convolutional neural network-based models to predict the load parameters. For evaluation, we recorded a dataset of 16 subjects who performed squat exercises on a Flywheel and rated their perceived exertion after each set. A measuring unit integrated into the Flywheel provided power readings for each repetition. The results show that our method achieves good results in predicting both parameters. Gradient Boosting Regression Trees best predicted perceived exertion with a mean absolute percentage error of 8.08% and a Spearman’s ρ=0.74. Multi-layer Perceptron performed best in predicting power with a mean absolute error of 23.13 Watts and ρ=0.79. Our findings show that our approach delivers promising external and internal load quantifications for fatigue, with great potential to provide external feedback to coaches or athletes.},
  author = {Albert, Justin Amadeus and Arnrich, Bert},
  journal = {Biomedical Signal Processing and Control},
  keywords = {bertarnrich justinalbert},
  pages = 105701,
  title = {A computer vision approach to continuously monitor fatigue during resistance training},
  volume = 89,
  year = 2024
}

2023

Protocol for a Randomized Crossover Trial to Evaluate the Effect of Soft Brace and Rigid Orthosis on Performance and Readiness to Return to Sport Six Months Post-ACL-Reconstruction. Jahnke, Sonja; Cruysen, Caren; Prill, Robert; Kittmann, Fabian; Pflug, Nicola; Albert, Justin Amadeus; de Camargo, Tibor; Arnrich, Bert; Królikowska, Aleksandra; Kołcz, Anna; Reichert, Paweł; Oleksy, Łukasz; Michel, Sven; Kopf, Sebastian; Wagner, Michael; Scheffler, Sven; Becker, Roland in Healthcare (2023). 11(4)

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

@article{healthcare11040513,
  abstract = {A randomized crossover trial was designed to investigate the influence of muscle activation and strength on functional stability/control of the knee joint, to determine whether bilateral imbalances still occur six months after successful anterior cruciate ligament reconstruction (ACLR), and to analyze whether the use of orthotic devices changes the activity onset of these muscles. Furthermore, conclusions on the feedforward and feedback mechanisms are highlighted. Therefore, twenty-eight patients will take part in a modified Back in Action (BIA) test battery at an average of six months after a primary unilateral ACLR, which used an autologous ipsilateral semitendinosus tendon graft. This includes double-leg and single-leg stability tests, double-leg and single-leg countermovement jumps, double-leg and single-leg drop jumps, a speedy jump test, and a quick feet test. During the tests, gluteus medius and semitendinosus muscle activity are analyzed using surface electromyography (sEMG). Motion analysis is conducted using Microsoft Azure DK and 3D force plates. The tests are performed while wearing knee rigid orthosis, soft brace, and with no aid, in random order. Additionally, the range of hip and knee motion and hip abductor muscle strength under isometric conditions are measured. Furthermore, patient-rated outcomes will be assessed.},
  author = {Jahnke, Sonja and Cruysen, Caren and Prill, Robert and Kittmann, Fabian and Pflug, Nicola and Albert, Justin Amadeus and de Camargo, Tibor and Arnrich, Bert and Królikowska, Aleksandra and Kołcz, Anna and Reichert, Paweł and Oleksy, Łukasz and Michel, Sven and Kopf, Sebastian and Wagner, Michael and Scheffler, Sven and Becker, Roland},
  journal = {Healthcare},
  keywords = {sys:relevantfor:dhc bertarnrich justinalbert},
  number = 4,
  title = {Protocol for a Randomized Crossover Trial to Evaluate the Effect of Soft Brace and Rigid Orthosis on Performance and Readiness to Return to Sport Six Months Post-ACL-Reconstruction},
  volume = 11,
  year = 2023
}

2022

PERSIST: A Multimodal Dataset for the Prediction of Perceived Exertion during Resistance Training. Albert, Justin Amadeus; Herdick, Arne; Brahms, Clemens Markus; Granacher, Urs; Arnrich, Bert in Data (2022). 8(1)

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

@article{data8010009,
  abstract = {Measuring and adjusting the training load is essential in resistance training, as training overload can increase the risk of injuries. At the same time, too little load does not deliver the desired training effects. Usually, external load is quantified using objective measurements, such as lifted weight distributed across sets and repetitions per exercise. Internal training load is usually assessed using questionnaires or ratings of perceived exertion (RPE). A standard RPE scale is the Borg scale, which ranges from 6 (no exertion) to 20 (the highest exertion ever experienced). Researchers have investigated predicting RPE for different sports using sensor modalities and machine learning methods, such as Support Vector Regression or Random Forests. This paper presents PERSIST, a novel dataset for predicting PERceived exertion during reSIStance Training. We recorded multiple sensor modalities simultaneously, including inertial measurement units (IMU), electrocardiography (ECG), and motion capture (MoCap). The MoCap data has been synchronized to the IMU and ECG data. We also provide heart rate variability (HRV) parameters obtained from the ECG signal. Our dataset contains data from twelve young and healthy male participants with at least one year of resistance training experience. Subjects performed twelve sets of squats on a Flywheel platform with twelve repetitions per set. After each set, subjects reported their current RPE. We chose the squat exercise as it involves the largest muscle group. This paper demonstrates how to access the dataset. We further present an exploratory data analysis and show how researchers can use IMU and ECG data to predict perceived exertion.},
  author = {Albert, Justin Amadeus and Herdick, Arne and Brahms, Clemens Markus and Granacher, Urs and Arnrich, Bert},
  journal = {Data},
  keywords = {arneherdick bertarnrich justinalbert},
  number = 1,
  title = {PERSIST: A Multimodal Dataset for the Prediction of Perceived Exertion during Resistance Training},
  volume = 8,
  year = 2022
}

Unsupervised Activity Recognition Using Trajectory Heatmaps from Inertial Measurement Unit Data. Konak., Orhan; Wegner., Pit; Albert., Justin; Arnrich., Bert (2022). 304–312.

[ BibTeX ] [ URL ] [ DOI ]

2021

Using Machine Learning to Predict Perceived Exertion During Resistance Training With Wearable Heart Rate and Movement Sensors. Albert, Justin; Herdick, Arne; Brahms, Clemens Markus; Granacher, Urs; Arnrich, Bert (2021).

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

Data Augmentation of Kinematic Time-Series From Rehabilitation Exercises Using GANs. Albert, Justin; Glöckner, Pawel; Pfitzner, Bjarne; Arnrich, Bert (2021). 1–6.

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

2020

Will You Be My Quarantine: A Computer Vision and Inertial Sensor Based Home Exercise System. Albert, Justin; Zhou, Lin; Gloeckner, Pawel; Trautmann, Justin; Ihde, Lisa; Eilers, Justus; Kamal, Mohammed; Arnrich, Bert (2020). (Vol. 14)

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

Evaluation of the Pose Tracking Performance of the Azure Kinect and Kinect v2 for Gait Analysis in Comparison with a Gold Standard: A Pilot Study. Albert, Justin; Owolabi, Victor; Gebel, Arnd; Brahms, Markus Clemens; Granacher, Urs; Arnrich, Bert in MDPI Sensors (2020). 20(18)

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ]

@article{albert2020evaluation,
  abstract = {Gait analysis is an important tool for the early detection of neurological diseases and for the assessment of risk of falling in elderly people. The availability of low-cost camera hardware on the market today and recent advances in Machine Learning enable a wide range of clinical and health-related applications, such as patient monitoring or exercise recognition at home. In this study, we evaluated the motion tracking performance of the latest generation of the Microsoft Kinect camera, Azure Kinect, compared to its predecessor Kinect v2 in terms of treadmill walking usinggold standard Vicon multi-camera motion capturing system and the 39 marker Plug-in Gait model. Five young and healthy subjects walked on a treadmill at three different velocities while data were recorded simultaneously with all three camera systems. An easy-to-administer camera calibration method developed here was used to spatially align the 3D skeleton data from both Kinect cameras and the Vicon system. With this calibration, the spatial agreement of joint positions between the two Kinect cameras and the reference system was evaluated. In addition, we compared the accuracy of certain spatio-temporal gait parameters, i.e., step length, step time, step width, and stride time calculated from the Kinect data, with the gold standard system. Our results showed that the improved hardware and the motion tracking algorithm of the Azure Kinect camera led to a significantly higher accuracy of the spatial gait parameters than the predecessor Kinect v2, while no significant differences were found between the temporal parameters. Furthermore, we explain in detail how this experimental setup could be used to continuously monitor the progress during gait rehabilitation in older people.},
  author = {Albert, Justin and Owolabi, Victor and Gebel, Arnd and Brahms, Markus Clemens and Granacher, Urs and Arnrich, Bert},
  journal = {MDPI Sensors},
  keywords = {bertarnrich justinalbert},
  title = {Evaluation of the Pose Tracking Performance of the Azure Kinect and Kinect v2 for Gait Analysis in Comparison with a Gold Standard: A Pilot Study},
  volume = {20(18)},
  year = 2020
}

2017

Geometric Algebra Computing for Heterogeneous Systems. Hildenbrand, D.; Albert, Justin; Charrier, P.; Steinmetz, C. in Advances in Applied Clifford Algebras (2017). 27 599–620.

[ BibTeX ] [ URL ] [ DOI ]