In this project we will try to learn machine learning by applying it to predict human pose from a RGB image. Human pose estimation is an interesting area of machine learning with a lot of applications like.
Artificial coach for learning new motor skills, sports, etc.
A gym assistant to guide you with your exercises.
Artificial physiotherapist for helping you get back from a musculoskeletal problem.
Gesture interface for a more natural way of interacting with your computer.
Personal assistant for improving your posture.
Analysis of nonverbal communication.
Controlling virtual characters.
These are some applications that excites me but there are many more.
We will start with very simple model and slowly add new ideas to make it better. You can find all the code on Github
Model 1 - SimplePose
In this simplest model we will use a simple convolutional neural network that are used for image classification but instead of predicting the class of the object present we will try to predict the x, y coordinates of the joints.
Architecture
Code
Results
The model is trained on the extended LSP dataset.
To test the model I used images from the original LSP dataset. Here are some results from the model.
Test 1
Test 2
Test 3
As you can see the model is able to get the approximate pose right but is not that accurate.
Improvements
One simple improvement that we can make to this model is that we could take the initial points predicted by this model and then crop the original image around those points and then train a second neural network to make a better prediction around that point.
This will give us the approach used by DeepPose model in this paper
Model 2 - HmapPose
Regressing directly to x, y coordinates is hard. In this model we will try to solve that problem by using a fully convolutional architecture to generate heat maps of where the points are in the image.
Architecture
Code
Results
Test 1
Test 2
Heatmap Order:
- Right ankle
- Right knee
- Right hip
- Left hip
- Left knee
- Left ankle
- Right wrist
- Right elbow
- Right shoulder
- Left shoulder
- Left elbow
- Left wrist
- Neck
- Head top
- All Joints
As you can see, this model has difficulty distinguishing symmetrical body parts like knee, hips, elbow etc. It is very hard to differentiate between left and right knee if you are just looking at a single knee and a small area around it.
Improvements
We can use the same approach we used previously and use the model predictions to crop the input image around that point and use a second neural network to improve upon that prediction.
We can further improve by using the lower level layer activation from the original neural network around the predicted points as the input to the second network. This will prevent redundant computations and improve generalizations.
We can also provide different rescaled versions of the original image to the network to help it deal with different scales in the dataset.
Doing all those improvements will give us the approach used by this paper
Model 3 - Convolutional Pose Machine
This is a model designed to specifically solve the problem of pose estimation. This model uses a multistage architecture. The first stage predicts the approximate joint location heat map and then in the next stage looks at a bigger context and refines those results.
This model comes from this paper
Architecture
Code
Results
Right ankle
Right knee
Right hip
Left hip
Left knee
Left ankle
Right wrist
Right elbow
Right shoulder
Left shoulder
Left elbow
Left wrist
Neck
Head top
If you look closely at the Left Hip example or the Right Wrist example you will see that the model is able to improve its prediction in the subsequent stages by using a more larger context.
Improvements
Lots of well know pose estimation models are build on this model. They add the concept of Part Affinity Fields that are used to predict the relationship between the parts and using this approach they can also do multi-person pose estimation in real time. The famous OpenPose model is based on this architecture.
Learnings
I learned a lot of new things in this project.
Increased my understanding of the PyTorch framework.
Got better at reading machine learning papers and implementing them.
Learned some tricks to debug and profile neural network models.
Gained an intuitive understanding of different architectures.
In the future maybe I will revisit this problem and try more complex and current state of the art model and create a useful application using them.