Jointly Optimize Data Augmentation and Network Training Adversarial Data Augmentation in Human Pose Estimation Xi Peng*†, Zhiqiang Tang*†, Fei Yang §, Rogerio Feris ‡, Dimitris Metaxas † Rutgers, The State University of New Jersey † Facebook § IBM T.J. Watson Research Center ‡ * Contributed equally, project page: https://sites.google.com/site/xipengcshomepage/cvpr2018
Highlights
Motivation:
Approach
Results (a) Network architecture
Target Network
Random
Loss/ PCKh
Random data augmentation
256C
Harder?
Hourglass network
: BN-ReLU-Conv1x1-BN-ReLU-Conv3x3-BN-ReLU-Conv1x1 4⇥4
Adversarial
low
16 ⇥ 16
0.7
1.0
-40
1.3
Scaling
Rotating
al
al on iti nd Co
AAADEnicfVLLbtNAFB2bVwmPprBkMyJCSlCJ7C6gUjeVKAoLJIpE2koZyxqPb5JRPWMzM0aNRv4GNvwKGxYgxJYVO/6GseNCSCuuZOn4nHvOXN9xUmRcmyD45flXrl67fmPjZufW7Tt3N7tb9450XioGY5ZnuTpJqIaMSxgbbjI4KRRQkWRwnJw+r/Xj96A0z+VbsyggEnQm+ZQzahwVb3l9ksCMSwvvyoZ6XHWIoGexJWYOhsZ2VFWY7GFSytTlgLFONvNkas8qorkgrwXMaNWyiX1RuYA/vZjoMtGGslOXR8vYqsZkyYgKQV0wwXgpzJfCqP83frsd4WDgRlg7ocaMZvZVNTnonyeseAfb+PxlUQ0i/ARfZlH/sXQIyHRlLXG3FwyDpvBFELagh9o6jLs/SZqzUoA0LKNaT8KgMJGlynCWQb0lDYXbDJ3BxEFJBejINlda4UeOSfE0V+6RBjfsqsNSofVCJK6znlmvazV5mTYpzXQ3slwWpQHJlgdNywybHNf/B065AmayhQOUKe5mxWxOFWXG3WjHLSFc/+SL4GhnGAbD8M1Ob/9pu44N9AA9RH0UomdoH71Eh2iMmPfB++R98b76H/3P/jf/+7LV91rPffRP+T9+A4AGAHE=
E
⌧r ⇠ ⌧h ⇠G(x,✓D )
❖ ❖
L[D(⌧h (x), y)] “hard” aug.
min E ✓D
L[D(⌧r (x), y)]
x⇠⌦ ⌧h ⇠G(x,✓D )
“random” aug.
(c) Adversarial data augmentation:
0.7
1.0
1.3
Scale distribution
❖ ❖
Enhance the training effect without looking for more labeled data. Plug-and-play for general target networks, e.g. image classification, segmentation. Jointly optimization without stop-and-retraining.
: BN-ReLU-Conv1x1-BN-ReLU-Conv3x3 : BN-ReLU-Conv1x1-Pooling or Upsampling
Augmentation Network
Dense Block
0
Rotation
60
10 epoch
1
13
⇥10
100 epoch
1
13
⇥10
11
9
9
9
7
7
3
13
17
7
3
10⇥10
6
9 7
3
13
17
⇥10
4
3
10 ⇥10
6
9 -60
-45
0
Rotation
45
60
7
3
8
11
15
3
8
11
15
13
⇥10
200 epoch
1
-60
-45
0
Rotation
45
60
4
-60
-45
0
Rotation
45
60
(c) Comparison of
[email protected] on LSP:
Evaluate the generation quality + learn from “hard” data augmentations.
4x4
8⇥8
11
19⇥10
32 x 32
Chu et al. HGs (8) + Ours
Head 98.1 98.2 98.6
Elbow 89.3 91.2 92.8
Wrist 86.9 87.2 90.0
Hip 93.4 93.5 94.8
Knee 94.0 94.5 95.3
Ankel 92.5 92.6 94.5
Mean 92.6 93.0 94.5
64 x 64
HGs (8)
Mixed Gaussians
Main Contributions: ❖
128C
11
13
“hard” aug.
Up Scaling 16 x 16 8x8
13
⇥10
19⇥10
L[D(⌧h (x), y)]
at aF
lo w
E
D
Jointly optimize Data Augmentation and Network Training. More effective data augmentation more effective training.
Or
(b) Visualization of training status:
-60
Target Network
Rotating Occluding “Hard” Data Augmentation
Augmentation Network
(b) Discrimination path (target network): AAAC0HicbVFbb9MwFHbCbYRbgUdeLCqkFk1VsgeY2MskiuABxEB0m1RHkeOetNZsJ4sdtMpYiFd+Hm/8Av4GbpZppeNIlr7znfOdm/NKcG3i+HcQXrt+4+atrdvRnbv37j/oPXx0qMumZjBhpSjr45xqEFzBxHAj4LiqgcpcwFF+8noVP/oKteal+mKWFaSSzhUvOKPGU1nvD8lhzpWF06ZlnruICJqD8MyrsXckV5klZgGGZnbsHCZ7mDRq5muCsURSs8gLe+aI5pJ8lDCnrmNz+6ZL34suFZgY2mR20Qrs28Flhe2uy3jo8GaNqHUZFfa9m44HFzXW1MNtfOEs3TCNCKjZ2lZZrx+P4tbwVZB0oI86O8h6v8isZI0EZZigWk+TuDKppbXhTIA/TKOhouyEzmHqoaISdGrbD3H4mWdmuChr/5TBLbuusFRqvZS5z1zNrDdjK/J/sWljit3UclU1BhQ7b1Q0ApsSr34Xz3gNzIilB5TV3M+K2YLWlBl//cgfIdlc+So43Bkl8Sj5tNPff9GdYws9QU/RACXoJdpH79ABmiAWfAh08C1w4efwLPwe/jhPDYNO8xj9Y+HPv9th5Ms=
Scaling
Occluding
Generate more difficult data augmentations than random.
Generation
Training Data
40
Distribution
x⇠⌦
✓G
Co nd iti on
0
32 ⇥ 32 64 ⇥ 64
(a) Generation path (augmentation network):
Our approach:
Competitive
Residual Block
⇠ ⇠
Half hourglass network
Reward/Penalty
max E
Agent
: Pooling or Upsampling 16 ⇥ 16 32 ⇥ 32 64 ⇥ 64
128C
HG Loss
❖
Same augmentation strategy for all data individual difference. Data augmentation can NOT follow network training isolated. Ineffective data augmentation ineffective training.
8⇥8
Target Network
HG Loss
❖
Da ta F
Scaling Rotating Occluding Random Data Augmentation ❖
Loss/ PCKh
Target Network
Training Data
256C
+ Ours -40
0
40
Rotation distribution
Mixed Gaussians
Scaling and rotating (mixed Gaussian)
Pose Network
Hierarchical occluding (scaled-up masks)
[HGs] Newell et. al “Stacked hourglass networks for human pose estimation.” In ECCV 2016. [HPG] Wang et. al “A-fast-rcnn: hard positive generation via adversary for object detection.” In CVPR 2017.