System On Chip

0 downloads 0 Views 3MB Size Report
Aug 11, 2017 - Learning. Deep Learning. DNNs. Multi-Layer. Neural Networks. Recurrent. Neural Networks ... DIC tth@2017. 11. Google Machine Learning Machine (TPU 2017) .... CMOS IN. HDMI OUT. SDSoC --all C/C++ programing c ...
Learning Machine Learning by Design an experience sharing using Xilinx SoC T. Hui Fellow of Institution of Engineers Singapore (FIES) OpenHW2017@107

Who we are? Architecture & Sustainable Design

Electrical Engineering

Engineering Information Product Engineering Systems Development Systems Technology & Design & Design

Mechanical Engineering Humanities, Arts, & Social Sciences

4/8/2017

Materials

Design Science

DIC tth@2017

2

Design project background • 2D design in Term-8 courses – Digital Integrated Circuits, Electrical Power System, • A term product design projects – 10 weeks duration, • Highly independent product design project, • A group product design project, • A working prototype, • Budget SG$600.00, use minimum possible as part of the evaluation criteria.

4/8/2017

DIC tth@2017

3

Design goal formulation Customer: faculties

Business: education

Customer: Year-1/2

Business: SUTD

Product: SoC

Product / Service?

AI

Smart campus

Technology?

Technology: ML

Convergence 4/8/2017

Divergence

Convergence DIC tth@2017

Divergence 4

Project deadline May 22nd 2017

August 11st 2017 4/8/2017

DIC tth@2017

5

What are our resources?

4/8/2017

DIC tth@2017

6

Where we are? Artificial Intelligence (AI) Computer Vision

Pattern Recognition

Linear Regression

K-Means Clustering

Multi-Layer Neural Networks

4/8/2017

...

Machine Learning

Deep Learning DNNs

Convolutional Neural Networks

DIC tth@2017

Cognitive Robotics

...

Decision Trees

...

Fuzzy Systems

Reinforcement Learning

Recurrent Neural Networks

7

[https://www.cbinsights.com/research/artificial-intelligence-top-startups/] 4/8/2017

DIC tth@2017

8

Why learning Machine Learning?

[https://www.forbes.com/sites/moorinsights/2017/03/03/a-machine-learning-landscape-where-amd-intel-nvidia-qualcomm-and-xilinx-ai-engineslive/#709344e9742f] 4/8/2017 DIC tth@2017 9

What ML hardware are available?

[https://www.forbes.com/sites/moorinsights/2017/03/03/a-machine-learning-landscape-where-amd-intel-nvidia-qualcomm-and-xilinx-ai-engineslive/#709344e9742f]

4/8/2017

DIC tth@2017

10

Google Machine Learning Machine (TPU 2017)

4/8/2017

DIC tth@2017

11

Microsoft Machine Learning Machine

4/8/2017

DIC tth@2017

12

Learning Machine Learning by SoC • Machine learning

• Artificial Intelligence, Learning (Supervised, Unsupervised, reinforcement), … • Artificial Neural Network (ANN) • Convolution, activation function, perceptron, backpropagation, ANN, Convolution Neural Network, …

• ARM microprocessor

• Architecture, instruction set, memory, multi-core, …

• System On Chip

• Hardware / software co-design

• Training and Inference

• GPU – training (NVidia GPU can help in) • SoC – inference (Using Zynq from Xilinx - smart, distribute, standalone)

• Integrated Circuits Design

• ASIC for Machine Learning Microprocessor – ultimate goal and practical implementation for dedicated ML𝜇P

4/8/2017

DIC tth@2017

13

Example of dedicated ML𝜇P A 2.9TOPS/W Deep Convolutional Neural Network SoC in FD-SOI 28nm for Intelligent Embedded Systems, ISSCC2017. ST Microelectronics

4/8/2017

DIC tth@2017

14

Example of dedicated ML𝜇P A 288μW Programmable Deep-Learning Processor with 270KB On-Chip Weight Storage Using Non-Uniform Memory Hierarchy for Mobile Intelligence, ISSCC2017. CubeWorks

4/8/2017

DIC tth@2017

15

Design goal formulation Cost: < 28 nm, $$$

ML𝜇P Time: 10 Weeks

Convergence 4/8/2017

DIC tth@2017

Divergence 16

Why SoC? • System on Board

4/8/2017

• System on Chip

DIC tth@2017

17

Why SoC? SoB (burger)

4/8/2017

SoC (bun)

DIC

18

Design goal formulation Video processing

Deep learning

ML CNN

Image processing

ANN

Convergence 4/8/2017

Divergence

Convergence DIC tth@2017

Divergence 19

Design with Zynq – Image and Video Processing, and Computer Vision Huge parallelism for image processing. Full High Definition (HD): 1920 x 1080 pixels = 2,073,600 pixels, 3 channels per pixel for color, 8 bits per channel (R, G, B), = 24 bits per pixel. 1920 x 1080 pixels = 49,766,400 bits per single HD image.

Image representation in textual, numeric description. Training set.

Zynq-PS: NEON

Identification of lines, curves, shapes and regions. Hough Zynq-PS transform, color identification, thresholding, morphology. Pre-processing: adjustments of color balance, contrast, edge. Sobel filter. 4/8/2017

DIC

Zynq-PL

20

What is the platform? EagleGo

4/8/2017

DIC tth@2017

21

Convolutional Neural Network (CNN)

[http://www.nature.com/nature/journal/v521/n7553/fig_tab/nature14539_F1.html]

4/8/2017

DIC tth@2017

22

Artificial Neural Network (ANN) 𝟏

𝒃

𝒙𝟏

𝒘𝟏

Perceptron

𝒘𝟐 𝒙𝟐

𝒚 ෍ 𝑦 = 𝐬𝐠𝐧(σ𝑗(𝑤𝑗 ⋅ 𝑥𝑗 ) + 𝑏)

Classification

Sum

𝑥2

𝒚=𝟏 𝒚=𝟎

1,1 0,1

0,0

4/8/2017

AND Perceptron Decision Boundary

𝑥1

1,0 𝑥2 = −𝑥1 + 1.5

DIC tth@2017

23

ANN training by backpropagation Forward Propagation: find the outputs

𝑥1

𝑤1

𝑦1

𝑏1 |𝑥ℎ1

𝑤2

𝑤ℎ1

Update of hidden layer’s weight depends on • learning rate, • error, • hidden layer gradient, • hidden layer input.

𝑦ℎ 𝑏ℎ |𝑧𝑐ℎ

𝑤3 𝑥2

𝑤4

Back Propagation,

4/8/2017

𝑦2

𝑏2 |𝑥ℎ2

𝜕𝜉 , 𝜕𝑤ℎ𝑗

𝑤ℎ2

Error, 𝜉 =

1 2

𝑧ℎ𝑡 − 𝑧ℎ𝑐

2

update the weights / bias

DIC tth@2017

24

ANN training by backpropagation Delta Rule: • Learning from mistakes • “Delta”: difference between targeted and calculated output Error ∝ target zh − calculated zh 1 2 𝜉 = ෍ 𝑧ℎ𝑡𝑗 − 𝑧ℎ𝑐𝑗 Delta rule 2

𝜉=

1 𝑧 − 𝑧ℎ𝑐 2 ℎ𝑡

2

𝜕𝜉 = −(𝑧ℎ𝑡 − 𝑧ℎ𝑐 ) 𝜕𝑧ℎ𝑐

𝑗

Gradient update 𝑖 Δ𝑤ℎ = 𝑤ℎ𝑗 − 𝑤ℎ𝑗 = −𝛼

𝜕𝜉 𝜕𝑤ℎ𝑗

𝜕𝜉 𝜕𝜉 𝜕𝑧ℎ𝑐 𝜕𝑦ℎ = ⋅ ⋅ 𝜕𝑤ℎ𝑗 𝜕𝑧ℎ𝑐 𝜕𝑦ℎ 𝜕𝑤ℎ𝑗

Gradient descent

4/8/2017

DIC tth@2017

25

ANN – Example For example, base on the pass exam data in the following table, predict the Final pass/fail for a student who has Study Hours=25 hours, and Mid-term Test=70. Study Hours

Mid-term Test

Final

35

67

1 (pass)

12

75

0 (fail)

16

89

1 (pass)

45

56

1 (pass)

10

90

0 (fail)

Training of the ANN to get error approaching zero, the ANN is then used to predict the desire Input (25, 70)

Trained

Trained

x1 x2 w1 w2 w3 w4 b1 b2 y1 y2 xh1 xh2 wh1 wh2 bh yh zch zth Err 0.25 0.70 4.40 22.65 1.86 3.74 -2.93 -6.44 -0.53 1.84 0.37 0.86 -4.31 19.92 -5.46 10.14 1.00 1.00 0.00

Desired input (not in database) Inference: Using trained model to predict/estimate outcomes from new observations. 4/8/2017

DIC tth@2017

Calculated output Predicted output 26

Concept question: Image

0 Sharpen 0 0 0 0 0 0 0

0 3 0 1 1 3 0 0

0 1 6 7 7 7 3 0

0 3 6 0 1 7 1 0

0 1 7 1 3 6 1 0

Filter

0 3 7 7 7 5 3 0

0 1 1 3 3 3 1 0

0 0 0 0 0 0 0 0

Max Pooling

0 -1 0 14 -7 7 (x) -1 5 -1 = -10 16 14 0 -1 0 -3 21 -15 -6 19 -12 7 15 20 -6 7 -6

-8 20 -12 0 14 -5

6 17 17 17 6 8

1 16 20 17 -6 -> 21 0 17 4 15 20 8 2 6 -1

Average Pooling

3.25 8.25 4.5 7.75 -9.8 10 5.75 5.75 4.8

1. Perform Convolution on the image using Sharpen filter 2. By changing one of the number in the filter to reduce the number 14 at the top left corner 3. Perform Average Pooling

4/8/2017

DIC tth@2017

27

Concept question: Image

0 Sharpen 0 0 0 0 0 0 0

0 3 0 1 1 3 0 0

0 1 6 7 7 7 3 0

0 3 6 0 1 7 1 0

0 1 7 1 3 6 1 0

Filter

0 3 7 7 7 5 3 0

0 1 1 3 3 3 1 0

0 0 0 0 0 0 0 0

Max Pooling

0 -1 0 8 -13 (x) -1 5 -1 = -17 16 0 -1 -1 -10 20 -13 12 4 14 -6 7

0 13 -18 -18 19 -6

-15 13 -19 -5 11 -5

5 14 14 14 5 8

1 16 13 14 -6 -> 20 -5 14 4 14 19 8 2 6 -1

Average Pooling

-1.5 2.75 3.5 2.25 -15 8.5 4.75 4.75 4.5

1. Perform Convolution on the image using Sharpen filter 2. By changing one of the number in the filter to reduce the number 14 at the top left corner 3. Perform Average Pooling

4/8/2017

DIC tth@2017

28

Summary of background knowledge study • It is good to know the theoretical background, • However, the timeline gives no point to design your own net (advanced project otherwise).

4/8/2017

DIC tth@2017

29

Design goal formulation Caffe Tensorflow

Training

DNN

...

? Inference

Convergence 4/8/2017

Divergence

Convergence DIC tth@2017

Divergence 30

Basic setup SDSoC --all C/C++ programing c

rgb2gray

sharpen

CMOS IN

4/8/2017

sobel_filter

HDMI OUT

DIC tth@2017

31

Example project-1: attendance

4/8/2017

DIC tth@2017

32

Example project-1

Training set preparation – limited set Increase the training set by: • Rotation (90°, 180°) • Distortion (Single value decomposition) • Filters (Grey scale)

4/8/2017

DIC tth@2017

33

Example project-1

Workflow SDSoC

Video for Camera Input

White Balance Filter for Input Image

Compare Frame to Network

Frame Converted to Picture

Matching

4/8/2017

DIC tth@2017

34

Example project-2: marks detection • Binarized Neural Network (BNN) – in Zynq

4/8/2017

DIC tth@2017

35

Learning Machine Learning by Design End of Presentation – thanks to Project-1: Amos Ho | Andrew Sng | Sabareesh Nair | Stanley Loh | Threvin Anand | Yap Pin Yaw Project-2: Jiong Le|Jien Yi