ATLAS: automatic temporal segmentation and

0 downloads 0 Views 583KB Size Report
Score(PTTi , ATTj ) = 0.0. PTT. 1. PTT. 5. PTT. 6. PTT. 7. ATT. 1. ATT. 4. ATT. 7. ATT. M. ATT. 5. PTT. N. Missed. ATT. Extra. PTT. ATT. 6 td1 td4 td5 tdn. PTT i. ATT.
ATLAS: automatic temporal segmentation and annotation of lecture based on modeling transition time

Rajiv Ratn Shah*

Yi Yu*

Suhua Tang# Roger Zimmermann* Anwar D. Shaikh*

*School

of Computing, National University of Singapore #School of Informatics and Engineering, The University of Electro-Communications {rajiv,yuy,rogerz}@comp.nus.edu.sg, [email protected] ACM Multimedia Grand Challenge, November 5, 2014 VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

1

VideoLectures.NET Challenge  The number of online lecture videos available is increasing rapidly  There is still insufficient accessibility and traceability of lecture video contents  It is very desirable to enable people to navigate and access specific topics within lecture videos

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

2

ATLAS ATLAS is our solution to this challenge ATLAS works in two phases 

Transition times prediction



Text annotations determination

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

3

Demo

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

4

Contributions  ATLAS has two main novelties 

A SVM hmm model is proposed to learn temporal transition cues



A fusion scheme is suggested to combine transition cues extracted from heterogeneous information of lecture videos

 Text annotations corresponding to these temporal segments are determined by assigning the most frequent N-gram token of the subtitle resource tracks (SRT) block under consideration

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

5

System Overview

TT2

Fusion

TT1

 Temporal transition: SVM hmm + fusion  Text annotations: N-gram token of SRT VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

6

Evaluation PTT1

ATT1

PTTi A= (a1, a2, a3, …, ap)

ATTj

td1

tdi

T= (t1, t2, t3, …, tq)

PTT = Predicted Transition Time, ATT = Actual Transition Time, td = time difference between ATT and nearest PTT

∑𝑟𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑃𝑃𝑃𝑃_𝑆𝑆𝑆 = 𝑁 𝑟 ∑𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑅𝑅𝑅𝑅𝑅𝑅_𝑆𝑆𝑆 = 𝑀 where, N = # ATTs, M = # PTTs and r = # ( PTTi, ATTj ) pairs

PTT4

Extra PTT

tdi = | PTTi – ATTj | // tdi is the time diff

PTT5 ATT4 Missed ATT PTT6

ATT5 ATT6 ATT7

PTT7 PTTN

Extra PTT

td4

ATTM

td5 tdn

If tdi < 5 Score(PTTi , ATTj ) = 1.0 ElseIf tdi < 10 Score(PTTi , ATTj ) = 0.8 ElseIf tdi < 15 Score(PTTi , ATTj ) = 0.6 ElseIf tdi < 20 Score(PTTi , ATTj ) = 0.4 ElseIf tdi < 25 Score(PTTi , ATTj ) = 0.2 Else Score(PTTi , ATTj ) = 0.0

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

7

Results

Effect of fusion VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

8

Conclusion  ATLAS determines the temporal segmentation by fusing the transitions cues computed from the visual contents and the text analysis  ATLAS annotates the texts corresponding to the determined temporal transitions  ATLAS facilitates the accessibility and traceability within lecture video contents

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

9

Acknowledgment  The authors are very grateful to Nimisha Drolia and the anonymous reviewers for constructive suggestions to improve the quality of this work.  This research has been supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office through the Centre of Social Media Innovations for Communities (COSMIC).

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

10

Thank You

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

11

Evaluation (1) PTT1

ATT1

PTTi ATTj

td1

tdi

A=(a1, a2, a3, …, ap) T=(t1, t2, t3, …, tq)

where, PTT = Predicted Transition Time,

PTT4

Extra PTT

ATT = Actual Transition Time,

PTT5 ATT4 Missed ATT PTT6

ATT5 ATT6 ATT7

PTT7 PTTN

Extra PTT

td4

ATTM

td = time difference between ATT and nearest PTT

td5 tdn

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

12

Evaluation (1) tdi = | PTTi – ATTj |

PTT1

ATT1

PTTi

td1

tdi

ATTj A=(a1, a2, a3, …, ap) T=(t1, t2, t3, …, tq)

PTT4

Extra PTT

PTT5 ATT4 Missed ATT PTT6

ATT5 ATT6 ATT7

PTT7 PTTN

Extra PTT

td4

ATTM

td5 tdn

// tdi is the time diff between PTTi and the nearest ATTj

If tdi < 5 Score(PTTi , ATTj ) = 1.0 ElseIf tdi < 10 Score(PTTi , ATTj ) = 0.8 ElseIf tdi < 15 Score(PTTi , ATTj ) = 0.6 ElseIf tdi < 20 Score(PTTi , ATTj ) = 0.4 ElseIf tdi < 25 Score(PTTi , ATTj ) = 0.2 Else Score(PTTi , ATTj ) = 0.0

∑𝑟𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑃𝑃𝑃𝑃_𝑆𝑆𝑆 = 𝑁 ∑𝑟𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑅𝑅𝑅𝑅𝑅𝑅_𝑆𝑆𝑆 = 𝑀

where, N = # ATTs, M = # PTTs and r = # ( PTTi, ATTj ) pairs

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

13

System Overview (2) N-gram SVMhmm language model model

TT2

Fusion

TT1

 Temporal transition: SVM hmm + fusion  Text annotations: N-gram token of SRT VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

14

System Overview (3) SVMhmm model

TT1 TT2

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

15

System Overview (4) N-gram language model

TT1 TT2

VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos

16