ATLAS: automatic temporal segmentation and annotation of lecture based on modeling transition time
Rajiv Ratn Shah*
Yi Yu*
Suhua Tang# Roger Zimmermann* Anwar D. Shaikh*
*School
of Computing, National University of Singapore #School of Informatics and Engineering, The University of Electro-Communications {rajiv,yuy,rogerz}@comp.nus.edu.sg,
[email protected] ACM Multimedia Grand Challenge, November 5, 2014 VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
1
VideoLectures.NET Challenge The number of online lecture videos available is increasing rapidly There is still insufficient accessibility and traceability of lecture video contents It is very desirable to enable people to navigate and access specific topics within lecture videos
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
2
ATLAS ATLAS is our solution to this challenge ATLAS works in two phases
Transition times prediction
Text annotations determination
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
3
Demo
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
4
Contributions ATLAS has two main novelties
A SVM hmm model is proposed to learn temporal transition cues
A fusion scheme is suggested to combine transition cues extracted from heterogeneous information of lecture videos
Text annotations corresponding to these temporal segments are determined by assigning the most frequent N-gram token of the subtitle resource tracks (SRT) block under consideration
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
5
System Overview
TT2
Fusion
TT1
Temporal transition: SVM hmm + fusion Text annotations: N-gram token of SRT VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
6
Evaluation PTT1
ATT1
PTTi A= (a1, a2, a3, …, ap)
ATTj
td1
tdi
T= (t1, t2, t3, …, tq)
PTT = Predicted Transition Time, ATT = Actual Transition Time, td = time difference between ATT and nearest PTT
∑𝑟𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑃𝑃𝑃𝑃_𝑆𝑆𝑆 = 𝑁 𝑟 ∑𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑅𝑅𝑅𝑅𝑅𝑅_𝑆𝑆𝑆 = 𝑀 where, N = # ATTs, M = # PTTs and r = # ( PTTi, ATTj ) pairs
PTT4
Extra PTT
tdi = | PTTi – ATTj | // tdi is the time diff
PTT5 ATT4 Missed ATT PTT6
ATT5 ATT6 ATT7
PTT7 PTTN
Extra PTT
td4
ATTM
td5 tdn
If tdi < 5 Score(PTTi , ATTj ) = 1.0 ElseIf tdi < 10 Score(PTTi , ATTj ) = 0.8 ElseIf tdi < 15 Score(PTTi , ATTj ) = 0.6 ElseIf tdi < 20 Score(PTTi , ATTj ) = 0.4 ElseIf tdi < 25 Score(PTTi , ATTj ) = 0.2 Else Score(PTTi , ATTj ) = 0.0
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
7
Results
Effect of fusion VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
8
Conclusion ATLAS determines the temporal segmentation by fusing the transitions cues computed from the visual contents and the text analysis ATLAS annotates the texts corresponding to the determined temporal transitions ATLAS facilitates the accessibility and traceability within lecture video contents
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
9
Acknowledgment The authors are very grateful to Nimisha Drolia and the anonymous reviewers for constructive suggestions to improve the quality of this work. This research has been supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office through the Centre of Social Media Innovations for Communities (COSMIC).
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
10
Thank You
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
11
Evaluation (1) PTT1
ATT1
PTTi ATTj
td1
tdi
A=(a1, a2, a3, …, ap) T=(t1, t2, t3, …, tq)
where, PTT = Predicted Transition Time,
PTT4
Extra PTT
ATT = Actual Transition Time,
PTT5 ATT4 Missed ATT PTT6
ATT5 ATT6 ATT7
PTT7 PTTN
Extra PTT
td4
ATTM
td = time difference between ATT and nearest PTT
td5 tdn
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
12
Evaluation (1) tdi = | PTTi – ATTj |
PTT1
ATT1
PTTi
td1
tdi
ATTj A=(a1, a2, a3, …, ap) T=(t1, t2, t3, …, tq)
PTT4
Extra PTT
PTT5 ATT4 Missed ATT PTT6
ATT5 ATT6 ATT7
PTT7 PTTN
Extra PTT
td4
ATTM
td5 tdn
// tdi is the time diff between PTTi and the nearest ATTj
If tdi < 5 Score(PTTi , ATTj ) = 1.0 ElseIf tdi < 10 Score(PTTi , ATTj ) = 0.8 ElseIf tdi < 15 Score(PTTi , ATTj ) = 0.6 ElseIf tdi < 20 Score(PTTi , ATTj ) = 0.4 ElseIf tdi < 25 Score(PTTi , ATTj ) = 0.2 Else Score(PTTi , ATTj ) = 0.0
∑𝑟𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑃𝑃𝑃𝑃_𝑆𝑆𝑆 = 𝑁 ∑𝑟𝑘=1 𝑆𝑆𝑆𝑆𝑆(𝑃𝑃𝑇𝑖 , 𝐴𝐴𝑇𝑗 ) 𝑅𝑅𝑅𝑅𝑅𝑅_𝑆𝑆𝑆 = 𝑀
where, N = # ATTs, M = # PTTs and r = # ( PTTi, ATTj ) pairs
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
13
System Overview (2) N-gram SVMhmm language model model
TT2
Fusion
TT1
Temporal transition: SVM hmm + fusion Text annotations: N-gram token of SRT VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
14
System Overview (3) SVMhmm model
TT1 TT2
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
15
System Overview (4) N-gram language model
TT1 TT2
VideoLectures.NET Challenge (MediaMixer, transLectures):Temporal segmentation and annotation of Lecture videos
16