Nov 11, 2008 - the mobile 3D video dataflow â first capture artefacts are added, then coding ones, ...... Shift left and right Frames according to âDPC-Offsetâ in.
Application and Visualization
Testsequence- Database
Framework- initialization via Framework GUI (proposed)
Communication
Body: Sequencename(1) Artefact(parameters) ... Artefact(parameters)
...
Capturing stage Rescaling
Blurring
Barrel Distortion
Pincushion Distortion
Vignetting
Interlacing
Cardboard effect by depth Quantization
Vertical Disparity
Keystone Distortion
Chromatic Abberation
Motion Blurring
Temporal Mismatch
Noises
White and Colour Disbalance
Edge Blocking
Colour Bleeding
Mosaic Pattern
Staircase Effect
Asymmetric coding
Depth Smoothing
Depth Bleeding by Misalignment
Depth Bleeding by harsh Quantization
...
Body: Sequencename(n) Artefact(parameters) ... Artefact(parameters)
MISC Import TXT batch file
Scan Sequence(i), get file information
Create New Directory
Import Sequence(i)
i++
More sequences
Get defined artefact(j)
Validate Inputparameters
Real Blocking
No No
j++
...
More artefacts Representation stage 2D+Z to ... Multiview Conversion
Artefact Database Display status
...
Coding stage
...
Testsequence Database
Multiview Frames or Video
Artefact Database (Artefacts are activated over flags)
TXT batch file to communicate between Application layer and Backend
Header: version information and date
Source and Depth Frames or Video
Framework-initialization via EXCEL-File (provided)
Export Sequence(i)
Transmission stage Multiview to 2D+Z Conversion
...
...
Pincushion Distortion
Vertical banding
Cross Distortion
DVB-H packet loss
...
Visualisation stage
...
Barrel Disortion
Temporal Mismatch
Low Level Processing Framewise rendering of Artefact(j) on Sequence(i)
Impaired sequence
Software for simulation of artefacts and database of impaired videos Atanas Boev n
Danilo Hollosi n Atanas Gotchev
...
MOBILE3DTV Project No. 216503
Software for simulation of artefacts and database of impaired videos Atanas Boev, Danilo Hollosi , Atanas Gotchev
Abstract: In this report we present a framework for simulating stereoscopic artefacts. Within the framework, an arbitrary combination of artefacts can be introduced to a video, with controlled amount of impairment for each artefact. We utilize our taxonomy of artefact built earlier, which follows the typical flow of a mobile 3D video over a DVB-H channel and summarises the artefacts into the following groups: capture artefacts, coding artefacts, conversion artefacts, transmission artefacts and visualization artefacts. We introduce an artefact- simulation channel, in which the artefacts are introduced in an order, natural for the mobile 3D video dataflow – first capture artefacts are added, then coding ones, and so forth. For each artefact included in the framework, we explain the algorithm we use for its simulation. Our framework is modular and could be extended to simulate additional types of artefacts. Any particular modelling block can be also modified or replaced. The framework has been implemented under MATLAB. We provide a quick user guide of the implementation, covering the GUI, installation, basic usage and batch operations over a library of stereoscopic videos.
Keywords: 3DTV, mobile video, stereo-video, artefacts, artefact simulation, quality estimation
MOBILE3DTV
D5.2
Executive Summary As mobile 3D video is meant for human observers, the most appropriate measure is quality as perceived by them. A necessary step of designing a perceptual quality metric for 3D video is to perform subjective tests where human observers grade the perceptual quality of a variety of content. A collection of videos, affected by the gradual change of a certain parameter is needed in order to assess its perceptual impact. In this report, we present a system which allows introducing a set of stereoscopic artefacts to a given 3D video, thus ensuring repeatability of subjective experiments. The system simulates the artefacts, which we expect to be the most common for the case of capturing, encoding and transmission of a 3D video, and can be used for generation of database of impaired 3D videos. Not all stereoscopic artefacts are likely to affect a mobile 3DTV system. We identify the most common artefacts to be expected during the data-flow of mobile 3DTV content. Based on our previous research, artefacts are organized in groups, which follow the natural flow of a mobile 3D video over a DVB-H channel. These are capture artefacts, coding artefacts, transmission artefacts, format conversion artefacts, and visualisation artefacts. An arbitrary combination of impairments can be selected, but they are always introduced in a certain order. Our framework simulates the following capture artefacts – content resizing (causing aliasing and improper disparity), blur, motion blur, barrel/pincushion distortion, keystone distortion, temporal mismatch, colour mismatch, and cardboard effect. In the coding stage, these following artefacts are simulated – blocking artefacts (as caused by harsh quantization), block-edge discontinuities, colour bleeding, staircase effect, cross distortion, depth bleeding and depth smoothing. We provide a set of algorithms for converting between dense depth video and stereoscopic video formats, thus simulating disocclusion and temporal inconsistency artefacts which come from format conversion. We simulate transmission errors by obtaining error patterns of the DVB-H channel and use them for simulation of channel losses. Finally, we simulate artefacts caused during reception and visualisation of the content on an autostereoscopic display – temporal mismatch, content resizing, vertical banding and cross-talk. We introduce a modular framework which is able to introduce these artefacts to video content in various 3D formats. The framework can be easily extended with additional types of artefacts. Existing simulation blocks can be improved or replaced. We present the general concept behind the framework and one specific implementation done in MATLAB. Our implementation includes algorithms for simulation of all artefacts, listed above, as well as a graphical user interface for setting parameters of each simulation run. We also include the option to execute similar simulation runs on a large number of test videos in ―batch mode‖. Finally, we include a quick user guide for the framework, covering installation, graphical interface, basic operations and batch execution. We also provide a table of all functions created for the framework, along with the list of parameters required by each function.
2
MOBILE3DTV
D5.2
Table of Contents Executive Summary .............................................................................................................................. 2 Table of Contents ................................................................................................................................. 3 1
Introduction ................................................................................................................................. 5
2
Artefact selection ......................................................................................................................... 6
3
2.1
Sources of stereoscopic artefacts ......................................................................................... 6
2.2
Artefact groups ...................................................................................................................... 7
2.3
Capture artefacts ................................................................................................................... 7
2.4
Coding artefacts .................................................................................................................... 8
2.5
Conversion artefacts ............................................................................................................. 9
2.6
Transmission artefacts ........................................................................................................ 10
2.7
Visualization artefacts ......................................................................................................... 10
Simulation of artefacts ............................................................................................................... 10 3.1
Capturing Stage ................................................................................................................... 10
3.1.1
Artefacts, caused by image resizing ............................................................................. 10
3.1.2
Blur ............................................................................................................................... 12
3.1.3
Barrel distortion ........................................................................................................... 13
3.1.4
Pincushion Distortion ................................................................................................... 14
3.1.5
Vignetting ..................................................................................................................... 15
3.1.6
Chromatic aberration ................................................................................................... 16
3.1.7
Keystone Distortion and Vertical Disparity .................................................................. 17
3.1.8
Cardboard Effect by Depth Level Quantization ........................................................... 20
3.1.9
Interlacing .................................................................................................................... 21
3.1.10 Motion Blurring ............................................................................................................ 22 3.1.11 Temporal Mismatch ..................................................................................................... 23 3.1.12 Noise ............................................................................................................................ 24 3.1.13 White and Colour Disbalance....................................................................................... 25 3.2
Coding Stage ........................................................................................................................ 27
3.2.1
Blocking by harsh quantization .................................................................................... 27
3.2.2
Block-edge discontinuities (basis image effect)........................................................... 28
3.2.3
Colour bleeding ............................................................................................................ 29
3.2.4
Staircase Effect ............................................................................................................. 30 3
MOBILE3DTV
3.2.5
Mosaic pattern ............................................................................................................. 31
3.2.6
Depth Bleeding by Misalignment ................................................................................. 32
3.2.7
Depth Bleeding by Harsh Quantization........................................................................ 33
3.2.8
Depth Smoothing ......................................................................................................... 34
3.2.9
Asymmetric coding....................................................................................................... 35
3.3
Representation Stage .......................................................................................................... 37
3.3.1 3.4 3.5
5
Depth Estimation and 2D+Z to Stereo Conversion ...................................................... 37
Transmission Stage .............................................................................................................. 39
3.4.1
4
D5.2
DVB-H channel loss ...................................................................................................... 39
Visualization Stage............................................................................................................... 40
3.5.1
Vertical banding ........................................................................................................... 40
3.5.2
Cross Distortion ............................................................................................................ 41
Implementation ......................................................................................................................... 43 4.1
Structure of the framework ................................................................................................ 43
4.2
List of functions ................................................................................................................... 46
4.3
Graphical User Interface ..................................................................................................... 48
4.4
Communication protocol and batch operation .................................................................. 51
4.5
Extending the framework .................................................................................................... 52
User’s manual............................................................................................................................. 54 5.1
Installation ........................................................................................................................... 54
5.2
Getting started .................................................................................................................... 55
Appendix I – list of functions for artefact simulation ........................................................................ 56 References.......................................................................................................................................... 62
4
MOBILE3DTV
D5.2
1 Introduction Mobile 3DTV broadcasting is no longer a concept only. Transmission of 3D video over the air and reception on a mobile device equipped with an auto-stereoscopic display has been demonstrated by teams from Korea and Europe. At NEM summit 2008, October 2008, Saint-Malo, France and at the ICT Event, November 2008, Lyon, France, our project demonstrated mobile 3DTV broadcasting over a DVB-H channel using a backward-compatible stream. Once the initial end-to-end prototype is working, practical questions arise: ―What bit-rate should be used?‖, ―How much packet losses we can afford?‖ For answering optimization questions like these, the ability to assess the quality of a stereoscopic video is essential. It has been shown that for 2D video, statistical quality measures do not provide adequate results [1]. Perceptual quality assessment for stereoscopic video is even more demanding task, as low quality 3D video would produce not only visually unpleasant results, but are also eye strain and general discomfort [2], [3], [4]. Since mobile 3D video is meant for human observers, it is most appropriate to measure the quality as it is perceived by the users. A good perceptual metric should have the following three properties: a) perceptual – being related to the way human visual system (HVS) operates, b) objective – providing a numerical representation of the quality as perceived by the user, and c) reliable – being able to predict the perceptual quality for wide variety of content, as perceived by a large amount of users. A necessary step of a perceptual quality metric design is to perform subjective tests in which human observers would grade the perceptual quality of a variety of content. For assessing the impact of some parameter – i.e. bit-rate, channel losses, etc – one needs a collection of videos, affected by the gradual change of its value. In a previous deliverable of our project [6], we identified the artefacts, which arise in various usage scenarios involving stereoscopic content. In this work, we present a system which allows introducing a set of stereoscopic artefacts to a given 3D video, thus ensuring repeatability of subjective experiments. The system simulates the artefacts, which we expect to be the most common for the case of capturing, encoding and transmission of a 3D video, and can be used for generation of database of impaired 3D videos. The next chapter describes the list of artefacts which are selected by the framework, the nature of each artefact and the basis for selecting it for simulation. Chapter 3 explains the way different artefacts are simulated. In chapter 4, we make an overview of the implementation; present the structure and the functionality of the framework, as well as the syntax and parameters of each function included in it. Chapter 5 is a ―user’s manual‖, which contains installation and usage instructions for our particular Matlab implementation of the framework.
5
MOBILE3DTV
D5.2
2 Artefact selection 2.1 Sources of stereoscopic artefacts In previous report of our project, we identified the stages in the process of 3D video broadcast, which might be source of artefacts [6]. These stages are presented in Figure 2.1, and the artefacts corresponding to each stage are as follows: Creation/capture – special care should be taken when positioning cameras or when selecting rendering parameters. Unnatural correspondences between the images in a stereo-pair (i.e. vertical disparity) are source of many types of artefacts [2]. As perfectly parallel camera setup is practically impossible, rectification is an unavoidable preprocessing stage. Representation format – different representations of stereo-video exist, multichannel video and dense depth representations being among the most widely used [2]. If the representation format is different from the one the scene was originally captured, converting between the formats is a source of artefacts. Furthermore, some classes of artefacts are common in one format and not possible in another – for example in dense depth video disocclusion artefacts are common, while vertical parallax does not occur. Coding – there are various coding schemes, which utilize temporal, spatial or inter-channel similarities of a 3D video [8]. In order to minimize transmission cost, ―redundant‖ information is omitted. Algorithms originally meant for single-channel video, are often improperly applied for stereo-video, and important binocular depth-cues may be lost in the process. Transmission – in the case of digital wireless transmission a common problem is burst packet losses [9]. Resilience and error concealment algorithms attempt to mitigate the impact on the video, but if not designed for stereo-video, such algorithms might introduce additional artefacts on their own. Visualisation – there are various approaches for 3D scene visualization, which offer different degree of scene approximation [10], [11], [12]. Each family of 3D displays has its own characteristic artefacts, and the artefacts are often scene dependant [13].
3D scene
Capture
Conversion
Coding
Transmission
Display
Observer
Figure 2.1 Flowchart for simulation of rescaling artefact
The human visual system is a set of separate subsystems, which operate together in a single process. It is known that spatial, colour and motion information is transmitted to the brain using largely independent neural paths [14]. Vision in 3D, in turn, also consists of different ―layers‖ which provide separate information about depth of the observer scene [14], [15]. Experiments so-called ―random dot stereograms‖ show that binocular and monocular depth cues are independently perceived [23]. This has led to our assumption that ―2D‖ (monoscopic) and ―3D‖ (stereoscopic) artefacts would be independently perceived [17]. However, due to the ―layered‖ structure of HVS, binocular artefacts might be inherited from other visual ―layers‖ – for example, blockiness is a ―purely‖ monoscopic artefact, which still can destroy or modify an important binocular depth cue. In the preceiding report, we discussed which stereoscopic artefacts might be created during various stages in the mobile 3DTV content delivery, and how they might affect different ―layers‖ of 6
MOBILE3DTV
D5.2
human 3D vision [6]. Some of these artefacts are not likely to occur in mobile 3DTV content distribution. In the next chapter we focus on the artefacts which affect a mobile 3DTV system, featuring H.264 AVC type of encoding, DVB-H transmission channel and portable autostereoscopic display. Additionaly, occurrence of some artefacts depends on the selected 3D video representation – multi-channel video or dense depth representation.
2.2 Artefact groups Not all stereoscopic artefacts are likely to affect a mobile 3DTV system. Some fall beyond the scope of our project, for example contrast range and colour representation problems of the display, as these are addressed by the display manufacturer. In this chapter we identify the most common artefacts to be expected during the data-flow of mobile 3DTV content. We introduce an artefact simulation channel, which is able to introduce an arbitrary combination of artefacts to a video, with controlled amount of impairment for each artefact. Based on the research from our previous report, artefacts are organized in groups, which follow the natural flow of a mobile 3D video over a DVB-H channel [6]. Each group of artefacts corresponds to a specific block of our simulation channel as shown in Figure 2.2. An arbitrary combination of artefacts can be selected, but they are always introduced in a certain order – i.e. capture artefacts will always be added before transmission ones. The first block simulates artefacts caused by sensor limitations. Then, the degraded scene observation is sent to a block which simulates geometric distortions as the ones caused by the camera optics. The next two blocks add global spatial and temporal differences between the video channels, simulating artefacts caused by multi-camera topology and temporal misalignment. The next two blocks simulate spatial and temporal artefacts caused by coding. Then, transmission losses are simulated in the encoded scream. For the case of dense depth video representation, format conversion artefacts are added. Finally, visualisation artefacts are added, independent of the position of the observer, or alternatively, for a given observation position. Sensor
Optical calibration
Inter-camera calibration
Capture (each camera)
Temporal calibration
Image filter
Capture (inter-channel)
Temporal filter
Coding
Channel simulation
Transmission
Format conversion
Format Conversion and visualisation
Visualization (static) Visualization (dynamic) Position of observer
Figure 2.2 Artefact simulation channel
The detailed structure of the simulation framework is explained in further details in chapter 4.1.
2.3 Capture artefacts The capturing process for mobile 3DTV video is similar to the one for a 3DTV system targeting large displays. One thing which separates video broadcast system from video conferencing one is that capture for the former is done off-line and non-real-time, and significant processing power might be spent for producing the best output possible. We have chosen for simulation the following list of common stereo video capture artefacts: Size and resolution changes – the problem of choosing the proper resolution for capturing of 3D contents is not necessarily a simple one. Two problems might arise from content resizing – aliasing and wrong disparity range. The perceptual impact of aliasing on stereoscopic video is yet to be studied – whether is it going to be masked by binocular suppression, or is going to destroy important texture-based binocular ques. Additionally, 7
MOBILE3DTV
D5.2
changing the size of a multiview 3D video changes the inter-channel relations as well, which might result in a disparity either too small or too large for proper 3D effect. Blur might be caused by low-quality optics or wrong focal setting. In a 2D movie, in most cases small amount of blur is permissible. In a binocular setup, prediction the perception of different amount of blur is more complex task. Depending on the case, blur in one channel might go unnoticed, or in rare cases even improve the perceived quality. Motion blur – this is usually caused by capturing in low light conditions. The temporal masking and perception of motion blur in stereo video is yet to be studied. Barrel/Pincushion distortion is a geometrical distortion, which affects each camera separately. In multi-camera it could cause serious artefacts in stereoscopic image, and induce eye-strain. Usually this is partially corrected by a process know as rectification. Keystone distortion affects the geometric relation between two channels. The result is a trapezoidal shape in opposite direction in left and right camera inputs. It is mainly caused by camera optics and selected multi-camera topology. The presence of keystone distortion can induce eye-strain or fully break the 3D effect of a stereo video. It also will greatly diminish the precision of dense depth estimation algorithms. Image rectification compensates this effect. Temporal mismatch occurs when a 3d scene is shot with multiple cameras, which are not shutter-synchronized. As a result, the frames in both channels are not shot simultaneously, but with slightly shifted in time. While precise time synchronization is of crucial importance for dense depth estimation algorithms, the human visual system can tolerate some amount of time mismatch, without diminishing the perceptual quality. Colour mismatch – Some factors (i.e. bright objects with large disparity between cameras) in can cause mismatch in the colours in the images of a scene captured by different cameras. It is most commonly caused by white balance done separately in each camera. Interlace – Interlaced video is created by scanning the odd and the even lines of an image sensor separately. Interlaced video exhibits specific ―jagged-border‖ artefacts. In 2D video interlacing overlaps consecutive frames in time. As one of the methods for encoding stereovideo involves using of odd and even fields, interlacing might also interleave simultaneous frames from different channels. Cardboard effect refers to unnatural flattening of objects in stereoscopic images, as if they were a cardboard cut-outs [3]. It is believed that the main reason is the field of view of a stereoscopic display being different from the field of view of the scene, thus creating inappropriate depth scaling [18]. In our framework we simulate cardboard effect only on video streams with dense depth. However, our framework could be extended to simulate cardboard effect in other video formats. Additionally, we are simulating less common capture artefacts like noise, vignetting and chromatic aberration. Proper simulation of camera noise is a very demanding task, but usually it is dealt with separately in each camera. We included noise simulation for the ability to prepare subjective test material where asymmetric amount of noise is present in each channel. The basic noise introducing algorithms that we use can be easily replaced or extended with more sophisticated ones, as is explained in chapter 3.1.12.
2.4 Coding artefacts While the visibility of coding artefacts is quite well studied for 2D case, the impact on the 3D vision is yet to be determined. Transform-caused artefacts come from the transforms and quantisation used for compressing the video stream. Blocking, mosaic patterns, staircase effect, ringing, and colour bleeding artefacts are in this group. All of them are well visible, and as overlay structural changes on the image, they might destroy depth cues and even create 8
MOBILE3DTV
D5.2
misleading ones. Depth bleeding and depth ringing are artefacts specific for the coding the depth map of a scene, and as such, they exist only in dense depth-based 3D video representations. Notably, such artefacts can be mitigated by using structural information of the 2D scene. Temporal coding artefacts appear as a result of transform/quantisation over time. Temporal inconsistency such as mosquito noise is the most common artefact in this group. Artefacts caused by imprecise motion prediction are also possible. This group of artefacts can appear both in multi-view and in dense depth 3D video. Our framework simulated the following coding artefacts: Blocking by harsh quantization is among the most widely studied distortions in video coding. The most common source of the artefact is block-based DCT compression, which involves quantisation and entropy coding of the results. This process creates a number of image impairments, the most noticeable of which is discontinues at the boundaries of the encoded blocks. In our framework, we simulate blocking by harsh quantisation by utilizing the DCT block-based compression used in JPEG. Additionally, some authors propose that blocking might be considered as several, visually separate artefacts – block-edge discontinuities, colour bleeding, blur and staircase artefacts [cite h.wu chapter3]. Our artefact also provides means to simulate these artefacts separately, if needed. Block-edge discontinuities – block-based coding tries to exploit the spatial correlation between pixels in a picture, but does not take into account the possible correlation beyond the block borders. One important property of such distortion is that the mean intensity of the block remains the same as before. In our framework, we provide an option that block-edge discontinuities are simulated separately from block-based DCT artefacts. Colour bleeding is an artefact caused by harsh quantisation of high frequency chrominance coefficients. Since chrominance is typically sub-sampled, bleeding can occur beyond the range of one block. Staircase effect affects diagonal edges of a picture. The quantisation of DCT coefficients results in diagonal lines which are almost horizontal or almost vertical, to be represented a series of blocks, containing of horizontal of vertical, respectively. Cross-distortion is an artefact caused by asymmetrical stereo-video coding. The asymmetry might be both in spatial (one channel with lower resolution) or in temporal (one channel having lower frame-rate) domains. The effect of spatial or temporal sub-sampling of one channel is not yet thoroughly studied. Asymmetrical coding is applied for multi-view video only. Additionally, our framework simulates less common coding artefacts which affect videos in image plus depth format – depth bleeding and depth smoothing. Depth bleeding is caused by a process similar that the one, which causes colour bleeding – with the difference that it degrades the depth channel instead of the chrominance. Depth smoothing is could be caused by asymmetric compression or resolution of the depth channel. In some cases, depth smoothing might improve the quality of an image plus depth video, as it will hide some disocclusion artefacts.
2.5 Conversion artefacts Format conversion artefacts occur during the conversion for a dense-depth representation used for broadcast to a multiview one as needed by the display. Most common here are disocclusion artefacts, which are more pronounced when rendering observations at angles much different from the central observation point, and less pronounced when layered depth images are used [47], [53]. Perspective-stereopsis rivalry occurs if the conversion over-exaggerates the depth levels in the depth map. Temporal inconsistency of the depth estimation creates artefacts similar to mosquito and depth ringing. It is quite difficult to simulate conversion artefacts separately from the actual 9
MOBILE3DTV
D5.2
process of conversion. Our framework allows various types of conversion algorithms and quality settings to be used for introducing of conversion artefacts.
2.6 Transmission artefacts The presence of artefacts generated in the transmission stage depends very much on the coding algorithms used and how the decoder copes with the channel errors. In DVB-H transmission most common are burst errors [19], which results in packet losses distributed in tight groups. In MPEG-4 based encoders packet losses might result in propagating or non-propagating errors, depending on where the error occurs in respect to the I frames, and the ratio between I and P frames. We simulate transmission errors by obtaining error patterns of the DVB-H channel and use them for simulation of channel losses [19], [20], [21].
2.7 Visualization artefacts Artefacts in visualisation of mobile 3DTV are caused by limitations of the display technology used. The mobile 3DTV system in our project will utilize an autostereoscopic display. As such displays use spatial multiplexing of the channels, the visibility of all artefacts depends on the position of the observer. In any case, knowing the observation angle and the distance between the observer and the display helps in both simulation and mitigation of such artefacts. Some visualization artefacts are perceived while changing the position in respect to the display. Such artefacts are angle dependant colour representation, pseudostereoscopy, picket fence effect, or the unnatural image parallax causing shear distortion. Others appear only for some observation angles, as image flipping, and angle dependant colour representation. The artefacts in this group are very difficult to simulate, but much easier to mitigate for a given position of the observer. In our framework, we choose to simulate only artefacts, which are visible by a static observer: Vertical banding can be regarded as the ―static‖ version of picket fence effect. It is very common for displays with parallax barrier, and manifests itself as changes of the intensity across the display – as if dark vertical bands are superimposed on the image. Even though it depends on the viewing angle, it is visible from most of the viewing angle/observation distance combinations, except for a few observation ―sweet spots‖. Temporal mismatch is a temporal misalignment between the video channels. While during capture such misalignment is usually very small, during visualization it can be up to several second. Typical causes are reception problems and rudimentary error concealment. Rescaling – it is possible that a stereo-video stream needs to be rescaled on the receiving device. Rescaling can create the same problems as doing the same during capture – aliasing and improper disparity. Additionally, rescaling during visualisation might affect (exaggerate or suppress) other artefacts. Cross-talk – display imperfections can cause cross-talk and other forms of inter-channel distortion. Stereo- and multiview displays using parallax barrier are particularly vulnerable to crosstalk.
3 Simulation of artefacts 3.1 Capturing Stage 3.1.1 Image resizing Changing the size and resolution of a stereo-video can introduce aliasing and/or wrong disparity. Our framework allows scaling an input frame to a certain output size-format, defined by a scale factor. Another parameter allows selecting the interpolation scheme. The function supports the 10
MOBILE3DTV
D5.2
following interpolation methods - "nearest", "bilinear" and "bicubic". The function also allows consecutive up- and down-scaling, which results in output video with the same size as the input. In that form, the simulation allows to investigate the subjective effect of the interpolation scheme on a stereoscopic video. The order of processing can be seen in Figure 3.1 and an example is shown in Figure 3.2. The following input arguments should be provided: View:
A string that defines which view (left or right) is going to be downscaled
Scaler:
A number that defines the scaling of the frame, vertical and horizontal at the same time to stay consistent with the ratio between ―height‖ and ―width‖
Method:
String that defines the interpolation scheme for the resize-process
START (FramesL,FramesR,View,Scaler,Method)
Left or Right View?
Get frames dimensions
Downscale left or right frame according to „Scaler― with interpolationmethod „Method―
Upscale left or right frame to original resolution with interpolationmethod „Method―
Save left or right frame
END (Frames)
Figure 3.1 Flowchart for image resizing function
11
MOBILE3DTV
D5.2
Figure 3.2 Example for resizing with parameters View = “left”, Scaler = 0.33 and Method = “bicubic”, original (left) and impaired (right)
3.1.2 Blur Our simulation of this artefact renders scalable blur to an input frame or video. It uses local operators with user defined sizes in horizontal and vertical direction. The default blurring kernel is a box-filter, and other kernels functions are possible. The flow diagram in Figure 3.3 shows the order of processing while Figure 3.4 shows an example of blurring for a given set of input parameters. The output arguments are blurred versions of input frames or video sequences. In general, the input arguments are: dv:
size of the box-Filter in pixels in vertical direction
dh:
size of the box-Filter in pixels in horizontal direction
Start (Frames, dv,dh)
Define boxfilter of dimension defined by „dv― and „dh―
Imfilter(Frames,Filter)
End(Frames)
Figure 3.3 Flowchart for simulation of blurring
Figure 3.4 Example for Blurring with parameters dv=9 and dh=9, original (left) and impaired (right)
12
MOBILE3DTV
D5.2
3.1.3 Barrel distortion This distortion can be modelled as a geometric distortion function that arise from extreme camera settings or art like lenses. The input frame changes its appearance to the form of a barrel. One possible implementation can be recalculating of the pixels position with the help of an adjustable cubic term in polar coordinates. Parameter ―d‖ directly manipulates this cubic term and therefore defines the strength of the geometric transformation. Post-filtering by specifying of a desired interpolation scheme is also possible. For simplicity, the interpolator scheme used for barrel distortion is not promoted to a framework parameter, but is fixed to ―cubic‖. If desired, the interpolator can be fixed on a function level. Figure 3.5 shows the flowchart of a practical implementation, while Figure 3.6 shows an example for a given input parameter set. d:
strength of the geometric distortion. Values in the Interval [0 +inf] are valid.
Start (Frames, d)
1
Get Frame-Dimensions
Define transformation function for angular part
Create meshgrid of size Frame
Transform meshgrid to kartesian coordinates
Transformation of meshgrid to polarcoordinates
Reshape Frames according to new meshgrid
1
End(Frames)
Figure 3.5 Flowchart for simulation of barrel distortion
Figure 3.6 Example for Barrel-Distortion with parameter d = 15, original (left) and impaired (right)
13
MOBILE3DTV
D5.2
3.1.4 Pincushion Distortion This distortion can be modelled as a geometric distortion function that arise from extreme camera settings or art like lenses. The input frame changes its appearance to the form of a barrel. One possible implementation can be recalculating of the pixels position with the help of an adjustable cubic term in polar coordinates. Parameter ―d‖ directly manipulates this cubic term and therefore defines the strength of the geometric transformation. Post-filtering by specifying of a desired interpolation scheme is also possible. For simplicity, the interpolator scheme used for barrel distortion is not promoted to a framework parameter, but is fixed to ―cubic‖. If desired, the interpolator can be fixed on a function level. Figure 3.7 shows the flowchart of a practical implementation, while Figure 3.8 shows an example for a given input parameter set. d:
strength of the geometric distortion. Values in the Interval [0 +inf] are valid.
Start (Frames, d)
1
Get Frame-Dimensions
Define transformation function for angular part
Create meshgrid of size Frame
Transform meshgrid to kartesian coordinates
Transformation of meshgrid to polarcoordinates
Reshape Frames according to new meshgrid
1
End(Frames)
Figure 3.7 Flowchart for simulation of pincushion distortion
Figure 3.8 Example for pincushion distortion with parameter d = 15, original (left) and impaired (right)
14
MOBILE3DTV
D5.2
3.1.5 Vignetting Vignetting can be modelled as a luminance plane of the size of an input-frame in form of a lens. The form and the size of the lens can be adjusted via a three parameters. Figure 3.9 shows the flowchart of a practical implementation, while Figure 3.10 shows an example for a given input parameter set. Rounding:
Rounding is the parameter for the size of the lens in the x- and y- direction, constructed via a 2D- cosine- function. As a default, the lens has a diameter equal to the picture size, defined as 1, always centred. The rounding-parameter acts as an offset to this default value and can have values between 0 and +inf. Values greater than 0 make the lens larger and therefore flatter. The amplitude of this 2D-Filter is 1.
Threshold:
The threshold is a shift in y- direction that is added to the amplitude as an offset. If values are smaller than 0, uint8-format will set them to 0.
Weight:
Weighting is the parameter to adjust the maximum amplitude of the filter to a certain value. As a result, we get a luminance- plane of a certain form, amplitude and slope with a dynamic range from 0 to 255. It is necessary to mention that the 2D-Filter is inversed before weighting and shifting so that the processing of the frame results in a simple add-statement.
1
Start
Get image or frame information
read image or frame
Amplitude-shifting according to Threshold
uint8(...)
RGB-2-YUV conversion Add luminancefilter to frame Define inverse 2DCos-Filter with amplitude 1 and size of Inputframe * rounding parameter
YUV-2-RGB conversion
Weighting of the Filter according to „Weighting―
Output: Distorted image/ frame
1
End
Figure 3.9 Flowchart for simulation of Vignetting
15
MOBILE3DTV
D5.2
Figure 3.10 Example for Vignetting for parameters rounding = 2, weighting = 5 and threshold = 0,25, original (left) and impaired (right)
3.1.6 Chromatic aberration Chromatic aberration is introduced by imperfect, wavelength dependand focus properties of the lens system. Our framework simulates this artefact by a relative shift of each colour channel, controlled by the pixel shift offset, given by the parameter "CAbb". In addition, slight blur is added to take defocusing of the shifted colour channels Red and Blue in the focal point into account. This can be controlled via the parameters "dv" and "dh", which define the size of a simple box filter in pixels in the horizontal and vertical direction. For simplicity, the amount of blur is not promoted to a framework parameter. If desired, it can be set on a function level. As an output we receive a frame of the original input size that suffers from chromatic aberration. Figure 3.11 shows the flowchart of a practical implementation, while Figure 3.12 shows an example for a given input parameter set. CAbb:
Offset in pixels for the red and blue colour channel
dv:
vertical size of the blur kernel
dh:
horizontal size of the blur kernel
16
MOBILE3DTV
D5.2
Start (Frames, dv, dh, Cabb)
Define boxfilter according to dh and dv
Imfilter(Frames,Filter) for blurring
Shift pixelwise Red and Blue colour channel by Cabb in left and right direction
End(Frames)
Figure 3.11 Flowchart for simulation of chromatic aberration, original (left) and impaired (right)
Figure 3.12 Example for chromatic aberration with parameters dv = 3, dh = 3 and CAbb = 5, original (left) and impaired (right)
3.1.7 Keystone Distortion and Vertical Disparity Keystone distortion simulates the impact of a towed-in camera configuration on video content and the artefacts that come along with the use of this configuration. The simulation takes each frame of a particular view and performs a geometrical transformation according to the following parameters: dpi:
resolution of the frame in dots per inch
t:
camera separation in mm 17
MOBILE3DTV
D5.2
C:
Convergence Distance in mm
f:
focal length of the cameras in mm
M:
Frame magnification (Screen size divided by the Size of the CCD-Sensor)
DPC_offset:
2D-Offset in pixels that makes the Depth Plane Curvature accessible
In the first step, the new position of the pixels in the frame is calculated by using the following simplified transformation function that have been introduced by [14] and is based on [5]. For horizontal disparity we use
where contains the original position of the pixels and and contain the position of the pixels in the distorted frames in x-direction. For vertical disparity, we use
where
To introduce depth plane curvature to the frames there is an additional possibility to define a 2Doffset for the frame position in x- and y-direction. As output arguments, we get the geometrical transformed frames of each view. Horizontal disparity/depth plane curvature and vertical disparity are calculated and applied separately. Figure 3.13 shows the flowchart of a practical implementation, while Figure 3.14 and Figure 3.15 show examples for keystone distortion and vertical disparity for a given input parameter set. ([24],[25],[2],[32]
18
MOBILE3DTV
D5.2
START (Frames, dpi, t, C, f, M, DPC_Offset)
1
Get Frames- Dimensions
Transform new coordinates in pixelpositions
Transform „dpi― in „dpcm―, transform Frames- Dimensions in „mm―
According to these new positions, create transformation-matrix to feed imtransfom.m
define left and right corner [mm] according to inputdimensions, centered
Apply image transformation
Calculate geometric transformation function to get new position of X and Y coordinates ([Xsl Ysl] [Xsr Ysr])
Shift left and right Frames according to „DPC-Offset― in horizontal direction realtiv to each other
1
END (Frames)
Figure 3.13 Flowchart for simulation of keystone distortion
Figure 3.14 Example for keystone distortion of left-camera view and parameters dpi = 71, t = 65, C = 150,f = 3.5 and M = 45, original (left) and impaired (right)
19
MOBILE3DTV
D5.2
Figure 3.15 Example for vertical disparity of left-camera view and parameters dpi = 71, t = 65, C = 150,f = 3.5 and M = 45, original (left) and impaired (right)
3.1.8 Cardboard Effect by Depth Level Quantization This function attempts to simulate the influence of coarse quantization and high valued under sampling on the depth map to create adjustable cardboard effect. First, under sampling by factor ―N‖ is applied. A coarse quantization matrix is created with manipulator ―d‖. Under sampling in combination with coarse quantization of the colour channels then causes the cardboard effect. Finally, the input frame is replaced by its distorted equivalent ([2],[31]). Figure 3.16 shows the flowchart of a practical implementation, while Figure 3.16 shows an example for a given input parameter set. The input arguments are N:
Under sampling factor
d:
integer manipulator to create a coarse quantization-matrix
START (Frames,d,N)
1
Get Frames-dimension
blockwise dequantization with offset „d― for the quantization matrix
2D downsampling of frames by factor N
8x8 blockwise IDCT
8x8 blockwise DCT
2D upsampling of frames by factor N
blockwise Quantization END (Frames)
1
Figure 3.16 Flowchart for simulation of cardboard effect
20
MOBILE3DTV
D5.2
Figure 3.17 Example of the influence on the depth map with parameters N=2 and d = 15, original (left) and impaired (right)
3.1.9 Interlacing Interlacing artefacts are simulated in the following way. First, motion estimation between 2 frames is applied to find out which parts of the reference frame are moving in the horizontal plane. We use a simplified block matching algorithm (BMA), which has been proposed by [43] and is done for each block of size "w", where "w" is usually a potency of 2. After that, the odd lines in the block are shifted according to its motion vector values (in pixels) calculated by the motion estimation [27]. The output frame then suffers from interlacing artefacts as it can be seen in the example in Figure 3.19. Figure 3.18 shows the flowchart of a practical implementation. w:
blocksize for the motion estimation
Start (Frames, Frames+1,w)
1
Get Frames Dimensions
Divide Frames in Blocks of size [w w] and get xcomponent from the motionvectors
Transform frames to grayscales
Shift odd and even lines according to the x-component in each block
Calculate motionvectors with a fast blockmatching algorithm between Frames and Frames+1 with w
Put the manipulated blocks back to the Frames
1
End(Frames)
Figure 3.18 Flowchart for simulation of interlacing
21
MOBILE3DTV
D5.2
Figure 3.19 Example for interlacing with parameter w = 8, original fragment (left) and impaired fragment (right). The simulation used frames 1 and 2 of the stereo video sequence.
3.1.10 Motion Blur Blur of moving objects in a video sequence is simulated in the following way. First, motion estimation between two adjacent frames is applied to get information about the moving parts in the scene. We use a simplified block matching algorithm [43] and is done for each block of size "w", where "w" is usually a potency of 2. After that, a blurring kernel is constructed. The filter size corresponds with the values of the motion vectors in horizontal and vertical direction. After filtering with this kernel we get a partially blurred frame as a result. The blurred parts are the moving parts in the examined frames. Figure 3.20 shows the flowchart of a practical implementation, while Figure 3.21 shows an example for a given input parameter set. The input parameters are: w:
window size for the motion estimation
START (Frames,Frames+1,w)
1
Get frames dimensions
Design boxfilter according to motionvectors
Calculate motionvectors from frames and frames+1
Frames+1 = imfilter(Frames+1,Filter*Weight)
1
END (Frames+1)
Figure 3.20 Flowchart for simulation of motion blurring
22
MOBILE3DTV
D5.2
Figure 3.21 Example for motion blur with parameter w = 8, original (left) and impaired (right). The simulation used frames 1 and 2 of the stereo video sequence.
3.1.11 Temporal Mismatch Temporal mismatch simulates distortions which are caused by unsynchronized camera setups. It can be modelled as a shift of frames from one camera view in time with a certain value t [ms] according to a reference frame in the other view. In the future, temporal mismatch in multi camerasetups will be simulated. Therefore, timeshifts will be calculated out from a ladder network to model dependencies. As an output, we get the desynchronized views, suffering from a frame based shift in time. Figure 3.22 shows the flowchart of a practical implementation. Delay: shift in time in milliseconds
START (Frames,Framerate,Delay)
Get Frames or Sequence
Calculate Frame-Delay in Frames out from Framerate and Delay in ms
Shift Frames with Delay, append sequence with black Frames
Save Frames or Sequence, keep overall length constant
END (Frames)
Figure 3.22 Flowchart for simulation of temporal mismatch
23
MOBILE3DTV
D5.2
3.1.12 Noise Our framework provides basic means of noise introduction. Since capture noise depends on the camera sensor parameters, for demanding tasks such as de-noising proper camera model should be used ([38],[39],[40],[41]). Simulation of capture noise taking into account all necessary parameters falls beyond the scope of our framework. Instead, we provide a rudimentary noise introduction function, which could be replaced with a more sophisticated one when required. Figure 3.23 shows the flowchart of a practical implementation, while Figure 3.24 shows an example for a given input parameter set. The input parameters are: Noise_type:
string that defines the type of the noise used.
d:
parameter that defines the variance of the noise as an integer value between [0 255]
START (Frames,Noise type,d)
Get frames dimensions
Create noise matrix of size frames according to noise type
Weight noise matrix with variance d
Add Noise matrix to Frames
END (Frames)
Figure 3.23 Flowchart for simulation of noise introduction
24
MOBILE3DTV
D5.2
Figure 3.24 Example for noises with parameters Noise_type = Gaussian and d = 25, original (left) and impaired (right)
3.1.13 Colour mismatch caused by white balance In multi-camera-setups, it is almost impossible to achieve perfect match of camera parameters. Calibration due to white balance, colour-temperature and optics is used to minimize the effect of inter-camera distortions that may influence further processing and stereoscopic perception. We simulate differences in white balance and colour- matching between cameras in a stereoscopic camera setup. The user can define offsets in different colour spaces to create the mismatch. In order to simplify the amount of parameters required by the function, we provide a number of presets mimicking standard illumination scenarios [22], such as: 'a'
standardizes light bulb
'c'
NTSC-standard illumination
'd50'
standard for Wide-gamut-RGB, warm illumination
'd55'
daylight (T = 5500K)
'd65'
daylight (T = 6500K)
'icc'
16-bit fractional representation of d50
The manipulation is done in the same colour space for both cameras. The output parameters are the modified colour and white balance frames of a frame- or video-sequence. Figure 3.25 shows the flowchart of a practical implementation, while Figure 3.26 shows an example for a given input parameter set. The following input parameters should be provided: FramesL:
Left frames of a video or frame-sequence
FramesR:
Right frames of a video or frame-sequence
Colorspace:
string that defines the colour space. [RGB, HSV, XYZ]
25
MOBILE3DTV
D5.2
Val_ProfL:
integer triple that defines offsets for HSV or RGB colour space for the left channel. For colour space XYZ, the input is a string out from the CIE standard lumination schemes as mentioned above
Val_ProfR:
same as for left channel, but parameters manipulate the right channel
START(Frames,Colorspace,Profile)
Frames = Color space Conversion(Frames,Colorspace)
Frames = Offset+*Frames
Frames = Color Space Reconversion(Frames)
END(Frames)
Figure 3.25 Flowchart for simulation of white and colour disbalance
Figure 3.26 Example for white and colour disbalance with parameters Colorspace = XYZ and Profile = a , original (left) and impaired (right)
26
MOBILE3DTV
D5.2
3.2 Coding Stage 3.2.1 Blocking by harsh quantization This approach is based on the fact, that a certain quality- degradation of a picture in coding and compression has a significant influence on the overall perceived quality ([34], [34]). To investigate this influence, the implementation is done via reading a picture and to save it in a degraded quality, controlled by parameter Q. The encoder to degrade the quality is a JPEG encoder. Figure 3.42 shows the flowchart of a practical implementation, while Figure 3.43 shows an example for a given input parameter set. The input arguments are Q:
Quality of the JPG-frame as an integer value [0 100] START (Frames,Q)
Save frames with Quality Q in *.jpg- format
Load frames again and replace original frames
Delete temp.-data
END (Frames)
Figure 3.27 Flowchart for simulation of depth bleeding by harsh quantization
Figure 3.28 Example for blocking by harsh quantization with parameter and Q = 25, original (left) and impaired (right)
27
MOBILE3DTV
D5.2
3.2.2 Block-edge discontinuities (basis image effect) The goal of this function is to simulate block edge discontinuities separately from other artefacts introduced by blocking. A study of the various perceptual effects caused by block-wise DCT transformation can be found in [37]. The simulation of this artefact is done in the following way. First, the input frame is transformed into the YUV- colour space to get access to the luminance channel. After dividing the transformed frame into blocks of 8x8, a random number in x- and ydirection between 0 and 1 for each block is calculated. We multiply this number with variable ―d‖ that can be seen as the variance of the luminance offset for each block. Each luminance offset block has zero mean to make sure that no additional information is added to the input frame. Finally, each luminance-block is added to its corresponding block in the input frame. As an output, we get a blocky picture that has exactly the same appearance as the blocking distortion coming from block processing in DCT-based algorithms. Figure 3.29 shows the flowchart of a practical implementation, while Figure 3.30 shows an example for a given input parameter set. The input arguments are: d: variance of the luminance- offset for each block.
START (Frames,info,d)
1
Get frames dimensions
If max(Block)-mean(Block) or mean(Block)min(Block)