A QoE Evaluation Methodology for HD Video ... - Semantic Scholar

2 downloads 7892 Views 334KB Size Report
Facebook.com environment provides a unique platform to develop own applications ..... developers.facebook.com/docs/reference/api/. [10] M. Ries, P. Froehlich, ...
A QoE Evaluation Methodology for HD Video Streaming using Social Networking Bruno Gardlo∗ , Michal Ries† , Markus Rupp‡ and Roman Jarina∗ of Telecommunications and Multimedia, University of Zilina, Zilina, Slovakia Email: {gardlo, jarina}@fel.uniza.sk † Department of Radio Electronics, Brno University of Technology, Brno, Czech Republic Email: [email protected] ‡ Institute of Telecommunications, Vienna University of Technology, Vienna, Austria Email: [email protected] ∗ Dept.

Abstract—A novel methodology for QoE evaluation in the social network environment is proposed. It provides high applicability for subjective testing of the multimedia services with respect to real usage scenarios. The environment of social networks provides also significant demographic data and ability to contact extremely many test subjects while allows to focus on or filter specific social groups. QoE results for HD internet video services are presented and followed by discussion on their statistical significance. Keywords-QoE Methodology; Social Networks; Multimedia streaming;

I. I NTRODUCTION The notion of Quality-of-Experience (QoE) and Social networks has gained strong interest, both from a research and a commercial perspective. QoE refers to an understanding of the qualitative performance of communication systems and applications that transcends traditional technology-focused Quality-of-Service (QoS) parameters [1], [2]. The concept is tightly coupled to the subjective perception of the end user (see Fig. 1). This user-centricity is also reflected in the most widespread definition, originating from the ITUT SG 12, which describes QoE as ”overall acceptability of an application or service, as perceived subjectively by the end user”, which ”may be influenced by user expectations and context” [3]. This user-centric perspective on quality is particularly relevant for multimedia streaming type of services (e.g., IPTV; internet TV) where large volumes of audio-visual data are delivered to the end-users premises in real-time [4]. The social networks provide a unique multidimensional platform, integrating social interaction supported by multimedia services and entertainment activities. Recently, games and other web applications gained major attention and popularity, and besides the user’s entertainment, they often provide efficient tools for additional purposes (e.g., psychological studies, marketing, or crowd sourcing) [5]–[7]. Networks usually allow for a high level of customization of their own account and also creating, sharing and distributing their own ideas, multimedia contents, news, or applications within the social network environment. The Facebook.com environment provides a unique platform to

develop own applications as well as to investigate technical and user centric aspects of multimedia services. Especially the Facebook.com application provides a variety of development tools, which help the application to be rapidly spread among the huge Facebook community [8]. Furthermore, the Facebook.com applications allow for assessing demographic and social related data (of course with user permission, e.g., the Facebook Social Graph provides direct information about gender, age, education, user environment and background, etc.) [9]. This particular feature is extremely useful for further investigation of user perceptional aspects of multimedia services. The actual user based assessment methodologies for multimedia have been designed originally for TV broadcasting services and have been modified for new digital multimedia (e.g., internet TV, mobile streaming or digital television). However these assessment methodologies are extremely time consuming and poorly reflect ”real world” scenarios. In order to cover the most significant QoE aspects, the ”real world” scenarios should consider: context, user expectations and technical system aspects. The proposed assessment methodology introduces a more efficient and more QoE aware method exploiting social network environments. This paper is organized as follows. Section II describes available standardized methods for video and audiovisual quality assessments. In Section III we discuss factors, which influence the overall QoE, and we describe how these factors are incorporated in our methodology. Section IV describes technical aspects of the proposed QoE assessment application and in Section V achieved results are presented. Finally Section VI finishes the paper with some conclusions. II. U SER BASED ASSESSMENT The standardization bodies introduced a few assessment methods. The actual methodologies focus at assessment reproducibility in fully controlled environment [11], [12], [14], [15]. The viewing conditions in recommendation ITUT P.910 [14] defines: viewing distance, peak luminance of the screen, ratio of luminance of inactive screen to peak luminance, etc. Such strict test environment determination is very important for further test reproducibility on the one

hand. On the other hand, strict test environment determination can lead to poor reflection of the ”real world” viewing scenario and can lead to a systematic deviation of subjective assessments. This behaviour has been observed and in a study of consistency between results at different testing laboratories, it has been found that systematic differences can occur between results obtained from different laboratories [11]. Furthermore, the actual standards recommend quite small test groups (15 - 30 participants), which can also caused a lack of sufficient statistics. With various viewing scenarios available today, the variety of end user devices and viewing conditions comes hand in hand. In a real world, the end user device are usually tuned for more than one usage scenario, or viewing conditions do not comply with any standard or recommendation. Also, even a controlled laboratory environment with the standardized and reproducible conditions cannot properly reflect this end user’s viewing conditions. Furthermore, there is a critical difference between testing in a laboratory environment and testing QoE in environments, where end users actually feel most convenient, and where they are actually using the provided service. In a laboratory environment, some testing period can last up to 60 minutes (with briefing and debriefing phase), but the subjects are not familiarized with the conditions and often are rewarded for the time spent in the lab itself. This naturally significantly influences the context and expectation. The situation is a bit different, when performing tests in the natural home or office environment. The social network environment takes full advantage of the Absolute Category Rating (ACR) method, resulting in the average per-sessiontime approx. 5 minutes, preserving thus the lowest possible ”leaving rate” on the one hand and minimally stresses the test persons on the other hand. The short assessment session does not allow for evaluating large test sets but the huge community within the social network allows to requests for testing a multiple subjects. Our experience shows, that it is possible to obtain results from dozens of people within 24 hours. This introduces a great flexibility into the subjective

QoE QoE Test environment / Facebook application User

QoS Streaming server

  Figure 1.

Relationship between user, application and network.

Personal Characteristics Environment

Context Content Social and Cultural

Application Type Image and Brand

Expectations

QoE

Usage History

Codec End User Devices

Technical System Video Delivery Service Type

Figure 2.

Influence factors of quality of experience [10].

testing methodologies, and it offers to run assessments as a quick response to the system configuration changes. Moreover ACR was considered as the best-suited method for media streaming quality testing, because it can be very well tailored for many viewing scenarios and minimally influences the context and expectations (see Fig. 2). III. C ONTEXT INFLUENCE AND Q O E In the past few years, social networking arose to a whole new dimension, because it very well reflects human needs to communicate within their own community. Modern social networks rely not just on the messaging concept, but extend communication and information exchange to a wide spectrum of multimedia and data applications, such as multimedia content sharing, video broadcasting, or news sharing. This allows for creating extensive user online profiles with many valuable demographic and social data. At the time of writing this paper (06/2011), it concentrates nearly 700 million active users, 46% of them are under 25 years old and another 26% is under 35 years. An average user creates 90 pieces of content each month and more than 30 billion pieces of content are shared each day [17]. These are important facts of social networking for research projects, since the user is accustomed to use them on a daily basis, and therefore it is a non-obtrusive way how to address the users and ask them for participation in the assessment. According to the official Facebook.com statistics, people spend over 700 minutes per month in this page [17]. Applications, and especially games based on Facebook.com applications gained great popularity. According to the Nielsen Co., the top two activities for Americans online are social networks and games together accounting for about one third of their time on the Net. Those two trends converge in games based on Facebook.com applications [16]. This greatly helps us to target the social network community with our proposed project, since users are accustomed for using the appli-

Table I S PATIAL AND TEMPORAL CHARACTERISTICS OF CONTENT CLASSES . Test sequence Action Movie Soccer

SI 48.705 90.665

TI 50.183 31.831

cations. Depending on the project proposal, and research needs, the application can be often designed so that the user feels natural during the testing session, or even enjoy the delivered content. Last, but not least, with simple forms or exploiting functionalities of Internet browsers, we can quickly gain information about how and where the QoE assessment application has been used. A. Audiovisual content The video and audiovisual perception is content dependent. Therefore, it is very important to select more content types for audiovisual evaluation. Furthermore, it is also crucial to choose the most representative sequences representing diverse contents, according to selected scenarios. In our scenario the selection of the content types is based on channel popularity with diverse content types based on market research. We choose soccer and action movie since they are two major content classes and they significantly outnumber others content classes. Further investigation of selected video sequences confirms significantly different temporal and spatial character of video sequences. Table I shows temporal and spatial information (TI and SI) values. B. Social and Cultural Aspects Social and cultural aspects (solitary vs. group viewing, cultural norms) differ from user to user, and strongly affect the overall perceived quality of experience. Collecting data, which could describe these aspects, can be very exhausting, and it is not always possible to obtain them during the test session. The QoE assessment application collects test subject’s personal data (gender, education, social status, language skills, environment, etc.) only with his permission. The user is fully aware of data, which are offered to the developer of the application, and can deny accessing his profile as well. During the testing session, we can extract various data, which can properly describe user’s social environment and also cultural background. The personal data allows targeting specific user groups (according to gender, social status, or other), which can improve assessment significance with respect to focused group. Finally, the data extracted from Facebook.com QoE assessment application allows for a much deeper understanding of QoE aspects than laboratory assessments. IV. T ECHNICAL SYSTEM Besides the context influence on the QoE, the technical system has an enormous influence on the overall perceived

quality. Our technical solution provides a seamless coupling of Facebook and video streaming technology. When dealing with Internet video and audio streaming, several choices are available, such as Flash video, MPEG-4 based codecs or WebM project codecs. For multimedia content delivery, the HTTP streaming method, also known as pseudo download was chosen. With this method, it is ensured, that the contents presented to the user, is exactly the same as those produced and saved on the server as it progressively downloads the file to the user’s end device and plays after sufficient size have been buffered to the player. In contrast with native streaming methodologies such as RTMP, we do not have to observe network quality parameters (nonetheless it is possible). Finally, the selected streaming technology is independent from audio and video encoding standards or settings. A. Encoder settings All video sequences were encoded into HD screen resolution with the widely used H.264/MPEG-4 AVC video codec combined with advanced audio codec (AAC). Audio content was encoded using an AAC codec with 48 kHz sampling rate and 96 kbit/s bit rate. No differences in audio codec settings were made for testing sequences, since it is not the scope of this paper to evaluate the influence of various audio quality settings on the overall audiovisual quality. For each of the two content classes chosen, three different representative scene cuts were selected. This results into six reference audiovisual sequences. Each of these sequences was further encoded into five equally distanced video bit rates, 2 Mbit/s for the best quality, and 800 kbit/s for the lowest quality (300 kbit/s step size). Hence, after the encoding process there were 30 video sequences, which were separated into three different testing sets. In the quality assessment itself, one of these sets was randomly selected, and presented to the user. Hence, each user performed a quality assessment on ten audiovisual sequences, various scene selections and various video bit rates B. End User Devices Another technical factor that influences the overall QoE is the end user device. Although in laboratory environment we can use devices, which are most relevant for the testing scenario, but no matter how good the laboratory is equipped, and no matter how good it relates to reality, the best QoE test results are still to be achieved only in a real environment. In internet streaming scenarios, end user devices vary greatly, in both hardware and software parameters. Targeting our QoE assessment application on the HD multimedia content, we cannot guarantee that this content will be also watched on native HD capable screens. Investigating statistics [19], one can see that the most common screen resolution on the Internet is within 800p vertical lines. About 50% of

the users use resolutions of 1024x768, 1280x800 or similar. Great differences occur in experiencing the quality on different screen resolutions. While the most frequent resolutions mentioned above are sufficient for watching 720p HD content, they are not sufficient for 1080p full HD content; the same argument can be turned around. The user is far more sensitive to distractions, when watching HD content on a full HD capable screen. Another problem is the aspect ratio of the end user screen and the provided content. With the internet browser capabilities and Facebook user’s account permissions, we can track the aforementioned screen resolution, operation system, browser type, etc. Knowing these parameters, we can track various system impacts on the end user’s quality of the provided service. It is also possible to target test sets on the type of the end user device. V. Q O E A SSESSMENT Several studies have been performed either in video quality or audiovisual quality surveys [20]–[22]. These studies present a subjective assessment in standardized laboratory environment. This assessment was performed and further compared to either subjective or objective assessments methods. In order to provide an extensive valid comparison of our proposed methodology with standardized methods [14], [15], [20], [21], statistical properties from the obtained assessment were examined (Fig. 6). A. Statistical relevance of obtained subjective evaluations The assessment methodologies [11], [12], [14], [15] recommend performing subjective tests in controlled laboratory environment on a set of approximately 30 video sequences presented to each test subject in a single test session. ITU [12], [14], [15] recommends the size of the test group between 15 and 40 participants. However presenting the whole video set in a single session is time consuming and can result into respondent’s fatigue, distractions or learning effects. In previous study [20], there were 24 subjects to evaluate each set consisting of 16 videos [20]. The ACR rate was used as test methodology. Their results show 95% confidence intervals (CI) raging from 0.2 to 0.5 mean opinion score (MOS) points, with the average 95% confidence interval at around 0.3 MOS points. The width of the confidence interval is dependent on the standard deviation of the results for a given visual or audiovisual content, and the number of participants in the testing session. If it is possible to achieve a similar standard deviation in our proposed scenario as in the laboratory environment, then a high number of participants in the testing session will result into a narrower 95% CI. However, our proposed methodology is somehow different from that procedure, which is usually performed in the lab. In our scenario, we want to achieve a very short testing session, so that it is ensured that the participant stays focused on the rating and will not leave the QoE assessment

application before the test session ends. From the whole set of 30 videos, three subsets of ten videos are selected, and only one of these subsets is presented to the user. Therefore one user votes only for ten audio-video files, so the voting is very reliable since for the 5 minute long test session, it is very likely that no distraction or fatigue of the user will occur. The disadvantage of this procedure is that for the evaluation of all 30 videos more participants are needed. However, this disadvantage will have a great positive impact on the overall results, since with a higher number of participants, the credibility and reliability of the results rises. Also as mentioned above, if the standard deviation of the results is relatively low, we can achieve even narrower 95% confidence intervals No matter how well we can exploit the advantages of the social network, the number of participants is still a very limited source. With this in mind, it is very important to establish a minimum number of participants needed to achieve a certain confidence interval. The estimation is based on the Student’s distribution. First the sample variance of the sample selection has to be calculated as defined in: n

s20 (x) =

1 X (xi − x)2 , n − 1 i=1

(1)

where n is number of evaluations used for estimation and x is the sample mean. Then the accuracy parameter d has to be established according to: µ − d ≤ x ≤ µ + d,

(2)

where µ is the true population mean. d represents the allowed error in estimation, and for the given significance level α ∈ h0, 1i it represents the accuracy of the sample mean estimation in comparison to the true population mean. The significance level is related to the confidence interval as defined by probability P = 0.95 (α = 0.05). Finally, the minimum sample size is calculated from the Student’s distribution according to: t1− α2 (n − 1) = d 

nmin

2

∗ s20 (x),

(3)

where t1− α2 (n − 1) is the 1 − α2 quantile of the Student’s distribution with (n − 1) degrees of freedom. In Table II and Table III estimations of the minimum number of participants are presented, estimated from the Student’s distribution for a given width of the 95% confidence interval. This estimation is based on sample results from our scenario. It can be seen, that if we want to achieve similar confidence of our results as those achieved [20], e.g., 0.25 MOS points, we need on average 32-36 participants, but it depends on the presented sequences and its variance. Estimating minimum number of participants for every single quality setting enables to continue with the testing session

Table II M INIMUM SAMPLE SIZE FOR ACTION MOVIE SEQUENCES .

0.1 0.15 0.2 0.25 0.3 0.35

800 kbit/s 196 87 49 31 22 16

1100 kbit/s 226 100 56 36 25 18

1400 kbit/s 190 84 47 30 21 15

1700 kbit/s 160 71 40 26 18 13

2000 kbit/s 128 57 32 21 14 10

Count  

±d

1  

Table III M INIMUM SAMPLE SIZE FOR SOCCER SEQUENCES .

Confidence   Interval  

0.1 0.15 0.2 0.25 0.3 0.35

800 kbit/s 333 148 83 53 37 27

1100 kbit/s 149 66 37 24 17 12

1400 kbit/s 217 96 54 35 24 18

1700 kbit/s 287 128 72 46 32 23

Figure 4.

2000 kbit/s 219 97 55 35 24 18

2  

3   MOS  

4  

5  

Action movie 800 kbit/s MOS Histogram.

30   25   Count  

±d

16   14   12   10   8   6   4   2   0  

20   15   10   5  

0.350   0.300  

0  

0.250  

1  

0.200  

2  

3   MOS  

4  

5  

0.150   1.0  

1.5  

2.0  

2.5  

3.0  

3.5  

4.0  

4.5  

5.0  

MOS  

Figure 3.

Figure 5.

Action movie 2000 kbit/s MOS Histogram.

95% Confidence intervals as function of mean opinion scores.

until the demanded sufficient statistics is achieved. The mentioned number is relatively small compared to the potential of the social network, and in the first tests we already got evaluations from 50-60 subjects. On Fig. 3 95% confidence intervals are depicted, achieved for various MOS values. The population mean of our results is only 0.25-0.3 MOS points far from the estimated sample mean with the probability equal to 0.95. This figure supports the obtained results of our proposed methodology, since similar results are usually obtained also in lab environments. B. Assessment results In order to show the performance of the proposed methodology, audiovisual user based assessments were performed. The testing sets were configured, so that the impact of the video bit rate on the overall audiovisual QoE could be examined. To omit the influence of single testing sequences, as already described in an earlier section of the paper, for the given content class, three different scenes where encoded with different video bit rates. After all files are evaluated, the mean MOS values are explored not for every sequence, but in general for the content class and video bit rates. By this way, for three different audiovisual sequences in the action movie content class only one average value of MOS scale is obtained. To provide a better example, on Fig. 4 and Fig. 5 histograms of votes for the action movie

content class sequences are presented, encoded to 800kbit/s and 2000 kbit/s. On presented figures the distribution of votes for a given content class and video bit rate can be seen. The action movie content class with 800 kbit/s video bit rate results into 2.74 average MOS points, where most of the participant voted with 2 MOS points. The best quality action movie sequence achieved on average 4.11 MOS points, whereas the most frequent MOS value was 4. Finally on Fig. 6 average MOS scores for given bit rates are presented with their corresponding 95% confidence intervals. It clearly depicts a rising trend of average MOS scores with rising bit rates. In general, soccer sequences were classified with lower MOS values compared to action movie sequences, which could be caused by higher user demands on the quality of this content class. But this has to be further examined in the next studies. The lowest quality video files with a video bit rate set to 800 kbit/s are evaluated with average 2.54 and 2.75 MOS scores for soccer and action movies, respectively. Best quality video files achieved MOS scores around 4 points, with minimal differences between 1 700 kbit/s and 2 000 kbit/s for action movie content class. Difference between these two setting is only 0.08 MOS points, which provides the possibility to reduce the video bit rate, to save some bandwidth, with minimal loss in perceived QoE. The maximum difference between lowest and highest quality files is below 2 MOS score points, which offers a sufficient difference for comparing and evaluating the quality

5   4.5   4  

MOS  

3.5   3  

Soccer  

2.5  

Ac>on  

2   1.5   1   800  kbit/s  

1100  kbit/s  

1400  kbit/s  

1700  kbit/s  

2000  kbit/s  

Video  bit  rate  [kbit/s]  

Figure 6.

MOS values with their 95% confidence intervals.

of experience of presented content. VI. C ONCLUSION The presented methodology explores the newest trends within QoE evaluation and social networks for video streaming technology, and provides a novel method for user based assessment. The methodology provides a high applicability for subjective testing of the multimedia services with respect to real usage scenarios. The QoE assessment application provides significant demographic data in combination with the ability to contact a large amount of test subjects. It allows to focus or filter specific social groups. Considering the very short test session and above-mentioned facts, the user is generally more relaxed, and more confidently evaluates the presented audiovisual sets. With the social data available throughout the social network, we can target certain specific user groups, with no need of performing exhausting social studies. All these facts support the high statistical significance of the results. Finally, thanks to the ability of contacting many test participants, it is possible to perform a lot of evaluations within a short time, significantly supporting the statistical relevance of the obtained results. ACKNOWLEDGEMENT The support of the project CZ.1.07/2.3.00/20.0007 WICOMT, financed from the operational program Education for competitiveness, is gratefully acknowledged. This work was partially supported by the COST IC1003 European Network on Quality of Experience in Multimedia Systems and Services - QUALINET (http://www.qualinet.eu/). R EFERENCES [1] P. Reichl, J. Fabini, C. Happenhofer and M. Egger, ”From QoS to QoX: A Charging Perspective,” In Proceedings of the 18th ITC Specialist Seminar on Quality of Experience. Blekinge: Blekinge Institute of Technology, p. 35 - 44, May 2008. [2] ITU-T Recommendation E.800, ”Definitions of terms related to quality of service,” International Telecommunication Union, 09/2008.

[3] ITU-T Recommendation P.10/G.100, ”Vocabulary and Effects of transmission parameters on customer opinion of transmission quality,” International Telecommunication Union, 2006. [4] H. J. Kim and S. G. Choi, ”A study on a QoS/QoE correlation model for QoE evaluation on IPTV service, ”The 12th International Conference on Advanced Communication Technology (ICACT), vol. 2, pp.1377-1382, 7-10 Feb. 2010. [5] W. Fan and K.H. Yeung, ”Virus Propagation Modeling in Facebook,” 2010 International Conference on Advances in Social Networks Analysis and Mining, pp.331-335, 9-11 Aug. 2010. [6] F. Fovet, ”Impact of the use of Facebook amongst students of high school age with Social, Emotional and Behavioural Difficulties (SEBD),” Frontiers in Education Conference, 2009. FIE ’09. 39th IEEE , vol., no., pp.1-6, 18-21 Oct. 2009. [7] B. Kirman, S. Lawson and C. Linehan, ”Gaming On and Off the Social Graph: The Social Structure of Facebook Games,” International Conference on Computational Science and Engineering, 2009. CSE ’09, vol.4, pp.627-632, 29-31 Aug. 2009. [8] Facebook.com, ”Social channels,” Retrieved 4.7.2011. [Online] http: //developers.facebook.com/docs/guides/canvas/#channels [9] Facebook.com, ”Graph API,” Retrieved 4.7.2011. [Online] http:// developers.facebook.com/docs/reference/api/ [10] M. Ries, P. Froehlich, R. Schatz, ”QoE Evaluation of High-Definition IPTV Services,” 21st International Conference Radioelektronika 2011, 19-20 April 2011, ISBN 987-1-61284-322-3 [11] ITU-R Recommendation BT.500-12, ”Methodology for the subjective assessment of the quality of television pictures,” International Telecommunication Union, 09/2009. [12] ITU-R Recommendation BT.800, ”Methods for subjective determination of transmission quality,” International Telecommunication Union, 08/1996. [13] ITU-R Recommendation BT.709-5, ”Parameter values for the HDTV standards for productions and international programme exchange,” International Telecommunication Union, 04/2002. [14] ITU-T Recommendation P.910, ”Subjective video quality assessment methods for multimedia applications,” International Telecommunication Union, 09/1999. [15] ITU-T Recommendation P.911, ”Subjective audiovisual quality assessment methods for multimedia applications,” International Telecommunication Union, 12/1998. [16] D. Kushner, ”Betting the farm on games,” Spectrum, IEEE , vol.48, no.6, pp.70-88, 2011. [17] K. Burbary, ”Facebook Demographics Revisited - 2011 Statistics”, March 2011, Retrieved 4.7.2011.[Online]. http://www.kenburbary. com/2011/03/facebook-demographics-revisited-2011-statistics-2/. [18] J. Braun, ”Worldwide TV unaffected by the crisis!” Eurodata TV Worldwide / Relevant partners. [Online]. http://www.international-television.org/archive/2010-03-21 global-tv-euro-data-worldwide 2009.pdf. [19] Netmarketshare.com, ”Screen Resolutions,” June 2011, Retrieved 4.7.2011. [Online]. http://netmarketshare.com/report.aspx?qprid=17# [20] Quan Huynh-Thu, M. N. Garcia, F. Speranza, P. Corriveau and A. Raake, ”Study of Rating Scales for Subjective Quality Assessment of High-Definition Video,” IEEE Transactions on Broadcasting, vol.57, no.1, pp.1-14, March 2011. [21] N. Staelens, S. Moens, W. Van den Broeck, I. Marien, B. Vermeulen, P. Lambert, R. Van de Walle and P. Demeester, ”Assessing Quality of Experience of IPTV and Video on Demand Services in Real-Life Environments,” IEEE Transactions on Broadcasting, vol.56, no.4, pp.458-466, Dec. 2010. [22] A. Peregudov, E. Grinenko, K. Glasman and A. Belozertsev, ”An audiovisual quality model of compressed television materials for portable and mobile multimedia applications,” IEEE 14th International Symposium on Consumer Electronics (ISCE), 2010, pp.1-6, 7-10 June 2010.

Suggest Documents