Relational Databases Watermark Technique Based on Content Characteristic Yong Zhang1,2, Xiamu Niu1,2, Dongning Zhao1, Juncao Li1, Siming Liu3 1 Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055 P.R. China; 2 Shenzhen Innovation International, Shenzhen, Guangdong Province 518057 P.R.China 3 School of Mechanical Engineering and Automation, Beijing University of Aeronautics and Astronautics, Beijing 100001 P.R.China
[email protected],
[email protected],
[email protected]
Abstract With the development of the Internet and wide applications of databases, the databases providers are worrying about their ownership of the databases. Combined with the digital watermark technique and the characteristic of relational databases, a scheme of watermarking relational databases based on content Characteristic is proposed in this paper. The corresponding algorithms of watermark embedding and detection are proposed and analyzed by probability. Then some watermark embedding and detection experiments are conducted, and the results are analyzed and discussed. In additional, some attacked experiments are conducted to prove the feasibility and the robustness of the proposed algorithms.
Key words: Copyright Protection, Data Security, Digital Watermark
1. Introduction With the rapid development of Internet techniques, the copy and distribution of electron data become easier and easier. Simultaneity, the illegal copy and tampering of electron data seriously increase. How to solve the problem of copyright protection? The digital watermark technique emerged in earlier 1990’s can solve the above problem effectively[1, 4]. Now the research of digital watermark technique mostly focuses on the objects such as image, audio, video, software and text, etc. And the research on relational databases watermark technique is occasional. However, the increasing use of databases in applications is creating a need for protecting data copyright in databases [2, 3, 4], and the owners of relational databases worry about their data being pirated and invaded by others, too. We
must find some methods to solve the copyright protection of relational databases as quickly as possible. Watermarking relational database is a solution to the problem, but there is little relative work on it. In this paper, we proposed a scheme of watermarking relational databases based on content characteristic. The rest of the paper is organized as follows. Section 2 describes simply the research status of this field. Section 3 specifies the watermarking algorithms based on content characteristic. Section 4 gives a formal interpretation of the algorithms through the probability. Section 5 conducted some experiments and analyzed its robustness. And section 6 draw some conclusions.
2. Research Status For the large differences between the databases and multimedia data[2,3,4], the techniques developed for multimedia data, such as images, do not be applicable for the databases. The literature [2] is the first paper on watermarking relational databases in the world. It has some flaws that the embedded marks should be closely related to the primary key attribute of relational databases, the primary key attribute value could not be modified or replaced, or else, the scheme would has no meaning. In [3], although the usage effects of data are taken into account after some values of some attributes within some tuples are modified. But the size of the set is hard to confirm in the scheme. And it is difficult to find a better sort method. In [4], the watermark blind detection has not been implemented. Combined with the merits of them, we proposed an effective watermarking relational databases technique scheme based on content characteristic in this paper.
3. Our Algorithms In our scheme, it needs a rigor condition that the certain errors are tolerated for some numerical attributes of relational databases, too [2,3,4]. Thus under the rigor condition, we can extract and regard the local characteristic of an attribute within a tuple as a mark, and then embed the mark into another attribute within the tuple, then the tuple is called the matched tuple. The ratio of the matched tuples in relational databases and the detection threshold value preselected can be Table 1 Notations A1
the attribute to extract the local characteristic, called characteristic attribute.
A2
the attribute to embed mark into it, called watermark attribute, which
smaller than the embedded proportion a, and the value of the characteristic attribute A1 is not null, we can watermark the tuple. Step 6 and step 7 denote the extracted local characteristic from the characteristic attribute A1 being regarded as mark. And in the step 7, the function InterString(Abi, K, S) denotes to extract a section string as the local characteristic from Abi, the length of the section string is S and the extracted position from Abi is the back K position of the string Abi., and noted marki. If the length of the string Abi is less than S, the empty position of Abi will be filled with ‘0’. For example, InterString(‘11001010’,6, 3), the extract result notes that marki equals ‘011’. Step 8 denotes that the mark was embedded into the watermark attribute A2.
can be the same with A1 S
used to select the characteristic bits of A1.The length of the extracted
3.2. Watermark Detection Algorithm
characteristic bits K
used to select the characteristic bits of A1. the started position is the back K position of the binary value of A1
a
the embedded proportion of relational databases (0