Digital libraries and cloud computing

8 downloads 6657 Views 1MB Size Report
a series of web-connected virtual digital libraries through Cloud computing, which can be replaced the ..... Security guidance for critical areas of focus in cloud.
2012 Digital libraries and cloud computing: A proposal for developing the digital libraries of universities

ARC8015-ARCHITECTURE IN THE INFORMATION AGE

Student | Confidential Ayman Al-Hafeth Student ID: 099163187 8/30/2012

Digital Library and Cloud Computing

Digital libraries and cloud computing: A proposal for developing the digital libraries of universities

Ayman Al-Hafeth

“Comes from the early days of the Internet where we drew the network as a cloud… we didn’t care where the messages went… the cloud hid it from us” Kevin Marks, Developer Advocate for OpenSocial, Google.

Abstract The information age has made futuristic steps into boosting our everyday life. One aspect of its manifestation would be its impact upon digital libraries, including its abilities to preserve and present its multi-services throughout cyberspace. Libraries in the near future may soon be a network of virtual buildings which can manage its own database through individual virtual centres, such as clouds computing, a new technology that uses the internet and central remote servers to maintain data and applications. This perspective would let libraries and digital libraries preserve and maintain more control over the metadata, data storage, and applications, which may contain private and sensitive information of their participants who are using the database. The maintenance and provision of infrastructure for Web-based Digital Libraries would present several challenges. In this paper, the author will discuss the idea, problems faced by digital libraries in parallel with the usage of Cloud computing. In addition to the development efforts and solutions to overcome these problems by presenting a series of web-connected virtual digital libraries through Cloud computing, which can be replaced the contents of physical libraries with a Cloud infrastructure one. Cloud computing, as an infrastructure virtualisation, is undoubtedly an attractive choice which is usually challenged by the growth of indexed collections of documents, its prominent usage, and its new features. This paper will describe the current status of conventional service models in university libraries and their connection to public libraries, with the purpose of applying a Cloud computing infrastructure. Then, it proposes a solution to improve the current services model with cloud computing layer IaaS.

Keywords: Digital library, Cloud Computing, IaaS, PaaS, SaaS

1

Digital Library and Cloud Computing

Contents Introduction: An overview of digital libraries ................................................................................. 3 The confused term of digital libraries ............................................................................................. 3 Characteristics and advantages of digital libraries .......................................................................... 5 Problems of Digital Libraries ......................................................................................................... 6 What is Cloud Computing? ............................................................................................................ 7 Cloud Computing and Digital libraries ........................................................................................... 9 The proposed plan of the Cloud Computing Digital Library ......................................................... 10 Conclusion .................................................................................................................................. 12 References ................................................................................................................................... 13

List of figures: Figure 1 Diagram of the Cloud Computing explains varied services by the provider. Source (UPDATE YOURSELF) .................................................................................................................. 7 Figure 2 Diagram of the hype cycle for Emerging Technologies, shows that cloud computing will be more transformational technologies in the next few years(Mark Raskino). ................... 8 Figure 3 Diagram of the Amazon Elastic Compute Cloud (Amazon EC2), source (cloud computing for digital libraries, Department of Computer Science University of Cape Town) .. 9 Figure 4 Plan of the proposed Cloud Library shows the location of main cloud at the University and the distribution of followers Clouds in the city. .................................................. 11

2

Digital Library and Cloud Computing

Introduction: An overview of digital libraries

Diane Kresh defines the digital library as "a library in which a significant proportion of the resources are available in machine-readable format (as opposed to print or microform), accessible by means of computers” (Kresh and Resources 2007). The digital content may be locally held or accessed remotely via computer networks. in libraries, the process of digitisation began with the catalogue, moved to periodical indexes and abstracting services, then to periodicals and large reference works, and finally to book publishing. William Y. Arms shares the same idea with Kresh, claiming that a digital library is an informal collection of information, stored in digital formats and accessible over a network, together with associated services (Arms et al. 1997). Most of the research about digital libraries flourished in the mid1990s with the advent of the internet, along with the need to make information open and easily accessible, in addition to lowering the cost and respecting the copyrights of authors. William Y. Arms explains that the role of the digital library is essentially to collect, manage, preserve, and make accessible objects (Arms 1995a). To this effect, at a minimum, the core services expected of digital library systems include: a repository service for storing and managing digital objects; a search service to facilitate information discovery; and a user interface through which interact with digital objects. Historically, the notion of digital libraries can be traced back to some research by foreseer scientists such as Vannevar Bush (1945) and J.C.R. Licklider (1965) pursuing and identifying the goal of approaches and innovative technologies toward knowledge sharing as fundamental instruments for progress(Bush 1945; Licklider 1965). However, the evolution of the concept of ‘digital libraries’ has not been linear, which has created several conceptions of what they and their roles are. Each of these perceptions has been influenced by the perspective of the primary discipline of the conceivers or by the concrete needs each was designed to satisfy. As a natural consequence, the ‘history’ of digital libraries is the history of a variety of different types of information systems that have been called ‘digital libraries’ (Candela et al. 2011). These systems are very heterogeneous in scope and functionality and their evolution does not follow a single path. In particular, when changes have happened, these have not only meant that a better quality system was been conceived superseding the preceding ones, but also meant that a new conception of digital libraries was born corresponding to newly raised needs (Arms et al. 1997). Nevertheless, looking back at the individual achievements of the projects and initiatives, it can clearly be seen that there is substantial commonality among many of them; the bottom‐up development of the field so far has provided enough ‘data points’ for patterns to emerge that can encapsulate all efforts.

The confused term of digital libraries

Like any contemporary technical term, the digital library has some contradictions and confusion surrounding the phrase itself, which stems from three factors. The first one would be about the library community, which has used several varied phrases over the years to explain this concept, including ’electronic libraries’, ‘libraries witho ut walls’, and ‘virtual

3

Digital Library and Cloud Computing libraries‘; it never was quite clear which of these different phrases were meant to be the "digital library". Gary Cleveland explains that simply because “digital library” is the most current and the most widely accepted term that is now used almost exclusively online, at conferences, and in the literature (Cleveland and Dataflow 1998). This is also approved as the survey made by Saracevic and Dalbello to confirm the usability of technical terms (2001). Therefore, the term itself can be changed according to the need and the source, regardless of its nature. The second factor that adds more confusion is that digital library nowadays is the focal point and subject of many different areas of research, and for some research communities, what may constitute a digital library differs depending upon the perspective describing it (Nürnberg et al. 1995). For instance, from the viewpoint of information retrieval, it is a large database, and it is becoming bigger and bigger over time. For the library science, it is still another step in the continuing automation of libraries that began over 25 years ago (Nürnberg et al. 1995). It can be clearly seen here that these differences come from not unifying the system that can provide one application and one database, which can with the idea of cloud computing, which will be explained later in this article. Currently, it is possible to accept the idea that what so many people consider to be a digital library is the WWW (World Wide Web), which is a gathering of millions and millions of documents, along with media, social networks and other entities. Thus, those who think that this huge collection of resources and informatio n is a digital library, like may use the term, ’digital bank’ (Trivedi 2010). Clifford Lynch does not accept this idea as he explains: “One sometimes hears the Internet characterised as the world's library for the digital age. This description does not stand up under even casual examination. The Internet—and particularly its collection of multimedia resources known as the World Wide Web—was not designed to support the organised publication and retrieval of information as libraries are. It has evolved into what might be thought of as a chaotic repository for the collective output of the world's digital "printing presses...In short, the Net is not a digital library” (Lynch and GarciaMolina 1995, p. 52). We can assume that digital libraries are like traditional libraries with the same functions, purposes, and goals, such as management, indexing, referencing, and preservation. The American Digital Library Federation came up with a notion of ‘digital library‘ because it is mainly constructed to serve particular communities (Chepesiuk 1997). D.J. Waters says: “Digital libraries are organisations that provide the resources, including the specialised staff, to select, structure, offer intellectual access to interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities” (Waters 1998). Therefore, digital libraries cannot be a chaotic mass of accumulated information like traditional websites for individuals or for groups of experts. Therefore, in examining different examples of what are called 'digital libraries', it is clear that even librarians have been confused about what digital libraries are. That the word "library"

4

Digital Library and Cloud Computing itself has been appropriated by many different groups to describe either their area of research or signify a simple collection of digital objects.

Characteristics and advantages of digital libraries

Cleveland demonstrated in his research Digital Libraries: Definitions, Issues and Challenges the characteristics and features of digital libraries that can be extracted from different resources and discussions (Graham 1995; C. Lynch & Garcia-Molina 1995; Chepesiuk 1997; Thong et al. 2004) -They are digital faces of traditional libraries that include both digital and traditional collections. -They also include digital materials that exist outside the physical and administrative boundaries of any one digital library. -They include all the services and processes that are the foundation of physical libraries. -Ideally, they provide a coherent view of all of the information contained within a library, no matter what its form or format. -They serve particular communities or constituencies. -They require both the skills of librarians and well as those of computer scientists to be viable. Nowadays, public bodies and commercial interests widely recognise the advantages of digital libraries as a means of rapid and easy access for books, images, video and different types of materials. One of these advantages is storage space. Michael Lesk (1995) says, "Libraries have to face such difficulties as increasing costs for buildings and storage. Through a constantly growing number of books there is not one library not complaining about lacking place for book storage" (Lesk 1995). Although traditional library buildings are becoming bigger and bigger over time because of the increasing numbers of publishing and printed papers, digital libraries have the potential opportunity to store more materials, simply because digital information requires very little physical space to store it (Arms 1995, p. 71). Furthermore, the cost of storage itself would be significantly lower than in physical libraries. Consequently, digital libraries would benefit from these fees in another ways, such as tackling its technical problems. In addition, the most important advantages of digital libraries are to increase accessibility to its users, wherever they are and whe never the need, as well as providing them with new forms of communication, such as blogs and local social networks (Rajashekar 2002). Trivedi (2010) summarises these advantages: -No physical boundary. Anyone can access the digital library as long as there is an internet connection. -Working for twenty-four hours a day, seven days a week -Multiple access. Where multiple usage can be applied for the same information and for an unlimited number of users

5

Digital Library and Cloud Computing -Information retrieval. Where any user can search for any term, phrase, title, name, and subject of the entire collection. -Preservation and conservation of some valuable items, which are vulnerable to degradation over heavy usage or long periods of time.

Problems of Digital Libraries

Although of the appropriateness of digital libraries and its ubiquitousness in most fields of recent sciences, it still has several problems that can be avoided by a new form of information technology platform called ‘cloud computing’. But before the identification of what is cloud computing, these problems, should be mentioned briefly by recording some of researchers' opinions. Cleveland states that "there is one fact that digital libraries will not be single and complete digital system that could provide instant access to all information, for all sectors of society, from anywhere in the world" (Cleveland & Dataflow 1998). Cleveland comes to this statement after considering the lawless situation of the Internet. It is true that individuals are facing some chaos when they are dealing with a space full of freedom, in which can everyone build his own place and try to gather as many participants as possible. But, it is also a place where everyone can organize himself and try to build his own giant website that controls all sub-pages synchronisingly. However, Cleveland tries to present some future solution to this problem after analysing it and reconsidering the possible alternatives:

"The problem, however, is that across multiple digital libraries, there is a wide diversity of different data structures, search engines, interfaces, controlled vocabularies, document formats, and so on. Because of this diversity, federating all digital libraries nationally or internationally would an impossible effort. Thus, the first task would be to find sound reasons for federating particular digital libraries into one system" (Cleveland & Dataflow 1998). This can happen by allowing some software to tackle this problem through cloud computing, for it has the potential possibility to unite varied interfaces into one software or one operating system. This software does not have to be in local hardware, but can be on the cloud, where everyone has an official account to run it the way he likes. M. Lesk (1995) identifies two major problems with current digital libraries that could be solved as well by cloud computing, when he describes them as ‘the problems of technological obsolescence’, which has two aspects: Hardware: the technological obsolescence of the devices for reading data is a problem, since it is a real challenge for the hardware to keep up with the changing variety of digital objects.

6

Digital Library and Cloud Computing Software: software obsolescence is a more serious problem than that of hardware. There are so many software formats existing and being created, that they become obsolete much faster than hardware devices, which take a longer time to be designed and manufactured. These two problems can be solved with cloud computing, by plugging into the internet as it will upgrade itself by itself, or by some controllers who take care of the upgrading, so there is no need for fearing of this obsolescence. Regarding universities' digital libraries, which are the most effective role for students, the problem would be more connected to the cost of building these libraries. T.B. Rajashekar (2002)mentioned the following disadvantages of universities’ digital libraries: “Building a digital library at university is an expensive process. It is not hard to collect digital resources, but it takes much more money and efforts if the original resources exist in a physical form and require transformation into digital objects”. However, Rajashekar realised that the cost probably would decline as ready solutions are now available and it is possible to choose the favourable way between building a digital library with own efforts and buying an existing solution (Rajashekar 2002). Yet, there is another problem related to the cost of content refreshing. Digital preservation will always stay an on-going operation, requiring considerable replicating expense. Rajashekar claims that this problem would decline as long as the cost of technology keeps declining and the cycles of information refreshing will be cheaper and cheaper. Most of these solutions are related now to the unification of all-ready software that can be run through cloud computing(Rajashekar 2002).

What is Cloud Computing?

Figure 1 Diagram of the Cloud Computing explains varied services by the provider. Source (UPDATE YOURSELF)

7

Digital Library and Cloud Computing Cloud Computing is a versatile form of technology that can support a broad spectrum of applications. The low cost of cloud computing and its dynamic scaling renders it an innovation driver for small companies, particularly in the developing world. Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. Networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (Brunette and Mogull 2009) see fig.1. M. Armbrust (2010)says about cloud computing: “It is the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Developers with innovative ideas for new Internet services no longer require the large capital outlays in hardware to deploy their service or the human expense to operate it”. Moreover, regarding cost, companies with large batch-oriented tasks can get results as quickly as their programs can scale, since using 1000 servers for one hour costs no more than using one server for 1000 hours (Lagoze et al. 1996, p. 63). These features of cloud computing would be so useful nowadays, given the increasing number of people who cannot stand without handled devices of some sort (see fig. 2). This ranges from cell phones to iPhones to BlackBerrys, and the list goes on. Increasingly, people who rely on information demand that that information be readily available no matter where they are. Robert Fox (2009) says about this concern: “One of the problems with accessing information form such a diverse group of devices is that those devices are not standardized on any protocol (usually) except HTTP. And, because HTTP is so widely used as an information distribution mechanism, the internet/web is the primary mode of information access. It is very handy for people to have certain kinds of information aggregated in a common, familiar interface”

Figure 2 Diagram of the hype cycle for Emerging Technologies, shows that cloud computing will be more transformational technologies in the next few years(Mark Raskino).

8

Digital Library and Cloud Computing So, basically, cloud computing is an advanced unified aspect of using computing resources, such as hardware and software, that can be delivered as a service over a network (mainly the Internet) by offering special accounts for the users or clients. The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Usually, Cloud computing entrusts remote services with a user's data, software and computation.

Cloud Computing and Digital libraries

Typically, Cloud computing offers its benefits through three types of service or delivery models, namely infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS) and software-as-a-Service (SaaS). It also delivers service through four deployment models: public cloud, private cloud, community cloud and hybrid cloud (Waters 1998; Armbrust et al. 2010). In the case of implementing the best choice for the digital library, it would be the Community cloud. That because it shares a single infrastructure between several organisations, such as university libraries and public libraries, from a specific community with common concerns (security, compliance, jurisdiction, etc.), whether managed internally or by a third-party and hosted internally or externally. Before setting the preferable model of cloud computing for digital libraries, each model should be considered individually: Infrastructure as a service (IaaS): where cloud providers offer images of physical computers, or more often as virtual machines, and other resources for users who run it as guests. Then, the users install their operating system (as images) on these machines as well as their application software. Example of IaaS: Amazon CloudFormation (Amazon EC2) see fig. 3.

Figure 3 Diagram of the Amazon Elastic Compute Cloud (Amazon EC2), source (cloud computing for digital libraries, Department of Computer Science University of Cape Town)

9

Digital Library and Cloud Computing

-Platform as a service (PaaS): Cloud providers deliver a computing platform, typically including the operating system, the programming language execution environment, the database, and the web server. Example of PaaS: Amazon Elastic Beanstalk -Software as a service (SaaS): Cloud providers install and operate applications in the cloud and cloud users access the software from cloud clients. The cloud users do not manage the cloud infrastructure and platform on which the application is running. Example of SaaS: Google Apps Basically, it can see that the (IaaS) supplies an image of virtual machines including raw blocks as files-based storage, which is useful for multiple users, in addition to firewalls, which are essential for the protection of individuals' accounts.

The proposed plan of the Cloud Computing Digital Library

The idea of connected cloud computing for local digital libraries (particularly academic digital libraries) has come from building a special software system by using the (IaaS) service model, which can be run by cloud computing. This social interactive software has the ability to be connected with its users, mostly university students, by providing individual accounts that would allow them to run their own shelves of digital books like they, virtually, do inside with their own physical shelves after installing their own software. This software consists of their preferable operating system, their modified virtual desktop, as well as their files exchange storage. This ability to visualise a digital library can be enhanced by providing Ethernet server platforms that can be locked and secured enough to preserve their secured shelves by provision of individual accounts and firewalls. The same activities that can happen in real physical libraries, such as borrowing; organising; lending; tagging or labelling books; searching; making interested notes; Interacting with others readings and commenting ; sharing their experiences; even donating their books for free; etcetera, can also happen using cloud computing Digital Library with the full advantages of digital libraries. However, there are two factors that are necessary in order to build a virtual library. First, cloud computing needs its own servers, which means it must located in particular spaces in a city, in addition to primary servers that control the cloud. Here, the role of the architects and urban planners (urban designers) are as providers of solutions for the supplying particular city areas with these servers, according to the needs and the expected future development plan(see Fig. 4). The second factor would be about preserving a continued clean power system that can keep these servers running constantly. This system can be achieved through the use of solar energy, as it is clean, sustainable, durable, and cheap in the long run. This idea will also raise the awareness of saving power, as long as individuals will realise the benefits for their future.

10

Digital Library and Cloud Computing However, there is one question that can be raised here: there are some websites that have similar ability like cloud computing, and can provide this service for its clients and costumers even for free, so why to move to the cloud then? Although it is correct, these websites are not connected to local libraries and their updates, news, or even their plans of buying books in the near future. Moreover, these websites are not concerned about having singular storage or virtual drives for each user. Even with the assumption that they do, it would be difficult for them to cooperate and integrate with multiple and different operation systems of varied PCs, laptops, smartphones, tablets (iPad, Galaxy tab), etcetera. In addition to the usual updates and upgrades, which could exhaust much memory of these individual sets with the risks of unexpected collapse? This is why to integrate local software and applications into one local operational cloud computing system.

Figure 4 Plan of the proposed Cloud Library shows the location of main cloud at the University and the distribution of followers Clouds in the city.

11

Digital Library and Cloud Computing Conclusion

Due to the necessity of a platform to build an infrastructure of a shareable and interactive digital library, this platform has to be connected and contained within a local shareable and flexible servers that are not specified by limited space, and can handle multiple users at the same time with the ability to protect their own data. The IaaS model of Cloud Computing is suitable for this function, because of its ability to provide freedom to users, as well to install their own software with personal storage and can be connected with varied university digital libraries. The idea of a cloud computing infrastructure connecting the digital library of a university with local public libraries and other specific areas will provide educational districts within a city that can attract individuals from the locked spaces of physical libraries to open places with sustainable power. Consequentially, it will activate these areas with more functions, which may change the future of the city in addition to introducing interactive action zones between citizens.

12

Digital Library and Cloud Computing References

1. Armbrust, M. et al., 2010. A view of cloud computing. Communications of the ACM, 53(4), pp.50–58. 2. Arms, W.Y., 1995a. Key concepts in the architecture of the digital library. D-lib Magazine, 1(1). 3. Arms, W.Y., 1995b. Key concepts in the architecture of the digital library. D-lib Magazine, 1(1). 4. Arms, W.Y., Blanchi, C. & Overly, E.A., 1997. An architecture for information in digital libraries. D-Lib Magazine, 3(2). 5. Brunette, G. & Mogull, R., 2009. Security guidance for critical areas of focus in cloud computing v2. 1. Cloud Security Alliance. 6. Bush, V., 1945. As we may think, 7. Candela, L., Castelli, D. & Pagano, P., 2011. History, Evolution, and Impact of Digital Libraries. E-publishing and digital libraries: legal and organizational issues, pp.1–30. 8. Chepesiuk, R., 1997. The Future Is Here: America. American Libraries, 27(1), pp.47–49. 9. Cleveland, G. & Dataflow, I.U., 1998. Digital libraries: definitions, issues and challenges, IFLA, Universal dataflow and telecommunications core programme. 10. Department of Computer Science University of Cape Town, cloud computing for digital libraries. Available at: http://people.cs.uct.ac.za/~lpoulo/index.html [Accessed August 30, 2012]. 11. Fox, R., 2009. Library in the clouds. OCLC Systems & Services, 25(3), pp.156–161. 12. Graham, P.S., 1995. The digital research library: Tasks and commitments. In Digital Libraries. pp. 11–13. 13. Kresh, D. & Resources, C. on L. and I., 2007. The Whole Digital Library Handbook, American Library Association. Available at: http://books.google.co.uk/books?id=fTGFBdalTV0C. 14. Lagoze, C., Lynch, C.A. & Daniel Jr, R., 1996. The Warwick Framework: A Container Architecture for Aggregating Sets ofMetadata. 15. Lesk, M., 1995. Why digital libraries? London: UKOLN. 16. Licklider, J.C.R., 1965. Libraries of the Future. 17. Lynch, C. & Garcia-Molina, H., 1995. Interoperability, scaling, and the digital libraries research agenda. In Iita digital libraries workshop. pp. 18–19.

13

Digital Library and Cloud Computing 18. Mark Raskino, Mastering The Hype Cycle — How to Choose the Right Innovation at the Right Time. Available at: http://blogs.gartner.com/hypecyclebook/ [Accessed August 30, 2012]. 19. Nürnberg, P.J. et al., 1995. Digital libraries: Issues and architectures. In Proceedings of ACM Digital Libraries. pp. 11–13. 20. Rajashekar, T.B., 2002. Digital Library and Information Services in Enterprises: Their Development and Management, Year. 21. Thong, J.Y.L., Hong, W. & Tam, K.Y., 2004. What leads to user acceptance of digital libraries? Communications of the ACM, 47(11), pp.78–83. 22. Trivedi, M., 2010. Digital libraries: functionality, usability, and accessibility. Library Philosophy and Practice (e-journal), p.381. 23. Waters, D.J., 1998. What are digital libraries. CLIR issues, 4(1), pp.5–6.

14