Architectural Perspectives
DBMS[me] Craig W.Thompson and Patrick Parkerson • University of Arkansas
he first two installments of the new Architectural Perspectives column are about scaling the Internet and the Web in space and time. In my first column (Jan/Feb 2004), I considered a near-future world in which “everything is alive” and the Internet can access any instrumented realworld object, including nano objects. This issue’s column is about time and the deluge of data that we can expect to collect and view. We examine this from the perspective of a subproblem, a database management system that records a person’s life in its entirety — referred to here as DBMS[me] and E-me (that is, a database of me or an electronic model of me). Challenges for the DBMS[me] are to augment human memory by recording all data about an individual, to organize such data into models, and to develop security and privacy languages to control access to such models.
T
Your Assignment To demonstrate DBMS[me]’s utility to augment memory, pretend that it’s the first day of class in Database Management 101. You are given a short quiz: 1. Where were you on 14 April 1997? Some of you whip out your diaries or PDAs; you can see what you were up to that day and the memories return. For those of you without such recording devices, could it be you didn’t really exist on the day in question? What about all those other days you don’t remember? Don’t you, as a human being, live every minute and value the record of your achievements, actions, thoughts, and feelings? Then, why didn’t you record this information? 2. Pick a person in class and tell me all you know about him or her. It’s hard to tell much about people you don’t know. In fact, it’s even hard to
IEEE INTERNET COMPUTING
1089-7801/04/$20.00 © 2004 IEEE
recall things about those you do know — your parents or grandparents — or, after they’ve passed away, what they told you about their lives or your childhood. You might summarize their lives to future generations by saying something general, such as, “They were nice.” Their details, stories, mannerisms, personalities, and accomplishments are mostly memories — hard to recall when the person is no longer with you and often lost entirely to future generations. 3.Pick an ancestor 15 generations back and tell me all you can about them. Even after years of genealogy research, you’d be lucky to determine birth and death dates, names, locations, and similar information. You’d be very unlikely to learn of their personality or important events, local color, and place in history. You’re lucky this quiz won’t be graded. It was simply to drive home the point that, as much as we value ourselves, the people we hold dear, and the sanctity of life, most of us don’t bother to keep very good records of our personal existence. As such, we squander the opportunity to provide ancestral memories to our descendants. Your homework for tonight is a one-page paper detailing the design for a “database” that will record a complete record of your life. There are many issues to consider. How often will your database record data (every second, every day, or only certain events, for example)? What kinds of data types will you need? Will you record yourself in your environment, or your environment as you perceive it? How can you query the data? What views (data subsets or aggregations) will you share, and with whom? How will you present the data for others’ consumption?
History We all know that history consists of many timescales and points of view but, until now, the record of what
Published by the IEEE Computer Society
MAY • JUNE 2004
85
Architectural Perspectives
is recorded about the past has been sparse — but this is changing: • The father of one of the authors, a geologist, used to walk into his backyard in Santa Barbara, California, and exclaim, “I cannot believe how fast those mountains are growing!” His timeline involved the history of the universe (now thought to be 13.7 billion years old), the earth, and the fossil record. • Humanity’s archaeological record is recorded in the strata just below the earth’s surface or in museums and personal collections. Using
• The modern historical record captures information in many mechanical and electronic forms, from photographs to phonograph records, audio- and videotapes, computer memories, CDs, DVDs, databases, and the Web. We might loosely call this last, digital historical era the E-record. It differs from previous eras by the sheer volume of recorded information, the many forms and formats, and the unprecedented ability we have to retrieve, manipulate, and view subsets of data. For example, at the time of
The E-record differs from previous eras by the sheer volume of recorded information, and the unprecedented ability to retrieve, manipulate, and view subsets of data. increasingly sophisticated forensic technologies, we can sometimes determine details about the past. For instance, Otzi the Iceman ate grain eight hours before he died in an Alpine mountain pass in late summer 5,300 years ago (www.pbs. org/wgbh/nova/icemummies/ iceman.html). Aggregating over time and space, we begin to build a record of people, communities, and civilizations that seemingly have vanished until we piece together a mosaic from fragmentary bits of evidence. • Before the electronic age, the historical record was captured on stone, papyrus, and paper. Written histories recorded an explosion of detail on how events shaped peoples, and biographies recorded how individuals shaped events. Diaries and letters — at least the few that survived — provide personal histories. The mosaic becomes more textured.
86
MAY • JUNE 2004
the invention of personal video cameras, anyone could afford to videotape their entire life for less than US$25 a day, and that cost is now much less. Steve Mann at the University of Toronto has spent 30 years developing and sporting the wearcomp wearable computer instrumented with sensors and the wearcam wearable camera, thus laying a convincing claim to be one of the world’s first “cyborgs” (www.eecg.toronto. edu/~mann). Similarly, from the beginning of PCs, a growing number of individuals could afford disk space to save and index all files and email sent or received. Adding family pictures, audio and video recordings, résumés, papers, software, bank and credit-card transactions, taxes, medical and employment records, and a daily diary captures an individual’s high points, which begins to provide the DBMS[me]. There are, of course, some bits of the physical record that we could
www.computer.org/internet/
consider preserving — possessions (such as a stamp collection), achievements (diplomas, works of art, and such), or a person’s DNA. Let’s consider, for the sake of the following argument, that all of these could be transformed into digital form so that the entire record of interest was digital. Yet, the record remains incomplete; it misses many things that make me, well, me — including my every observable action on 14 April 1997 (or more broadly, the past), angles of view not captured by future wearcams, and, especially, thoughts and feelings.
Packaging the E-Me Now let’s assume we want to package our personal histories into a form that we can pass along in a will, to our descendants 15 generations hence, for example. How could we do this? How do we increase our assurance that time won’t erase the gift and that it will survive for eons? Moreover, can we protect privacy but still share different views with different viewers? Existing Technologies One puzzle is what form the gift should take. Should it be in the form of an archive, such as a directory of files, email, videotapes, and so on? Or should it be packaged for easy access using ModelMe, a popular future application software package for modeling individuals? Most likely, the answer is both. It should be possible to augment our record with adjunct technologies. Today, we might use these add-in technologies: • MPEG-4 Facial Animation Specification. This standard provides a way to encode facial expressions, thus letting us create a personal avatar that looks realistic. We could even include different avatars to represent us as children and as we aged. • Voice generation. We might provide
IEEE INTERNET COMPUTING
DBMS[me]
a voice for our avatar with voice generation augmented with a personal speech profile. In the other direction, speech-to-text technologies could convert our spoken words to text to help index our video archives. • Chatbot. This technology could provide knowledge-driven models that recount stories or a day’s activities using fragments of past experiences — a next-generation Jabberwacky (www.jabberwacky.com). • Databases. We might use information-retrieval and database technology (such as XML Query) to respond to text or semistructured record queries. This list is just a start and could include increasingly sophisticated knowledge representation, ontology tools, vision systems, and much more. Technologies We Still Need When considering storage technologies, it is important to keep in mind that many recording formats degrade as they age. At an even faster pace, new recording technologies continually replace older ones — for example, it is becoming difficult to find record players. Because of this, the entire volume of previous generations of technologies must be converted to a new format every N years or risk becoming unreadable. A second puzzle involves a record’s survivability. When an archaeologist digs up a site, its physical record is lost. Professional archeologists preserve a derived subset of that record in the form of intermediate and final reports (providing they publish their results). Weekend “archaeologists” might preserve valuable objects but lose the find’s provenance and historical context. If a person’s entire record is electronic and we provide a means to upgrade to the latest recording technologies, perhaps this problem will be easier to solve in the future. We can expect that, over time, it will also
IEEE INTERNET COMPUTING
become easier to replicate the record for reliability, ensuring that a disaster or a virus doesn’t destroy the only copy of a complete record of a person or civilization. We could predict that, as the volume of data we want to store increases, the cost will decrease more quickly — as a result, it might eventually be cost effective to store many copies of the complete history of the world. A remaining challenge related to survivability is determining if the record will survive humanity and eons while remaining readable. Solutions to this might involve packaging E-mes in such a way that they won’t degrade or
that interacts with an individual while emulating an experience. Veracity and Privacy When considering the archaeological record, even a powerful person from the past, such as mummified Pharaoh Ramses II, does not have lasting personal rights to privacy today. Could an E-me model really represent the person being modeled? What if the Eme model’s subject expunged embarrassing moments or chose to enhance real life as Toad in Kenneth Grahame’s The Wind in the Willows was wont to do:
Maybe this will evolve into technology that will permit anyone to have a “virtual conversation” with a model of Ben Franklin, Mother Teresa, or Brad Pitt. will be self-healing. Other technologies will likely improve over time so that our E-me knowledge repository can improve. As DBMS technology evolves to make it easier to organize and query heterogeneous data sets, new capabilities will make it easier to create views (for example, the financial view or the health view) that help answer questions such as, “Where did my money go these past three months?” or “Have I had fewer colds since I started drinking orange juice every day?” AI technology will improve to provide better models of an individual than the simple chatbot. Maybe this will evolve into technology that will permit anyone to have a “virtual conversation” with a model of Ben Franklin, Mother Teresa, or Brad Pitt — anyone with an E-me model. This might eventually be delivered in a 3D virtual-reality hologram format — something like the “holodeck” from Star Trek, a synthetic environment
Perhaps of more concern, what if others were to revise or interpret the record? Perhaps digital watermarking would preserve the original, and some sort of smarter Cut-and-Paste operator could copy not only text or other media but also provide a reference to the copy’s origin (as Ted Nelson’s Xanadu hypermedia work suggests; see http:// xanadu.com.au/media/insearch.html). It would be a boon if next-generation applications provided this capability! We could use current and future security technology to encrypt parts of our E-me records; perhaps owners
www.computer.org/internet/
MAY • JUNE 2004
Indeed, much that [Toad] related belonged more properly to the category of what-might-have-happenedhad-I-only-thought-of-it-in-timeinstead-of-ten-minutes-afterwards. Those are always the best and the raciest adventures; and why should they not be truly ours, as much as the somewhat inadequate things that really come off? (Chapter 11)
87
Architectural Perspectives
could also grant others the right to view and add to them using a variant of digital rights schemes that today protect digital media (such as music and e-books) or classified government information. Hand in hand with security is the issue of privacy. Where does an individual’s right to privacy end? Could law enforcement unlock otherwise private parts of DBMS[me] via a search warrant? Would an employer have full access rights to the portion of an DBMS[me] that represented the person at work (probably)? What if, in 100 or 10,000 years, no one with access to any parts of an E-
record (or scenario) of everything observable that happens. These messages could be replayed to simulate the agent’s behavior. Furthermore, we could use queries to select subsets of agent interactions to permit simulation views. A query to isolate every message sent to or received by a given agent might provide us a DBMS[that agent]. It might be unrealistic to store all messages in a single logging agent. Nevertheless, the information content of the collection of logging agents represents a history that we could query. That is, maybe not all infor-
We could use policy-management languages to control who has what rights,on what data,and under what circumstances. record remained? Would this be viewed as a sad event, the loss of the record of a human’s existence, or will statues of limitations permit Earchaeologists to open the time capsule and “dig up” the past?
Scaling Up So far, we have assumed E-me is a model of an individual and that the model is a self-contained DBMS. What if we considered how a collection of such E-me models might interact with each other? Agents Consider an agent modeling capability in which an E-me agent represents every person, robot, vehicle, equipment, sensor, data source, application, or resource. Agents send and receive messages among themselves, represented, say, in XML. For present purposes, assume all messages are blind carbon-copied (bcc) to a logging agent that dumps the messages in a master DBMS. The messages DBMS is now a complete, explicit
88
MAY • JUNE 2004
mation about an individual agent would be in a given DBMS[that agent], but it could still be accessible to a query. A query to collect every purchase made by a given credit card during a specific billing period might result in a transaction log of what was purchased and when — just like a monthly statement. Wait a minute! We have such distributed repositories now — our banks and credit-card agencies keep them, as do our utility companies and those we do business with including healthcare providers, financial advisors, as well as our employers, educational institutions, and churches. It is difficult to query across them all to see a complete record of what is recorded about a given individual, but this is getting easier with money-management software. Perhaps this trend will continue until our finances, family pictures, activity logs, and calendars are integrated to build the E-me model from a large collection of data sources, such as property, insurance, credit, work, health, and other records.
www.computer.org/internet/
Enclaves and Policy Management In the Star Trek television series, one enemy, the Borg, formed the Borg Collective, in which, if one Borg learned something, this information was immediately known to all fellow Borgs. By sharing information, the society as a whole learns more quickly. Drawbacks to sharing information include a loss of privacy (because every individual knows everything) and as such, there are no secrets and no private knowledge. It is unlikely that we all want all data to be shared always. We need technologies that partition knowledge into datasets and views of datasets. And we need technologies that let us control the sharing of these resources. We could use policy-management languages to control who has what rights, on what data, and under what circumstances. We do this for some kinds of data today (for example, relational DBMSs) but not yet for all the data types that will be needed for DBMS[me], nor for the fragmented record that is distributed across media, field of use, and each collection agency’s organizational boundaries. LifeLog In January 2004, DARPA’s LifeLog project (www.darpa.mil/ipto/Programs/ lifelog/) was cancelled after the US Congress received the proposals, but before significant progress had been made. LifeLog attempted to create technologies that would provide DBMS[me]-like capabilities. Civil libertarians criticized the program, arguing that LifeLog was too invasive, capable of capturing not only our transactions, but also how we feel. Interestingly, we can view LifeLog as a step toward Vannevar Bush’s Memex device, “a future device for individual use, which is a sort of mechanized private file and library … a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate sup-
IEEE INTERNET COMPUTING
plement to his memory.” Bush published this article 60 years ago (“As We May Think,” The Atlantic Monthly, July, 1945, www.theatlantic.com/unbound/ flashbks/computer/bushf.htm). With or without LifeLog, the volume of data that humans want and can afford to easily record about themselves is increasing rapidly. Like so many other forms of technology, this capability can bring great good (for example, augmented memory, a legacy for future generations, or possibly even a form of immortality). But the same technology can create great risks (such as identity theft and ethnic cleansing), and the trend toward data collection at the expense of privacy can be viewed as a worldwide puzzle. If you did your Christmas shopping on the Internet, then many different companies have your credit-card and personal-profile data. If people think, or know, that their universal health record (including mental health disorders or family secrets, such as the 10 percent of out-of-wedlock births estimated in Iceland) will be entered in a database, any trust between doctors and patients can disappear, such as in Iceland’s population-wide genetic data-collection scheme (“Iceland’s Genetic Jackpot,” Kristen Philiposki, Wired, Dec. 1999, www.wired.com/news/print/0,1294,32 904,00.html). Big Brother’s information systems, if connected, can know more about individuals than they know about themselves; with that knowledge could come control. As a clear warning, the Holocaust Museum in Washington, D.C. contains graphic descriptions of a real government organized to use genetic records for ethnic cleansing. Still, canceling LifeLog barely makes a dent in how quickly we will develop the range of technologies to provide a near-perfect record of our lives. Indeed, it is no longer a question of whether we will develop such technology — we already have much of it, and there is a strong demand for more
IEEE INTERNET COMPUTING
and better capabilities. This is a “manifest destiny.” Privacy technologies and public policy might lag, but they will not be far behind because they will be required to make DBMS[me] a viable market force. Challenges We might view recording all data about ourselves, turning this data into interactive models, while simultaneously building in privacy and security protections as technical challenges to realize the DBMS[me] vision. There are more challenges ahead as well. Will there be emergent behavior when models of individuals interact with each other? Will it change people’s lives to be able to consult past generations and ancestral memories? Can we engineer the survival of all memories of all individuals for future generations to come, covering horizons of millions of years? Of course, there is a question whether all this data will be interesting to future generations: Will only tidbits of past lives be retrieved? What metrics and data mining techniques will automate discovery of interesting events? Finally, there is the challenge of predicting unexpected uses for all this data. Craig Thompson is professor and Acxiom Database Chair in Engineering at the University of Arkansas and president of Object Services and Consulting. His research interests include data engineering, software architectures, middleware, and agent technology. He received his PhD in computer science from the University of Texas at Austin. He is a Senior Member of IEEE. Contact him at
[email protected].
NEW for 2004! IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING Learn how others are achieving systems and networks design and development that are dependable and secure to the desired degree, without compromising performance. This new journal provides original results in research, design, and development of dependable, secure computing methodologies, strategies, and systems including: • Architecture for secure systems • Intrusion detection and error tolerance • Firewall and network technologies • Modeling and prediction • Emerging technologies Publishing quarterly in 2004 Member rate: $31 print issues $25 online access $40 print and online Institutional rate: $525
Learn more about this new publication and become a charter subscriber today. http://computer.org/tdsc
Patrick Parkerson is an assistant professor of computer engineering at the University of Arkansas. His interests include integrated circuit design, ASIC/CPLD/FPGA design, design methodologies, and space electronics. He received his BSEE, MSEE, and PhD from the University of Arkansas. Contact him at
[email protected].
www.computer.org/internet/
MAY • JUNE 2004
89