There is a fascinating entry by Seb Chan of the Powerhouse Museum in Sydney documenting experiences one month into their participation in the Commons on Flickr. The Powerhouse Museum has been alert to various ways of combining professional and audience metadata in its services. It was an early comer to the Commons, joining the Library of Congress.
Our experiment with the Commons on Flickr continues and barring a few hours delay we have managed to keep to our promise of 50 new images a week. We’re up to 400 images now with the most recent 50 going live this morning. 158 of these have been geotagged. [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]
A couple of things struck me about his note. First, the volume of activity:
And second, he talks about the volume and quality of tagging activity:
Tonnes of tags have been added and they have been of a quality that we’ve not experienced in our other tagging projects. I am firmly of the belief that the quality is a result of the Flickr environment (lets call it ‘culture’) and its userbase. [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]
It will be interesting to see the promised three-month report. It seems to me that this shows the long-tail dynamic I have discussed elsewhere. A large part of the long tail effect is about better matching supply and demand by aggregating each in a network environment. Flickr aggregates supply: it provides a critical mass of pictures and community structure for sharing at the network level. It also aggregates demand by attracting large numbers of users, and creates value for them through its sharing structures. An individual institution has difficulty mobilizing this audience.
I bought [i] a New York Times yesterday in our local Borders and, over coffee, was interested to open up a large two page spread advertising the newly named Thomson Reuters, A new name and a new logo.
I was interested to read the following in the business pages (under an 'advertising' label).
Thomson’s desire to raise its public profile as it completes the $16.6 billion transaction is partly a reflection of an era when information has never been so accessible and the struggle to maintain profitability at the companies that provide it, particularly among incumbents, has never been more difficult.
“In the simplest terms, we see this as the opportunity to be the new power brand in the global information industry,” said Gustav Carlson, Thomson Reuters’ chief marketing officer. “We don’t simply accumulate data. Thomson’s strategic evolution has been from print to digital and now into a supplier of intelligent information.”
Thomson’s newspaper holdings once included The Times of London, The Globe and Mail in Toronto and an array of less distinguished smaller newspapers. But as it abandoned paper for digital publishing, Thomson became the antithesis of companies like Google that treat information as a no-cost commodity for selling advertising.
Instead, Thomson has focused on building vast databases of material that is dull to most people but of great value to professionals, and the company charges them accordingly. More recently, that data has been integrated into systems that sift through it, organize it and, in some cases, make suggestions to users about actions to take. A litigation lawyer researching a case involving asbestosis through the company’s Westlaw service, for example, will be presented with information from Thomson Scientific about the disease along with legal decisions related to it.[A Name to Herald Its Merger: Thomson Reuters - New York Times]
I was struck by the parallel with the Elsevier note I did a couple of weeks ago, on how information wants to be both free and expensive.
What we see here is a reallocation from the 'information wants to be free' arena, where business magazines (including Library Journal, part of the group being divested) are increasingly supported by advertising revenue and are in competition with a network environment rich in alternative sources, to the 'information wants to be expensive' arena where the value resides in providing business-critical information tightly integrated into workflow solutions. [Lorcan Dempsey's weblog: Free and not free]
Providers are taking steps to increase the value of data - through workflow integration, timeliness, data mining, and so on - to differentiate their offer from generally available information. In this case, the article talks about information never having 'been so accessible', and places the focus on added value professionally relevant information. The data is mined to create new relationships.
Thomson-Reuters seem to want to capture the idea of this added value in the expression "intelligent information". I am not sure if that works for me .....
Somebody I was talking to recently mentioned that they liked they way Microsoft implemented book search. In particular they mentioned the visual presentation of where in a book matched search terms occurred.
I had a look. Here is a screen capture of the first result in a search done this afternoon Ireland and globalization.
It is indeed quite nice. Another example of glanceabiity: a measure of how quickly and easily a visual design communicates useful information.
I got hold of a Kindle at work the other day, only for an evening as I had to pass it on. I didn't have it for long enough to form any realistic impression and I did not read a book on it. However, even based on this limited exposure, I thought that the reaction of my 9-year old son, Eoghan, was interesting.
He loved it, and for a few hours it even made it to the top of his Christmas list ...
What really struck me was that his positive reaction seemed to relate to how the device brought together a web experience and a book experience. It made reading more like the experience of the web, and it is the latter that conditions his experiences. But it did it in a way that made the experience portable.
So, he liked the fact that he could see reviews, other books by the author, samples and so on. He liked the ability to search, to browse other titles while reading, to collect materials into his own space.
He thought that the searching was poor, because it required you actually to spell things correctly ;-) His searching expectation includes spelling correction or a 'did you mean' feature.
In short he liked the 'in-book web experience' or maybe the 'in-web book experience'. He liked the ways in which reading a book mapped his more general web experience, and that he could carry it around.
This was in addition to a reading experience which seemed to work very well - he read several of the downloaded samples concentratedly. The expressed design goal of making the mechanics of reading disappear into the experience seemed to be achieved.
Sure, the device is not as smooth as an iPod but this didn't seem to be an issue for him: maybe it would be over time, I don't know. What was more important for him though than any clumsiness of navigation or control was what could be done with it.
The main downside to emerge - remember in a very short exposure - was the size of the available collection on Amazon. He was impressed that various titles were available, but we only found between a third and a half of what we looked for.
That said, Guitar Hero III has gone back to the top of the Christmas list ;-)
The whole world seems to have seen that Newsweek is carrying a cover story about Kindle from Amazon ;-) The story is pretty positive.
One notable aspect is the tight coupling of the service for delivering ebooks and other materials and the device for presenting them. This is a model we are familiar with from iTunes, but no intermediate computer is required here. The Kindle can connect wirelessly (using an EVDO-based service).
Specifically, it's an extension of the familiar Amazon store (where, of course, Kindles will be sold). Amazon has designed the Kindle to operate totally independent of a computer: you can use it to go to the store, browse for books, check out your personalized recommendations, and read reader reviews and post new ones, tapping out the words on a thumb-friendly keyboard. Buying a book with a Kindle is a one-touch process. And once you buy, the Kindle does its neatest trick: it downloads the book and installs it in your library, ready to be devoured. "The vision is that you should be able to get any book—not just any book in print, but any book that's ever been in print—on this device in less than a minute," says Bezos. [Amazon: Reinventing the Book | Newsweek.com]
The article, by Steven Levy, also discusses adoption of ebooks in general terms. It spends some time discussing patterns of reading and wonders to what extent the Kindle will support or shape new expectations. I suppose that this type of discussion is inevitable, but this type of yes_it_is/no_it_isn't the shape of things to come exchange is a little tedious. Our expectations and behaviors are continually being reshaped.
See the discussion by David Rothman on Teleread ("Do publishers and readers really want Amazon or Google to be the ultimate controllers of interactivity?") and Richard MacManus on Read/WriteWeb ("And now it looks like Amazon has, finally, taken the always-nascent eBook industry to the next level.").
I look forward to seeing one and trying it out. Levy notes: "Though Bezos is reluctant to make the comparison, Amazon believes it has created the iPod of reading." Of course, one important difference from the iPod model is that folks transfer their CDs to the iPod as welling as buying materials through iTunes. We will not be able to transfer the books in our current personal collections to the Kindle in the same way.
It is now conventional to make a distinction between what libraries own (e.g. books, DVDs, ...) and what they license (e.g. e-journals).
However, we can only use 'own' in a circumscribed way. This has been made clearer in the mass digitization projects. Libraries cannot do as they wish with the digitized copies of copyrighted material. And we know that in most library collections, a large part, maybe a majority part, is still covered by copyright.
What the library in fact 'owns' is the cost of managing the physical materials and of making them available to users. They do not 'own' the content, and are limited in what they can do with it.
In fact, they may end up licensing the very content that they thought they owned once it has been digitized.
Alma Swan has an interesting post discussing the value added by the publisher in copy editing and concludes that it is ... variable. She notes a publisher study:
Wates and Campbell looked at copy editing changes carried out on a set of science, humanities and social science articles at Blackwell Publishing (as was) and reported that the biggest category of corrections by the publisher was concerned with the references (42.7% of all copy editing changes), the next biggest category (34.5%) was concerned with minor syntactical or grammatical changes and a small proportion (5.5%) of changes corrected author ‘errors that might otherwise have led to misunderstanding or misinterpretation’. [OptimalScholarship]
I was interested in the attention to references. And I wondered whether the variety of tools introduced in recent years to help with the capture and management of such citation data (RefWorks, Zotero, etc) had reduced the number of errors spotted in a paper's references. It would be interesting to know how the corrections break down, as between errors in bibliographic sources, transcription errors, stylistic or completeness errors, and so on.
In the longer term, it will be interesting to see whether such data flows more easily with the potential introduction of citation microformats (I don't know what the status of this work is), or, say, if it were to happen, the introduction of support in something like Microsoft Word to allow structured data of this sort to be imported or exported. I still believe that we will see greater use made of a new 'bibliographic tissue' which connects the user environment and database resources through resources like citation managers, reading lists, social bookmarking, microformats and RSS feeds.
Incidentally, the discussion of copy-editing is by way of introducing a JISC-funded project looking at differences between versions of articles (different author versions, publisher version):
VALREC will ask stakeholders what levels of validation they would like to see, and what broad categories of differences would be helpful, such as ‘editorial differences’ and ‘content differences’. The project will then develop the technology to measure differences and generate a digital certificate for any article detailing the differences. An example of such a certificate is on the VALREC website. Not only will there then be a means to itemise the exact differences between the author-final and published version, but between other, earlier, versions of an article too, perhaps those first exposed on blogs or wikis. This will permit better formalisation and monitoring of the scholarly record, especially as authors move to early-use of repositories and informal web tools as part of the communications process. [OptimalScholarship]
The project is a joint one between Alma's company, Key Perspectives, which has done a lot of empirical work on open access and researcher behaviors, and the University of Southampton, which has been a major producer of tools, systems and data analysis in support of open access directions (see, for example, the eprints.org site).
Today Google and CIC announce an agreement to digitize ten million volumes across the CIC libraries. Google has been adding new partners since the first announcement was made about the Google 5. Some folks have wondered what rationale has governed selection of partner opportunities. We do not know, but they sure are moving fast! Here are some early thoughts.
The CIC announcement is interesting for several reasons:
It is a shared effort across a major group of libraries with significant collections. There appears to be strong CIC institutional commitment. Of course, CIC has a history of collaboratively sourced activities and this 'pooling' model makes increasing sense given the necessary policy and service challenges that need to be addressed. In this case, but also across a range of other issues that libraries face as they support changing research and learning behaviors in a reconfigured network environment. For some things, scale matters.
The libraries have a shared approach to managing the digital copies based on shared infrastructure at the University of Michigan, and serving them up to their user communities. An example of collaborative sourcing.
Google recently advertized for somebody to work on collection development and we seem to be seeing a stronger focus in this area. Collecting areas of importance within each library [pdf] have been identified for attention. Presumably, these decisions have been influenced by the 'collective collection' of the full Google parnership also.
This initiative in turn prompts some more general thoughts about access:
One of the most valuable features of the Google initiative is that it digitizes book content, allowing fine-grained discovery over topics, people, places and so on. Of course this presents interesting questions about indexing, retrieval, ranking, and presentation but the advantage of having this access seems clear. It drives use and sales, and it supports enquiry. Without it, the book literature is less accessible than the web literature.
However, as we are beginning to see on Google Book Search, we are really going beyond 'retrieval as we have known it' in significant ways. Google is mining its assembled resources - in Scholar, in web pages, in books - to create relationships between items and to identify people and places. So we are seeing related editions pulled together, items associated with reviews, items associated with items to which they refer, and so on. As the mass of material grows and as approaches are refined this service will get better. And it will get better in ways that are very difficult for other parties to emulate.
Currently this material is made available within the Google destination site. Google is an advertizing engine and its approach depends on aggregating attention for adverts. This apporach may be difficult to deploy within a more 'data services' approach where others - especially the partners - have remixable access to content and services. However, the 'utility' value of this resource will be diminished if it is not made available in this way so that others can mobilize these resource within their own environments. How and if this gets done remains to be seen. (See the related discussion about the search API.)
This type of access seems especially important for the partner libraries. In the early days of this activity there was some discussion of the types of services which would be built on top of the digitized books by the libraries. However, it is difficult, and maybe not very sensible, for the libraries to individually invest in some types of service development. An important factor here is that they cannot benefit from the network effects tha
Carla Montori spoke about the Google digitization initiative at the University of Michigan at the LIBER Think Tank on the future value of the book artefact and the future value of digital documentary heritage [pdf]. She reported an interesting finding: experience of the digital version of an item is creating demand for the physical item. One can imagine various reasons for this. For some folks and for some uses immersive reading may not be congenial in the digital environment. There may also be cases where there is interest in the original artifact. And there may be other reasons.
It does mean that it is unlikely that digital materials will be a simple substitution for print. This has implications for physical collection management. However, this does not necessarily mean that libraries retain all that they digitize on local shelves. It does mean that what is digitized may have to be quickly accessible within some framework of predictable and reliable delivery, whether that is local, or, increasingly, within a shared context.
Update: Check out the comment by John Wilkin. In my own experience, which is nothing more than that ;-), I will sometimes go to a print version having come across something first on NetLibrary or Google Book Search. It will be interesting to see what behaviors do emerge when we begin to have data to work with.
Link resolvers and the serials supply chain [pdf] is the title of an interesting report commissioned by the UK Serials Group and written by James Culling. From the summary:
The current knowledge base data supply chain is characterized by a complex series of roles, relationships and inter-dependencies between publishers, other content hosts, subscription agents, link resolver suppliers, libraries and others. [Link resolvers and the serials supply chainpdf]
The report argues that a lack of understanding between these stakeholders results in many inefficiencies.
It could be said that whilst the community's attention has been mostly focused on what it means to be OpenURL compliant, a code of practice and information standards to ensure knowledge base compliance and the efficient transfer of data through the supply chain have been sorely absent and overlooked. [Link resolvers and the serials supply chainpdf]
The report discusses issues from different stakeholder perspective. It recommends the establishment of an organization which brings stakeholders together around a "code of practice for effective participation in the knowledge base supply chain". Counter is suggested as a model.
My colleague Phil Norman alerted me to the publication of the report.
The Google digitization of books appears to have caught the public imagination. Recent weeks have seen high profile articles in The Economist [subscription required]and The New Yorker as well as several newspaper pieces (see the links and response on this OUP blog entry for example).
Google Book Search is a major endeavor, and Google have brought an impressive service online with impressive speed. The media stories tend to have different hooks. Inevitably, some pivot on a description of confrontation between publishers and Google; others discuss it in the context of a general digital turn, or the future of the book.
In some of the more reflective discussion I am interested to see a particular strand emerge. And that is the acknowledgement that the book, in its material form, is itself a designed and evolved technology, rather than a permanent or unchanging feature of our experience. Simply, this may involve talking about the 'technology' of the book. Or it may take more elaborate form.
Of course, the material book - its technology, circulation, reception, institutions - is a strong if diffuse field of enquiry. However, now, it is as if the change in perspective brought about by the digital turn has made the technology of the book more popularly visible and discussed just as that, as a particular technology which can be compared to others. And, having become used to talking about the impact on practice and potential of new technologies, we may now use that language to also describe earlier forms and their impact.
Seeing it in this way reinforces an awareness that the book itself, the codex, represents particular technological choices which in turn have influenced how we create and engage with the intellectual and cultural record, and in turn with broader experience and intellectual development.
This for example, comes from a recent discussion of copyright and book digitization by the writer John Lanchester. Incidentally, it is encouraging to see a piece which is so appreciative of a library and library staff. He talks about the technology of the library and of the book.
The buildings of the Bodleian are so old, and in their golden Cotswold stone so beautiful, that it is easy not to see how insistently modern an institution the library has tended to be. The very beginnings of the collection, in Duke Humfrey's Library above the divinity school, showed how Thomas Bodley's own bibliographic vision had to react to a technological shift. The new collection was built to accommodate the transition from the long-established, tried-and-tested technology of unique handwritten texts to the hot new mass-produced technology of the printed codex: in other words, the book. Duke Humfrey's Library has high stacks of shelves, which the reader can't directly access: the world's first closed stacks. These were designed to accommodate the increasing number of books too small to chain securely to open shelves, and were an important repository of copies from the Stationers' Company. Issues of copyright and of access to information were thus built into the institutional DNA from the start. The very layout of the buildings, with teaching "schools" tucked in the corners of the quadrangle, reflected new ideas about the connection between the library as a repository of information and the university as a place of instruction.[John Lanchester: Who owns what in the digital age? | News | Guardian Unlimited Books]
And after a marvellous description of the technology of delivery from the stacks, he concludes that "it is impossible not to miss the point: a library is a machine for storing and retrieving information". Later in the piece he quotes Richard Ovenden, of the Bodleain:
"The codex was a technological leap. It works very well, has done so for 2,000 years, and still does so - people still find it very easy to use. What digitisation does is to highlight that."
Here is another example, following nicely from the last comment. The origins of the codex were discussed recently in The New York Review of Books by Eamon Duffy, in a review of two books on the role of the book in early church history [available to subscribers or for purchase]. Here is his opening paragraph:
These two books are built on a single perception. Early Christianity was more than a new religion: it brought with it a revolutionary shift in the information technology of the ancient world. That shift was to have implications for the cultural history of the world over the next rwo millennia at least as momentous as the invention of the Internet seems likely to have for the future. Like Judaism before it and Islam after it, Christianity is often described as "a religion of the Book." The phrase asserts both an abstraction - the centrality of authoritative sacred texts and their interpretation within the three Abrahamic religions - and also a simple concrete fact - the importance of a material object, the book, in the history and practice of all three traditions.
Note how he talks about the book as a shift in information technology, and makes the comparison with the impact of the Internet explicit. In a fascinating piece, he goes on to discuss the practical and political reasons why the codex was favored over the scroll in early church writing. In the context of my point here, consider his later references to technology.
Why should the new religion have adopted this down-market and unfashionable book technology? ... However that may be, until recently surprisingly little has been made of this momentous foundational shift to a new book technology.
I think that this terminology is symptomatic of a positive trend, a recognition that the book itself, while central, influential and marvelously adapted to various uses, is not some natural given. It is another sign that we are moving beyond the reductive opposition between the book and the digital turn.
The reality is we are not running scared from Google - frankly it's quite the opposite. Through initiatives such as Google Book Search, Amazon's Search Inside the Book, Microsoft's Windows Live Book Search and many more, publishers have finally been presented with enough options that make financial sense so we are pursuing our digital future.
What we publishers have come to realise is that Google and friends have opened up the world to our content by showing us that discoverability and access leads to interest and opportunity. Publishers investing millions in asset repositories and digitisation hardly means that they are doing so to spite or protect themselves from Google. [FT.com / Comment & analysis / Letters - Google has opened up world to publishers' content]
It is interesting to see the strong link made between discoverability and opportunity, and the acknowledgement that discoverability is aligned with visibility in network level discovery services like Google.
Chapter 6 is about the Managed Learning Environment (MLE), a term used for the ensemble of systems and services which support learning in an institution. Weller identifies portal, library, student record system and content management system as important other components in the MLE. His discussion about libraries and the interaction with the VLE is concentrated in this chapter.
The relationship between VLEs and library systems reflects the changes in practice and internal politics wrought by the advent of e-learning perhaps more than any of the other systems. There is a sense in which the very identity of libraries and their function in the educational process is at stake. Just as e-learning has induced much navel gazing and concern amongst educators regarding their role, and the potential commoditization of education, so it is with librarians. The answer, however, is largely the same - e-learning makes the store of information less significant, but in such an information-rich world it makes the skills of dealing with information more valuable. [p. 67]
He then suggests a continuum of potential for the library, from redundant to central. "At one extreme the need for a library becomes superfluous - at its simplest this might be categorized as 'I've got Google, what do I need a library for?'" [p. 67] In this redundant model, necessary materials are loaded into the VLE, and it points to other resources out on the open web.
In the central approach, the library mediates access to content within the VLE, providing value in selection, purposing to particular tasks, metasearch and so on.
The VLE and library interface then is one fraught not only with problematic technical issues, but also with a political dimension. There have been no shortage of projects examining the interface between the two, indeed there is something of a project overload, without a real consensus reached as to the ideal configuration. The main areas where the two systems interface is with the location of resources, and more specifically the following:
locating and importing resources into a VLE;
storing data about new types of resources, for example learning objects, within library catalogues;
managing rights and clearance for resources;
indexing and describing resources. [p. 68]
He mentions that the VLE may be managed within different organizational contexts, including some in which it is located within the library. The relationship between the library and the learning management system has indeed been a topic of much discussion in recent years. And we are seeing a growing discussion about the role of the library in relation to e-science and data curation. As more activities move onto the network, workflow and information management become pervasive issues which prompt interesting questions about how academic support services are best configured.
The National Library of Wales is celebrating its centenary with a series of events and exhibitions, and with a new website.
I discovered this as I stumbled across the Wales-Ohio project, no less. As a resident of Columbus, Ohio, I need to watch how this develops!
The experiences of the Welsh settlers in Ohio are about to be made available to audiences throughout the world.
The goal of the Wales-Ohio Project is to digitise a selection of Welsh Americana relating to the state of Ohio held at The National Library of Wales and to make them available on an innovative bilingual website.
The website will display digital images of:
archive and manuscript material
printed material
photographs
maps
prints and paintings
giving us a feel of what life was like for the Welsh settlers in the nineteenth century from hardship and tragedy to prosperity and happiness. The site will also document the contribution the Welsh have made to the history and culture of Ohio. [National Library of Wales - Wales-Ohio Project]
There is a specially written poem by Gwyneth Lewis for the centenary.
The hardest place to be is here,
we need to imagine it and require
a library’s wormholes, its infinite doors.
Which reminds of the poem that Ted Hughes wrote some years ago to accompany New Library: the People's Network.
A funny video about a library website? Yes, it is possible ....
One of the highlights of the CIC 2007 Library Conference was in the presentation from Ellysa Cahoy of Penn State (you can see Ellysa's presentation on Slideshare). Her topic was how the library website should help the user make effective information choices. The presentation included a clip showing a user trying to find Time Magazine on the PSU website. Here it is:
Yesterday, Alan Rusbridger, editor-in-chief of the Guardian, told the staff of his newspaper that now “all journalists work for the digital platform” and that they should regard “its demands as preeminent.” [BuzzMachine » Blog Archive » ‘The web is preeminent’]
Folks will notice that I occasionally quote from the Guardian here. This is for a couple of reasons, but first among them is that it has really done more than most papers to move fully into a web environment. This is through a variety of mechanisms.
The 'digital platform' is 'preeminent'. The Guardian will be a "24 hour, web-first newspaper". However there are other platforms, including paper.
A parallel is sometimes made between libraries and newspapers in terms of some of the pressures involved in operating in a network environment. I have not seen many statements like this is a library environment though. In this context I was interested to read a paper by David W Lewis, Dean of Libraries at IUPUI, arguing that we should "complete the migration from print to electronic collections" and "retire legacy print collections".
BuzzMachine reference via John Naughton.
David W Lewis reference via ACRLog.
I mentionedAugustine of Hippo the other day, in the context of the interesting work that Google is doing to develop a contextual page for each book (resources about it, resources related to it, etc. See this Penguin Classic with the nice cover, for example). Searching for the 'City of God' in Google Book Search pulls together a couple of editions, and notice in the picture that a couple of editions of the second result are also pulled together (the indented entries and the un-indented entry they follow are versions of the same work).
I was looking at this because Dan Clancy, of Google, mentioned that they were pulling together members of work sets in this way at the Working Group on the Future of Bibliographic Control meeting the other day. This is interesting and valuable. Presumably this is happening programmatically. They do not pull in The City of God Against the Pagans further down the page, another translation of the same work. They do not pull in De Civitate Dei of which these are all translations. Although, they do pull together a couple of different selections of De Civitate Dei itself (and it might be reasonable not to pull together selections with the complete versions?). It does not pull in a French version.
Now, the entry for Augustine in Worldcat Identities says the following about The City of God: '369 editions published between 1466 and 2005 in 16 languages and held by 9,731 libraries worldwide'. Here are twenty-five.
I think that it is interesting to place an algorithmically generated (and still very partial) resource like the Google Book Search summary page alongside the 'expert' generated bibliographic data in library resources, and aggregated here in Worldcat and its derivative, Identities (and this is just the application to show up inconsistencies in the data!).
Suggesting that one approach is better then the other seems to me to be a fruitless direction. There is a lot to be said about each in many dimensions. They are complementary and can amplify, correct and refine each other. Over time the balance between them may change, but for the moment think how interesting it would be to have them working together.
Aside: In my previous GBS message, I mentioned how St Augustine was being recognized in A portrait of the artist as a young man and placed on the map in Florida. Looking at text of The City of God I was amused to see ads for real estate in Jacksonville and St Augustine scroll by at the bottom of the screen. That said, it seems to me that we are becoming increasingly tolerant of such errancies as a reasonable price to pay where the value of a programmatic approach is visible?
Peter Brantley, the new Director of the Digital Library Federation writes about the library digitization initiatives with Google:
We poisoned our hand before we played it. We were approached singly, charmed in confidence, the stranger was beguiling, and we embraced. For the love of selfish confidence, we spoke neither our fortune nor our misgivings with our neighbors or our friends. We felt special, invited to loud weddings on far away islands of adventure; in the quiet we may wonder if we were given broken jewelry. [shimenawa - Google and the books]
I was listening to the audio for the ALA Top Tech Trends session on the LITA Blog. Cliff Lynch discussed the emergence of personal libraries and used music as an example.
This reminded me that I had spent an interesting while (well, an hour or so) looking at the iTunes ministore over Christmas. I am sure that this is familiar to lots of folks. It comes up as a pane in your iTunes interface. When you select one of the songs in your 'local collection' it brings up other work by the artist and shows music that folks who bought that artist have also bought.
There are times when it does not give you results at the artist level, but falls back to the genre. I was occasionally surprised when this happened. Maybe not unsurprising for,say, Energy Orchard, but you would expect early Van Morrison - St Dominic's Preview [wikipedia], for example - to be matched.
On first look, the results were sometimes potentially interesting, sometimes unsurprising.
The tight integration between computer, network and device applications is a feature of the iTunes experience. Nevertheless, I was struck by the way in which my local collection was being used to generate responses from the network service.
It is another small example of how much of what we do increasingly participates in network interaction.
Incidentally, the Top Tech Trends talks are well worth a listen. Although, it would be nice to be able to speed them up ;-)
There is a famous passage in A portrait of the artist as a young man where Stephen Dedalus is talking to an English priest and is made uncomfortable about the Irish way in which he speaks English. Stephen uses the word tundish for funnel, a word unfamiliar to the priest. Stephen self-consciously jokes that it is called tundish in Lower Drumcondra - an area of Dublin - 'where they speak the best English' [wikipedia entry for Drumcondra]. The incident causes Stephen to reflect that he is speaking a language - English - that is not his own: 'the language we are speaking is his before it is mine' and 'his language, so familiar and so foreign, will always be for me an acquired speech'.
I think about these words from time to time, and I went to check the passage the other day. The easiest way to do so was to go to Google Book Search - even though I have a copy of the book somewhere on my own shelves. And, indeed, there are several editions of the Portrait available there.
Here is a link to a version digitized from the Stanford collection. Look at the 'key words and phrases': interestingly they have pulled out tundish. The list has names on it, and some phrases.
I was also interested to see the other features being added. They offer related editions (and a query syntax to retrieve them). They suggest 'related books', books which cite this one, and scholarly works which cite this book also. This last feature is provided by Google Scholar. Interestingly, they also pull place names from the text and place them on Google Maps.
Presumably, this is all done programmatically. The results may be patchy. I was interested to see 'St Augustine, Florida' placed on the map. Of course, the references in the book are to St Augustine himself, for whom St Augustine in Florida was named [wikipedia entry for Augustine of Hippo]. It is interesting that different versions give slightly different results. On one of the other editions I noted that Kingstown was put on the map in St Vincent and the Grenadines. Whereas the Kingstown in question here is the former name of Dun Laoghaire, south of Dublin [wikipedia entry for Dun Laoghaire]. Notes and prefatory materials, which of course change from edition to edition, are also usefully indexed, and get mixed in with results from the text itself. At first look, it was not clear to me how some of the listed scholarly works related to Joyce's book.
That said, it seems to me that this adds real value, and so long as they think it is worth attention and resource, it will continue to improve. They will make more knowledge emerge from the data they are working with.
I was disappointed though that Drumcondra - where the best English is spoken, and where my mother grew up - was not on the map .... ;-)
Aside: Dan Cohen points to another map example within Google Book Search. "Look at the bottom of this page for Illustrated New York: The Metropolis of To-day (1888), digitized by Google at the University of Michigan Library." Here the locations go down to street level.
Update: some links added. I see that the maps feature is noted in the Google Book Search blog.
In preparation for my trip to France recently, I read, in English, Jean-Noël Jeanneney's critique of the Google digitization of library books, Google and the myth of universal knowledge : a view from Europe ('available' here, on, er, Google Book Search). Whatever one's response to his argument, it is a reminder of how sparse the public policy discussion in libraries has been about the framework and direction of current digitization efforts. This is whether one welcomes these as pragmatic initiatives to broaden access or whether one is concerned about private control of the intellectual record. A couple of things have come over my horizon in the last few days.
Here is David Bearman discussing the book in the current Dlib Magazine:
In the glare of publicity surrounding Google Book Search and other mass digitization projects focused on print culture, we should not lose sight of the small proportion of culture that publication represents, the problems of ceding its control to a private firm, Google's unfortunately incendiary approach to intellectual property, the poor quality of the digital capture we have seen to date, the limits of search and presentation as performed in this one service and the restriction that Google applies to other potential value-added uses, or the significant problem of cultural bias exacerbated by Google's advertising business model. Ian Wilson calls our attention to five principles enumerated by national librarians of la francophonie meeting in Paris on February 28, 2006: free access to publicly owned resources; non-exclusive agreements with content providers; capture of preservation standard images with assurances for long-term accessibility; protection of the integrity of original source materials; and provision of multi-lingual, multi-cultural access [17]. Jean-Noël Jeanneney has done us all a service by reminding us to look under the hood and hold Google, and those providing content to it, accountable. In the two years since Google first announced its ambitions, I think the D-Lib community has largely given Google the benefit of the doubt; now that some results are visible and the implications are more clear, I think it's time to publicly endorse open access to rights-cleared, high quality, scanned page images and reconsider the appropriate roles for academic and public institutions participating in commercial analogue heritage conversion efforts that don't contribute to this end. [Favid Bearman. Jean-Noel Jeanneney's Critique of Google: Private Sector Book Digitization and Digital Library Policy]
Here is Tim O'Reilly:
Three things ought to happen to speed up the development of the book search ecosystem:
Book search engines ought to search publishers' content repositories, rather than trying to create their own repository for works that are already in electronic format. Search engines should be switchboards, not repositories.
Book search engines that are scanning out of print works in order to create a search index ought to open their archives to their competitors' crawlers, so readers can enjoy a single integrated book search experience. (Don't fight the internet!)
Libraries struggle with how to present multiple digital materials on their websites. Here is The Oldie magazine on the rewards and frustrations of using public library websites.
First, get to know the catalogue; the entire regional catalogue is available for you to look at from home. You can search for a book, tape, CD or video by author, subject or title. That in itself is not too amazing, but the best bit is that you can then reserve the book and have it delivered to the closest library to you. They will send you an email when it is there, and off you go. Excellent. You can renew your books online, too. Secondly (and this is the fantastic bargain I mentioned), most libraries now offer the most wonderful opportunities for using the finest reference books online. It's an astonishing resource, but none of them, in my experience, promote it for what it is: the best new service to their customers for years and years....
This is a remarkable deal, and it should be headline news. But most libraries seem shy about it, and make the service hard to find on their website, sometimes very hard. It's a disgrace.
For example, Norfolk call the section `Online Subscriptions', Essex call it `Answers direct', Manchester has `24-hour library' and Leeds only mention it in passing on their catalogue page. My own library (Suffolk) has it hidden away under a small link called `Cyber-library', of all things. [The Oldie - SUPERBYWAYS]
Collections of personal papers are important areas of interest for libraries and archives, and for the scholars and students that use them. In the last few weeks, several examples of digital - or digitized - personal papers have come over my horizon: