April 06, 2008
•
Categories:
Featured
, Social networking
More of a linked list of other people's thoughts ... about egos and objects. I quote some pieces below: all of the posts are suggestive and worth reading. The linking theme is that people connect and share themselves through 'social objects', pictures, books, or other shared interests, and that successful social networks are those which form around such social objects.
Here is Fred Stutzman in a post which contrasts ego-centric and object-centric social networks. Flickr or Librarything are object-centric networks, while Facebook is an ego-centric one.
In a post I wrote exploring the network effect multiplier, the value proposition of object-centric social networks is described. Object-centric social networks offer core value, which is multiplied by network value. A great photo-hosting service like Flickr stands alone without the network, making it less susceptible to migration. An ego-centic network, on the other hand, has limited core-value - it's value is largely in the network - making it highly susceptible to migration. We see this with Myspace: individuals lose little in terms of affordances when they migrate from Myspace to Facebook, making the main chore of migration network-reestablishment, a chore made ever-simpler as the migration cascade continues. [Unit Structures: Social Network Transitions]
In a much discussed post, Jyri Engestrom of Jaiku talks about the importance of objects in mediating connections between people. He talks about the "'social just means people' fallacy", suggesting that FOAF, for example, will not work because it tries to connect people to people without representing the objects around which they connect.
Russell's disappointment in LinkedIn implies that the term 'social networking' makes little sense if we leave out the objects that mediate the ties between people. Think about the object as the reason why people affiliate with each specific other and not just anyone. For instance, if the object is a job, it will connect me to one set of people whereas a date will link me to a radically different group. This is common sense but unfortunately it's not included in the image of the network diagram that most people imagine when they hear the term 'social network.' The fallacy is to think that social networks are just made up of people. They're not; social networks consist of people who are connected by a shared object. [zengestrom.com: Why some social network services work and others don't Or: the case for object-centered sociality]
Here is a report of a talk by Jyri Engestrom where he talks about five key principles involved in a successful social network built around objects.
- You should be able to define the social object your service is built around
- Define your verbs that your users perform on the objects. For instance, eBay has buy and sell buttons. It's clear what the site is for.
- How can people share the objects?
- Turn invitations into gifts
- Charge the publishers, not the spectators. He learned this from Joi Ito. There will be a day when people don't pay to download or consume music but the opportunity to publish their playlists online.
[NMKForum07: Jyri of Jaiku. Strange Attractor: Picking out patterns in the chaos]
These thoughts are picked up interestingly by Hugh MacLeod (of gapingvoid fame). He suggests that sometimes he will use 'sharing device' rather than 'social object' in conversation. Social networks are built around social objects, he suggests, not the other way around; the objects are nodes which appear before the network, and around which it forms.
5. Yesterday at the Darden talk I explained why geeks have become so important to marketing. My definition of a geek is, "Somebody who socializes via objects." When you think about it, we're all geeks. Because we're all enthusiastic about something outside ourselves. For me, it's marketing and cartooning. for others, it could be cellphones or Scotch Whisky or Apple computers or NASCAR or the Boston Red Sox or Bhuddism. All these act as Social Objects within a social network of people who care passionately about the stuff. Whatever industry you are in, there's somebody who is geeked out about your product category. They are using your product [or a competitor's product] as a Social Object. If you don't understand how the geeks are socializing- connecting to other people- via your product, then you don't actually have a marketing plan. Heck, you probably don't have a viable business plan. [gapingvoid: "cartoons drawn on the back of business cards": more thoughts on social objects]
John Breslin picks up the theme in practical terms and has some pictures which try to show this 'decentralized me'.
I’ve extended my previous picture showing a person being linked across communities to this idea of people (via their user profiles) being connected by the content they create together, co-annotate, or for which they use similar annotations. Bob and Carol are connected via bookmarked URLs that they both have annotated and also through events that they are both attending, and Alice and Bob are using similar tags and are subscribed to the same blogs. [T-SIOC, object-centered sociality at Cloudlands]
And a final quote from Hugh MacLeod.
14. The most important word on the internet is not "Search". The most important word on the internet is "Share". Sharing is the driver. Sharing is the DNA. We use Social Objects to share ourselves with other people. We're primates. we like to groom each other. It's in our nature. [gapingvoid: "cartoons drawn on the back of business cards": more thoughts on social objects]
March 02, 2008
•
Categories:
Featured
, General - distributed environments
, Libraries - distributed environments
, Libraries - organization and services
, Social networking
I find Web 2.0 increasingly confusing as a label; no surprise there. This is not just because of its essential vagueness, but because I think it tends to be used in a couple of very different ways. Where this happens there is bound to be some confusion. Schematically, I will use the labels 'diffusion' and 'concentration' for these two ways.
diffusion is probably the more dominant of the two. Here it covers a range of tools and techniques which create richer connectivity between people, applications and data; which support writers as well as readers; which provide richer presentation environments. What tends to get discussed here are blogs and wikis; RSS; social networking; crowdsourcing of content; websites made programmable through web services and simple APIs; simple service composition environments; Ajax, flex, silverlight; and so on.
concentration is a major characteristic of our network experience, which often involves major gravitational hubs (google, amazon, flickr, facebook, propertyfinder.com). These concentrate data, users (as providers and consumers), and communications and computational capacity. They build value by collaboratively sourcing the creation of powerful data assets with their users. The value grows with the reinforcing property of network effects: the more people who participate, the more valuable they become. And opening up these platforms through web services creates more network effects. These sites also mobilize usage data to reflexively adapt their services, to better target particular users or to identify design directions. Of course, these platforms are very closely controlled, and there is an interesting balance of interests between openness and control at various levels in how they manage resources (see for example my discussion of the Amazon and Google APIs).
Interestingly, if you trace Tim O'Reilly's writings on Web 2.0 since the publication of his major defining article you see an emphasis on what I have called 'concentration' come through. (See my note on an interview with Tim O'Reilly by David Weinberger, on which I draw above, and also see O'Reilly blog posts here and here.)
Now, of course 'concentration' and 'diffusion' are often complementary approaches. The major Internet hubs 'diffuse' their benefits through service and data syndication, apis, participation, etc, but their value often derives from successfully driving network effects through wide participation and consolidation of data. In fact, many of the 'diffusion' techniques work best when associated with concentrating applications. Think of tagging for example. People have incentives to tag their resources in Flickr or Librarything in ways that may not obtain in the library catalog. Scale matters in the context of the social value created in these services (of course, in these examples, folks are also tagging their own resources). You cannot simply add social networking to a site and expect it to work well. Think of all those empty forums.
Much of the library discussion of Web 2.0 is about 'diffusion', about a set of techniques for richer interaction. It is appropriate that libraries should offer an experience that is continuous with how people experience the web.
However, there is a very important way in which the library experience is not continuous with the web. It remains fragmented: it does not have the characteristics of the concentrating, gravitational hubs which characterize so much web use, and are so much a part of O'Reilly's Web 2.0. Fragmented by database boundary, by service boundary (e.g. connecting a discovery experience gracefully to a fulfillment experience through resolution), by library boundary. We are now familiar with the comparison between this fragmented experience and discovery on the web. And we are also familiar with discussion of how the library presence is weakly represented in the major network presences.
However, think also of the library management environment. Think for example of places where data needs to be concentrated to create value: aggregating user data across sites (e.g. counter data), or aggregating user created data (tags, reviews), or aggregating transactions (e.g. circulations, resolver clickthroughs). Motivations here are to drive business intelligence which allows services to be refined (e.g. how does my database usage compare to that of my peer group), to develop targeted services (people who like this, also liked that), to improve local services (e.g. add tags or reviews). These are examples where scale matters, where data may need to be concentrated above the individual library level.
And, we are seeing for fee services emerge which address this need. LibraryThing, for example, syndicates its user-generated tagging to libraries. I am not sure that ScholarlyStats provides a service which compares usage across libraries; it would be interesting to know if there were demand for such a thing.
This then touches on larger questions about sourcing decisions (in what combination of local, collaborative, and third party do libraries acquire their service capacities) and about concentration of library presence (in what combination of library or library and third party are services offered).
For example, I discussed Georgia Pines and OhioLink the other day as examples of groups of libraries collaboratively sourcing a concentrated library presence which increases their gravitational pull.
And libraries are beginning to think more seriously about sourcing services with central web presences. Think for example of the decisions made by the National Library of Australia and the Library of Congress when they chose to use Flickr for significant image projects. NLA is seeking to expand the coverage of PictureAustralia; LC is seeking to collect tags from viewers. In each case, the library wants to benefit from the concentration of users and data that Flickr has created on the web. And to suggest another example, Andy Powell has been raising some intriguing questions about how repository services should be sourced in ways that, again, map onto peoples' experience of the web: would a consolidated network level service be more motivating than a serious of institutional presences? (see here and here). Social networking or other services, he suggests, might flourish at this network level in ways that are not feasible at the institutional level.
When we discuss Web 2.0, there is a temptation to think about blogs and wikis, RSS and a Facebook application, and to stop there. There is also some useful thinking about how to expose web services or data in ways that they can be remixed into other applications. However, Web 2.0 is also about concentration, concentration of data, of users and of communications. We need also to think about how libraries reconfigure services in an environment of network level gravitational hubs, driven by network effects. This will involve greater concentration of library resources in various ways, and also - probably? - greater reliance on other web presences to deliver their services.
January 20, 2008
•
Categories:
Featured
, General - systems and technologies
, Libraries - systems and technologies
, Libraries - organization and services
, OCLC
Just as I began to see messages about the publication of Marshall Breeding's report on his survey of library perceptions of their system vendor I was reading The new economics of the BI market by Jerry Held on The Database Column blog.
He talks about consolidation within the BI (Business Intelligence) market: "After more than a dozen acquisitions made by Business Objects, Cognos, and Hyperion over the past few years, these BI tools/analytics industry leaders were themselves snapped up in a matter of months by SAP, IBM, and Oracle respectively." And he notes the earlier consolidation of the underlying database industry around Oracle, IBM and Microsoft.
Held argues that consolidation has improved the overall BI marketplace. It delivers - he suggests - economies of scale and economies of innovation (and, although he does not mention it by name, economies of scope). These 'mega-vendors' offer a range of products. For some customers, the ability to concentrate interaction with a single vendor, a single helpdesk, and a single contract, and to benefit from discounts, are important benefits. For vendors, it should be possible to remove redundant costs in administration and distribution. Competition between a small number of dominant players is good for the market.
He suggests, however, that the mega-vendors find it difficult to innovate or meet new needs; they have a very full array of products spread over a large customer base. This means that there will always be investment available to new entrants who innovate around technology or business models to meet evolving needs.
He points to open source and SaaS (software as a service) as two important business model innovations. He also provides some technology innovation examples, emphasizing performance and price improvements.
Does this map onto the process automation providers within the library community? Here are some thoughts, focusing on the US environment. (And, full disclosure, OCLC has some offerings in some of the areas I discuss below.)
There has definitely been consolidation within the classic ILS environment. This is good in principle, as the library market - not very big to begin with - has been overpopulated with vendors trying to provide a full range of products. In practice, of course, much depends on how the remaining vendors work through integration issues. We can see some potential economies of scope (as diversifying library needs can be met from a single source) and scale (as development, support and R&D are consolidated).
However, none of these vendors is very large, they operate in a small community, and they have limited organic growth opportunities in their historic core. They have moved to meet diversifying library needs with additional products. Accordingly, we have seen that process automation for the 'bought/physical collection' (the ILS) has been joined by process automation for the 'licensed collection' (metasearch, resolution,knowledge base,ERM), and the 'digital collection' (repositories). Other products have also appeared to meet more specific needs (self-service, e-reserves, ...). Recently, a new category of discovery system has emerged which pulls together institutional data (from the ILS and from repositories), and several products have appeared. Now, each vendor has a significant development challenge in creating this full array of products, and we have seen some licensing of other components (support for metasearch or knowledge base, for example). Interestingly, we have not seen these companies acquire new entrants who are also developing these newer products (more of these below).
And, although we have seem some libraries acquire pieces from different vendors this is not as widespread as one might expect for some of the reasons suggested above. There are economies in dealing with as few vendors as possible. In addition, the library community has quite a personalized relationship with its ILS vendor community which adds to the incentives to acquire various components from the same vendor.
Marshall suggests that 'dissatisfaction and concern prevail' in this marketplace. I think we can expect further consolidation, as the number of vendors here reduces to two or three, maybe with particular specialties.
What about innovation? There is some concern that there has been little innovation in the classic ILS space, which matches Held's observation. That said, we can point to Ex Libris's collaboration with Herbert Van Der Sompel around the deployment of resolution as a service as a notable instance, or experimentation with ERM. It is not surprising that as new areas have been identified we have seen a range of new entrants, sometimes emerging from within the library or academic community. See for example Serials Solutions, which aims to provide a complete approach to licensed collections. The metasearch and resolution arena has seen several companies emerge, some of whom syndicate services to other players. See for example Muse Global, Openly Informatics (now part of OCLC), WebFeat or TDnet. And more recently, as we have seen attention to better discovery environments, Aquabrowser is being deployed by some libraries.
One area where innovation has been slow is in how the library systems apparatus engages with the tools that people are increasingly using to organize their own information spaces, at the browser level, or in social bookmarking, social networking, and other network-level sites.
Business model innovation? Held mentions Open Source and SaaS (Software as a Service). We have seen two major areas of open source development. The first is in the area of repositories, where we see Fedora, Dspace, and Eprints. The effort involved in deployment here may be high. Each initiative has gone through some organizational development, looking for ways to sustain itself, and the role of grant/foundation money has been important. The second is in the ILS arena, where Koha and Evergreen are receiving a lot of attention. Koha is more widely deployed; there have been some recent high-profile commitments to Evergreen. There are also some other areas where open source solutions are in use: metasearch (e.g. Index Data, LibraryFind), text searching (e.g. Lucene, Index Data), and a recent interest in 'next generation catalog' solutions (e.g. Solr, Vufind). Index Data has been active for a while, with a strong niche presence in Z39.50 applications and text searching and metasearch offerings. One interesting development is the emerging support industry here, where Care Affiliates, Index Data, Equinox and LibLime will offer support and consultancy. It will be interesting to see how this range of activity develops in coming years. In part it will probably depend on the ability of this nascent support industry to meet mainstream library requirements for support and reliability; and in part of course on the ability to continue to develop the software.
And what about SaaS? SaaS tends to be used quite loosely. Think simply of three levels. The first is where individual instances of an application are hosted. This may save the library some costs (hardware, sysadmin) but does not really alter the service model in other ways. A second is a 'multi-tenancy' model where multiple customers may be served from the same instance, but each with their own virtual application, potentially with configuration options. This may deliver savings but there may also be service improvements. Enhancements, fixes, etc, are available to all at the same time. Serials Solutions' services might be an example here. The third level becomes more interesting where shared use of a service generates network effects. Take a hypothetical example: a supplier could more easily develop recommender systems across multiple circulation systems. An actual example appears to be provided by Aquabrowser's announcement of its MyDiscoveries feature which aims to share user contributions to the catalog across customer instances. The SaaS model has been rapidly adopted in wider contexts, and while there has been some library adoption, it is interesting that there is not a high level of discussion of the approach.
Marshall writes:
The year 2007 saw considerable upheaval in the library automation industry. To get some sense of the aftermath of the recent rounds of mergers, acquisitions, product consolidations, and to gauge interest in open source automation systems, I created and executed a survey that aims to measure the prevailing perceptions in libraries. [Perceptions 2007: an International Survey of Library Automation]
What is interesting to me is the extent to which the ecology of library process automation is richer than it was a few years ago. If we think of managing three materials workflows (bought/print, licensed/electronic, digitized/digital), and the progressive movement of libraries into the latter two, then we see that library needs are now potentially met by a wide number of players. The classic ILS vendors remain central players, but they have been joined by others.
The ILS vendors have products in all three areas, and are developing new discovery products. We have seen new entrants in the repository space (including ContentDM, now owned by OCLC) and in the licensed materials space (resolover, knowledgebase, metasearch, ERM) where a variety of products are available from a range of vendors. In this context, the collection of services within the Cambridge Information Group is interesting (Serials Solutions, Refworks, Illumina, Aquabrowser as well as other bibliographic products). And, of course, OCLC provides services also. Open source offerings have emerged to meet needs across the board.
We will definitely see more convergence alongside further new entrants. It will be interesting to see how the Open Source offerings develop, and I think that we will see some game-changing offerings in the SaaS space.
I hope Marshall repeats the survey. It would be interesting to extend its scope - if that can be done without too much loss of focus - to consider more of the wider process automation landscape.
Related entries:
Pointer to MyDiscoveries via Meredith Farkas.
January 06, 2008
•
Categories:
Books, movies and reading ...
, Featured
The much-discussed, and somewhat contested, NEA report on reading came out at around the same time as The Uncommon Reader, a fictional account by Alan Bennett of the late discovery of reading by the Queen (of England). The conjunction was discussed in the New York Times:
PERHAPS the most fantastical story of the year was not “Harry Potter and the Deathly Hallows,” but “The Uncommon Reader,” a novella by Alan Bennett that imagines the queen of England suddenly becoming a voracious reader late in life. [A Good Mystery: Why We Read - New York Times]
'Fantastical', the author, Motoko Rich suggests because: 'At a time when books appear to be waging a Sisyphean battle against the forces of MySpace, YouTube and “American Idol,” the notion that someone could move so quickly from literary indifference to devouring passion seems, sadly, far-fetched.'
“The Uncommon Reader” posits the theory that the right book at the right time can ignite a lifelong habit. (For the fictional queen, it’s Nancy Mitford’s “Pursuit of Love.”) This is a romantic ideal that persists among many a bibliophile. [A Good Mystery: Why We Read - New York Times]
This same tone is evident in the Financial Times review:
His storytelling, though, is rather less magical. By taking us into the workings of minds other than our own, Bennett argues, reading makes better people of us. This is a quaintly old-fashioned view of literature that one might find comforting had history not so comprehensively rubbished it. [FT.com / Books / Fiction - The Uncommon Reader]
I read the book when it came out and was a little puzzled by some of the emphasis of these and other reviewers. While the book does indeed celebrate the power of reading to transform the Queen's life, its main message for me was somewhat different. It is a discussion of how little of this 'literary' reading there actually is. So, I reread it over the holiday. It is a quick read ...
The Queen discovers a City of Westminster mobile library outside the kitchen doors of the palace and borrows a book. This triggers a sustained late-life reading wave. She reads quickly, passionately and in ever-increasing circles (her initial choices are guided by Hutchings, who worked in the kitchen and was in the mobile library when she came across it; he suggests books by gay authors). She soon comes to regret the many wasted years where she did not read; she is mortified when she thinks of all the authors she has met without any insight into what they wrote. And yes, the author connects her progressively more discriminating reading tastes with a general refinement of sensibilities. She becomes concerned, for example, with the bad impression she makes on a maid, something that before she would not have noticed. She wonders why, and the narrative voice suggests that she is yet to connect this "access of consideration" with her reading. She talks of books opening up "other lives" and "igniting the imagination". She rebukes her Private Secretary who wondered had she not been briefed about the authors she met: "Briefing closes down a subject, reading opens it up".
But what comes across more strongly than this personal refinement is that her new interest does not extend the range of her personal connections with others. She does not find the world hospitable to readers. Indeed, her reading becomes a barrier to engagement, not a bridge built on new shared reading interests. Sir Kevin is concerned that while not quite "elitist", reading tends to "exclude" and sends out a bad message. Not many people actually read, he suggests. He further suggests that reading is selfish, a "withdrawal", that it makes "oneself less available", and is "solipsistic". She makes people she meets uncomfortable as she asks about what they are reading; if they cannot come up with any current reading materials, she offers then whatever she has in her bag. Her staff worry about this, as "most people, poor dears, aren't reading anything". Those that receive her books in this way, they reckon, sell them on eBay. Her equerries come up with some suggestions of titles for those that otherwise might be at a loss when the queen asks about their reading. "Though this meant that the Queen came away with a disproportionate notion of the popularity of Andy McNab and the near universal affection for Joanna Trollope, no matter; at least embarrassment had been avoided." Her family approves of books, so long as they don't have to read them; they wish she did not quiz them about their reading habits or check that they had read books she had given them. As her behavior continues to change and she devotes more time to reading, she becomes somewhat perfunctory in the performance of her duties. Her staff fear the worst: "the dawn of sensibility was mistaken for the onset of senility".
This is all conveyed in a gently satirical tone. Although there are some broad swipes at the business language of Sir Kevin, and at East Anglia, New Zealand and Canada! The treatment gets sharper when other figures of authority are involved. In the opening pages, she discombobulates the president of France by wanting to talk about Genet. He is unbriefed and so unprepared. She rings the Archbishop of Canterbury wanting to talk about reading in church services; after their conversation he returns to watching Strictly Come Dancing on the TV. There are some very barbed swipes at the Prime Minister (not named, but presumably Tony Blair). He did not "wholly believe in the past or in any lessons that might be learned from it". When the queen begins giving the Prime Minister books, an unequivocal message comes back through 'channels': "yes. Lending him books to read. That's out of order". Towards the end of the book, the Queen begins to think about writing a book herself, something more 'radical' and 'challenging' she tells the Prime Minister. He is not worried, as 'radical' and 'challenging' are both words that trip off his tongue: they bleached of any meaning for him.
So, she experiences an awakening through reading, and wants to share her discoveries and pleasures with those around her. However, she runs into incomprehension, opposition and distaste.
The book's title plays on The Common Reader, and on the fact that the Queen is not a 'commoner' like her subjects. However, as I read the book, I increasingly heard something else. The Queen is also 'uncommon' because she is a reader, unlike all the others she comes in contact with. Reading turns out not to be common, in the sense in which reading is being used here; there are no common readers. Rather than being a celebration of the redemptive power of literature, this is an elegy on its demise, or at least on the demise of a particular type of reading as a common pursuit. It may be more appropriate to point to what The Uncommon Reader shares with the NEA report, than to offer it as a contrast.
Note: updated for style.
November 20, 2007
•
Categories:
Featured
, Metadata
, OCLC
Updated: 11/21/07
I have spoken about library logistics before.
Logistics is about moving information, materials and services through a network cost-effectively. Resource sharing is supported by a library logistics apparatus. The emerging e-resource discovery to delivery chain, tied together with resolution services, is a logistics challenge. Many of the e-resource management issues are like supply-chain management issues. [Lorcan Dempsey's weblog: Library logistics]
It seems to me that recent developments highlight the logistics theme. Think of the systemwide inventory management questions that are beginning to arise in relation to off site storage and mass digitization. Or the issues that arise when we connect multiple discovery environments to backend library - or other - fulfillment options.
I like the UPS slogan about synchronizing commerce. It reminds us of the central role of data in logistics and of the need for integrity of data along supply chains or other processes. I was reminded of this while reading Michael Cairns' interesting post about Booknet Canada and the Global Data Synchronization Network.
Industries other than publishing also battle data reliability and timeliness and, over the years led by umbrella groups such as UCC and EAN (now combined into one organization named GS1), they have developed programs to embrace supply chain efficiency and its' co-relation data integrity. Data Synchronisation (GDSN) is such a program which I have noted a few times in the past (Post). The objective of the GDSN is to ensure that all trading partners are working with the same set of product details that are simultaneously synchronized at a network level and in transaction details such as purchase orders and shipping details. The benefits of synchronised data can extend from 'simple' efficiency improvements in the ordering and receipt process to higher effectiveness in marketing and promotions programs. [PersonaNonData: Five Questions on Global Data Synchronization] Michael interviews Michael Tamblyn, President of Booknet Canada which is offering services based on GDSN. Among the advantages he suggests are: Then there is the more forward-looking work: collaborative sales data mining for independents, backlist optimization and forecasting research, industry cost analysis on returns, digital publishing trends, our annual Technology Forum. And on it goes. [PersonaNonData: Five Questions on Global Data Synchronization]
There is a temptation in library discussions to focus on discovery and end-user issues when thinking of bibliographic data. However, bibliographic data is increasingly important to efficient library operations more generally. Think of the blurring of circulation and resource sharing in consortial arrangements, the issues of managing and tracking print collections in the context of the mass digitization and off site storage initiatives, connections between external discovery environments and library systems, resolution and the management of knowledge bases, and so on. Systemwide data synchronization and data integrity issues are becoming more central. Increasingly we recognize that efficient management of resources imposes data needs.
Some examples: What books have been digitized by Google, etc? Is an available-for-use digitized copy of this book available more easily than getting it in 3 days on ILL. How would last copies be registered and curated within a systemwide framework (Ohio, for example, or the UK, or ...)? Can I let a user make an optimum request based on price/speed of delivery balance? Can I do recommender systems across aggregate circulation data, or aggregate resolution data? Can I develop core collection recommendations based on aggregate holdings data? Can I make selection decisions based on a view of what my regional partners are selecting? Can I begin to do some modelling of collections based on the aggregate holdings of off site storage facilities. Can I receive collection development recommendations based on my users' use of Google Scholar? Can I be assured that my users will be linked correctly - and as seamlessly as possible - into my collections from Google, or Worldcat.org, or a growing range of other potential discovery venues? Can I make collection development decisions based on aggregate Counter data?
There is an earlier discussion of some similar data issues by my colleagues and me in a Library Journal article: Making data work harder.
Related entries:
October 28, 2007
•
Categories:
Digital asset management
, Featured
, Research, learning and scholarly communication
, Social networking
, The cultural and scholarly record
Here is Grainne Conole, professor of e-learning at the Open University writing about academic papers, conference papers, and blogging:
Coming back to the question of which represents academic discourse – to my mind it’s all three – in different ways writing a paper, giving a presentation and blogging all help me to formulate and take forward my thinking on a particular topic, a means of meaning making and transformation of the raw ‘data’ to new understandings – surely that’s one of the cornerstones of what being an academic means? [e4innovation.com]
And here is how she distinguishes between those modes of academic disclosure:
So the function and nature of the three media seems to be: - Academic paper: reporting of findings against a particular narrative, grounded in the literature and related work; style – formal, academic-speak
- Conference presentation: awareness raising of the work, posing questions and issues about the work, style – entertaining, visual, informal
- Blogging – snippets of the work, reflecting on particular issues, style – short, informal, reflective
[e4innovation.com]
Here is Dani Rodrik, a Professor of International Political Economy at Harvard, commenting on an earlier post of his where he queried whether the high opportunity costs of blogging (think of all those other things that could get done if you did not use the time blogging!) would drive out high quality economics blogs. No, he concludes:
And second, in my trip to Nottingham I was simply stunned by how many people reported reading my blog. Not only that, people actually remembered my posts--some going quite a while back. With this kind of positive feedback, along with others like this, it is hard to imagine closing the operation down. Not so incidentally, one of the unexpected scholarly benefits of having a blog is that it is like keeping an intellectual journal. You get an idea, you jot it down in your blog. Some months later, you vaguely remember having had the idea and you google your own blog to recover it. I am not kidding: I google my own blog all the time... And here is the evidence: the first third of my talk at Nottingham was based on a couple of blog posts from a few weeks back (this and this). So maybe that someone also over-stated the bit about opportunity costs...[Dani Rodrik's weblog]
It is interesting to see them both discuss blogging as an integral part of their academic lives. And their blogging is an important record of thinking about the academic problems they address. And an indication of their academic networks.
I regularly look at the blogs of several folks from the Open University: Tony Hirst's, John Naughton's, and now Grainne's (with whom I used to interact years ago when she was director of ILRT and I of UKOLN). I will occasionally land on Martin Weller's and am peripherally aware of Marc Eisenstadt's.
Ever since my (economist) colleague Brian Lavoie introduced me to Greg Mankiw's blog, I have intermittently followed it, as well as Rodrik's. They occasionally refer to their colleague George Borjas's blog, another Harvard economics professor. Of course there are some pretty high profile economics blogs, including blogs from the Freakonomics authors and, recently, Paul Krugman, both hosted by the New York Times. And there is the prolific Gary Becker, Nobel prize winning economist, at the Becker-Posner blog. I have found Mankiw and Rodrik interesting because of the general mix of light material, commentary on theirs' and their colleagues' work, and their high-level and engaged policy perspectives. The general nature of the blog discourse, to borrow Grainne's word, in that community is absorbing to watch.
Rodrik notes that his blog material appears to have enduring appeal for colleagues. Indeed, the intrinsic interest of the blog output of both the Open University and the Harvard bloggers, and its relation to their academic work, and their broader communities of interest, means that this is probably more generally true.
The blogging platforms used by these people vary. Sometimes they may be institutionally based, more often they will be on one of the main blog hosting sites. While they may be of enduring interest, little thought has probably been given to thinking about their longer term persistence.
Which brings me to my question. Universities and university libraries are recognizing that they have some responsibility to the curation of the intellectual outputs of their academics and students. So far, this has not generally extended to thinking about blogs. What, if anything, should the Open University or Harvard be doing to make sure that this valuable discourse is available to future readers as part of the scholarly record?
July 06, 2007
•
Categories:
Featured
, Learning and research - distributed environments
, Libraries - systems and technologies
, Libraries - distributed environments
, Libraries - organization and services
, User experience
One of the main issues facing libraries as they work to create richer user services is the complexity of their systems environment. Consider these pictures which I have been using in presentations for a while now.

Reductively, we can think of three classes of systems - (1) the classic ILS focused on 'bought' materials, (2) the emerging systems framework around licensed collections, and (3) potentially several repository systems for 'digital' resources. Of course, there are other pieces but I will focus on these.
In each case what we see is a backend apparatus for managing collections, each with its own workflow, systems and organizational support. And each with its own - different - front-end presentation and discovery mechanisms. What this means is that the front-end presentation mirrors the organizational development over time of the library backend systems, rather than the expectations or behaviors of the users.
You have the catalog here, maybe several options for licensed resources (a-to-z, metasearch, web pages of databases, and so on) over there, and potentially several repository interfaces (local digitized materials, institutional repository) somewhere else.
This is one reason that people have difficulties with the library website. Effectively, it is a layer stretched over a set of systems and services which were not designed as a unit. Indeed, in some cases, they were not originally designed to work on the web at all. So what do we have?
ILS: a management system for inventory control of the 'bought' collection (books, DVDs, etc). The catalog is bolted onto this and gives a view onto this part of the collection. In effect, in virtue of its integration with inventory management, the catalog provides discovery (what is in the collection), location (where those things are) and request (get me those things) in a tightly integrated way. The ILS and catalog may be part of a wider apparatus of provision, and may have mechanisms for interfacing to resource sharing systems of one sort or another. The management side may have interfaces to a variety of other systems for sharing and communicating data: procurement, finance, student records. And there will be a flow of data into the system, from jobbers, as part of a shared cataloging environment, and so on.
Licensed: This has been an area of rapid recent development as the journal literature moved to electronic form. On the backend we now see a variety of approaches, and the frontend can be very confusing with lists of databases and journals presented in various ways, often in uncertain relation to the catalog (where do I look for something?). We are now seeing the emergence here of an agreed set of systems around knowledge-base, ERM, resolution and metasearch, and there is rapidly developing vendor support. This is the range of approaches for which Serials Solutions has proposed the ERAMS name. These systems require the management of new kinds of data, and mechanisms are being put in place, certainly not yet optimal, for the creation, propagation and sharing of this data. With journals data, discovery, location and request are not so tightly coupled as they were with the catalog. Discovery has happened in one set of tools (A&I databases), but then the appropriate title may have to be located in another tool (the catalog for example) and, if not available locally, requested through yet another system. The importance of the resolver, and the enabling OpenURL, has been to tie some of these things together and remove some of the human labor of making connections between these systems. And metasearch has been seen as a way of reducing human labor by providing a unified discovery experience over disparate databases. However, this whole apparatus is still not as as well-seamed as it needs to be, and users and managers still do more work than they should to make it all work.
Repository: Libraries are increasingly managing digital materials locally and supporting repository frameworks for those. This includes digitized special collections, research and learning materials in institutional repositories, web archives, and so on. There are a variety of repository solutions available, some open source. Typically, the contents of the repository backend may be available to repository front-ends on a per-repository basis. Here, discovery (what is there), location (where is it) and request and delivery are typically tightly integrated. Repositories may also have interfaces for harvesting or remote query. On the management side, metadata creation and material preparation may still be labor-intensive.
OK, so here are some general observations about this environment: - There is still a major focus - in terms of attention, organizational structures, and resource allocation - on the systems and processes around the ILS and the bought collection. In academic libraries, we will surely see some of this move towards the systems and processes around the licensed collections given the rising relative importance of this part of the collection. The repository strand of activity, associated with emerging digital library activities, may, in some cases, be supported from grant or other special resources. It will need to become more routine.
- The fragmentation of this systems activity, the multiple vendor sources, the different workflows and data management processes, and the absence of agreed simple links between things mean that the overall cost of management is high.
There is also another cost: diminished impact and lost opportunity. The awkward disjointedness described above also means that it is difficult to mobilize the consolidated library resource into other environments, course management or social networking systems for example. It is difficult to flexibly put what is wanted where it is wanted.- There has been much discussion of library interoperability, but it has tended to be about how to tie together these individual pieces, or about tying pieces to other environments (how do I get my repository harvested for example). There has been less focus on how you might abstract the full library experience for consumption by other applications - a campus portal for example.
This in turn means several things. - We will see more hosted and shared solutions emerge, which offer to reduce local cost of ownership. And, of course, we are seeing vendors consider more integration between products. In particular it is interesting seeing the concentration on support for the licensed e-resources emerge strongly, as well as discussion about integrated discovery environments.
- Over time, we can expect to see some more reconfiguration in a network environment. Shared cataloging and externalizing the journal literature have been two significant reconfigurations in the past. The pace of current developments suggest that we may be ready for other ways of collaboratively sourcing shared operations. For example, does it make sense for there to be library by library solutions for preservation, social networking, disclosure to search and social networking engines, and so on.
The next picture tries to capture an important direction that has emerged in the last year or so.

For many of the reasons identified above, we are seeing a growing interest in separating the discovery and presentation front end from the management backend across this range of systems. Why? Well, because it is becoming clearer as I suggested in my opening that legacy system boundaries do not effectively map user preferences. And because fragmentation adds to effort and accordingly diminishes impact.
What about the discovery side? So, we saw metasearch, a partial response to fragmentation of A&I databases. We are now seeing a new generation of products from the 'ILS vendors' which look at unifying access to the library collection: Encore, Primo, Enterprise Portal Solution. However, discovery has also moved to the network level. So, folks discover resources in Amazon, Google, Google Scholar. And OCLC is working to create discovery experiences which connect local and network through Worldcat Local, Worldcat.org and Open Worldcat.
And on the management side? Here the variety of workflows and systems adds cost, as resources are managed on a per-format basis. We can expect to see simplification and rationalization in coming years as libraries cannot sustain expensive diversity of management systems. The National Library of Australia's discussion of a 'single business' systems environment, or Ex Libris's discussion of Uniform Resource Management are relevant here. It is likely that there will be a growing investment in collaboratively sourced solutions, as libraries seek to share the costs of development and deployment.
As discovery peels off, then the issue of connecting discovery environments back to resources themselves becomes very important. It is interesting to look at Google Scholar in this regard, as different approaches are required for the three categories identified above. It has worked with OCLC and other union catalogs to connect users through to catalogs and the ILS; it has worked with resolver data to connect users through to licensed materials; and it has crawled repositories and links directly to digital content.
Given this great divide, several issues become very important: - Routing, resolution and registries become critical, as one wants to enable users to move easily from a variety of discovery environments to resources they are authorized to use. We need a richer apparatus to support this. (I have discussed the role of registries elsewhere.)
- Libraries have thought about discovery. There is now a switch of emphasis to disclosure: libraries need to think about how their resources are best represented in discovery environments which they don't manage. (I have also discussed disclosure in more detail elsewhere in these pages.)
- And again, how we present library services for consumption by other environments becomes an issue. For example, we are lacking an ILS Service Layer, an agreed way of presenting the functionality of the ILS so that it can be placed, say, in another discovery environment (shelf status, place a hold, etc).
- Better discovery puts more pressure on delivery, whether from a local collection, throughout a consortium, or in broader resource sharing or purchase options. Streamlining the logistics of delivery and providing transparency on status at any stage for the user (as they can do with UPS or Amazon) become more important.

And finally ....
We are used to thinking about better integration of library services. But that is a means, not an end. The end is the enhancement of research, learning and personal development. I discussed above how we want resources to be represented in various discovery environments. Increasingly, we want to represent resources in a variety of other workflows. These might be the personal digital environments that we are creating around RSS aggregators, toolbars and so on. Or the prefabricated institutional environments such as the course management system or the campus portal. Or emerging service composition environments like Facebook or iGoogle. As well as in network level discovery environments like Google or Amazon that are so much a part of people's behaviors.
Libraries need to focus more attention on reconfiguring library services for network environments. This is the main reason for streamlining the backend management systems environment. It does not make sense to spend so much time on non-value creating effort.
Related entries:
June 06, 2007
•
Categories:
Books, movies and reading ...
, Digital asset management
, Featured
, Knowledge organization and representation
, Libraries - organization and services
, Metadata
, OCLC
, Search
, ebooks and other e-resources
Today Google and CIC announce an agreement to digitize ten million volumes across the CIC libraries. Google has been adding new partners since the first announcement was made about the Google 5. Some folks have wondered what rationale has governed selection of partner opportunities. We do not know, but they sure are moving fast! Here are some early thoughts.
The CIC announcement is interesting for several reasons: - It is a shared effort across a major group of libraries with significant collections. There appears to be strong CIC institutional commitment. Of course, CIC has a history of collaboratively sourced activities and this 'pooling' model makes increasing sense given the necessary policy and service challenges that need to be addressed. In this case, but also across a range of other issues that libraries face as they support changing research and learning behaviors in a reconfigured network environment. For some things, scale matters.
- The libraries have a shared approach to managing the digital copies based on shared infrastructure at the University of Michigan, and serving them up to their user communities. An example of collaborative sourcing.
- Google recently advertized for somebody to work on collection development and we seem to be seeing a stronger focus in this area. Collecting areas of importance within each library [pdf] have been identified for attention. Presumably, these decisions have been influenced by the 'collective collection' of the full Google parnership also.
This initiative in turn prompts some more general thoughts about access: - One of the most valuable features of the Google initiative is that it digitizes book content, allowing fine-grained discovery over topics, people, places and so on. Of course this presents interesting questions about indexing, retrieval, ranking, and presentation but the advantage of having this access seems clear. It drives use and sales, and it supports enquiry. Without it, the book literature is less accessible than the web literature.
- However, as we are beginning to see on Google Book Search, we are really going beyond 'retrieval as we have known it' in significant ways. Google is mining its assembled resources - in Scholar, in web pages, in books - to create relationships between items and to identify people and places. So we are seeing related editions pulled together, items associated with reviews, items associated with items to which they refer, and so on. As the mass of material grows and as approaches are refined this service will get better. And it will get better in ways that are very difficult for other parties to emulate.
- Currently this material is made available within the Google destination site. Google is an advertizing engine and its approach depends on aggregating attention for adverts. This apporach may be difficult to deploy within a more 'data services' approach where others - especially the partners - have remixable access to content and services. However, the 'utility' value of this resource will be diminished if it is not made available in this way so that others can mobilize these resource within their own environments. How and if this gets done remains to be seen. (See the related discussion about the search API.)
- This type of access seems especially important for the partner libraries. In the early days of this activity there was some discussion of the types of services which would be built on top of the digitized books by the libraries. However, it is difficult, and maybe not very sensible, for the libraries to individually invest in some types of service development. An important factor here is that they cannot benefit from the network effects tha
May 20, 2007
•
Categories:
Featured
, Knowledge organization and representation
, Libraries - organization and services
, Metadata
, Social networking
I think it is useful to think of four sources of descriptive metadata in libraries. These are not mutually exclusive, and one of the interesting questions we have to address is how they will be mobilized effectively together.
I don't have good names for these. How about: professional, contributed, programmatically promoted, and intentional?
Professional
The curatorial professions have made major investments in knowledge organization, through the development and application of cataloging rules, controlled vocabularies, authorities, gazetteers, and so on. One of our major challenges is releasing the value that has been created through those approaches in web environments. There is much to think about here, and many folks are thinking about it. Currently, these approaches do not tend to work well across silos, they are not made available as web resources themselves so that they can be part of the connected fabric of the web, they only work with the other approaches I mention in particular projects or services, their 'relating' power is underused, and higher level services based on data mining or statistical analysis are limited. Now, these types of issues are being addressed, but are some way from routine systemwide application. I believe that these approaches will continue, within a reconfigured system, and we need to make that data work harder. My personal view is that the curatorial professions need to invest more in the shared production of resources which identify and describe authors, subjects, places, time periods, and works.
Contributed
A major phenomenon of recent years has been the emergence of many sites which invite, aggregate and mine data contributed by users, and mobilize that data to rank, recommend and relate resources. These include, for example, Flickr, LibraryThing, and Connotea. These services have a different focus, and create real value in the way that they organize resources. They also have value in that they reveal relations between people. Libraries have begun to experiment with these approaches, but individual libraries may not have the scale to iron out local or personal idiosyncrasy or emphasis. This is another area which lends itself to shared attention. There are real advantages to be gained. So, for example, as we digitize photographic and other community collections, we will want to mobilize knowledge about those collections that does not exist within the library. Or, if you think about a service like Worldcat Identities, at some stage we will want to allow those 'identities' themselves to comment, augment, amend. What this means is that we will have to get rather more sophisticated about managing assertions about resources from different sources.
Programmatically promoted
We are handling more digital materials, where it is possible to programmatically identify and promote metadata from resources themselves or groups of resources. We will also do more to mine collections, including collections of metadata, to discern pattern and relations. We are increasingly familiar with clustering, entity identification, automatic classification and other approaches. Look at the home page for books that Google is creating to see a resource created from mining Scholar, Google Book Search, and big Google to deliver a range of related materials.
Intentional
I have used this term to refer to the data that we are collecting about use and usage. Pagerank is based on aggregate linking choices. Amazon recommendations are based on aggregate purchase choices. We use holdings data in ranking algorithms, which aggregates selection choices of libraries. This type of data has emerged as a central factor in the major web presences as they seek to provide useful paths through massive amounts of data.
To repeat, these approaches are not mutually exclusive and will increasingly be deployed alongside each other. For example, authority lists may support programmatic identification of personal or place names in large text resources. The shared interests revealed in social networking applications may be abstracted into a form of intentional data to drive recommendations or 'related work' services. Patterns of association and interaction will develop between tags and subject headings. And so on.
Much of our discussion pits these approaches against each other. This seems like the wrong approach. Clearly there will always be choices about where one invests effort, especially as the network continues to reconfigure what we do, but the starting point should be how we create better services and what approaches support that, and not a 'techeological' position around one or other approach which confuses ideology and technology.
Related entries:
April 16, 2007
•
Categories:
Books, movies and reading ...
, Featured
, Research, learning and scholarly communication
, User experience
, ebooks and other e-resources
The Google digitization of books appears to have caught the public imagination. Recent weeks have seen high profile articles in The Economist [subscription required]and The New Yorker as well as several newspaper pieces (see the links and response on this OUP blog entry for example).
Google Book Search is a major endeavor, and Google have brought an impressive service online with impressive speed. The media stories tend to have different hooks. Inevitably, some pivot on a description of confrontation between publishers and Google; others discuss it in the context of a general digital turn, or the future of the book.
In some of the more reflective discussion I am interested to see a particular strand emerge. And that is the acknowledgement that the book, in its material form, is itself a designed and evolved technology, rather than a permanent or unchanging feature of our experience. Simply, this may involve talking about the 'technology' of the book. Or it may take more elaborate form.
Of course, the material book - its technology, circulation, reception, institutions - is a strong if diffuse field of enquiry. However, now, it is as if the change in perspective brought about by the digital turn has made the technology of the book more popularly visible and discussed just as that, as a particular technology which can be compared to others. And, having become used to talking about the impact on practice and potential of new technologies, we may now use that language to also describe earlier forms and their impact.
Seeing it in this way reinforces an awareness that the book itself, the codex, represents particular technological choices which in turn have influenced how we create and engage with the intellectual and cultural record, and in turn with broader experience and intellectual development.
This for example, comes from a recent discussion of copyright and book digitization by the writer John Lanchester. Incidentally, it is encouraging to see a piece which is so appreciative of a library and library staff. He talks about the technology of the library and of the book. The buildings of the Bodleian are so old, and in their golden Cotswold stone so beautiful, that it is easy not to see how insistently modern an institution the library has tended to be. The very beginnings of the collection, in Duke Humfrey's Library above the divinity school, showed how Thomas Bodley's own bibliographic vision had to react to a technological shift. The new collection was built to accommodate the transition from the long-established, tried-and-tested technology of unique handwritten texts to the hot new mass-produced technology of the printed codex: in other words, the book. Duke Humfrey's Library has high stacks of shelves, which the reader can't directly access: the world's first closed stacks. These were designed to accommodate the increasing number of books too small to chain securely to open shelves, and were an important repository of copies from the Stationers' Company. Issues of copyright and of access to information were thus built into the institutional DNA from the start. The very layout of the buildings, with teaching "schools" tucked in the corners of the quadrangle, reflected new ideas about the connection between the library as a repository of information and the university as a place of instruction.[John Lanchester: Who owns what in the digital age? | News | Guardian Unlimited Books]
And after a marvellous description of the technology of delivery from the stacks, he concludes that "it is impossible not to miss the point: a library is a machine for storing and retrieving information". Later in the piece he quotes Richard Ovenden, of the Bodleain:
"The codex was a technological leap. It works very well, has done so for 2,000 years, and still does so - people still find it very easy to use. What digitisation does is to highlight that."
Here is another example, following nicely from the last comment. The origins of the codex were discussed recently in The New York Review of Books by Eamon Duffy, in a review of two books on the role of the book in early church history [available to subscribers or for purchase]. Here is his opening paragraph:
These two books are built on a single perception. Early Christianity was more than a new religion: it brought with it a revolutionary shift in the information technology of the ancient world. That shift was to have implications for the cultural history of the world over the next rwo millennia at least as momentous as the invention of the Internet seems likely to have for the future. Like Judaism before it and Islam after it, Christianity is often described as "a religion of the Book." The phrase asserts both an abstraction - the centrality of authoritative sacred texts and their interpretation within the three Abrahamic religions - and also a simple concrete fact - the importance of a material object, the book, in the history and practice of all three traditions.
Note how he talks about the book as a shift in information technology, and makes the comparison with the impact of the Internet explicit. In a fascinating piece, he goes on to discuss the practical and political reasons why the codex was favored over the scroll in early church writing. In the context of my point here, consider his later references to technology.
Why should the new religion have adopted this down-market and unfashionable book technology? ... However that may be, until recently surprisingly little has been made of this momentous foundational shift to a new book technology.
I think that this terminology is symptomatic of a positive trend, a recognition that the book itself, while central, influential and marvelously adapted to various uses, is not some natural given. It is another sign that we are moving beyond the reductive opposition between the book and the digital turn.
August 20, 2006
•
Categories:
Featured
, General - distributed environments
, Libraries - systems and technologies
, Libraries - distributed environments
, Metadata
, OCLC
Interaction between systems and services on the network requires intelligence. Intelligence about what is in the environment (search or resolution targets, for example), about how to interact with found entities (addresses or interface specifications, for example), about who is authorized to do what, and so on.
Think of two parallels. One for human users: the yellow pages directory provides intelligence about businesses, the services they provide, and how to access those services (by telephone or at a physical address). One for machines: the DNS allows an address to be resolved into locations. In each case, a 'network-level' service reduces the need for extensive redundant collection and management of 'intelligence' by each potential user of services within the network.
Think of what the network world would be without the DNS. The burden of data collection and configuration on each organization would be that much greater, and the overall efficiency of the network would be much reduced.
But this is exactly the situation we are in with higher level network services where we have no such directory services. Increasingly, library applications need to know about a variety of entities. We are used to thinking about information objects (books, journals, maps, etc). What about institutions (suppliers, libraries, etc), policies (e.g. ILL policies), licenses, collections (databases, special collections, summary level descriptions of archival collections, and so on), and services (addresses and interface details for machine users, and descriptions for human users)? The absence of appropriate directory services for each of these reduces the efficiency of the network. We have an extensive infrastructure to allow us to discover and use information objects, and we are currently figuring out how that needs to be re-engineered for more effective use in a network world. However, we are very poorly equipped in the other areas. This means that there is a lot of local configuration and redundant effort in making certain applications work.
This is partly a discussion about metadata. We are very familiar with the notion of 'metadata'. I like to think of metadata as data which removes from a user (human or machine) the need to have full advance knowledge of the existence or characteristics of things of potential interest in the environment.
So, a catalog record notes the existence of an item. Additional metadata may provide a location. Additional metadata may say something about terms under which it is available. We are now in an environment where we are really interested in metadata about many more 'things'. A metasearch engine will need metadata about the targets it can interact with, and that metadata will be of various types. This is a form of metadata about 'services'. We want this for humans and for machines. And we need metadata for all of the entities mentioned above. In fact, as our networked environment becomes richer so does the need to provide metadata about the entities in that environment.
However, we also need to think about how metadata is made available in useful ways. It needs to be acted upon in the appropriate domain. At the moment, we have metadata about all the entities I mention above, and others. It is scattered across many systems and services. It may be hardwired into particular applications and not be more generally available. Think of metasearch, for example. Each metasearch application will need to be configured with the data - the intelligence - needed to locate and to connect to available targets.
This is where discussions about directories or registries come in. In many cases there are potential advantages in lifting this configuration data out and consolidating it in shared registries. In this way, each application does not have to know in advance what is available to it and how it should interact with it. Of course, we are very familiar with this principle from the directory examples given above. Think of the phone. We may keep a local list of numbers, but we can derive some of it from directories, and we can always look to several directories where we do not have the required number to hand. A shared directory or registry removes a lot of confusion and redundant effort.
An example of such a registry in our space is the OCLC OpenURL resolver registry, which is beginning to be used in interesting ways. Here is Dan Chudnov's description of the registry: The OCLC OpenURL Resolver Registry comprises records for roughly 1000 OpenURL resolvers at various institutions, mostly but not solely in North America. It also provides a simple web service that takes an IP address as a parameter and returns zero-to-many resolver records for every resolver that serves users coming from that IP address. [A Clean, Well-Linked 'Base (or, Solving the "Appropriate Resolver" Problem with the OCLC Resolver Registry) | One Big Library.] And here is what he wants to use it for: What does that mean? If you're like me, and you work for a small service like the Canary Database, you used to be essentially unable to provide user-appropriate OpenURL linking without having to configure many many ranges of IP addresses after many many conversations with librarians. "Used to be," that is. [A Clean, Well-Linked 'Base (or, Solving the "Appropriate Resolver" Problem with the OCLC Resolver Registry) | One Big Library.] Dan goes on to describe the approach. You can see it in action by going to the Canary Database. And this is how the functionality is described to users: The Canary Database now attempts to create links to library full text link servers (known in libraries as “OpenURL resolvers”) for many hundreds of libraries. If you’re using the Canary Database from an academic campus, there’s a good chance you’ll see links from articles in our database back to your own library’s online journals. Follow these links to get to full text just like you would any other time you see the link buttons from your library! [Canary Database Project News » New features: Full text article links] The registry is used in association with the OpenURL Gateway to connect to appropriate resolvers. Ross Singer also talks about how he is using the registry.
What the registry gives you in each case is the ability to direct visitors from citations on one site to their institutional resolver on another site, where it exists. This is quite nice. The registry is used to determine which service to point a user at, and its use avoids the need for local configuration. The referring site does not have to have advance knowledge of all the places from which it will be visited. And the target site does not have to notify all potential referrers of its existence. From the referring site point of view, this adds value to the site by connecting a 'discovery' service in one place to the appropriate 'location' service in another. From the target site point of view, it means that it can mobilize other people's discovery environments to bring people back to their services.
The registry provides the intelligence which makes this happen. In this case it associates IP ranges with the metadata required to access the relevant resolver. My development colleague Phil Norman tells me that the Registry API will also accept an OCLC symbol, and in the future may accept other inputs (a geographic code for example).
This is a new way of working and it is not without its issues, some of which Dan addresses. Of course, an immediate issue is that not all visitors will be from institutions with OpenURL resolvers. Or, related, not all visitors who are from institutions with resolvers will come in from an IP address associated with their institution. And the referring site does not know in advance if the target site will hold a copy of the item. These and other issues present interesting questions about interaction design in a distributed environment where control is passed between systems.
Incidentally, the Registry is also used by Openly's OpenURL Referrer, which "is a Firefox browser extension that can take certain kinds of citations on the web and convert them to direct links to the cited resource in one of your local library's databases". This works with Google Scholar and with Coins-enabled sites.
These are small examples of how one type of registry can add value. Registry services will need to get more common if we are going to have efficient interactions within networks of library providers and consumers.
Related entries:
Update: edited for clarity. Related entries added.
August 08, 2006
•
Categories:
|