Lorcan Dempsey's weblog On libraries, services and networks.   

Knowledge organization and representation

Tagging at the network level

 •  Categories: Digital asset management , Knowledge organization and representation , ebooks and other e-resources

There is a fascinating entry by Seb Chan of the Powerhouse Museum in Sydney documenting experiences one month into their participation in the Commons on Flickr. The Powerhouse Museum has been alert to various ways of combining professional and audience metadata in its services. It was an early comer to the Commons, joining the Library of Congress.

Our experiment with the Commons on Flickr continues and barring a few hours delay we have managed to keep to our promise of 50 new images a week. We’re up to 400 images now with the most recent 50 going live this morning. 158 of these have been geotagged. [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]

A couple of things struck me about his note. First, the volume of activity:

... our images have been viewed 39,685 times to yesterday. That’s more than an entire year on the old Tyrrell website (which, incidentally, has more images and is better indexed by Google) [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]

And second, he talks about the volume and quality of tagging activity:

Tonnes of tags have been added and they have been of a quality that we’ve not experienced in our other tagging projects. I am firmly of the belief that the quality is a result of the Flickr environment (lets call it ‘culture’) and its userbase. [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]

It will be interesting to see the promised three-month report. It seems to me that this shows the long-tail dynamic I have discussed elsewhere. A large part of the long tail effect is about better matching supply and demand by aggregating each in a network environment. Flickr aggregates supply: it provides a critical mass of pictures and community structure for sharing at the network level. It also aggregates demand by attracting large numbers of users, and creates value for them through its sharing structures. An individual institution has difficulty mobilizing this audience.

Related:

Dempsey, Lorcan. Libraries and the Long Tail: Some Thoughts about Libraries in a Network Age. D-Lib Magazine, April 2006, Volume 12 Number 4.

View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

Serendipitous encounter through tags

 •  Categories: Knowledge organization and representation , Social networking

The University of Michigan has introduced a social bookmarking application, MTagger. Here is Ken Varnum:

More important than the tagging functionality itself is what MTagger will allow our faculty, staff, and students to do. MTagger brings a social component to research that we have not previously had. It will allow users to share knowledge about library resources with each other, to enable quick-and-dirty subject guides to be produced, and -- we hope -- to bring researchers together via their individual tag clouds. As research moves online, chance meetings in the stacks of researchers with overlapping interests become even more rare. Through tagging, we hope to be able to recreate some of those synergistic interactions as one researcher finds a tag of interest, and through that, the other researcher. [New Tagging Tool at University of Michigan Library (RSS4Lib)]

I very much like the way Ken describes the rationale for this initiative above, and the focus on social connection rather than retrieval. Scale and incentives are important for this type of behavior: it will be interesting to see how well they do.

Related entries:

View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

Tags

 •  Categories: Knowledge organization and representation , Metadata

Stanford researchers collected data from del.icio.us and come to some pretty interesting conclusions about tagging. Of course, they are talking about tagging of web pages where the text of the tagged item is available for indexing.

Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the largest dataset from a social bookmarking site yet analyzed by academic researchers. Our dataset represents about forty million bookmarks from the social bookmarking site del.icio.us. We contribute a characterization of posts to del.icio.us: how many bookmarks exist (about 115 million), how fast is it growing, and how active are the URLs being posted about (quite active). We also contribute a characterization of tags used by bookmarkers. We found that certain tags tend to gravitate towards certain domains, and vice versa. We also found that tags occur in over 50 percent of the pages that they annotate, and in only 20 percent of cases do they not occur in the page text, backlink page text, or forward link page text of the pages they annotate. We conclude that social bookmarking can provide search data not currently provided by other sources, though it may currently lack the size and distribution of tags necessary to make a significant impact. [Heymann, Paul; Koutrika, Georgia; Garcia-Molina, Hector: Can Social Bookmarking Improve Web Search?]

In general they found that users thought that tags were objective and relevant. They highlight results throughout the paper. I thought the conclusion they drew from this result quite interesting:

Result 11: Domains are often highly correlated with particular tags and vice versa.



Conclusion: It may be more efficient to train librarians to label domains than to ask users to tag pages.

View commentsView comments (2)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

FRBR and Learning Objects (FLOR?)

 •  Categories: Knowledge organization and representation , Learning and research - systems and technologies , Metadata

Phil Barker looks at FRBR in the context of learning object metadata.

The proposed object model borrows from the scholarly works application profile (SWAP) application model, which in turn is based on the Functional Requirements for Bibliographic Records (FRBR) entity model. The rationale behind this was that, firstly, scholarly works may be considered learning materials in higher education, so any model for learning materials would have to describe scholarly works, secondly, the FRBR model is well-tested and seems generic enough to describe many other types of resource (e.g. musical scores and performances, images, online resources). [Learning Materials Application Profile Domain List]

Via Pete Johnston.

View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

QOTD: the bibliographic archipelago

 •  Categories: Knowledge organization and representation , Metadata

Our bibliographic systems are like an archipelago. Scattered islands which need to be visited individually. In this context I was interested to read Bob Wolven:

Now, however, more radical change seems both possible and responsible in light of developments taking place outside library cataloging. The balkanized system that has characterized information retrieval to date—in which researchers use one tool to find books and journals, another to find journal articles, a third to track poems, and so forth—has allowed library cataloging practices to be evaluated in isolation. Rules and the data they generate are seen as more or less valuable in relation to their impact on the library OPAC; in turn, OPACs are seen as more or less effective for their ability to use and present cataloging data. Now, this hegemony is being challenged: metasearch tools bridge formerly separate search environments; search engines draw on multiple sources to present alternative interfaces to both popular and scholarly resources; full-text aggregations, Google Book Search, and Microsoft's Live Academic Search extend the reach of discovery into the content itself. [In Search of a New Model - 1/15/2008 - netConnect]

I sometimes puzzle over the emphasis on next generation catalogs. Of course, it is easy to understand, given the local control. But it is only one island, an important one, but one destination among several. What about all the other databases?

What questions about the value of the controlled data in our catalog records (names, subjects, etc) will we ask as it begins to be merged more with data created in different regimes? We can already see this happening in the environments that Bob mentions, and in new integrated discovery environments like Primo, Encore and Worldcat Local.

View commentsView comments (3)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

Cataloging and standards

 •  Categories: Knowledge organization and representation , Metadata , Standards

Bob Wolven has an interesting piece in netConnect about cataloging. He mentions our approach to standards, among other things.

Perhaps worse, the kind of consensus we have demanded drives us toward complexity. Our libraries acquire a vast and wildly diverse set of resources, yet we insist on treating all of them by the same rules. We prize consistency over practicality. If some works, in some contexts, benefit from a precise transcription of statements of responsibility, or from detailed recording of pagination and illustrations, we apply those same principles to all. We apply the same level of subject analysis to the 20-page pamphlet and the 1000-page treatise. We do this not out of obduracy or short-sightedness, but because it's the only way we have found to build trust among what is, after all, a very large and diverse group. [In Search of a New Model - 1/15/2008 - netConnect]

We do sometimes treat standards activity as if the desired outcome were socially acceptable consensus. This has meant that we may allow optionality or discretion in how data is represented, or, for example, we may suggest that data go into notes. This may have been more acceptable when actual data exchange was not very frequent, or data was created for human display. However, as more of our services are supported by communicating applications, and as the volume and variety of data transfers increase, this approach is less useful. Think of how we want to process data for faceted display, or for clustering into works, or think about using data to manage flows into mass digitization or offsite storage where we want to track volumes through workflows. We want to make sure that the full intellectual effort that goes into description is available for re-use by applications.

View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

On the record: report of the LC working group on the future of bibliographic control

 •  Categories: Knowledge organization and representation , Libraries - organization and services , Metadata

The final report of the LC Working Group on the Future of Bibliographic Control has been submitted and is now available on the LC website.

  • On the Record: Report of The Library of Congress Working Group on the Future of Bibliographic Control (January 9, 2008)

    Read final report [PDF, 442 KB]




  • [News and Press Releases - Working Group on the Future of Bibliographic Control (Library of Congress)]

    Note: I am a member of the Group.

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Library of Congress Working Group on the Future of Bibliographic Control

     •  Categories: Knowledge organization and representation , OCLC

    I am one of the two 'at large' members of the LC Working Group on the Future of Bibliographic Control. A draft final report for comment was released a while ago and today is the final day for responses.

    Karen Calhoun submitted a comment [PDF] on behalf of OCLC yesterday.

    Update: Prompted by a query: Cliff Lynch and I are 'at large' members. Other members were nominated by the following organizations: American Association of Law Libraries, the American Library Association, the Association of Research Libraries, the Special Libraries Association, the National Library of Medicine, the National Agricultural Library, Google, Microsoft Corporation and the Program for Cooperative Cataloging.

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Deweybrowser II

     •  Categories: Knowledge organization and representation , OCLC

    A new version of the Deweybrowser has appeared. This is a prototype system with some nice features. It is built using Solr and highlights the use of a classification system in retrieval:

    The DeweyBrowser, beta version 2.0, has a new interface and updated database. You can search for a topic or drill down through the summaries by clicking on a caption in the Dewey clouds. New features include the ability to filter search results by format, language of resource, and OCLC Audience Level. You can also search within a results set.
    The interface provides the option of displaying the captions in one of several languages. Available languages are English, French, German, Norwegian, Spanish, and Swedish.
    The prototype provides access to approximately 2.5 million records from the OCLC Worldcat database. The records are indexed and searched using Apache Solr. [About the DeweyBrowser ]

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Library of Congress Working Group on the Future of Bibligraphic Control

     •  Categories: Knowledge organization and representation , Metadata

    The draft final report of the Working Group on the Future of Bibliographic Control has been made available [PDF] for public comment.

    Responses are being accepted by the group until December 15, 2007.

    Different communities of bibliographic practice have grown up around different resource types: library collections of books and journals, archives, journal articles, and museum objects and images. As these resources and others become increasingly accessible through the Web, separation of the communities of practice that manage them is no longer desirable, sustainable, or functional. Bibliographic control is increasingly a matter of managing relationships—among works, names, concepts, and object descriptions—across communities. Consistency of description within any single environment, such as the library catalog, is becoming less significant than the ability to make connections between environments: Amazon to WorldCat to Google to PubMed to Wikipedia, with library holdings serving as but one node in this web of connectivity. In today's environment, bibliographic control cannot continue to be seen as limited to library catalogs. [Report on the Future of Bibliographic Control PDF]

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Some notelets on Facebook and the social graph

     •  Categories: General - distributed environments , Knowledge organization and representation , Social networking

    Some holiday morning notelets ....

    1. The social graph in action. I felt a tremor in the social graph this week. A bundle of my Facebook befrienders attended the CETIS conference. I was suddenly aware of status lines, notes, imported blog entries. I had a sense of some of what was discussed and could follow up if I wanted. It happened in the background. It was like the weather: I had a sense of what was happening without having to do much investigation. Incidentally, CETIS have done a nice job in collecting some of the network amplification of the conference on the website: blog posts, del.icio.us bookmarks, and so on.

    2. The social graph, not. Facebook's flatness does not very well accommodate our layered and multidimensional social lives. A lot to talk about there, but this is still a holiday morning notelet .... To pick a simple and relatively straightforward example: what to do with an unwelcome invitation to be a 'friend' from your boss? I assume we will see a more nuanced way of managing the ways in which we present ourselves emerge over time. Which raises issues about how we port or share our represented identities, something that we do not do well now. The social graph is site-specific.

    3. Net, web, graph. Tim Berners Lee gave the social graph expression a lift yesterday in a post about the evolution of our networked environment. He talks about a net/web/graph stack. The 'net' allowed us to address computers directly, abstracting away from the underlying connection paths. The 'web' allowed us to address documents, abstracting away from the machines on which they reside. In each case, new and unanticipated value was built on the navigable spaces the net and the web created. The 'graph', Tim Berners Lee suggests, allows us to work with the things that documents are about, friends, flights, proteins, customers and so on, abstracted away from the documents or sites themselves. If represented appropriately, and he uses the example of FOAF, applications can combine and recombine data about things across multiple documents and sites. So, an application could combine what various sites know about me and my relationships. So yes, in these terms, the social graph meets the semantic web. Of course, we have yet to see whether Facebook believes that the social graph is actually greater than the Facebook graph.

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Worldcat Identities again

     •  Categories: Knowledge organization and representation , OCLC

    Thom has some more details about Worldcat Identities.

    What is returned is really XML with a reference to an XSL stylesheet to transform the XML into the HTML displayed by the browser. [Outgoing: Links to WorldCat Identities]

    He talks about how links from Worldcat work and also describes direct linking approaches ...

    http://worldcat.org/identities/find?fullName=colm+toibin

    http://worldcat.org/identities/lccn-n81-98944

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Worldcat Identities

     •  Categories: Knowledge organization and representation , OCLC

    Worldcat Identities went into production as part of Worldcat.org over the weekend. It is currently linked to from under the 'details' tab. We will be interested to see how it is used and review other integration options over time.

    One nice feature is that the Tag Cloud (these are FAST headings derived from Library of Congress Subject Headings in the records) sends searches back into Worldcat.

    lorcancloud.png

    Thom discusses the implementation over on Outgoing.

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Webified terminologies again

     •  Categories: Knowledge organization and representation

    My colleagues recently organized a meeting to discuss demand for and potential uses of webified terminologies. A strawman document [pdf] was produced to inform the meeting, which provides some use cases.

    A summary report of the meeting is now available.

    It seems increasingly clear that if the 'classic' terminologies used in our environment are to create value on the web then they themselves have to become web resources, URI-addressable at the concept (or name, or place-name, ...) level.

    Related resource:

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Webified Dewey

     •  Categories: Knowledge organization and representation , OCLC

    My colleague Michael Panzer discusses issues involved in the 'webification' of Dewey in a recent presentation [ppt].

    The presentation will briefly introduce a series of major principles for bringing subject terminology to the network level. A closer look at one KOS in particular, the Dewey Decimal Classification, should help to gain more insight into the perceived difficulties and potential benefits of building taxonomy services out and on top of classic large-scale vocabularies or taxonomies. [DLIST - Towards the “webification” of controlled subject vocabulary: A case study involving the Dewey Decimal Classification]

    This is part of an ongoing investigation of what it means to release more of the value of "classic large-scale vocabularies" in a web environment.

    Here is the introductory slide:

    • Well-known problem, why does it resurface every other year without much change?
    • Large scale projects dealing with KO vocabularies are started without adhering to common fundamentals on the operational and strategic level
    • Project results are often unsustainable and do not outlive the specific use case (if any) that they were build to support
    • Currently, the DDC is facing such a challenge and chance for transition to the “network level”
      • “Network level”: Infrastructural improvements to make a KOS web-scale accessible, to make sharing, syndicating, leveraging of its data feasible
    • Main project goal: Improving accessibility and visibility of the scheme to stimulate association with resources

    Via Eric Childress.

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Names, names, names, .....

     •  Categories: Knowledge organization and representation

    Name authority files are often national in scope and will be created under different policy regimes. This is the rationale for VIAF (the Virtual International Authority File).

    Thom and colleagues have just made a prototype VIAF system available. Read more about VIAF on the project page:

    The Deutsche Nationalbibliothek, the Library of Congress, the Bibliothèque nationale de France, and OCLC are jointly conducting a project to match and link the authority records for personal names in the retrospective personal name authority files of the Deutsche Nationalbibliothek (dnb), the Library of Congress (LC), and the Bibliothèque nationale de France (BnF). [Virtual International Authority File [OCLC - Projects]]

    vine.png

    Longer term, the future of authority control is interesting. Typically, the files will contain names associated with materials catalogued. Now, this is much better than not having any files, however, this will increasingly seem like a rather arbitrary slice of names. Think of what is not included: people who have only published in articles or people who have only deposited stuff in archives, for example. And think about the national slicing, which presents matching issues as we begin to move data around much more.

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Organized, internationally

     •  Categories: Books, movies and reading ... , Knowledge organization and representation

    There was some discussion a while ago in various places about the relative merits of bookstore and public library shelf organization.

    I was thinking of this as I was looking at music in Borders earlier today. Borders used to have Irish music in a section called World. Now they have a new section called International. And in International there is both Celtic and Irish/Scottish. On my brief perusal I could not immediately see which was used for which materials as it seemed to me that stuff under either heading could be put in the other. Now, maybe this move is informed by experience, as in the earlier organization Irish and Scottish stuff was occasionally mixed up. It could be that this new categorization aims to remove that opportunity for confusion ;-)

    However, I was interested to find in the Irish/Scottish material the wonderful, but, er, English, Kate Rusby. It seems that she is being assimilated to transatlantic folkiness. (Of course, if she were, say, a 'rock' artist, she would be in the general rock section, not in the 'international' section.) I was curious to see how Kate Rusby was tagged in (the US version of) Amazon.

    rusbytags.png

    (There are no tags in the UK Amazon. I wonder whether or not this flows from a judgment that tagging is culturally or geographically specific. There may be a more prosaic reason of course.)

    Incidentally, when I first arrived at OCLC I tried - forlornly - to resist the use of international where what was really meant was non-US. Members' Council is indeed an international body - it has participation from different countries. However, a delegate from The Netherlands or South Africa is not 'international'. Similarly, Kate Rusby or Dolores Keane are no more or less international than Iris Dement. I notice that the use of international in this sense is becoming more widely used. Perhaps it stems from a desire to avoid using the rather stark foreign in these cases?


    View commentsView comments (4)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    QOTD 2: identifers again

     •  Categories: Knowledge organization and representation , Metadata

    Tony Hirst of the Open University says:

    In the days when this blog was dominated by library related concerns, I used to spend a lot of time working out how to use ISBNs as pivot points for various book related searches; (librarians, of course, don't rate ISBNs - they'd rather focus on the city a book was printed in...). [OUseful Info: OU Course Codes - A Web 2.OU Crown Jewel]

    Maybe a little harsh ;-) But it does highlight for me one of the major shifts that needs to take place in our thinking about bibliographic data. My sense is that a majority view in the library community is that bibliographic data is to support discovery and is for display to human users.

    However, increasingly important is the use of bibliographic data to support automated processes. Think of resolvers. Think of the growing need to link data from discovery environments (google scholar, next generation catalogs, worldcat, ...) back to various library fulfilment environments. Think of the analysis and collection comparison being done to support digitization or off-site storage. Think of the processing required to support richer discovery experiences (faceting and frbrization for example). And think of future higher level services which build on the relationships in the data (see the related links here and here for example, or Worldcat Identities itself for that matter which does major batch processing of data).

    This suggests that we need to be much more careful to facilitate this processing. The growing importance to us of identifiers is one - only one - example of this.

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    QOTD: URI patterns

     •  Categories: General - systems and technologies , Knowledge organization and representation , OCLC , User experience

    A quote about URIs:

    I propose that a resource and its URI ought to have an intuitive correspondence. …. URIs should have a structure. They should vary in predictable ways: you should not go to /search/Jellyfish for jellyfish and /i-want-to-know-about/Mice for mice. If a client knows the structure of the service’s URIs, it can create its own entry points into the service. ….. URIs do not technically have to have any structure or predictability, but I think they should. This is one of the rules of good web design, ….. [RESTful web services. Leonard Richardson and Sam Ruby. P. 83]

    And in Jon Udell’s review of the book:

    Lacking a Strunk and White Elements of Style for URI namespace, we’ve made a mess of it. It’s long past time to grow up and recognize the serious importance of principled design in this infinitely large namespace. [RESTful Web Services « Jon Udell]

    I was reminded of these while reading Michael Panzer's discussion of URI patterns and Dewey the other day.

    Although the Dewey Decimal Classification is currently available on the web to subscribers as WebDewey and Abridged WebDewey in the OCLC Connexion service and in an XML version to licensees, OCLC does not provide any “web services” based on the DDC. By web services, we mean presentation of the DDC to other machines (not humans) for uses such as searching, browsing, classifying, mapping, harvesting, and alerting.
    In order to build web-accessible services based on the DDC, several elements have to be considered. One of these elements is the design of an appropriate Uniform Resource Identifier (URI) structure for Dewey. [025.431: The Dewey blog: Designing identifiers for the DDC]

    Many organizations are probably having similar discussions, and this is certainly part of a general exploration of this issues within OCLC.

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Give us a subject heading ...

     •  Categories: Books, movies and reading ... , Knowledge organization and representation , Metadata , OCLC , Search

    I was interested to read the following in Susan Gibbons' The academic library and the Net Gen student.

    As gaming becomes a more mainstream pastime and an important element in popular culture, academic libraries should begin to develop collections of books and journals about gaming. To find some recent monographs, search OCLC's Worldcat using subject headings such as "Internet games-social aspects" and "Computer games-psychological aspects." [p. 38]

    Click for some results:

    Internet games -- social aspects

    Computer games -- psychological aspects

    Some things that occur ....

    First, we do not see subject headings or class numbers often used in that way in text, despite their pervasiveness in our library catalogs. Second, it will become increasingly clear that they apply only to a part of the library collection as we more and more pull together access across the whole collection. And third, how much more should we do with them? Wouldn't it be nice to have a selection of tags, or authors, or publishers, which are related to these headings in various ways? We are not making them work very hard ....

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Bibliographic fore-understanding

     •  Categories: Knowledge organization and representation , Metadata

    On the website of the LC Working Group on the Future of Bibliographic Control ... A webcast of the third meeting, the topic of which was economic and organizational issues. For those few who do not have the time or the inclination to experience the full range of presentations ;-), there seemed to be widespread agreement that Rick Lugg did a very good introductory overview of issues.

    Webcasts are provided for the complete meeting (timings are indicated for each part of the complete session below. [Webcasts for July 9, 2007 Meeting in Washington, D.C. - Economics and Organization of Bibliographic Data (Working Group on the Future of Bibliographic Control, Library of Congress)]

    In reading, and listening to, reports of discussion at this and others in the series I have been struck by how much folks' interests and prior understanding affect what they hear. This is very usual, of course, as we all tend to interpret things in relation to our own interests and experiences. We are prepared to hear what we have been prepared to hear. It seemed a little stronger here.

    Several things are at play, IMHO. There is a variety of perspectives (functional/user, bibliographic, technical, economic, managerial and so on) which are not always well integrated in discussion; there is disagreement about how much change is or is not required within any of these perspectives; and there are different views about the value of some existing practices, or of the potential impact of alternative approaches.

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    LCWGFBC III

     •  Categories: Knowledge organization and representation

    The background paper for the third open meeting of the Library of Congress Working Group on Bibliographic Control has been released [pdf].

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Systemic change: CIC and Google

     •  Categories: Books, movies and reading ... , Digital asset management , Featured , Knowledge organization and representation , Libraries - organization and services , Metadata , OCLC , Search , ebooks and other e-resources

    Today Google and CIC announce an agreement to digitize ten million volumes across the CIC libraries. Google has been adding new partners since the first announcement was made about the Google 5. Some folks have wondered what rationale has governed selection of partner opportunities. We do not know, but they sure are moving fast! Here are some early thoughts.

    The CIC announcement is interesting for several reasons:

    • It is a shared effort across a major group of libraries with significant collections. There appears to be strong CIC institutional commitment. Of course, CIC has a history of collaboratively sourced activities and this 'pooling' model makes increasing sense given the necessary policy and service challenges that need to be addressed. In this case, but also across a range of other issues that libraries face as they support changing research and learning behaviors in a reconfigured network environment. For some things, scale matters.
    • The libraries have a shared approach to managing the digital copies based on shared infrastructure at the University of Michigan, and serving them up to their user communities. An example of collaborative sourcing.
    • Google recently advertized for somebody to work on collection development and we seem to be seeing a stronger focus in this area. Collecting areas of importance within each library [pdf] have been identified for attention. Presumably, these decisions have been influenced by the 'collective collection' of the full Google parnership also.

    This initiative in turn prompts some more general thoughts about access:

    • One of the most valuable features of the Google initiative is that it digitizes book content, allowing fine-grained discovery over topics, people, places and so on. Of course this presents interesting questions about indexing, retrieval, ranking, and presentation but the advantage of having this access seems clear. It drives use and sales, and it supports enquiry. Without it, the book literature is less accessible than the web literature.
    • However, as we are beginning to see on Google Book Search, we are really going beyond 'retrieval as we have known it' in significant ways. Google is mining its assembled resources - in Scholar, in web pages, in books - to create relationships between items and to identify people and places. So we are seeing related editions pulled together, items associated with reviews, items associated with items to which they refer, and so on. As the mass of material grows and as approaches are refined this service will get better. And it will get better in ways that are very difficult for other parties to emulate.
    • Currently this material is made available within the Google destination site. Google is an advertizing engine and its approach depends on aggregating attention for adverts. This apporach may be difficult to deploy within a more 'data services' approach where others - especially the partners - have remixable access to content and services. However, the 'utility' value of this resource will be diminished if it is not made available in this way so that others can mobilize these resource within their own environments. How and if this gets done remains to be seen. (See the related discussion about the search API.)
    • This type of access seems especially important for the partner libraries. In the early days of this activity there was some discussion of the types of services which would be built on top of the digitized books by the libraries. However, it is difficult, and maybe not very sensible, for the libraries to individually invest in some types of service development. An important factor here is that they cannot benefit from the network effects tha

      View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    LCWGFBC II

     •  Categories: Knowledge organization and representation , Metadata

    The report from the second open meeting of the Library of Congress Working Group on the Future of Bibliographic Control is now available. The topic was Structures and Standards for Bibliographic Data.

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Four sources of metadata about things

     •  Categories: Featured , Knowledge organization and representation , Libraries - organization and services , Metadata , Social networking

    I think it is useful to think of four sources of descriptive metadata in libraries. These are not mutually exclusive, and one of the interesting questions we have to address is how they will be mobilized effectively together.

    I don't have good names for these. How about: professional, contributed, programmatically promoted, and intentional?

    Professional

    The curatorial professions have made major investments in knowledge organization, through the development and application of cataloging rules, controlled vocabularies, authorities, gazetteers, and so on. One of our major challenges is releasing the value that has been created through those approaches in web environments. There is much to think about here, and many folks are thinking about it. Currently, these approaches do not tend to work well across silos, they are not made available as web resources themselves so that they can be part of the connected fabric of the web, they only work with the other approaches I mention in particular projects or services, their 'relating' power is underused, and higher level services based on data mining or statistical analysis are limited. Now, these types of issues are being addressed, but are some way from routine systemwide application. I believe that these approaches will continue, within a reconfigured system, and we need to make that data work harder. My personal view is that the curatorial professions need to invest more in the shared production of resources which identify and describe authors, subjects, places, time periods, and works.

    Contributed

    A major phenomenon of recent years has been the emergence of many sites which invite, aggregate and mine data contributed by users, and mobilize that data to rank, recommend and relate resources. These include, for example, Flickr, LibraryThing, and Connotea. These services have a different focus, and create real value in the way that they organize resources. They also have value in that they reveal relations between people. Libraries have begun to experiment with these approaches, but individual libraries may not have the scale to iron out local or personal idiosyncrasy or emphasis. This is another area which lends itself to shared attention. There are real advantages to be gained. So, for example, as we digitize photographic and other community collections, we will want to mobilize knowledge about those collections that does not exist within the library. Or, if you think about a service like Worldcat Identities, at some stage we will want to allow those 'identities' themselves to comment, augment, amend. What this means is that we will have to get rather more sophisticated about managing assertions about resources from different sources.

    Programmatically promoted

    We are handling more digital materials, where it is possible to programmatically identify and promote metadata from resources themselves or groups of resources. We will also do more to mine collections, including collections of metadata, to discern pattern and relations. We are increasingly familiar with clustering, entity identification, automatic classification and other approaches. Look at the home page for books that Google is creating to see a resource created from mining Scholar, Google Book Search, and big Google to deliver a range of related materials.

    Intentional

    I have used this term to refer to the data that we are collecting about use and usage. Pagerank is based on aggregate linking choices. Amazon recommendations are based on aggregate purchase choices. We use holdings data in ranking algorithms, which aggregates selection choices of libraries. This type of data has emerged as a central factor in the major web presences as they seek to provide useful paths through massive amounts of data.

    To repeat, these approaches are not mutually exclusive and will increasingly be deployed alongside each other. For example, authority lists may support programmatic identification of personal or place names in large text resources. The shared interests revealed in social networking applications may be abstracted into a form of intentional data to drive recommendations or 'related work' services. Patterns of association and interaction will develop between tags and subject headings. And so on.

    Much of our discussion pits these approaches against each other. This seems like the wrong approach. Clearly there will always be choices about where one invests effort, especially as the network continues to reconfigure what we do, but the starting point should be how we create better services and what approaches support that, and not a 'techeological' position around one or other approach which confuses ideology and technology.

    Related entries:

    View commentsView comments (3)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    LC WCFBC

     •  Categories: Knowledge organization and representation

    A short background paper for the second open meeting of the Working Group on the Future of Bibliographic Control is now available on the LC website [pdf].

    View commentsView comments (1)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    WGFBC note of first open meeting

     •  Categories: Knowledge organization and representation , Metadata

    A quick note pointing to a summary of the first Open Meeting of the LC Working Group on the Future of Bibliographic Control by Nancy J Fallgren.

    View commentsView comments (0)    Post a commentPost a comment    Bookmark:  del.icio.us   Digg This   Google Boomarks   reddit   Furl  

    Metadata across cultural domains

     •  Categories: GLAM , Knowledge organization and representation , Metadata

    An interesting article by Mary W. Elings and my colleague Günter Waibel on cross-domain metadata practices has just appeared.

    Integrating digital content from libraries, archives and museums represents a persistent challenge. While the history of standards development is rife with examples of cross-community experimentation, in the end, libraries, archives and museums have developed parallel descriptive strategies for cataloguing the materials in their custody. Applying in particular data content standards by material type, and not by community affiliation, could lead to greater data interoperability within the cultural heritage community. [Metadata for All: Descriptive Standards and Metadata Sharing across Libraries, Archives and Museums]

    The authors propose a framework within which to think about metadata across domains. I commented here on a blog entry in which Günter introduced the framework.

    In recent discussions, I have been struck by how the issue of authorities, gazetteers, and subject resources has come up as a shared interest across these curatorial traditions. Each community has an interest in establishing agreed ways of noting names, places and things, and has a variety of practices to support it. This seems like a fertile area for investigation of shared attention across communities.

    Related entry: