I mentioned Augustine of Hippo the other day, in the context of the interesting work that Google is doing to develop a contextual page for each book (resources about it, resources related to it, etc. See this Penguin Classic with the nice cover, for example). Searching for the 'City of God' in Google Book Search pulls together a couple of editions, and notice in the picture that a couple of editions of the second result are also pulled together (the indented entries and the un-indented entry they follow are versions of the same work).

I was looking at this because Dan Clancy, of Google, mentioned that they were pulling together members of work sets in this way at the Working Group on the Future of Bibliographic Control meeting the other day. This is interesting and valuable. Presumably this is happening programmatically. They do not pull in The City of God Against the Pagans further down the page, another translation of the same work. They do not pull in De Civitate Dei of which these are all translations. Although, they do pull together a couple of different selections of De Civitate Dei itself (and it might be reasonable not to pull together selections with the complete versions?). It does not pull in a French version.

Now, the entry for Augustine in Worldcat Identities says the following about The City of God: '369 editions published between 1466 and 2005 in 16 languages and held by 9,731 libraries worldwide'. Here are twenty-five.

I think that it is interesting to place an algorithmically generated (and still very partial) resource like the Google Book Search summary page alongside the 'expert' generated bibliographic data in library resources, and aggregated here in Worldcat and its derivative, Identities (and this is just the application to show up inconsistencies in the data!).

Suggesting that one approach is better then the other seems to me to be a fruitless direction. There is a lot to be said about each in many dimensions. They are complementary and can amplify, correct and refine each other. Over time the balance between them may change, but for the moment think how interesting it would be to have them working together.

Aside: In my previous GBS message, I mentioned how St Augustine was being recognized in A portrait of the artist as a young man and placed on the map in Florida. Looking at text of The City of God I was amused to see ads for real estate in Jacksonville and St Augustine scroll by at the bottom of the screen. That said, it seems to me that we are becoming increasingly tolerant of such errancies as a reasonable price to pay where the value of a programmatic approach is visible?

Related entries:

Comments: 3

Mar 12, 2007
Karen Coyle


My understanding of Clancy's description was that they are doing "de-duping" not FRBR. Of course, there wasn't time to get into what they mean by "the same book" but it would be interesting to continue that conversation. I can imagine a calculation of sameness based on the digital version that would bring together instances of the same text, whether actual duplicates (because Google will be scanning the same book more than once) or ones we would consider reprints. This is probably fairly close to what a user would consider to be "same," and which I think would be the same Expression in FRBR. It would be interesting to consider "FRBR-izing" at this level as well, not just the work level.

Mar 12, 2007
Lorcan Dempsey

Karen, what I say is that they are pulling together members of work-sets. This is different from 'de-duping' at the manifesation-level, which is where we have usually thought about it. Maybe you could say that what I was talking about was where they are trying to de-dupe at the work level? Of course, they are interested in de-deduping at all levels.

I think that it is useful to think about this in terms of managing similarity and difference. Sometimes you are interested in any version of Huck Finn. Sometimes you are interested in a particular version. To meet that requirement you need to manage similarity (what are the members of the work) and difference (what is sufficiently distinct to merit individual attention).

In fact, based on the above, I should generalize to say that you are interested in managing similarity and difference at the item, manifestation and work levels (putting expressions to one side).

The extent to which you do this is really a service choice, based on available evidence and your decision about what you are prepared to accept as equivalent for your purposes.

You are right about the interesting potential for machine inspection: they have more evidence on which to base decisions. In fact, Google can now make item-level distinctions (where an item carries annotations in the margin, for example). Or find small differences within manifestations (e.g. spelling corrections). And so on.

I confess that I tend to use 'frbrize' loosely to refer to something which tries to give both a work-based and a manifestation-based view. Of course, in the absence of any real specifications which encapsulate aggreeed implementation of the model, FRBR remains a vague - but useful - concept.

Mar 20, 2007
Jonathan Rochkind

The FRBR Model doesn't actually go into too much detail about specifying exactly where work boundaries occur. When two exprsesions/manifestations are the same work, and when they aren't.

It actually says this is subjective and context specific. I think this is just right---it's the job of particular communities or implementor's using the FRBR Model to make these decisions and/or create specifications/rules for their community. Perhaps certain 'best practices' will come out of people actually implementing FRBR Modelled data systems.

But I think it is both intentional and indeed the right decision that the FRBR Model doesn't go into detail about when 'things' are part of the same work and when they aren't. The FRBR Model defines that these entities like Work, Exprssion, Manifestation exist, and defines the attributes and relationships that they have. How to actually capture 'the real world' in this model is an implementatino decision which is subjective, context-dependent, and likely community-specific. If a community (like 'the AACR2/RDA using community', such as it is) feels it needs strict guidelines for such---then they must be created. Logically, in RDA.