'Book' is a big word. It has a lot of power as it is intimately bound up with our intellectual and imaginative histories. More parochially, the book is also strongly bound up with the professional practice and identity of the library and librarians.
At a more prosaic (sic) level, the book is also interesting as we manage data about multiple formats. We use book to mean a format (a set of bound pages, etc, say), a type of creative work (continuous textual/pictorial narrative, say), a work (Don Quixote, say), and maybe other things. Because of its centrality when our professional practices were being formed, perhaps more was taken for granted about the book than has been about other formats (the phrase 'non-book formats' is telling here).
This is a bigger topic than a late-night blog entry will tackle. I am prompted to write the post by a conversation I had recently with my colleague Brian Lavoie about what a book is. This was in the context of data mining activity looking at counts of books in particular contexts (here and here for example). There are two questions here: 'what is a book?', and 'how do you operationalise that definition in relation to a particular data set?'. It may be that it is not possible to operationalise aspects of your definition, in which case you will not be able to count as you wish. For example, one sometimes sees this Unesco definition of a 'book':
A book is a non-periodic publication of at least 49 pages exclusive of the cover pages, published in the country and made available to the public. [Revised Recommendation concerning the International Standardization of Statistics on the Production and Distribution of Books, Newspapers and Periodicals, 1 November 1985]
The document provides some other qualifications as to what is and what is not a book. So, is this this unambiguous enough to be operationalizable in a database like, say, Worldcat? Well, the short answer is 'probably not completely' ;-) I would have to consult more knowledgable colleagues who would have to do some work to find out how much could be done. However, it is unlikely that one would be able to consistently identify all the included categories of materials and apply a limit of 49 pages.
Anyway, this is a prelude to the these comments that Brian sent me about the issue:
"As a non-librarian who works with library data on a regular basis, I was surprised to learn that the commonplace object 'book' is not well-defined in traditional cataloging practice. This is all the more surprising when one considers that historically, libraries were built around aggregations of books. The difficulty is that there are no explicit bibliographic criteria for identifying something most people would recognize as a 'book'. So for example, consider a simple question like 'How many books are in WorldCat?' In the bibliographic universe, there is nothing explicitly defined as a 'book': there are monographs, or more narrowly, language-based monographs, but the items falling into these categories are not necessarily books as we might commonly perceive them. Is a government document a book? A dissertation? A technical report? A pamphlet of only a dozen pages? These kinds of materials, and more, get included when we use a construct like 'language-based monographs' as a proxy for 'books'.
"Why is this important? The concept of "books" is appearing in a variety of current discussions, most notably in the context of digitization issues like the Google book settlement. So we are often asked questions like, 'how many print books in WorldCat have been published after 1923?' We can provide answers to these questions, but only with a degree of approximation built in: i.e., we can calculate a number that reflects something along the lines of 'all language-based monographs in WorldCat, excluding dissertations and government documents'; we can even throw in a minimum page requirement (at least 49 pages, according to the UNESCO definition of a book). But we can't say exactly how many books are in WorldCat, because from a cataloging standpoint, we don't know what a book is. Libraries are grappling with difficult new questions these days, as collections and services transition from print to digital, from local to the network. But an old question still remains: what is a book?" [Personal communication from Brian Lavoie]