The JISC has made a report on digital preservation costs available. "This study has investigated the medium to long term costs to Higher Education Institutions (HEIs) of the preservation of research data". One aim was to provide a methodological foundation on research data costs for national and institutional initiatives.
Our case studies suggest that the service requirements for data collections and the best structure for organising relevant services locally will be more complex than many have thought previously. Both Cambridge and KCL are developing central repositories to work with departmental facilities and discussing federated local data repositories for research data preservation combining services and skills from central and departmental repositories. Costs for the central data repository component at Cambridge and KCL are an order of magnitude greater than that suggested for a typical institutional repository focused on e-publications alone. [Keeping research data safe : JISC]
The authors are my former work colleague Neil Beagrie, Julia Chruszcz and my current work colleague Brian Lavoie. Neil outlines the contents on his blog:
The report itself has chapters covering the Introduction, Methodology, Benefits of Research Data Preservation, Describing the Cost Framework and its Use, Key Cost Variables and Units,the Activity Model and Resources Template, Overviews of the Case Studies, Issues Universities Need to Consider, Different Service Models and Structures, Conclusions and Recommendations. There are also four detailed case studies covering the Universities of Cambridge, King’s College London, Southampton, and the Archaeology Data Service (University of York). [Neil Beagrie’s Blog]
I sense some renewed interest in digital preservation of late. For example, the following two reports came over my desk on the same day a couple of months ago.
There is a fascinating entry by Seb Chan of the Powerhouse Museum in Sydney documenting experiences one month into their participation in the Commons on Flickr. The Powerhouse Museum has been alert to various ways of combining professional and audience metadata in its services. It was an early comer to the Commons, joining the Library of Congress.
Our experiment with the Commons on Flickr continues and barring a few hours delay we have managed to keep to our promise of 50 new images a week. We’re up to 400 images now with the most recent 50 going live this morning. 158 of these have been geotagged. [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]
A couple of things struck me about his note. First, the volume of activity:
And second, he talks about the volume and quality of tagging activity:
Tonnes of tags have been added and they have been of a quality that we’ve not experienced in our other tagging projects. I am firmly of the belief that the quality is a result of the Flickr environment (lets call it ‘culture’) and its userbase. [fresh + new(er) » Blog Archive » Commons on Flickr - one month later]
It will be interesting to see the promised three-month report. It seems to me that this shows the long-tail dynamic I have discussed elsewhere. A large part of the long tail effect is about better matching supply and demand by aggregating each in a network environment. Flickr aggregates supply: it provides a critical mass of pictures and community structure for sharing at the network level. It also aggregates demand by attracting large numbers of users, and creates value for them through its sharing structures. An individual institution has difficulty mobilizing this audience.
The New York Times carries an article this week about the economics of digital preservation. The context is the NSF Blue Ribbon Task Force on Sustainable Digital Preservation and Access co-chaired by my colleague Brian Lavoie.
All that work is going on, Dr. Lavoie said, but “that misses the point” that the task force was formed to examine: ensuring that the various technologies make economic sense. “You can have the most elegant technological solution to the digital-preservation problem, but if there’s no economics underpinning it, then there’s no solution at all,” he said. [In Storing 1’s and 0’s, the Question Is $ - New York Times]
Brian has an article outlining the work of the Task Force in the current issue of the D-Lib Magazine.
I have mentioned before the tremor an event can sometimes cause in your communications fabric, as it pops up among your Facebook friends, in your RSS aggregator, and so on. So with Open Repositories 2008. A note about amplifying activities from a description of the event ...
Update: Sarah Shreeves left a comment which I thought it useful to copy in here:
And, of course, the use of hashtags for tracking tweets about the conference: http://hashtags.org/tag/or08/. This was a really fascinating way to follow reactions / thoughts during the conference.
Savas Parastatidis announces the announcement of a platform from Microsoft to support 'research output repositories'.
At the Open Repositories 2008 conference, we will formally unveil our work in advance of its official release and initiate interactions/exchanges with the DSpace, EPrints, Fedora, and other players in the repository community. This is crucial to us because—like every other project our group undertakes—we are intensely focused on interoperability. [ - Microsoft and "Research-Output" Repositories]
Read the full post for a better sense of what they are working on, but here are some excerpts.
We are looking forward to sharing our efforts over the last few months. We have been working hard on a platform for building repository-related services and tools. Our goal is to abstract the use of underlying technologies and provide an easy-to-use development model, based on .NET and LINQ, for building repositories on top of robust technologies. [ - Microsoft and "Research-Output" Repositories]
I want to be very transparent here: our effort is intended to provide a repository option to those institutions/organizations that already license or have access to Microsoft software (including the free versions of the products, like SQL Server Express). Our platform is intended to sit on top of the existing Microsoft "stack". By providing this new research-output repository platform at no cost, we can offer added value for our existing (and future) customers in the academic and research space. It is critical to point out that we are making every effort to ensure our platform is optimized to make the best use of Microsoft technologies AND to also interoperate with all other existing systems and platforms in the repository ecosystem. We are actively seeking engagement and feedback from the community! [ - Microsoft and "Research-Output" Repositories]
We are already well into the process of developing a collection of tools and interfaces on top of the platform as tangible examples of how to use it. We already have implementations of OAI-PMH, BibTeX import/export, customized feed syndication service, ASP.NET controls providing access to the repository, and working on Search and a simple Web UI. We are also working on WPF and Silverlight tools for visualizing the relationships between the resources within our repository. [ - Microsoft and "Research-Output" Repositories]
This initiative comes from the Technical Computing area at Microsoft.
Welcome to Technical Computing at Microsoft, our company-wide effort to collaborate with the global scientific community. As modern science increasingly relies on integrated information technologies to collect, process, and analyze complex data, we believe that the Computer Science research community and Microsoft technologies can assist scientists make breakthrough discoveries. [Technical Computing @Microsoft - Vision]
One of the issues institutions have with current repository software is the difficulty of working with it 'out of the box'. It will be interesting to see whether this offering helps here.
Google Book Search: Document Understanding on a Massive Scale [PDF] is a brief treatment of issues faced by Google as they grow their corpus of digitized books and work to make it useful in various ways.
Luc Vincent of Google discusses OCR (issues of many languages occurring unpredictably in variously formatted volumes, at scale), and then focuses on issues of document understanding.
In addition to OCR, making these books easily accessible and useful on http://books.google.com has required developing a number of additional state-of-the-art systems. These include systems for automatically deskewing, cropping and cleaning-up scanned book pages, which is critical as pre-processing prior to OCR, but also to generate clean and small images for efficient web serving. While this may be a well understood problem for high-quality documents, doing this well on scanned century-old book pages is no small feat. Most of the advanced systems developed for Google Book Search however involve some form of Document Understanding and as such, come after OCR in the book processing pipeline. Systems that have been developed, are being developed or are being considered as interesting research challenges include: [Google Book Search: document understanding on a massive scale PDF]
These challenges include: page ordering, language identification, chapter identification, content linking (relate table of contents to appropriate boundaries, index entries to pages, ...); summarization; metadata extraction and cross validation; topic identification; book clustering and linking (create relationships between volumes).
He also discusses ranking:
Specifically, how should books that match a particular query be ranked? The web is notorious for its rich graph of hyperlinks, famously exploited by Google’ PageRank algorithm [6]. This structure applies somewhat to technical publications, which typically contain numerous references to other technical publications. However the universe of books is different and most books (eg, novels) do not contain any references. Novel approaches therefore had to be developed, exploiting an array of new signals. Additionally, these techniques were recently extended to allow “blending” of book search results with web search resuts when appropriate. [Google Book Search: document understanding on a massive scale PDF]
The paper outlines presentation options based on copyright status and also discusses how Google supports the document understanding community through the release of software and data sets.
I was interested that there was no discussion of social features.
I find library strategy documents an interesting indicator of trends.
I do think that looking at how organizations present themselves and what is important to them in the documents they produce is revealing, documents such as annual reports, strategies, job ads, websites, org charts, and so on. It would be interesting to see more analysis of them. Of course, we would have to be cautious in assuming too much about what they do reveal! [Lorcan Dempsey's weblog: Self disclosure]
I recently came across the Emory 5 year strategy document (2008-2012). It is pretty interesting in its range and ambition.
Through the implementation of its strategic plan, the Emory University Libraries (the Library) will be recognized as a model research library that fosters courageous inquiry through the integration of print, digital, and multi-media resources. During the next five years, the Library will strengthen further its distinctive work in two areas: digital information technology and special collections. At the same time, leaders in specific areas throughout the Emory library system will work collaboratively with both internal and external partners to increase access to these exceptional tools, systems, and resources; support new modes of teaching, learning, research, and scholarly communication; and preserve, store, and manage traditional and digital materials for future generations. By fulfilling these objectives, the Library will play a central role in both the creation and dissemination of knowledge and serve as an intellectual bridge between communities at Emory and between Emory and the external world. ...
... The Library’s aggressive strategic plan, which will require roughly $100 million to implement, reflects the vision and priorities of Emory University. First, the plan leverages areas of particular strength within the library, namely advanced digital library technologies and renowned special collections, in much the same way the University’s strategic themes reflect areas of distinctive achievement and potential at Emory. Second, the plan proposes to mobilize leadership throughout the libraries to build a customer-centered organization and to increase access to resources for scholars both within and beyond Emory, just as the strategic initiatives look beyond our community. Third, the plan connects to the strategic themes by strengthening faculty distinction, preparing engaged scholars, reaching out to the external community, and increasing access to resources for scholarship in interdisciplinary fields. [Five Year Strategy for the Emory Libraries]
I was particularly struck here by two emphases which indicate a direction. The first is the very strong emphasis on "collaboration in production and dissemination of knowledge". The library aims to engage much more deeply with research behaviors, supporting faculty in their digital scholarship, and the creation and sharing of their research outputs. The second is the focus on the distinctive contribution of their special collections, "the laboratory of the humanities", on building these up and on connecting them to developing digital research environments.
The emphasis here is on institutional resources: the unique or rare materials that the library has acquired for its users, or the intellectual output of the university faculty. One of the interesting things to ponder is how the latter may be the "special collections" of the future, as the library takes these materials into curatorial care.
Network services have accustomed us to move from the personal to the global. Think of iTunes. I have my own local library on my PC which I can synchronize with mobile devices. It is also tightly integrated with the global network iTunes. And the MiniStore uses aggregate buying patterns to make recommendations to me based on what I have in my 'library'.
Variations of this pattern are repeated everywhere. Flixster allows me to rate movies, and relates those to those of my 'friends' and to the aggregate global network level (Flixster drives the Movies application in Facebook). del.icio.us, LibraryThing, Flickr: I can move from my own collection to a global resource in various ways, often assisted by navigational features based on shared attributes across collections and items.
Of course, the dynamic is different in different places. In LibraryThing, for example, the 'global' data level is made up from aggregate personal collections, and central to the service is the idea that connections between our collections are important connections between us. In iTunes, the 'global' data level is already provided as an indication of available purchases, and I do not get to see other people's collections. Although, as already suggested, I benefit from 'hints' based on aggregate buying decisions. In this way, the balance between 'personal' and 'social' value varies across services.
At the same time, we have seen a related interest in all sorts of ways in creating personal collections which may draw materials from many services. Look at Zotero or the work of the SImile project for instance. These personal collections may or may not connect up to global or shared data layers.
Whatever the context, and whether or not the service has a social orientation, the idea of traversing from the personal to the global is becoming an important characteristic of our web experience. Yet another thing for libraries to think about as they work towards reconfiguring services for the web environment ...
Constant readers - there are a few ;-) - will have noticed several references to outputs from the Eduserv Foundation of late, as well as links to their blog, eFoundations. They are producing a nice body of work.
They have just released a new report "Snapshot study on the use of open content licences in the UK cultural heritage sector" [PDF].
Simply placing digital resources on a website, without any licensing information or terms and conditions, does not necessarily make these resources truly accessible to users of the resource. From the standpoint of the public, this content must be assumed to be fully covered by copyright and therefore permission from the rightsholder needed for use and re-use of the resource (subject to possible fair dealing defences). An image of a painting available on a museum’s website would not without a licence come with permission to place that image on your own website, use it in a presentation, or place it in a Virtual Learning Environment (VLE). ...
... Open content licensing is a way of generally granting a wide range of permission in copyright for use and re-use of the work via a copyright licence, whilst retaining a relatively small set of rights. As mentioned above, copyright operates so that permission is needed for any use except for a limited number of cases. In contrast, open content licensing reverses this default and grants permission for a very wide range of uses but asks that users seek permission only in a limited number cases – often known as a ‘some rights reserved’ model. This style of licensing, like any other, can only be used on works by someone who owns the rights over the work or otherwise has permission to do so. ["Snapshot study on the use of open content licences in the UK cultural heritage sector" PDF.]
Those involved with the development of this Strategy believe that if we apply a combination of will, clarity of vision, collaborative effort across sectors and jurisdictions, and investment from both private and public sectors, we can make Canada the most information rich and information literate country in the world. If we are successful in identifying, valuing and preserving our digital information assets, we can use these assets to educate our youth, to foster a common cultural identity and pride in our accomplishments, and to create new knowledge and new products that advance our economy. If we provide ubiquitous and democratic information access for all Canadians, we will support our common goal to live in an inclusive and progressive society. [The Digital Strategy: Part IV: Conclusion - Canadian Digital Information Strategy - Library and Archives Canada]
Here is Grainne Conole, professor of e-learning at the Open University writing about academic papers, conference papers, and blogging:
Coming back to the question of which represents academic discourse – to my mind it’s all three – in different ways writing a paper, giving a presentation and blogging all help me to formulate and take forward my thinking on a particular topic, a means of meaning making and transformation of the raw ‘data’ to new understandings – surely that’s one of the cornerstones of what being an academic means? [e4innovation.com]
And here is how she distinguishes between those modes of academic disclosure:
So the function and nature of the three media seems to be:
Academic paper: reporting of findings against a particular narrative, grounded in the literature and related work; style – formal, academic-speak
Conference presentation: awareness raising of the work, posing questions and issues about the work, style – entertaining, visual, informal
Blogging – snippets of the work, reflecting on particular issues, style – short, informal, reflective
Here is Dani Rodrik, a Professor of International Political Economy at Harvard, commenting on an earlier post of his where he queried whether the high opportunity costs of blogging (think of all those other things that could get done if you did not use the time blogging!) would drive out high quality economics blogs. No, he concludes:
And second, in my trip to Nottingham I was simply stunned by how many people reported reading my blog. Not only that, people actually remembered my posts--some going quite a while back. With this kind of positive feedback, along with others like this, it is hard to imagine closing the operation down.
Not so incidentally, one of the unexpected scholarly benefits of having a blog is that it is like keeping an intellectual journal. You get an idea, you jot it down in your blog. Some months later, you vaguely remember having had the idea and you google your own blog to recover it. I am not kidding: I google my own blog all the time...
And here is the evidence: the first third of my talk at Nottingham was based on a couple of blog posts from a few weeks back (this and this). So maybe that someone also over-stated the bit about opportunity costs...[Dani Rodrik's weblog]
It is interesting to see them both discuss blogging as an integral part of their academic lives. And their blogging is an important record of thinking about the academic problems they address. And an indication of their academic networks.
I regularly look at the blogs of several folks from the Open University: Tony Hirst's, John Naughton's, and now Grainne's (with whom I used to interact years ago when she was director of ILRT and I of UKOLN). I will occasionally land on Martin Weller's and am peripherally aware of Marc Eisenstadt's.
Ever since my (economist) colleague Brian Lavoie introduced me to Greg Mankiw's blog, I have intermittently followed it, as well as Rodrik's. They occasionally refer to their colleague George Borjas's blog, another Harvard economics professor. Of course there are some pretty high profile economics blogs, including blogs from the Freakonomics authors and, recently, Paul Krugman, both hosted by the New York Times. And there is the prolific Gary Becker, Nobel prize winning economist, at the Becker-Posner blog. I have found Mankiw and Rodrik interesting because of the general mix of light material, commentary on theirs' and their colleagues' work, and their high-level and engaged policy perspectives. The general nature of the blog discourse, to borrow Grainne's word, in that community is absorbing to watch.
Rodrik notes that his blog material appears to have enduring appeal for colleagues. Indeed, the intrinsic interest of the blog output of both the Open University and the Harvard bloggers, and its relation to their academic work, and their broader communities of interest, means that this is probably more generally true.
The blogging platforms used by these people vary. Sometimes they may be institutionally based, more often they will be on one of the main blog hosting sites. While they may be of enduring interest, little thought has probably been given to thinking about their longer term persistence.
Which brings me to my question. Universities and university libraries are recognizing that they have some responsibility to the curation of the intellectual outputs of their academics and students. So far, this has not generally extended to thinking about blogs. What, if anything, should the Open University or Harvard be doing to make sure that this valuable discourse is available to future readers as part of the scholarly record?
Quantity has a quality all its own. A focus on quality is one reason that libraries, archives and museums have not moved their collections in large quantities to the web. This reduces their visibility and impact as the web becomes central to research, learning and civic engagement. Scale matters, and fragmented small-scale activities do not map well onto behaviors in a web environment.
Our intricate attempts to describe and present a few choice collections have resulted in expensive, but little-used websites. And the rest of our collections remain largely invisible.
We need to stop thinking of our lovingly crafted sites, designed specifically for a particular collection, as the only way people will discover our content. While researchers value the description and organization that we bring to collections, they don’t want to have to consult dozens of specialized sites to find what they need. [Shifting Gears: Gearing Up to Get Into the Flow [PDF]]
These words are from a brief and provocative report [PDF] about special collections and digitization just written by my Programs colleagues Ricky Erway and Jen Schaffner. The report is based on speaker suggestion and enthusiastic audience reaction at a forum convened in August as part of an RLG Programs project called Bringing special collections into the large-scale digitization milieu.
The report discusses how current practices will need to change if this activity is indeed to be scaled up in the ways that are discussed. The report presents key areas where assumptions must change if we are to make progress.
Scaling up digitization of special collections (here defined as non-book collections, such as photographs, manuscripts, pamphlets, minerals, insects, or maps) will compel us to temper our historical emphasis on quality with the recognition that large quantities of digitized special collections materials will better serve our users. This will require us to revisit our procedures and policies. Should we be digitizing for both preservation and access, or optimizing procedures primarily for access? How can our selection approaches help us maximize both throughput and impact? Have projects produced reusable infrastructures? What is the appropriate level of description for online materials? How can we make smart partnership agreements in order to build a collective collection that will be valued by a broad audience? [Shifting Gears: Gearing Up to Get Into the Flow [PDF]]
I find it convenient to think about current library systems activities in terms of support for three materials workflows: bought/print materials, licensed/electronic materials, and digital/digitized materials. This is being pragmatic rather than pure, and is open to challenge on many grounds. I have discussed these at more length here, and suggested some ways in which they are developing. Development is in two directions: each of the areas continues to develop itself, while at the same time there is a growing desire to find better ways of working across them (e.g. at the discovery layer, or in terms of a more unified approach to metadata creation/management).
Now, we have an agreed and well-understood set of processes around the first category. These are encapsulated in the integrated library system, and still quite strongly influence library organization. These include things like selection, acquisition, cataloging, circulation, catalog, and so on.
We have a less well agreed set of processes around the second area, and an emerging apparatus of systems support. This includes resolvers, ERM systems, A to Z lists, metasearch, and so on. A level of agreement is apparent in that substitutable systems are now available to support this activity. However, differences in organizational structure to support the area and low takeup of ERM systems suggest that we are in early days. One place where there is likely to be further evolution relates to the creation, management and sharing of the data used to drive these systems.
And we have a much less well agreed set of processes around the third area. Libraries are exploring repositories for digitized collections, they are creating institutional repositories, and building workflows for content preparation and ingest, metadata creation, and so on. In fact, there is no agreed level of service in this area: you do not naturally expect to find particular services here in the way, for example, that you expect to find a circulation system. Of course, this lack of agreement makes this a potentially expensive area. There is a lot of figuring out what to do, and routine off-the-shelf tools or services may not necessarily exist across the range of what you want to do.
This is an overly complex systems landscape, and it will have to be rationalized in coming years so that libraries can spend more time putting their systems to work in support of their users and less time actually getting their systems to work together at all.
Anyway, this is by way of prelude to an observation about repositories. A couple of repository launches have come over my horizon in recent weeks.
The first is the Digital Conservancy at the University of Minnesota, which I mentioned the other day. This aims to provide services in relation to two classes of material: faculty research outputs and university administrative materials that traditionally would have gone to the University Archives. As I suggest in my post this makes a lot of sense: the repository aims to support the full range of institutionally produced intellectual outputs.
The second was the Open University's Open Research Online, "a repository of our research publications and other research outputs." In this case, the service aims to provide support for all the research outputs of OU academics. So, what you will find are deposited open access materials. However, you will also find citations to books, journal articles, and so on, which are not actually available in the repository: you may be referred to a publisher site. The repository aims to provide a full record to research activity, not only the open access materials.
What we have here, then, are well-worked through services which offer overlapping but different views onto their University's intellectual outputs. This is not a major issue as universities work towards a view of what should be offered and what their constituencies value.
However, in the longer term, lack of agreement about services and supporting processes may be a barrier, on the management side where different systems support is needed, or on the user side where different services from different universities may lead to confusion, reducing the gravitational pull that familiarity supports.
Aside: Of course, in the longer run also, there are interesting questions about the relationship between these institutional services and network level services but that is a discussion for another day.
We have become used to managing collections of digital resources: images, music, citations. Zotero is one response to the question of how we will manage collections of scholarly resources. Raymond Yee's suggestive triple does good service describing the motivation: we want to be able to easily gather, create, and share resources. This general question has emerged strongly in library contexts recently.
The interesting Digital Lives project was advertized on various lists the other week.
As we move from cultural memory based on physical artifacts, to a hybrid digital and physical environment, and then increasingly shift towards new forms of digital memory, many fundamental new issues arise for research institutions such as the British Library that will be the custodians of and provide research access to digital archives and personal collections created by individuals in the 21st century. ...
... Digital Lives is a major research project focusing on personal digital collections and their relationship with research repositories. It brings together expert curators and practitioners in digital preservation, digital manuscripts, literary collections, web-archiving, history of science, and oral history from within the British Library (one of the world’s leading research libraries) with researchers in the School of Library, Archive and Information Studies at University College London, and The Centre for Information Technology and Law at the University of Bristol. [Digital Lives :: About]
The project blog notes the related work of RLG Programs in this area. Here is the scope:
Problem statement: Personal collection-building tools abound in the online environment, from social bookmarking sites (De.li.ci.ous, PennTags, CiteULike, Zotero etc.) to iTunes and LibraryThing. As libraries seek to integrate their services into the flow of online scholarship and research and to build collections that mirror and support current scholarly practice, they must reexamine the place of personal collections in the research lifecycle. Are research libraries responsible for creating or supplying tools to support personal collection building? Are they responsible for acquiring and preserving the personal collections of the researchers, student,s and faculty they serve? Little is known about how the range of available tools might be integrated in the library service environment, or what opportunities are available for collaborative sourcing of solutions that can meet the needs of libraries, archives, and museums. [Personal Research Collections program [OCLC - New modes of research, teaching & learning]]
Major 'memory organizations' face significant challenges as the volume and variety of what is within their potential remit to collect grows. The digital turn has presented major challenges in developing routine ways of capturing and curating digital materials in many contexts. An Australian colleague pointed me to a joint statement and request for additional funding by the National Film and Sound Archive, National Archives of Australia, and National Library of Australia.
Digital has become the preferred medium for Australian government agencies, authors, researchers, film makers, musicians and creators. Increasingly, the primary evidence of public administration is created in digital form. The vast majority of film and television works, and virtually all music and recorded sound created in Australia are now released in digital form.
Australia ’s ability to maintain a permanent and accessible record of these activities is therefore linked to our preparedness to cope with this digital tidal wave of images and sounds. As the Collections Council of Australia noted in its background papers for the 2006 Summit on Digital Collections: “ The growth of digital information and the need to store, manage and preserve access is an issue of truly global proportions.” [Australia’s Cultural Heritage – A Digital Future]
As the scope of what such organizations have to do grows, as digital curation needs to become mainstream, and as they have already cut back where they can, the situation becomes more grave.
We’ve already lost many of our important moments and many of our creative ideas and cultural expressions. There is a danger that in ten years time Australians will look back at today as a digital dark-age. [Australia’s Cultural Heritage – A Digital Future]
I might humbly suggest that digital libraries must adopt a theoretical stance. As I noted above, library science is devoid of theoretical foundations and of a knowledge-base that is relevant to the budding digital world. Archival science with its principles of uniqueness, provenance, arrangement and description, authenticity, appraisal, and its tool sets such as diplomatics, may offer us a framework for a theoretical foundation for digital libraries. [Digital preservation, archival science and methodological foundations for digital libraries. ECDL 2007 [PDF]]
We use the following collections grid from time to time to help focus attention on particular collecting patterns in libraries. The bottom right hand corner represents materials that have not been highly stewarded and which are usually unique to a particular institution. The types of material which go in here are research and learning outputs (e.g. preprints, data sets, learning objects) and institutional administrative records (annual reports, and so on).
These share some characteristics. And in some ways, we can see them becoming the 'special collections' of the future when they move into more stewarded environments.
In this context I was interested to see the University of Minnesota's Digital Conservancy. Effectively, it is looking at stewarding the material in that quadrant: institutional research materials and administrative records.
The University Digital Conservancy is a program of the University of Minnesota Libraries that provides long-term open access to a wide range of University works in digital formats. It does so by gathering, describing, organizing, storing, and preserving that content.
Works produced or sponsored by the University of Minnesota faculty, researchers, staff, and students are appropriate for deposit in the UDC. Works might include pre- and post-prints, working papers, technical reports, conference papers and theses.
An interesting report [pdf] from the University of Minnesota Libraries looks at the behaviors of researchers in the sciences. It extends the earlier work done by the Libraries on researchers in the humanities and social sciences.
Not unsurprisingly there is a major focus on having resources available online and when online reducing the number of clicks required to use something. There is also considerable discussion about research data issues, covering the need for better ways of organizing and managing data outputs.
Scientists make heavy and regular use of
library resources availably electronically, but regard the physical library
buildings as a place of last resort -- where you go when you have no other
way to find something. Library buildings are places of “disclosure” rather
than “discovery,” inasmuch as researchers go to libraries to retrieve what
they have already identified. At the same time, many scientists speak
nostalgically about the lost art of browsing and serendipitous discovery in
libraries and depend on technology to provide browsing proxies. [Understanding Research Behaviors, Information Resources, and Service Needs of Scientists and Graduate Students: A Study by the University of Minnesota Libraries]
The findings appear consistent with the RIN report on researchers' use of libraries [pdf] which was released a short while ago, although it ranges over wider ground.
The NSF has set up a blue ribbon panel, with support from Mellon, to explore the economic sustainability of digital preservation activity. I am pleased to report that my colleague Brian Lavoie - who has a background in economics as well as an important record of work in digital preservation - is a co-chair of the panel.
Dr. Berman and Dr. Lavoie will convene an international group of prominent leaders to develop actionable recommendations on economic sustainability of digital information for the science and engineering, cultural heritage, public and private sectors. The Task Force is expected to meet over the next two years and gather testimony from a broad set of thought leaders in preparation for the Task Force's Final Report. [Task force on economic sustainability of digital data-Lavoie [OCLC]]
Looking through the presentations at the JISC digitisation conference, I was interested to read the following prediction from Peter Kaufman [ppt]:
Over the next 13 years:
an iPod or a device its size will be able to hold:
a year’s worth of video (8,760 hours) by 2012 (5 years from now)
all the commercial music ever created by 2015 (8 years), and
all the content ever created (in all media) by 2020 (13 years).
[Peter Kaufman. Online digital video –
educational developments and opportunities. ppt]
We are now in an Internet phase of large informational hubs on the web, massive aggregations of content and services. It is quite a centralized environment. It will be interesting to see how patterns of use change as devices evolve as Peter discusses. At some stage in the near future, I assume, we will be shipping large amounts of content around to people on small devices. We could give people their own library, which synchs up from time to time with various services?
The conference content is 'published' as part of a wider set of materials about the JISC Digitization Program and the Strategic Content Alliance. The use of the blogging environment with material organized with categories is also something we are seeing more of.
Despite its UK focus, this report should make interesting reading more widely. It provides a useful overview of practice and policy across a variety of stakeholders: funders, nationally funded data centers, other policy bodies, local institutions. The environment is interesting given the range of national data centers, a situation which is not replicated everywhere else, and their relationship to research funding.
It is provides some discussion of general issues and it contains an interesting table which helpfully proposes a set of stakeholder roles and suggests associated rights, responsibilities and relationships.
Readers here may be interested in the following statement: "The polarisation of views regarding the role of institutional repositories for data was marked. "
The report emphasizes service and policy fragmentation and notes outstanding technical issues. However, it also shows an environment which is organizationally well-developed enough to be able to have discussions between major players about better coordination in these areas.
The Digital Curation Centre now has a blog. Well worth following for discussion of policy, service and technical issues surrounding data curation and repositories generally. Here is a snippet:
Digital Curation is maintaining and adding value to a trusted body of digital information over the life cycle of scholarly and scientific materials, for current and future use. It is our belief in the DCC that the curation of digital data requires this whole of life approach. Critical decisions on the curation of data are taken before the data are even created, often at the time the associated project is conceived, or funding is sought. This is not least because curation requires resources that must be allowed for within the work plan. It is increasingly clear that for any project involving data of value, you should provide a data management plan within the project proposal (NSF, 2007).
Digital curation includes good management of data for current purposes, and also in many cases the preservation of those data for the long term. Long term preservation is not necessarily an essential part of curation in all cases, although it is usually a desirable aspect (subject to appraisal and selection decisions). So we can think of curation as having two important components, which we can label “data publication”, for the process of making current data available for use by other contemporaries, and “data preservation”, for the process of making those data available for futu