Social tools and science

In her report on Open science at webscale, I was interested to see Liz Lyon give the following list of tools used to share their work by researchers.

Currently, researchers are using open science tools such as:
  • Connotea for reference management
  • Mendeley (which applies LastFM principles associated with music selections to journal
  • articles)
  • Friendfeed (for threaded discussion and aggregation)
  • Scivee and YouTube (for sharing experimental methodologies and protocols)
  • SciLink and Nature Networks (for social networking)
  • myExperiment (for sharing workflows)
  • eyeLIMS (an open source Laboratory Information Management System)
  • LabLit.com (about science/laboratory culture in the literature and media)
  • ConceptWeb (from WikiProfessional and includes WikiPeople and WikiProteins)
[Open science at webscale - PDF]

Libraries and e-science

Emerging data-intensive e-science presents many support challenges for institutions, disciplines and national bodies to work through. The role of the academic library in this multiscale world is also an open question. Two recent reports discuss e-science (or 'cyberinfrastructure' or 'e-research') in general terms and repay reading.

Liz Lyon, the Director of UKOLN, and also a principal in the Digital Curation Centre, has focused on this area for several years now and has produced an interesting synthesising report for the JISC: Open science at web-scale: optimising participation and predictive potential: consultative report [Summary; Full report PDF]. An important theme of the report is 'data informatics', defined in this way: "library and information science methodologies which have been applied to research data".

The report is organized around six 'consultation challenges'. The first is 'scale, complexity and predictive potential'. Here is the summary:

Data-intensive science powered by contemporary computational hardware, software and research techniques, enables scientists to perform experiments and calculations at different orders of magnitude of scale and volume: research that was completed in a year can now be repeated in a weekend. Sustained growth in data modelling, complex simulations and visualisations, facilitate interpretation and analysis by humans and machines, leading to the development of predictive science scenarios in a wider range of disciplines. Examples of data intensive science at these extremes of scale, which enable forecasting and predictive assertions, have been described.
Assessments of the accuracy and robustness of predictions are linked to uncertainty quantification, the accuracy of the underlying model, and the integrity of the data. Key questions address community awareness and understanding of the potential implications and impact of (open) data-intensive science at new extremes of scale and complexity, and the service requirements for associated data curation and preservation. [Open science at web-scale: Optimising participation and predictive potential - summary]

To give some flavor of concerns, here are the other challenges: Continuum of openness; citizen science; credentials, incentives and rewards; institutional readiness and response; data informatics capacity and capability. A brief chapter is devoted to each.

The author is positive about the role of libraries and librarians, particularly in the data informatics section. That said, given the absence of routine service and organizational responses the library role is still expressed in very general terms. What it might mean in practice is naturally less well developed.

The other publication is a collection of essays assembled in honor of Jim Gray:

In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized. [The fourth paradigm]

For Gray the first three paradigms are experimental, theoretical, and computational.

We said, "Look, computational science is a third leg." Originally, there was just experimental science, and then there was theoretical science, with Kepler's Laws, Newton's Laws of Motion, Maxwell's equations, and so on. Then, for many problems, the theoretical models grew too complicated to solve analytically, and people had to start simulating. These simulations have carried us through much of the last half of the last millennium. At this point, these simulations are generating a whole lot of data, along with a huge increase in data from the experimental sciences. People now do not actually look through telescopes. Instead, they are "looking" through large-scale, complex instruments which relay data to datacenters, and only then do they look at the information on their computers.
The world of science has changed, and there is no question about this. The new model is for the data to be captured by instruments or generated by simulations before being processed by software and for the resulting information or knowledge to be stored in computers. Scientists only get to look at their data fairly late in this pipeline. The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration [1]. [Jim Gray on escience - PDF.]

The collection of essays is divided into these sections: Earth and environment; Health and wellbeing; Scientific infrastructure; Scholarly communications. And there are opening and concluding sections. The contributions are readable and in the form of short essays rather than research papers. There is a contribution by Cliff Lynch on the changing scholarly record, by Timo Hannay on the impact of the network on the structure of science, and by Herbert Van de Sompel and Carl Lagoze on the enhancement of the scholarly record with actionable structure.

There is no specific contribution on libraries, and it is interesting to note that the directions of much of the occasional mention of libraries is towards network level digital libraries.

It is important for libraries to understand these changes. The reshaping impact of the network on learning and research behaviors is a more important factor for libraries than the direct impact of the network on library processes themselves.

Reputation enhancement redux

I wrote recently about the growing interesting in reputation management on the web.

Reputation management on the web - individual and institutional - has become a more conscious activity for many, as ranking, assessment and other reputational measures are increasingly influenced by network visibility. In particular, it raises for academic institutions an issue that has become a part of many service decisions: what is it appropriate to do locally? What should be sourced externally? And what should be left to others to do? [Reputation enhancement]

This is a wide-ranging issue, pulling together in various ways overlapping issues such as individual and institutional disclosure of research and other outputs; emerging academic social networking practices; formal expertise and research output management; search engine optimization strategies; practices for improving citation, ranking and reputation measures; social reference/bibliography; and so on. I think that we will see some of this activity become more routine in organizational and operational terms over the next few years.

In this context, I was interested to see a presentation on research support by Rachel Cowan and Alex Hardman from the University of Manchester. They focus on reputation and network identity as important parts of overall research management.


The presentation has three strands: developing reputation through a digital identity, keeping on top of the literature, and extending research connections. Of these, the first and third relate broadly to reputation enhancement or management in a web environment.

They ask the audience if a personal Google search does a good job of showcasing their identity and research. (This reminds me of Tony Hirst's comment that our 'home page' in now the first page of Google results.) Then they talk through some of the ways in which people develop digital identities (blogs, twitter, ...). They also review some social networking and other tools of interest in an academic context.

Here is their overview of activities mapped onto services (click to see in situ with ability to enlarge):

researcheridentity.png

QOTD: protocol-based time travel for the web

We are pleased that Herbert Van de Sompel will be talking about Memento, a joint project of Los Alamos National Laboratory and Old Dominion University, at OCLC later this month. We will make a webcast available; see the details here. If you are in Central Ohio, come by ....

Here is a recent paper describing the work:

The Web is ephemeral. Many resources have representa- tions that change over time, and many of those represen- tations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For ex- ample, some content management systems maintain version pages that reflect a frozen prior state of their changing re- sources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol- wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, pre- vents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a signicant discovery challenge for both human and software agents, which typically involves following a mul- titude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a frame- work in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web. [Memento]

Libraries and the long tail: intro

Discussing grades of availability in my last post, I mention an article I wrote a few years ago on libraries and the long tail. Here is how it starts:

Discussions of the long tail that I have seen or heard in the library community strike me as somewhat partial. Much of that discussion is about how libraries contain deep and rich collections, and about how their system-wide aggregation represents a very long tail of scholarly and cultural materials (a system may be at the level of a consortium, or a state, or a country). However, I am not sure that we have absorbed the real relevance of the long tail argument, which is about how well supply and demand are matched in a network environment. It is not enough for materials to be present within the system: they have to be readily accessible ('every reader his or her book', in Ranganathan's terms), potentially interested readers have to be aware of them ('every book its reader'), and the system for matching supply and demand has to be efficient ('save the time of the user'). [Libraries and the long tail: some thoughts about libraries in a network age]

Incidentally, I think Ranganathan's 5 laws [Wikipedia entry] remain relevent in lots of ways to current discussions, as above.

Recently Posted:

  • On the discriminations of availability ... November 02, 2009 – Seamus Heaney famously - and in poetry - complained about being included in an anthology of 'British' poetry. In the course of his poem he invokes Miroslav Holub's 'On the necessity of truth' where a man creates a disturbance in a cinema when he sees a beaver mistakenly called a...
  • Community bibliography November 01, 2009 – I prefer 'crowdsourced' to 'user contributed' but neither works very well for me. In particular 'user contributed' does not seem a good term at all for a variety of reasons. Anyway, I was looking at the new catalogue at Ottawa Public Library powered by Bibliocommons earlier (following a mention by...
  • Research support services October 26, 2009 – I am pleased to note a collaboration between OCLC Research and the Research Information Network in the UK to explore changing research support needs in universities. We tend to focus on how technology changes library practices, but the impact of technology on libraries will be less important in the long...
  • Untangling the library systems environment October 25, 2009 – NISO organized a meeting on library resource management a couple of weeks ago: I notice that the presentations are now available on the web. They make an interesting collection, and I return to them in a moment. I have written about the library systems environment in these pages from time...
  • Community is the new content October 18, 2009 – We are now very used to interacting with resources in a social context. The application of community to content, in terms of discussion, recommendation, reviews, ratings and so on, is evident in many of the services we use, and in some form in most of the major network servies we...
  • Government URIs: a write to reply October 17, 2009 – Tony Hirst alerted me to an interesting document on the structure of URIs in UK government websites. There were two things of immediate interest. The first was the emphasis a Government agency was putting on information architecture in a web environment. Other documents will follow. This is from the introduction:...
  • Discoverability .. a report that's worth a look October 07, 2009 – We are awash in assisted thinking, as I may have remarked. One document that is worth a look is Discoverability produced earlier this year by a team at the University of Minnesota. In October 2008, the Web Services Steering Committee at the University of Minnesota Libraries created the Discoverability exploratory...
  • Reputation enhancement September 29, 2009 – Reputation management on the web - individual and institutional - has become a more conscious activity for many, as ranking, assessment and other reputational measures are increasingly influenced by network visibility. In particular, it raises for academic institutions an issue that has become a part of many service decisions: what...
  • Waking the unread September 23, 2009 – While looking at Google Book Search the other day I was intrigued to discover that parts of my early oeuvre, such as it is, have been digitized from the University of Michigan collections. I was particularly struck by the word clouds. Here is a word cloud from a 1991 work...
  • Metadata sources September 20, 2009 – A while ago, I suggested that it was interesting to think about four sources of metadata in our systems and services: Professional. Produced by staff in support of particular business aims. Think of cataloging, or data produced within the book industry, or A&I data.Crowdsourced. Produced by users of systems.Think of...
  • QOTD: Books and bitumen September 18, 2009 – A very nice reference from the blog of husband and wife writing team Nicci French: I've just finished On Roads: A Hidden History by Joe Moran. It's one of those books you keep stopping in order to tell someone the interesting fact you've just learned. That the Chiswick flyover was...
  • The library website: a unified service? September 13, 2009 – I mentioned the reworked University of Michigan Libraries website a while ago. Although it is still a layer over various other resources, I liked the way that the site aimed to project the library on the web as a unified service not as a set of unrelated opportunities. This extended...
  • Libraries and publishing: a couple of examples September 04, 2009 – As interaction with the book literature, publishing, the role of large print collections, and research and learning behaviors are all changing in a network environment, academic libraries have been looking at their role in the scholarly communication and publishing process. I came across two examples of library activity which prompt...
  • Metadata redux September 04, 2009 – I was asked in a meeting recently to define metadata. This prompts me to adapt some text recycled from All that is solid melts into flows* ... Like most people ;-), I tend to think about metadata as 'schematized assertions about resources': schematized because patterned and machine understandable; assertions because...
  • Muldoon and Colbert September 03, 2009 – I have mentioned Paul Muldoon in these pages several times. I could not resist linking to this appearance of the Northern Irish poet, Princeton faculty member, and poetry editor of the New Yorker, on The Colbert Report. The best bit is at the end. The Colbert ReportMon - Thurs 11:30pm...