Tags

Stanford researchers collected data from del.icio.us and come to some pretty interesting conclusions about tagging. Of course, they are talking about tagging of web pages where the text of the tagged item is available for indexing.

Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the largest dataset from a social bookmarking site yet analyzed by academic researchers. Our dataset represents about forty million bookmarks from the social bookmarking site del.icio.us. We contribute a characterization of posts to del.icio.us: how many bookmarks exist (about 115 million), how fast is it growing, and how active are the URLs being posted about (quite active). We also contribute a characterization of tags used by bookmarkers. We found that certain tags tend to gravitate towards certain domains, and vice versa. We also found that tags occur in over 50 percent of the pages that they annotate, and in only 20 percent of cases do they not occur in the page text, backlink page text, or forward link page text of the pages they annotate. We conclude that social bookmarking can provide search data not currently provided by other sources, though it may currently lack the size and distribution of tags necessary to make a significant impact. [Heymann, Paul; Koutrika, Georgia; Garcia-Molina, Hector: Can Social Bookmarking Improve Web Search?]

In general they found that users thought that tags were objective and relevant. They highlight results throughout the paper. I thought the conclusion they drew from this result quite interesting:

Result 11: Domains are often highly correlated with particular tags and vice versa.



Conclusion: It may be more efficient to train librarians to label domains than to ask users to tag pages.

Comments: 0

Mar 30, 2008
j trant

Lorcan,

These findings are in line with a number of other studies about the utility of folksonomy, including those of Margaret Kipp. She looked at user tags and professionally created metadata, and found classes of tags -- such as those that referred to methodology -- that weren't in librarian-created metadata. These were distinct from the tags about managing one's own method, such as @toread. In the steve.museum project, we're also finding real differences between curator-supplied descriptions of works of art and the tags that users supply.

jennifer

Two of Margaret's articles:

Kipp, M. E. (2007). Tagging Practices on Research Oriented Social Bookmarking Sites. Canadian Association for Information Science, Montreal, Quebec, Canada Retrieved January 31, 2008 from http://dlist.sir.arizona.edu/2027/01/kipp%5F2007.pdf.

Kipp, M. E. (2006). Complementary or Discrete Contexts in Online Indexing: A Comparison of User, Creator and Intermediary Keywords. Canadian Association for Information Science, Toronto, Ontario, Canada Retrieved January 31, 2008 from http://dlist.sir.arizona.edu/1533/01/mkipp-caispaper.pdf.

Apr 01, 2008
Jeff

The authors find that "tags occur in over 50 percent of the pages that they annotate, and in only 20 percent of cases do they not occur in the page text, backlink page text, or forward link page text of the pages they annotate."
Only 20%? Given that back link and forward link pages have been considered to account for granularity shift (e.g. food-fruit-bananas), I would have thought that 20% is actually pretty high, suggesting that tagging is a subjective process open to idiosyncrasies.