Remind me again why we don't have a national academic/scholarly repository

This is how we do Thanksgiving at my house— does anyone want to come over for dinner?
Thanksgiving

SSRN
On the front page of the Chronicle a few weeks ago was an article about the Social Science Research Network (SSRN) – actually it was about the launch of the Humanities Network, but it’s all part of the same family. I’m not very well versed in the IR world, but it seems like this type of tool is more valuable. Why not support a national repository instead of hundreds of smaller ones? Why not also build in social features so scholars can talk, comment, and network. I hate to be harping on the same old theme, but it seems like too much duplicated effort rather than a cooperative process that benefits everyone. Hmm, maybe I’m a socialist after all?

I’m sure this topic has been covered in-depth elsewhere, so please forgive my ignorance. It just makes sense that scholars should be able go to a central place to share and discover research in their own and related disciplines. However, if we wait long enough I’m sure Google will get to it.

A note to all of you holiday shoppers—buy a Dell or buy a Mac, but don’t buy Hewlett Packard. I tried to save a little money and I’m paying for it by constantly losing data. It's tragic!

Finally, this one goes out to Mr. Bell.

WHAT GETS VIEWED? An exploratory study of large IR collections

In my work circle there has been a lot of talk about growing our institutional repository. There is a big push to add meaningful content. The thing that I always get hung up on though is usage. I’m very interested in what people find useful, and my feeling is that if I’m going to pitch this service to my faculty, then I need to prove to them that the stuff is actually being seen, rather than simply offering them a theoretical argument about why open access is good and big publishers are evil.

So I decided to do a mini study. I wanted to see what the top items viewed were across several universities. I used ROAR to identify DSpace collections in the US, and then sent emails to the libraries with the 10 largest collections. One library never responded, another (MIT) shot me down with ““I'm sorry to report that our staff is unable to provide that data at this time”—but all the others provided me with a list of their top 20 most viewed items. (Thanks!)
I should note that Georgia Tech and U of Oregon were the only organizations in this sample that allowed open access to their statistics.

The results were very eclectic, as expected, however there were definite themes that emerged. For example, the U of Rochester included many musical scores, U of Michigan was heavy with engineering technical reports, Ohio State had numerous articles from The Ohio Journal of Science, U Oregon featured NewBreed Librarian articles as well as classic texts from Shakespeare, Milton and others, while Oregon State included several environmental topics. U Maryland had the most diverse materials and is unquestionably the heaviest used collection within this sample.

Someone should publish a scholarly article about this and perform a detailed synthesis on these collections, but in the mean time, here are the top viewed items from each of the collections:

  • Delivery of DNA and Recombinant Infectious Bursal Disease Virus Vaccines in Ovo (dissertation), 34,768 hits, U of Maryland
  • How Do I Do This in ArcGIS/Manifold?: Illustrating Classic GIS Tasks, 18,636 hits, Cornell
  • Relaxation studies in the muscular discriminations required for touch, agility and expression in pianoforte playing, 8,764 hits, U of Rochester
  • A study of the role of carbon in temper-embrittlement and the effect of temper-embrittlement on the fatigue properties of a 3140 steel, 7,155 hits, U of Michigan
  • Dragonflies Taken in a Week, 6,650 hits, Ohio State
  • Measurement of delignification diversity within kraft pulping (dissertation), 5,517 hits, Georgia Tech (current year only)
  • NewBreed Librarian ; Vol. 2, No. 4, 2,093 hits, U of Oregon
  • Estimating the weight of plywood, 500 hits, Oregon State

There is definitely a lot of long-tail action going on too. Most of the repositories featured one or two heavily used items, but then dropped off drastically.

Umaryland_sample_ir_long_tail_2

Some questions:

  • Why is the U of Maryland IR used so heavily? Their top 3 items blow away everyone else (34,768 hits; 32,916 hits; and 32,214 hits respectively)
  • How are people finding this stuff? Google? Native Searches? Catalog Searches? Direct Links? We need to run an analytics program.
  • How many of these hits are from web crawlers or related software?
  • Why the long-tail? What makes those top few items so popular? And just how long is the tail? Could you say something like 90% of everything in our IR was viewed at least once over the past two years?
  • If you place your IR within your metasearch tool, will it pad your results?
  • Is there a big difference between views and downloads?
  • Why does the DSpace interface still look so mid-1990’s?
  • How are items obtained? Is it piecemeal or more systematic? Are we building collections or is it random take-what-we-can-get?
  • What is the percentage of dissertations? (or, take away dissertations and what have you got left?)
  • What non-text items are collected (mp3, videos, jpg, etc)?
  • Leaving the  big vision rhetoric aside, what is the goal of each IR?
  • How do you measure the success of an IR? Is it volume or downloads or something else?

(If this is your area and you want to work on something together, let me know. I'm devoted to ALA Editions right now, but I'd like to continue this project into 2008.)

About Brian

My Photo

My Online Status

AddThis Social Bookmark Button
Blog powered by TypePad
Member since 05/2006