“You looked at The Complete Essays by Montaigne; you might also consider The Renaissance in Europe: A Reader
edited by Whitlock.” Most of us are familiar with Amazon’s gently pushy way of
suggesting further purchases. If you’re a music fan, you may have tried
“scrobbling” [1] each song
you listen to into the massive Last.fm database of listener behaviour. In
return for this gift of your data, you get to explore the habits of others who
share some of your tastes, and you get a series of recommendations for other
music you might enjoy.
If it works for retail and leisure, might
this same approach also be applicable for libraries and learning? In
introducing this workshop, organised by the Association for Learning Technology
(ALT) with the JISC-funded TILE Project (Towards Implementation of Library 2.0
and the e-Framework), Ken Chad noted that, amid a rash of “crowd-sourcing”
ventures, the higher education sector has done relatively little so far to
exploit data from learners. Yet valuable data about learner behaviour certainly
exists in library systems. So could we, Ken asked, be approaching a tipping
point where data, methods and technologies combine to revolutionise the
learning potential of library services? Or, as the workshop title put it, are
we sitting on a goldmine?
If we could map the paths that learners
take between sets of resources, we would get new ways of viewing and
understanding the links between the items that make up a library. Of course,
the road well travelled is not always the most efficient or creative way of
learning – and certainly not the most original. But by bringing these patterns
into the open, we could enable a new level of reflection, judgement and
guidance about how to get the most out of a library.
The TILE workshop explored practical
(could we?) and policy (should we?) issues which relate to pursuing some
aspects of this broader approach. It explored the area at the intersection
between library and learning technology, virtual learning environments (VLEs)
and library systems. Navigating it successfully will require careful judgements
that balance the virtues of “top down” disciplines of traditional information
management with the “bottom up” folk wisdom of Web 2.0 techniques.
Scale and ambition
The model that the TILE project has
developed recognises two contrasting perspectives on the library domain: a
narrow definition concerned just with the management of a library’s own assets;
and a wider definition of the total set of processes required to help people
interact with learning “stuff” (content, metadata, reading lists, profiles).
While stressing the strengths of each perspective, David Kay, the TILE Project
Manager, left little doubt that the ambitions he harboured were on a larger
“web-scale” stage.
Along with Joy Palmer (Mimas’s Manager of Library and Archival Services)in a later session,
David cited Lorcan Dempsey’s call for a model of library use suited to the web
environment. “We need to think about library services in the context of the
full web of user experience,” writes Dempsey (2005). The size, and more
particularly the reach, of services like Google and Amazon gives them a
gravitational pull that draws learners towards them. As long as library
resources remain fragmented, they will never exert the same pull. David told us
how California State University’s library holdings now include user-generated
tags. With over a million students spread across the university’s many sites,
these tags constitute a genuine web-scale service. However, he was at pains
to point out, examples such as Amazon’s analysis of users’ clickstreams, show
that you do not require users to generate content
explicitly in order to capture their context; you can get value from the
choices that are implicit in the tracks they leave.
Joy Palmer envisaged a range of personalisation tools and APIs (application
programming interfaces) that might build on library usage data. The Copac
catalogue – funded by JISC, with 54 contributing libraries – that Joy works on
will soon include a “my bibliography” feature, with a feed to allow it to be
shared; additional features might include automated recommendations and tagging
facilities. Again, services like Amazon and Last.fm provide further examples
through the way they enable third parties to build services on top of their
open APIs. Students could conceivably come to share reading lists on their
social network profiles the same way they share music playlists now.
Implementation and examples
David Kay outlined options for building
services on top of learners’ library data: as well as providing APIs; these
include building applications and liberating the data. The workshop touched
on several examples of these.
Mendeley (www.mendeley.com) is a free
social software application for managing and sharing research papers. By
aggregating the data from its users, it is also a Web 2.0 social network for
discovering research trends and connecting to like-minded academics. At present
it does not draw any data from academic institutions, but its growth plans
unquestionably overlap with the wider definition of the library domain.
Calling in by video
link, Dave Pattern, Library Systems Manager at the University of Huddersfield,
told how his university had experimented with mining their borrowing data. They started by generating
borrowing suggestions for students to see if these would be useful. They were
also able to generate keyword suggestions – a potentially very valuable link
between the vocabularies used by librarians and by learners – and also to identify
keywords that people enter most frequently that get zero results (e.g.
“newspapermen”, “ligament”).
On the day of the workshop, Dave Pattern
published usage data for two million transactions on 80,000 library titles,
broken down by year, academic school, and academic course (with relevant UCAS
codes), under a common data licence at library.hud.ac.uk/data/usagedata/.
The rights issues with this data are complex, but Dave was able to side-step
the complexities by using a very open licence, which also served the purpose of allowing the data be distributed, shared and used as
widely as possible. Thus he hopes to provide an open
resource that anyone can use as one of the foundations of further innovation.
Joy Palmer explained how Mimas is
interested in exploring the potential of the Huddersfield work in a national
context with services like Copac.
Presently Copac has data on two million searches per month, and 800,000 user sessions – but this does
not tell them a lot about learner behaviour. While Joy could see a clear need
for deep log analysis of attention data, circulation data provides an instant
snapshot of learner behaviour that could be profoundly useful for system-wide
services such as Copac. As well as supporting developments around adaptive
personalisation such as recommender functions, it could provide rich
opportunities for text-mining and improved search.
Mark van Harmelen from the TILE Project,
described a web-service-based architecture to realise the project’s approach.
The architecture is in two parts: to gather and aggregate anonymised library user
behaviour data from educational institutions, and to search a catalogue that is
enhanced with this data, thus providing searches that are enhanced with
recommended results. This architecture can be seen in Figure 1.

Figure 1: Web-service-based architecture
Mark mentioned promising results emerging
from another JISC-funded project in which he is involved (the EIE Project) for
search performance using the Lucene search engine. He also demonstrated how
library records can be treated as social objects, in the same way as Flickr
treats photographs as social objects that enable communication and discussion
between users, and how those objects can be integrated into a learning
environment.
Questions and open issues
Mark Toole, Director
of Information Services at the University of Stirling,
raised the concern that data samples from different places might not “fit
together” – meaning that any analysis would be comparing apples and pears –
because different institutions organise their modules in very different ways.
Does this mean that their data cannot comfortably coexist in the same set?
Flickr has shown how it is possible to tease out clusters of semantically
different data – distinguishing, for example, “jaguar” the animal, the car and
the Apple operating system – by comparing the contextual tags. So all may not
be lost.
Clearly capturing a learner’s context
adds another dimension to the data. Are you borrowing
this book about Napoleon to study the psychology of dictatorship or French
history? Often there may be a trade-off between collecting extra data and
keeping the collection process – for learners and library staff alike – as
simple and integrated into the workflow as possible.
Library specialists will have to get
to grips with some of the standard problems that affect recommendations based
on “collaborative filtering” of data. One such is the “Harry Potter Effect”:
because people with many different interests and dispositions have read a Harry
Potter book, the automated recommendations have a propensity to tell you,
“People who borrowed books on Renaissance literature also borrowed Harry
Potter.” The statement is statistically correct, but not finely-tuned or
useful. Another is the “cold start problem”, which affects new additions to a
library: as no one has ever borrowed them at this point, they will not be
recommended by the system unless there is some editorial intervention in the
data links.
Joy Palmer reminded us of the challenges
of semantic context and “ontological drift” when user-generated commentary on
contentious subjects becomes too rich to be easily assimilated – for example,
consider the multiple sparring entries relating to the state of Israel on
Wikipedia. She questioned whether the library OPAC (Online Public Access
Catalogue) was too generic a system to support contextually and academically
meaningful personalisation, and this point was carried over to the break-out
discussions about whether users would be motivated to contribute content to
institutional OPACs.
User data always brings with it concerns
about privacy, rights, and ownership of data, together with the relative merits
of opt-in versus opt-out schemes. Where data is in aggregated
and anonymous form, there should in principle be no risk to personal data, but
security procedures need to be monitored carefully to ensure that anonymity
cannot be compromised.
Conclusions
Perhaps unsurprisingly, the workshop
participants were optimistic and eager to roll up their sleeves and experiment
with pilot projects (any sceptics in the library profession probably stayed
away). One comment was that, “The idea is so different from what
we do now that we just have to try it, boldly.” And
there was consensus there is not just one opportunity in this area,
but many.
The application of Web 2.0 ideas to
libraries in education is not a matter of trying to ape the features of
Facebook and MySpace, and it is emphatically not about stopping people from using
existing social networks, blogs or other services. Web 2.0 is not like that; it
is more likely to involve, say, the provision of library profile “widgets” that
learners can embed in their blogs – and their coursework.
The complexity when applying these
ideas in education stems from the need to retain some kinds of value judgement
that do not apply to Amazon’s retailing or Last.fm’s music discovery. Perhaps
analysis will reveal which resources the successful students use and how these
compare with those the poor students use. Does students’ selection of resources
influence their capability, or vice-versa? And many library professionals will
be wary of the herd tendency in basing recommendations on behaviour of other
students. Just because learners do not
follow the official reading lists to the letter does not mean that they should not be encouraged to do so.
Professionals will need to bring all their experience to bear in order to judge
how to moderate, and when to intervene in, the emergent behaviours of learners
as captured by usage data.
But one way or another, through HE
institutions or entrepreneurial social software such as Mendeley, learners will
access a whole new perspective on the universe of resources – including which
of them are used most often, most highly rated and which seem to connect and
work best together. The best way to influence this new approach is to be part
of making it happen.
The
TILE Workshop “Sitting on a gold mine” was organised by the Association for
Learning Technology (ALT) and held at JISC's offices in London on 12 December
2008. Further information is available from www.sero.co.uk/jisc-tile.html
David
Jennings is author of Net, Blogs and Rock’n’Roll: How
Digital Discovery Works and What it Means for Consumers, Creators and Culture (Nicholas Brealey Publishing, 2007). David’s report was assisted by contributions
from Joy Palmer (Manager of Library and Archival Services at Mimas), Phil
Barker (Learning Technology Adviser at Heriot Watt’s Institute for Computer
Based Learning) and Mark van Harmelen.
David Jennings
DJ Alchemi Ltd.
References
Dempsey, L. (2005) The User Interface that
Isn’t. Lorcan Demspey’s Weblog, orweblog.oclc.org/archives/000667.html
[accessed 22.12.2008]