Monday, 28 May 2012

I think therefore I classify - next ISKO UK event - July 16

The next ISKO UK event (a joint event with the BCS Information Retrieval Specialist Group) will be on Monday July 16th, in London. It is a one-day seminar on the continuing need for classification, exploring how it is taught and how it is changing to meet the needs of an increasingly online world.

The event offers to the chance to hear from leading speakers talking about the philosophy, teaching, and applications of classification and how researchers, teachers, and practitioners can adapt to meet the new classification challenges posed by the Semantic Web. Participants will be able to join a number of breakout sessions to explore themes in more detail as well as investigate automated classification systems in vendor demonstrations. Lunch is included in the cost (only £25 for members and £60 for non-members) and the day will end with a chance to network over wine and nibbles.

For full programme details, speaker biographies, and booking form see the main event page.

Wednesday, 11 April 2012

Call for nominations: 2012 UKeiG Tony Kent Strix Award

The UKeiG Tony Kent Strix Award is given in recognition of an outstanding practical innovation or achievement in the field of information retrieval. This could take the form of an application or service, or an overall appreciation of past achievements from which significant advances have emanated. The Award is open to individuals or groups from anywhere in the world. The deadline for nominations is Friday 24th August 2012.

Nominations should be for achievement that meets one or more of the following criteria:

· a major and/or sustained contribution to the theoretical or experimental understanding of the information retrieval process;
· development of, or significant improvement in, mechanisms, a product or service for the retrieval of information, either generally or in a specialised field;
· development of, or significant improvement in, easy access to an information service;
· a sustained contribution over a period of years to the field of information retrieval; for example, by running an information service or by contributing at national or international level to organisations active in the field.

Key characteristics that the judges will look for in nominations are innovation, initiative, originality and practicality.


Nominations for the 2012 Award are now invited.

Thursday, 29 March 2012

Review of On Location event

ISKO UK and the British Computer Society co-hosted an event all about location data.

The first speaker was Mike Sanderson of 1Spatial, who described using geo-spatial data as helping to power a European knowledge economy. He spoke of the need for auditing and trust of data sources, particularly ways of mitigating poor quality or untrustworthy data. At 1spatial, they do this by comparing as many data sources as they can to establish confidence in the geodata they use. Confidence levels can then be associated with risk, and levels of acceptable risk agreed.

Alex Coley of INSPIRE talked about the UK Location Strategy, which is aiming to make data more interoperable, to encourage sharing, and to improve quality of location knowledge. The Strategy is intended to promote re-use of public sector data, and is based on best practice in linking and sharing to support transparency and accessibility. Historically there has been a lot of isolated working in silos, and the aim is to try to bring all such data together and make it sharable. This should help organisations to cut costs in technology support and reduce unnecessarily duplicated working. Although some organisations need to have very specialised data, there remains much that is common. Location data is frequently present in all sorts of data sets, and can be re-used and repurposed, for example to help understand environmental issues. Location data can be a key to powering interesting mashups - for example someone could link train timetables with weather information, so train companies could offer day trips to resorts most likely to be sunny that day.

The Location Strategy's standardisation of location data is effectively a Linked Data approach, but so far little work has been done to map different location data sets.

Data that is not current is generally less useful than data that is maintained and kept up to date, so data sets that include information about their context and purpose are more useful.

Jo Walsh of EDiNA showcased their map tools. EDiNA is trying to help JISC predict search needs and provide better search services. They are trying to take a Linked Data approach, but there is a need for core common vocabularies. EDiNA runs various projects to create tools to help open up data. Unlock is a text mining tool that helps pick out geo location data from unstructured text. It could be used to add location data as part of digital humanities projects.

One aspect of Linked Data that is often overlooked is that it can "future proof" information resources. If a project, or department, is closed down, its classification schemes and data sets can become unusable, but if stable URIs have been added to classification schemes there is more chance that people in the future will be able to use them.

Matt Bull of the Health Protection Agency explained how geospatial data is useful for health protection, such as tracking infectious diseases, or environmental hazard tracking. Epidemiology. Data is inherently social. Diseases are often linked to the environment - radon gas, social deprivations - and clinics, pharmacies, etc. have locations. This can be used to investigate treatment seeking behaviour as well as patterns of infection, in order to plan resourcing. For example, people often don't use their nearest clinic especially in cities, to seek treatment for sexually transmitted diseases. Such behaviour makes interpretation of data tricky.

Geospatial data is also useful for emergency and disaster planning and monitoring the effects of climate change on the prevalence of infectious diseases.

Stefan Carlyle, of the Environment Agency (EA), talked about the use of geospatial data is in incident management. One example was using geospatial information to model risk of failure of dam and plan an evacuation of the relevant area. Now it is a quick and easy operation that would not have been possible in past without huge effort. Risk assessments of flood defences can now be based on geospatial data and this can help prioritise asset management - e.g. management of flood defences.

Implementing semantic interoperability is a key aim, as is promoting good quality data, and this includes teaching staff to be good "data custodians". Provenance is key to understanding the trustworthiness of data sets.

The EA is focussing on improving semantic interoperability by prioritising key data sets and standardising and linking those, rather than trying to do everything all at once. Transparency is another important aim of the EA's Linked and Open Data strategy and they provide search and browse tools to help people navigate their data sets. Big data and personal data are both becoming increasingly important, with projects to collect "crowd sourced" data providing useful information about local environments.

The EA estimates that its Linked Open Data approach produces about £5 million per year in benefits from reduced duplication of work and other efficiencies such as unifying regional data silos, and from sale of data to commercial organisations. The EA believes its location data will be at the heart of making it an exemplar of pragmatic approach to open Data and transparency.

Carsten Ronsdorf from the Ordnance Survey described various location data standards, how they interact, and how they are used. BS7666 specifies that data quality should be included, so the accuracy of geospatial data is declared. Two key concepts for the OS are the Basic Land and Property Unit and the Unique Property Reference Number. Address data is heavily standardised to provide integration and facilitate practical use.

Nick Turner, also of the OS, then talked about the National Land and Property Gazetteer. The OS was instructed by the government to take over UK address data management because a number of public sector organisations were trying to maintain separate address databases. The OS formed a consortium with them and formed AddressBase.

AddressBase has three levels - basic postal addresses, AddressBase plus which includes addresses of buildings such as churches, temples, and mosques, and other data, and AddressBase premium, which includes details of buildings that no longer exist and buildings that are planned.

AddressBase is widely used as it allows organisations to refer to AddressBase to verify and update or to extract other address information that they need when they need it, rather than having to manage it all by themselves.

Tuesday, 28 February 2012

ISKO UK event: On Location: organizing and using geospatial information

Thursday 29th March (14.00-18.00)
Wilkes Room - British Computer Society London Office

In this ISKO UK and BCS joint meeting, we will hear from experts about the current geospatial information landscape and its challenges, some of the standards and frameworks that have been put into place to ensure interoperability and the potential for linking data. We will also hear how some users of GIS systems have applied them in their own organizations.

The event is free to ISKO and BCS members and to full-time students. The fee for non-members is just £40, payable in advance. Registration opens at 1.45, immediately following the ISKO UK AGM, and we shall start promptly at 2 p.m. The programme will be followed by a chance to network, with wine and nibbles.

For full details and booking go to: http://www.iskouk.org/events/location_march2012.htm

Tuesday, 6 December 2011

Review of 'Innovations in Information Retrieval: Perspectives for Theory and Practice'

Edited by Allen Foster and Pauline Rafferty
2011, London: Facet, 224pp, £44.95,
ISBN 978-1-85604-697-8

When I recently told a fellow librarian I was reviewing a book on information retrieval (IR) she denied that the concept had any relevance to librarianship any more – it’s now (allegedly) all about the fuzzier and friendlier ‘resource discovery’. IR has always been a particular interest of computer-science departments, but this book argues, against my colleague, for its continued wider relevance and validity.

Behind the catch-all title lies a deep vein of historical analysis and a wide range of perspectives on IR in practice. David Bawden asks what happens to browsability and serendipity when most information-seeking acts take place online. Aida Slavic gives an up-to-the-minute report from the overlapping borders of semantics, linked data and classification. Three chapters deal respectively with the retrieval of music information, fiction, and the usefulness of social tagging. The final two chapters investigate searchers’ interaction with information objects, and search engines through the lens of webometrics (i.e. the quantitative aspects of the Web).

There are gems here but there are also obsolete data (studies of AltaVista and HotBot tell us nothing about how Google works today), careless mistakes (a reference to “Julie” Kristeva), irrelevancies (we are told that Yahoo! China has “particularly good coverage of China”), and text that could benefit from greater editorial invention. The extensive research being carried out by the Goliaths of the internet is entirely absent, though Microsoft and Google have large research divisions. But the book makes the case for IR being an expansive area of study, and the academy-centred magpie approach is ideally suited to its defined target audience – master’s-level students in ILS wanting information and inspiration. The volume, in different ways, offers both.

Colin Higgins, Librarian, St Catharine's College Cambridge

Tuesday, 15 November 2011

Metadata for Digital Collections - book review

Review of Metadata for Digital Collections: A How-to-do-it Manual By Stephen J Miller, London:Facet Publishing, 2011. 364 pp
ISBN 978-1-85604-771-5 Price £54.95 Paperback
Metadata – a word that many in the library and informationcommunities found quite frightening a few years ago, and maybe even today! We all knew what metadata was, its simple definition had been drummed into us and we could all remember it – “metadata is data about data”. But beyond that ....there were a few books around, they were mostly highly confusing, verytechnical, and demanded a quite deep understanding of computer programming.
And now, at last, the simple how to do it manual has arrived. Metadata for Digital Collections is a practical guide for practical people. Read this well illustrated book and all will become clear. Were it not pushing towards coffee table size, albeit a paperback, I would say that it was suitable for reading on the bus or train to work.
Good quality metadata is crucial for providing intellectual access to the ever-increasing number of digital collections which are being created by libraries, archives, museums, and other organisations. Without good metadata these digital resources would be under-used because most potential users would not discover their existence. The book, and its companion website, seek to introduce readers to the fundamental concepts and practices, and the author has aimed it at both beginners and experienced practitioners with little formal metadata training. Another advantage of this book is that it does not assume that the reader has previous cataloguing experience.
The book is divided into 11 chapters – Introduction to Metadata for Digital Collections; Introduction to Resource Description and Dublin Core; Resource Identification and Responsibility Elements; Resource Content and Relationship Element; Controlled Vocabularies for Improved Resource Discovery; XML-Encoded Metadata; MODS: The Metadata Object Description Schema; VRA Core: The Visual Resources Association Core Categories; Metadata Interoperability, Share Ability, And Quality; Designing and Documenting a Metadata Scheme; and Metadata, Linked Data and the Semantic Web. These chapters are followed by a very substantial bibliography and thorough and comprehensive index. Each chapter is laid out with an introduction and overview with excellent explanations and examples and finally a summary and appropriate references.
The book is laid out extremely well with good size and clear font and a very pleasing use of white space, often in columns. Some readers might want to use this space for their own annotations, but I find that it prevents the page from becoming overfull with technical explanation and, perhaps, become off-putting to the newcomer to this subject. The book is full of illustrations (all black and white), digital images, screen shots, and tables. The book is written in a pleasing style. It is not hand-holding or patronising but really just provides the information required in a straightforward and perfectly clear style. I have already indicated the likely readership for this book. Without any criticism at all, it is just to say that an individual reader who is not having it bought by their institution may find it somewhat costly, but this is often the penalty of the specialist book. I have nothing but praise for this book, well recommended!
Reviewer: Eric Jukes- Systems Librarian.
Posted by Fran Huckle on behalf of Eric Jukes

Tuesday, 1 November 2011

Library of Congress looks to Replace MARC

A Bibliographic Framework for the Digital Age

The new bibliographic framework project will be focused on the Web environment, Linked Data principles and mechanisms, and the Resource Description Framework (RDF) as a basic data model. The protocols and ideas behind Linked Data are natural exchange mechanisms for the Web that have found substantial resonance even beyond the cultural heritage sector. Likewise, it is expected that the use of RDF and other W3C (World Wide Web Consortium) developments will enable the integration of library data and other cultural heritage data on the Web for more expansive user access to information.

Original source. Thanks to Tom Baker of the DC Architecture list for distributing this news.