Friday, 27 February 2015

Thesaurus Debate needs to move on



Surprise, surprise - last Thursday's debate on this proposition was a pushover for the opposition. To defeat any argument of the form “XXX has no place in YYY”, all you have to provide is one counter-example.
Just for starters:
  •  The UK Data Archive, powered by the HASSET thesaurus
  • The FAO’s AGRIS database, searchable using AGROVOC, and
  •  EUROVOC, used for searching publications of the EU institutions and others

were among 11 such examples that Leonard Will managed to cram on to one slide. He could have gone on to cite dozens more cases where a thesaurus provides sophisticated and indispensable search capabilities.
The “expert witness” Philip Carlisle backed him up by describing the nine vocabularies and related services that English Heritage built and maintains for the heritage community. Contributions from the floor drew attention to the power of a thesaurus to cross language boundaries, not to mention image searching, where indexing with a controlled vocabulary still outperforms all the other methods.  
But simply overthrowing the proposition misses the point – the role of the thesaurus in modern Information retrieval has shrunk from what it once was. The high development and maintenance costs of an extensive controlled vocabulary deter most potential implementers. Most users simply do not want to know about such a complicated-looking beast, and so the shy thesaurus needs to perform discreetly but cost-effectively behind the scenes. Given a discerning team of developers, curators, IT support staff and indexers, this sophisticated tool can and should function interoperably alongside statistical algorithms, NLP techniques, data mining, clustering, latent semantic indexing. linked data, etc. Networking and collaboration, not rivalry, are the future.
As the professional body that has grown up around classification, indexing, use of thesauri and other knowledge organization systems, ISKO has a mandate to mark out that future. Follow-up activities could usefully explore:
  •           The contexts in which the thesaurus is or is not a useful tool;
  •           how to choose between a thesaurus and another type of knowledge organization system;
  •           how to integrate a thesaurus with the other components of a modern information retrieval system;
  •           how to adapt a standard thesaurus to the needs of special contexts;
  •           features of the software needed for thesaurus management.


The knowledge organizer with a grasp of these topics is ideally placed to develop the hybrid vocabulary structures (e.g. a layer of thesaurus model hooked on to upper level ontologies and coated with taxonomy features) needed in today’s networked environments.

Tuesday, 3 February 2015

Call for Papers - International UDC Seminar 2015: CLASSIFICATION AND AUTHORITY CONTROL: Expanding Resource Discovery

DATE: 29-30 October 2015
VENUE: National Library of Portugal
Campo Grande 83
Lisbon, Portugal
WEB: http://seminar.udcc.org/2015/
CONTACT: seminar2015@udcc.org

Linked data practices and techniques have opened new possibilities in exploiting controlled vocabularies and improving resource discovery. Authority data held in library systems, including classification schemes find new ways of expanding its potential as shared knowledge structures across the linked data environment.

The objective of this conference is to explore such a potential, expanding the value and use of classification as authority controlled vocabulary, from the local perspective to the global environment.

We invite experts in authority control, classification schemes and linked data to provide overviews, illustrations and analysis of classification data management and exploitation. Contributions are welcome on high quality, innovative research and practice on the following topics:

•    Classification as a component of subject authority control
•    Classification authority data formats and modeling
•    Classification and multilingual subject access
•    Sharing classification data from authority files
•    Classification data in the open linked data context

CONTRIBUTIONS:

Two kinds of contributions are invited: conference papers and posters. Authors should submit a paper proposal in the form of an extended abstract (1000-1200 words, including references, for papers; and 500-600 words for posters). The submission form is provided on the conference website.

Proposals will be reviewed by the Programme Committee consisting of an international panel of experts. Each submission will undergo a blind review by at least three reviewers.

The Conference proceedings will be published by Ergon Verlag and will be distributed at the conference.

IMPORTANT DATES

    28 February 2015    Paper proposal submission deadline
    23 March 2015    Notification of acceptance & paper submission instructions
    15 May 2015 Papers submission (camera ready copy)

ORGANIZER: Classification & Authority Control: Expanding Resource Discovery is the fifth biennial conference in a series of International UDC Seminars organized by the UDC Consortium (UDCC). UDCC is a not-for-profit organization, based in The Hague, established to maintain and distribute the Universal Decimal Classification and to support its use and development. UDC is one of the most widely used knowledge organization systems in the bibliographic domain.

Monday, 22 September 2014

Sharing expertise in support of Networked Knowledge Organization Systems

“Small is beautiful” said Ernst Schumacher in 1973. Despite the subsequent trends towards globalization, his note still strikes a chord, echoing strongly at the 13th NKOS Workshop held in London on 11-12 September. Addressing only 20 participants, each of the 9 speakers got the space and audience support to expose real issues arising from their current R & D projects, including practical obstacles such as the weaknesses of tools for handling KOS management and exploitation.
Ceri Binding, for example, had investigated six different products to help with establishing mappings between vocabularies, without finding one that was fully satisfactory. The size of the LOD (Linked Open Data) cloud always impresses - 1048 datasets, 302 vocabularies, and the numbers grow all the time – but problems have been reported with at least 58% of the datasets, such as “503: unavailable” or “404: not found”, and Ceri observed big variations in the quality of the links. Successful Linked Data projects with sustained value for users are plainly less common than our wishful thinking supposes.
All the speakers were clear and straightforward. As usual at the NKOS Workshops, I was very impressed at the amount of knowledge and expertise assembled in the room. If only there was some way of sharing that accumulated experience with all those who struggle in isolation to handle thesaurus or taxonomy development!
The intimate atmosphere made it realistic to engage everyone in discussion. I took the opportunity to report on outcomes from a workshop held jointly by ISKO UK, DCMI and BCS IRSG on 23 June, on “Vocabularies and the Potential for Linkage”, and asked again what could be done about the need for more tools and training. Animated conversation followed, overflowing into the pub afterwards and continuing through to the concluding session next day.
It was hard to find practical, feasible solutions, partly because the community engaged in vocabulary mapping is quite small. Not only that, the term “vocabulary” has different meanings for different groups. The DCMI definition, for example, includes datasets and metadata schemas as well as controlled vocabularies used for subject indexing. This leads to a very wide range of skills and specializations. This diversity of topics and the distance that separates us from co-workers makes it unrealistic to set up affordable training days.
In the NKOS community, at least we can focus on KOSs (Knowledge Organization Systems) as the type of vocabulary to be linked (… though that still includes subject heading schemes, classification schemes, name authority lists and many taxonomies as well as thesauri.) That’s a useful focus for ISKO members too. If we can’t organize formal training sessions, at least we can make use of wiki space and perhaps other social media. In a wiki we could assemble all we know about tools and techniques – or at least pointers to where that knowledge can be found. Workshop participants left resolving to work together by email to make this happen. I shall report on progress via the ISKO-L and ISKO–UK lists.
Finally, remember - this workshop was small but beautiful. See the programme and all presentations at <https://at-web1.comp.glam.ac.uk/pages/research/hypermedia/nkos/nkos2014/programme.html>.
Stella Dextre Clarke
Chair, ISKO UK

Monday, 30 June 2014

Digital Asset Management Europe 2014 (DAMEU)

I attended this conference on 26th-27th June in London on behalf of ISKO UK, in place of Stella Dextre Clarke, who was unable to attend. The full programme of the meeting, with abstracts of papers is given on the conference web site.

The scope of the concept that is labelled by the term “digital assets” was not defined, but it became clear during the conference that the main emphasis was on pictures, video and audio, prepared for marketing purposes in big organizations. These may amount to hundreds of thousands of items, including, for example, all the images of products included in printed and on-line catalogues, advertising images, sales and training videos, news items, and pictures of people. Marks and Spencer receive 1500 such digital assets per day, coming from many agencies who may be in competition with each other, so that the work of each one has not to be visible to the others. DAM systems are also used in non-marketing applications, such as Kew Gardens, which uses them for scientific purposes as well as for public awareness. The collection of North Wales Police includes fingerprints, scene of crime photographs, mug shots of criminals, CCTV recordings, recordings from video cameras worn by police officers and recordings of interviews, all of which are growing at the rate of 2500 new items per day.

DAM systems are basically information storage and retrieval systems, and their underlying functionality is similar to that of such systems used in other applications. A blog post by Elizabeth Keathley lists ten features of a DAM system, including the essential requirements of version control and workflow processing, recording where a resource has come from and where and when it was used.

These systems depend on metadata being attached to each item, either at the point of creation or later, and if this can be done automatically so much the better. Technical information may be embedded with a picture in EXIF format, for example, and the location may be recorded automatically if the capture device has GIS capability. Embedded information is sometimes lost when an item is edited or transferred to a different system, and it may be better retained in an associated file of metadata, which can also contain fields which are not supported by the embedded format.

Several speakers mentioned the importance of having a “librarian” as part of the team, to manage the metadata and to help create and maintain controlled and structured vocabularies. Very little was said about the nature of these, however, and from the demonstration systems that were exhibited I got the impression that indexing vocabularies were often ad hoc creations without being based on sound principles such as those understood by ISKO.

Many of the presentations were around the problems of procuring a DAM and introducing it into an organization, convincing management that it was needed and convincing potential users to use it. The advantages of a coherent centralised system were pointed out, but several speakers said that they found about forty different digital asset management systems in use in their organizations, and it was not easy to persuade people to give up their personal or departmental systems and transfer their data to a company-wide scheme. As usual, if one department could be convinced and realise the benefits, they could act as a “champion” to enthuse others. It was essential that the end users should take a full part in the procurement process, so that they could feel that the system was the one that best met their needs and that they “owned” it and participated in its development.

Several speakers said how important it was to engage a consultant to help with the procurement process, who could suggest a realistic list of suppliers from the large number of systems available, and work with users to define and prioritise their requirements. In such a fast-changing field, it was essential to choose suppliers in whom the client had confidence that they could work with, and in one case the client and the supplier were asked to share “road maps” of how they saw their systems developing over the next five years, to check that they were in step. One consultant whose advice was mentioned as valuable by two of the speakers was Mark Davey, president and founder of the DAM Foundation. His presentation “DAM will morph into knowledge based platforms” was the only one to acknowledge that wider developments in KOS were applicable to DAM systems, mentioning schema.org and linked open data, for example.

There was a small exhibition of DAM software, including the open source system ResourceSpace. (Other open source packages are listed at Review of available open source DAM software). One of these might be appropriate for ISKO UK’s increasing collection of presentations and recordings from its past meetings.

Thursday, 9 January 2014

Metadata Intersections: Bridging the Archipelago of Cultural Memory. Call for Participation.

The International Conference and Annual Meeting of DCMI, 8-11 October 2014 (DC-2014) requests submission of papers on the Conference theme:

Metadata is fundamental in enabling ubiquitous access to cultural and scientific resources through galleries, libraries, archives and museums (GLAM). While fundamental, GLAM traditions in documentation and organization lead to significant differences in both their languages of description and domain practices. And yet, the push is on for "radically open cultural heritage data" that bridges these differences as well as those across the humanities and the sciences. DC-2014 will explore the role of metadata in spanning the archipelago of siloed cultural memory in an emerging context of linked access to data repositories as well as repositories of cultural artifacts.

For further information, see the Conference website.

Open Access Metadata and Indicators

With the advent of Open Access initiatives, the need has arisen to annotate discrete works to indicate the conditions under which they may be accessed and/or re-used. In January 2013, the NISO  Open Access Metadata and Indicators Working Group was charged with developing protocols and mechanisms for transmitting the access status of scholarly works, specifically to indicate whether a specific work is openly accessible (i.e., free-to-read by any user who can get to the work over the internet) and what re-use rights might be available.

NISO is currently seeking comments on the draft recommended practice Open Access Metadata and Indicators (NISO RP-22-201x).

“Use and re-use rights can be difficult to explain in metadata,” states Ed Pentz, Executive Director, CrossRef, and Co-chair of the NISO Open Access Metadata and Indicators Working Group. “By publishing URIs for applicable licenses and including these URIs in the metadata for the content, more detailed explanations of rights can be made available. The metadata can also be used to express how usage rights change over time or point to different licenses for particular time periods, for example when an embargo applies.”

The draft recommended practice is open for public comment through February 4, 2014. To download the draft or submit online comments, visit the Open Access Metadata and Indicators webpage.

Source:
Email dated 06/01/2014 to DC-GENERAL@JISCMAIL.AC.UK from:
Cynthia Hodgson
Technical Editor / Consultant
National Information Standards Organization
chodgson@niso.org

Saturday, 28 December 2013

99% Smiles, 1% Sweat, 100% Cotton

At our last two conferences, we have supplied delegate bags made from pure, eco-friendly cotton. These were manufactured by a social enterprise called Vandanamu, whose factory is located near Pondicherry on the coast of southern India. The bags are of very high quality and good value for money. They can be printed with a logo - or logos - of your choice and are available in a range of sizes.

Vandanamu was set up in response to the devastating Boxing Day tsunami which hit the whole region in 2004, with a view to providing a livelihood for some of those hit hardest by the disaster. Last year, the enterprise started a crowdfunding campaign to raise money for solar panels, which would have cut their electricity costs significantly and would have made their business far less vulnerable to rising energy costs. Unfortunately, they raised insufficient donations to qualify for the funding.

Nevertheless, Vandanamu continue to consolidate their enterprise by working towards gaining Fair Trade and environmental certifications, allowing them eventually to be featured in Ethical suppliers' databases world-wide.

For more information on a venture well worth supporting, view their video on YouTube.