Friday, 29 May 2015

SEMANTiCS 2015 conference

The Semantic Web Company (SWC) has announced the 11th International Conference on Semantic Systems - SEMANTiCS 2015 - to be held in Vienna, September 15-17, 2015. Calls are now open for research papers, industry presentations, posters and demos. More information is available on the conference web site.

Tuesday, 26 May 2015

Macmillan Science and Education Publish RDF Ontologies

Macmillan Science and Education have published their RDF ontologies used for content publishing.

They are sharing these in order to contribute to the wider linked data community and to provide a public reference for their data models.

"This May 2015 release further extends the number and size of our published data models. We've added two more domain models: relations and review-states. We've linked our subjects domain model to the NLM MeSH RDF Linked Data (beta) and provided Bio2RDF links as well. We've also displayed instance data for all of the domain models. On top of that we've grown the number of terms from our Core Ontology by more than 50% – see the bar chart below. And we've improved navigation on the core and domain model pages."

See the site for further information.

Wednesday, 6 May 2015

Standards, Interoperability and Dewey

Next time the Internet comes crashing down about your ears, spare a thought for the value of standards, starting with TCP/IP and HTTP. When you consider how the superhighway relies on precise implementation of an immense jigsaw of protocols and standards, it’s a miracle we ever find anything. But while we can’t get along without them, standards are also a pain. They push you into one-size-fits-all and clip the wings of dizzy free-fliers.

At ISKO-UK’s Great Debate [1] last February, the international standard ISO 25964 [2] collected a lot of flak from some speakers who wanted their thesauri to escape control, and conversely from others who urged greater discipline, in the style of an ontology. So it was refreshing to attend EDUG’s April workshop [3], where the ISO 25964 guidance on mapping received a grateful welcome.

EDUG is the European Dewey User Group, whose membership includes a great many national libraries and major university libraries. Their patrons want unfettered and uncluttered access, not just to the resources held locally, but to all the collections you can reach through the Internet. Given the multiplicity of different thesauri, subject headings and classification schemes used to index and/or classify the original material, and given the shrinking budgets for re-classifying new acquisitions, mappings between the various vocabularies have been seen as part of the solution.

But mappings are a challenge! Between one thesaurus and another, cases of exact equivalence between concepts are the exception not the rule. Between a thesaurus and a classification scheme there’s an additional complication – the precoordination built into most classes. Thesaurus concepts designed for postcoordinate indexing do not easily map to or from classmarks, originally developed for arranging books on shelves.

While mapping has no easy answers, that does not mean we should give up trying. As Grete Seland quoted from Piet Hein:
Problems worthy of attack
Prove their worth by hitting back.

ISO 25964 sets out basic guidelines, starting with thesauri and reaching part of the way towards classification schemes and other types of Knowledge Organization System (KOS). For some years members of EDUG have been drawing upon these guidelines in projects such as MACS [4], Criss-Cross [5], Coli-conc [6], and a project to map the Norwegian thesaurus Humord to Dewey. Some are looking towards Semantic Web applications; others are simply trying to speed up cataloguing of resources already classified or indexed by a different KOS. In the EDUG forum a big concern is to build all the accumulated knowledge guidance into WebDewey [7].

This workshop in Naples focused specifically on developing recommendations for best practice when mapping to the Dewey Decimal Classification System. Standards were greatly in demand.  Speakers pointed out the limitations of both ISO 25964 and SKOS [8] in this context, but the general conclusion was to build on and extend these standards rather than casting them aside. Detailed conclusions of the four working groups are currently in discussion, and should be published on the website [3] by the end of June 2015.

So come back standards, all is forgiven… for the meantime. And as for the teams developing mappings to Dewey, even when supported by standards, wish them fortitude and a jar of paracetamol as they grapple with the intellectual challenges of mapping to a pre-coordinated scheme.



Friday, 27 February 2015

Thesaurus Debate needs to move on

Surprise, surprise - last Thursday's debate on this proposition was a pushover for the opposition. To defeat any argument of the form “XXX has no place in YYY”, all you have to provide is one counter-example.
Just for starters:
  •  The UK Data Archive, powered by the HASSET thesaurus
  • The FAO’s AGRIS database, searchable using AGROVOC, and
  •  EUROVOC, used for searching publications of the EU institutions and others

were among 11 such examples that Leonard Will managed to cram on to one slide. He could have gone on to cite dozens more cases where a thesaurus provides sophisticated and indispensable search capabilities.
The “expert witness” Philip Carlisle backed him up by describing the nine vocabularies and related services that English Heritage built and maintains for the heritage community. Contributions from the floor drew attention to the power of a thesaurus to cross language boundaries, not to mention image searching, where indexing with a controlled vocabulary still outperforms all the other methods.  
But simply overthrowing the proposition misses the point – the role of the thesaurus in modern Information retrieval has shrunk from what it once was. The high development and maintenance costs of an extensive controlled vocabulary deter most potential implementers. Most users simply do not want to know about such a complicated-looking beast, and so the shy thesaurus needs to perform discreetly but cost-effectively behind the scenes. Given a discerning team of developers, curators, IT support staff and indexers, this sophisticated tool can and should function interoperably alongside statistical algorithms, NLP techniques, data mining, clustering, latent semantic indexing. linked data, etc. Networking and collaboration, not rivalry, are the future.
As the professional body that has grown up around classification, indexing, use of thesauri and other knowledge organization systems, ISKO has a mandate to mark out that future. Follow-up activities could usefully explore:
  •           The contexts in which the thesaurus is or is not a useful tool;
  •           how to choose between a thesaurus and another type of knowledge organization system;
  •           how to integrate a thesaurus with the other components of a modern information retrieval system;
  •           how to adapt a standard thesaurus to the needs of special contexts;
  •           features of the software needed for thesaurus management.

The knowledge organizer with a grasp of these topics is ideally placed to develop the hybrid vocabulary structures (e.g. a layer of thesaurus model hooked on to upper level ontologies and coated with taxonomy features) needed in today’s networked environments.

Tuesday, 3 February 2015

Call for Papers - International UDC Seminar 2015: CLASSIFICATION AND AUTHORITY CONTROL: Expanding Resource Discovery

DATE: 29-30 October 2015
VENUE: National Library of Portugal
Campo Grande 83
Lisbon, Portugal

Linked data practices and techniques have opened new possibilities in exploiting controlled vocabularies and improving resource discovery. Authority data held in library systems, including classification schemes find new ways of expanding its potential as shared knowledge structures across the linked data environment.

The objective of this conference is to explore such a potential, expanding the value and use of classification as authority controlled vocabulary, from the local perspective to the global environment.

We invite experts in authority control, classification schemes and linked data to provide overviews, illustrations and analysis of classification data management and exploitation. Contributions are welcome on high quality, innovative research and practice on the following topics:

•    Classification as a component of subject authority control
•    Classification authority data formats and modeling
•    Classification and multilingual subject access
•    Sharing classification data from authority files
•    Classification data in the open linked data context


Two kinds of contributions are invited: conference papers and posters. Authors should submit a paper proposal in the form of an extended abstract (1000-1200 words, including references, for papers; and 500-600 words for posters). The submission form is provided on the conference website.

Proposals will be reviewed by the Programme Committee consisting of an international panel of experts. Each submission will undergo a blind review by at least three reviewers.

The Conference proceedings will be published by Ergon Verlag and will be distributed at the conference.


    28 February 2015    Paper proposal submission deadline
    23 March 2015    Notification of acceptance & paper submission instructions
    15 May 2015 Papers submission (camera ready copy)

ORGANIZER: Classification & Authority Control: Expanding Resource Discovery is the fifth biennial conference in a series of International UDC Seminars organized by the UDC Consortium (UDCC). UDCC is a not-for-profit organization, based in The Hague, established to maintain and distribute the Universal Decimal Classification and to support its use and development. UDC is one of the most widely used knowledge organization systems in the bibliographic domain.

Monday, 22 September 2014

Sharing expertise in support of Networked Knowledge Organization Systems

“Small is beautiful” said Ernst Schumacher in 1973. Despite the subsequent trends towards globalization, his note still strikes a chord, echoing strongly at the 13th NKOS Workshop held in London on 11-12 September. Addressing only 20 participants, each of the 9 speakers got the space and audience support to expose real issues arising from their current R & D projects, including practical obstacles such as the weaknesses of tools for handling KOS management and exploitation.
Ceri Binding, for example, had investigated six different products to help with establishing mappings between vocabularies, without finding one that was fully satisfactory. The size of the LOD (Linked Open Data) cloud always impresses - 1048 datasets, 302 vocabularies, and the numbers grow all the time – but problems have been reported with at least 58% of the datasets, such as “503: unavailable” or “404: not found”, and Ceri observed big variations in the quality of the links. Successful Linked Data projects with sustained value for users are plainly less common than our wishful thinking supposes.
All the speakers were clear and straightforward. As usual at the NKOS Workshops, I was very impressed at the amount of knowledge and expertise assembled in the room. If only there was some way of sharing that accumulated experience with all those who struggle in isolation to handle thesaurus or taxonomy development!
The intimate atmosphere made it realistic to engage everyone in discussion. I took the opportunity to report on outcomes from a workshop held jointly by ISKO UK, DCMI and BCS IRSG on 23 June, on “Vocabularies and the Potential for Linkage”, and asked again what could be done about the need for more tools and training. Animated conversation followed, overflowing into the pub afterwards and continuing through to the concluding session next day.
It was hard to find practical, feasible solutions, partly because the community engaged in vocabulary mapping is quite small. Not only that, the term “vocabulary” has different meanings for different groups. The DCMI definition, for example, includes datasets and metadata schemas as well as controlled vocabularies used for subject indexing. This leads to a very wide range of skills and specializations. This diversity of topics and the distance that separates us from co-workers makes it unrealistic to set up affordable training days.
In the NKOS community, at least we can focus on KOSs (Knowledge Organization Systems) as the type of vocabulary to be linked (… though that still includes subject heading schemes, classification schemes, name authority lists and many taxonomies as well as thesauri.) That’s a useful focus for ISKO members too. If we can’t organize formal training sessions, at least we can make use of wiki space and perhaps other social media. In a wiki we could assemble all we know about tools and techniques – or at least pointers to where that knowledge can be found. Workshop participants left resolving to work together by email to make this happen. I shall report on progress via the ISKO-L and ISKO–UK lists.
Finally, remember - this workshop was small but beautiful. See the programme and all presentations at <>.
Stella Dextre Clarke
Chair, ISKO UK

Monday, 30 June 2014

Digital Asset Management Europe 2014 (DAMEU)

I attended this conference on 26th-27th June in London on behalf of ISKO UK, in place of Stella Dextre Clarke, who was unable to attend. The full programme of the meeting, with abstracts of papers is given on the conference web site.

The scope of the concept that is labelled by the term “digital assets” was not defined, but it became clear during the conference that the main emphasis was on pictures, video and audio, prepared for marketing purposes in big organizations. These may amount to hundreds of thousands of items, including, for example, all the images of products included in printed and on-line catalogues, advertising images, sales and training videos, news items, and pictures of people. Marks and Spencer receive 1500 such digital assets per day, coming from many agencies who may be in competition with each other, so that the work of each one has not to be visible to the others. DAM systems are also used in non-marketing applications, such as Kew Gardens, which uses them for scientific purposes as well as for public awareness. The collection of North Wales Police includes fingerprints, scene of crime photographs, mug shots of criminals, CCTV recordings, recordings from video cameras worn by police officers and recordings of interviews, all of which are growing at the rate of 2500 new items per day.

DAM systems are basically information storage and retrieval systems, and their underlying functionality is similar to that of such systems used in other applications. A blog post by Elizabeth Keathley lists ten features of a DAM system, including the essential requirements of version control and workflow processing, recording where a resource has come from and where and when it was used.

These systems depend on metadata being attached to each item, either at the point of creation or later, and if this can be done automatically so much the better. Technical information may be embedded with a picture in EXIF format, for example, and the location may be recorded automatically if the capture device has GIS capability. Embedded information is sometimes lost when an item is edited or transferred to a different system, and it may be better retained in an associated file of metadata, which can also contain fields which are not supported by the embedded format.

Several speakers mentioned the importance of having a “librarian” as part of the team, to manage the metadata and to help create and maintain controlled and structured vocabularies. Very little was said about the nature of these, however, and from the demonstration systems that were exhibited I got the impression that indexing vocabularies were often ad hoc creations without being based on sound principles such as those understood by ISKO.

Many of the presentations were around the problems of procuring a DAM and introducing it into an organization, convincing management that it was needed and convincing potential users to use it. The advantages of a coherent centralised system were pointed out, but several speakers said that they found about forty different digital asset management systems in use in their organizations, and it was not easy to persuade people to give up their personal or departmental systems and transfer their data to a company-wide scheme. As usual, if one department could be convinced and realise the benefits, they could act as a “champion” to enthuse others. It was essential that the end users should take a full part in the procurement process, so that they could feel that the system was the one that best met their needs and that they “owned” it and participated in its development.

Several speakers said how important it was to engage a consultant to help with the procurement process, who could suggest a realistic list of suppliers from the large number of systems available, and work with users to define and prioritise their requirements. In such a fast-changing field, it was essential to choose suppliers in whom the client had confidence that they could work with, and in one case the client and the supplier were asked to share “road maps” of how they saw their systems developing over the next five years, to check that they were in step. One consultant whose advice was mentioned as valuable by two of the speakers was Mark Davey, president and founder of the DAM Foundation. His presentation “DAM will morph into knowledge based platforms” was the only one to acknowledge that wider developments in KOS were applicable to DAM systems, mentioning and linked open data, for example.

There was a small exhibition of DAM software, including the open source system ResourceSpace. (Other open source packages are listed at Review of available open source DAM software). One of these might be appropriate for ISKO UK’s increasing collection of presentations and recordings from its past meetings.