Showing posts with label semantic web. Show all posts
Showing posts with label semantic web. Show all posts

Wednesday, 15 September 2010

Linked Data Conference - a very successful event

Linked Data: the future of knowledge organization on the Web

By Fran Alexander

ISKO UK events consistently manage to cram in about twice as much content as seems possible given the time. With enough material for at least a two-day conference, no fewer than nine speakers and two poster presenters made for a packed day that provided a pleasing mix of fine technical detail, practical advice, and some context-setting explanations of the evolution of Linked Data.

Keynote address - Government Linked Data: A Tipping Point for the Semantic Web
Professor Nigel Shadbolt gave the keynote address, pointing out that local government data is as useful and interesting as national data. He offered a rundown of the history of the Semantic Web, starting with the classic “layer cake” picture that had been prevalent some years ago, explaining that a lot of the research into Artificial Intelligence (AI) – natural language processing, entity extraction, intelligent reasoning over distributed databases – was very interesting but not particularly pragmatic. Much discussion was devoted to detailed technological issues for a highly specialised community. In the meantime, Linked Data emerged as a simpler, easier approach based on a few founding principles – resources should have a unique derefenceable identifier, be expressed in open standards formats, and be interlinked.
Linked Data is now becoming established and is fractioning out into separate areas, with certain core nodes in various sectors being heavily linked. Although many questions remain about the differences between the “web of documents” and the “web of things”, the release of UK government data in Linked Data format should be seen as a gift for the web community. When trying to get Linked Data principles adopted in organisations, explaining to people the value of the decentralised model of the web is important.
Releasing government Linked Data also shifts responsibility for the use and interpretation of data away from the government to individual users. This can circumvent a lot of bureaucracy. For example, the Department of Transport held a lot of statistics about bicycle accidents, but it was only when this data was released that someone turned it into a map and started providing “safe route” information and various related apps aimed at cyclists. The Treasury was reluctant to release its COINS database, because it felt it was confusingly structured and hard to interpret, but once released people built navigation interfaces for it that are now being used by the Treasury itself. The release of data depended on the adoption of an open licence. The principle is if you publish, the apps will come!
Public data is objective, factual, non-personal – accident rates, student degree numbers, etc. – and can be used to measure public service delivery. This sort of data is a straightforward proposition to release, but private data raises more difficult questions about privacy and trust. How much sharing of personal data is to the citizen’s benefit? To the government’s benefit? Should individuals be responsible for their own data, such as medical records? What role should the government play?
One of a number of Linked Data principles is that public bodies should maintain and publish inventories of their data holdings. It is important that we consider this data seriously, because we are not just assigning URIs to roads, streets, buildings, etc., we are building the digital infrastructure of the nation.

SKOS and Linked Data
Antoine Isaac talked about SKOS and Linked Data, from the perspective of the “web of culture data”. He explained that for many large cultural repositories, converting their classification schemes into ontologies is not practical due to the huge volumes of data involved. However, much rich semantic information can be extracted without the need for a fully formalised ontology system. SKOS enables classification data to be shared in a simple way that permits sharing of thesauruses and grouping of items by concept. It can also be a useful way of expressing annotations to documentation.
It is extensible, so core complex expressions can be included, but it has some basic constraints. For example, only one term can be the preferred term, broader and narrower are assumed to be inverse relationships (which makes it easier to complete a graph), and although this is a limitation in some ways, in other ways it means that classifications can be expressed with minimum semantic commitment. SKOS is not intended to draw inferences beyond what is present in the core data.
It is a web-oriented straightforward way of sharing content and descriptions and permits mapping across repositories (e.g. The MACS Project).
The most interesting applications are the ones that cross-contextualise, so more work needs to be done to mix automatic and manual mapping methods

The Linked Data Journey

Richard Wallis of Talis gave an overview of his Linked Data journey, which began some 40 years ago when cataloguers and librarians managed rich data sets almost entirely manually. He has seen many innovators in the semantic field disappear, but some have persisted. Semantic technology has a reputation for being really wonderful until you add the second user, so it is important to make sure everything you do is scalable in the real world.
The limitations of the web at present are that documents are linked with unqualified links. It is very hard for machines to make any sense of the links without undertaking the vast amounts of work that Google has done, and even then Google connections are only speculative. There are other issues – for example, there are no negative links on the web, so a pressure group can’t link to the websites of organisations they are objecting to, because linking will only serve to boost traffic and enhance the reputation of the very organisations they are opposing.
Linked Data standards represent a very pragmatic approach to the Semantic Web, so do not have to get caught up in science-fiction-like predictions of Artificial Intelligence leading us all into the “hive mind”.
The main difference between ordinary hypertext and Linked Data links is that Linked Data links are qualified. A surprising number of organisations are now entering the Linked Data world – for example Tesco, Walmart, and Best Buy. Linked Data connections can be hidden from the user, so many people don’t realise they are accessing Linked Data applications online.
There is also a growing web of government data that is being used for all sort s of purposes, one such was illustrating the UK’s “innovation hotspots” to encourage people to invest. The BBC has also undertaken a number of Linked Data projects, the most well-know being Wildlife Finder. The New York Time published a lot of data and was then criticised by the community, but responded by making amendments, demonstrating that it makes sense to engage with the wider web community and respond to feedback in order to improve data quality.
When your data is opened up, categorised, and made sharable, all sort of serendipitous connections can be made and exciting new uses discovered. Everyone is experimenting to a certain extent, so it is worth looking out for “fellow travellers” and finding out what others have done.

The Knowledge Hub
Steve Dale approached Linked Data from a very human-focused, user-centric perspective. He posed the questions that can get lost amidst the technical details, such as what exactly is the problem we are trying to solve? Where do I find the information I need to do my job? Which networks or societies should I join?
The web is fragmenting interaction, so that conversations become more granular but also disaggregated. I took this to mean that the Web encourages us to communicate online with lots of specialists about niche areas, but will only exchange a few words or sentences with them, rather than building up a long dialogue with a few people over time. This means that it is hard to forge real connections over social websites. Linked Data can help to aggregate this knowledge and help it build into a core repository for communities of practice, rather than being widely dispersed.
Human intelligence is needed to interpret much data, but a great start would be just to get councils and other organisations to recognise the value of the data they hold. Some even hold data they don’t really know about.
If you use Linked Data to start to build a knowledge hub, you can start to release hidden data and encourage widespread collaboration and communication as others contribute what they think is useful to the hub. You can then also “push” the best or most relevant content to users, tailored and personalised to their selections. Federated search and real-time indexing can keep such a hub vibrant and responsive. Open authentication and open IDs can help smooth pathways for users to encourage them to use the site and services as part of their ordinary working lives, with the minimum of friction.

Afternoon keynote – Linked Data in E-commerce
Professor Martin Hepp talked about the GoodRelations ontology which he has been developing as an ontology to serve online businesses. In 1920 there were only some 5,000 types of goods being traded – the number was so constrained it was possible to publish – presumably profitably - a dictionary of goods listing them all. Now it seems that every product is available in a huge array of varieties – there is even a type of muesli for horses!

This increased specificity makes search a far more complex problem to solve. The effort you need to make to get exactly what you want and to make sure it will do has increased hugely. You cannot just buy a nail, you have to buy a highly specialised electronic accessory.

The advent of the Internet was a huge boon to reducing this massive search effort. Individuals perform hundreds of Google searches every day. However, much of the business world runs on highly structured data which becomes unstructured when consumed by Google. Preserving the structure would contain as much – possibly more – useful information as preserving the links.
In order to make the most of the structure, it needs to be expressed in a standardised format and attention needs to be paid to getting the schema right. A schema that can’t be reused means data that can’t be reused, and in the rapidly changing world of e-commerce, data needs to be as up to date as possible. The Good Relations Ontology is aimed at providing a standard structure for expressing key e-commerce information.

Although Tim Berners-Lee urged everyone just to get their data out on to the web, putting some effort into rendering it in a reusable structure and form can make a huge difference to reuse rates and save much time in rationalising and standardising later. A balance needs to be struck between the level of detail and the time taken to populate data fields and how they can be processed. For example, separating house numbers from street names can make processing easier, but can be slower for customers to fill in forms.
Following good principles in ontology construction will also help your data be picked up and reused. You may need to mix structured and unstructured data, just putting the unstructured data in the best place you can find in an ontology designed with more structure in mind. It can work well to compliment an ontology with a mechanism that provides vocabualry for structure if you have it, but allows you to attached unstructured data to higher-level node if it is difficult to categorise it finely.
There are a number of known pitfalls in definitions – for example it is important not to confuse a product with an offer (otherwise your product will be on special offer all the time!) and a store is not a business entity – Tesco the retailer is not the same as any one individual Tesco store.
Many people in business have been put off the Semantic Web by the artificial intelligence researchers who make it all sound like something from science fiction, but if you can show direct short-term financial gains for businesses – such as improvements in search engine results, clickthrough rates, and unified marketing – you are more likely to get buy-in.

Linked Data: the Long and Winding Road
Andy Powell described the history of the Dublin Core Metadata Initiative. He proposed that if Linked Data is the future, then RDF must be the future of the web.
Dublin Core was originally 12, then 15 metadata elements – which now would be called properties – that can be used to describe web resources. It took a librarian-centric, document focused approach to resource discovery on line. However, the metatag element was widely ignored by search engines and it rapidly became apparent that the whole web could not be categorised. As a method for transferring records and tracking provenance, it set the stage and had some benefits. It deliberately used broad semantics and flat-world modelling (“fuzzy buckets”), but also avoided thinking too much about issues – such as how to express the relationship between an image and a representation of an image or how an artist’s name could also be an attribute of a person – that became more pressing. Many people found it very difficult to grasp the difference between a thing in the world and a string (of characters) held as a representation of that string and there was comparatively little abstraction of the model for any underlying syntax. However, some of the thinking could be transferred to an RDF world, potentially with the benefit of avoiding the same mistakes.
Current problems include promoting and open world view, promoting the view that everyone and no-one can be an expert, and the “strings and things” issue, which now relates to the difference between a resource and a web page representing that resource. One of the biggest challenges remains the need to get agreement on standardisation and for any model to gain a certain critical mass to give it traction within a community.

Linking to Geographic Data
John Goodwin from the Ordnance Survey, who has been involved in the Semantic Web for 10 years, explained some of the unique challenges of geographical data. The problems of common definitions embedded deep in the data were noted. One example was that subtly different definitions of houses-in-multiple-occupancy had been used by different government organisations, so that their data could not be usefully compared. Place names and boundaries change over time and people often call places by unofficial names. Names that no longer exist as official boundaries persist in common parlance. This can cause particular problems for the emergency services, as they need to make sure they go to the place the caller meant by the name. An example given was of children using the name for one park to mean a different one.
Geographic data is however a very useful route in to many other applications and can provide interesting and informative visualisations. By using geographic hierarchies, you can draw inferences to aggregate data up to broader levels – from county to region level, for example.
Many people think that RDF is difficult and relational databases are difficult, but John felt that for him certainly it was the other way around.
Much work still needs to be done on spatial predicate standardisation in RDF and Oracle and the OGC are working on this, as spatial descriptors are not yet as well standardised as temporal descriptors.

PoolParty: SKOS Thesaurus Management utilizing Linked Data


Andreas Blumauer described the Pool Party SKOS editing tool. He felt that SKOS could be the way to introduce web 2.0 mechanisms directly into the web of data. SKOS enables virtually any user to join in with their own knowledge organisation systems and Pool Party seeks to support knowledge organisation + network effects + collaboration + ontology evolution.
He stressed the significance of using Linked Data principles within the firewall of an organisation, with benefits such as improved collaboration and sharing that can still be useful without having to release data onto the public Web.
The Semantic Web can improve every aspect of information retrieval, and if people move from free tagging with single words – which are fraught with problems such as ambiguity – and move to concept tagging with URIs, resource descriptions will become far more valuable and useful.

Porting terminologies to the Semantic Web

Bernard Vatant of Mondeca talked about porting terminologies to the Semantic Web. Much of his work is done within Intranets for companies that need to make use of large external vocabularies. He explained the model underlying the new management system for EUROVOC which is his latest project. This vocabulary presents itself as a thesaurus, but with extensions of expressivity at the terminological level. Bernard emphasises the importance of semiotic approach to terminology in the Semantic Web framework especially relevant in the multilingual context (evident in e.g. lexvo.org initiative). He proposed a semiotic view of terminology to be – every sign is a thing (signs are terms; resources are business objects) and reminded of semiotic triangle of terms, concepts, and objects (Saussure: sign - signfiant - singifié). He pointed out that shallow ontologies can be very effective when more complexity isn’t needed.

Panel and networking
A full and interesting day was ended with a lively panel discussion , ranging across many topics and producing gems like “data is the new raw material” and “in a data-rich world, the scarce commodity is attention”, ad as always an excellent drinks and networking reception to finish.

(Recordings and presentations files of the entire conference are promised in the following weeks)

Wednesday, 17 September 2008

IVOA recommending SKOS

International Virtual Observatory Alliance (IVOA) has published a proposal of recommendation entitled " Vocabularies in the Virtual Observatory" for public review:

A few interesting excerpts from the document explaining the context and the rational:

"Astronomical information of relevance to the Virtual Observatory (VO) is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things such as position on the sky, brightness in some units, times measured in some frame, redshifts, classifications or other similar quantities are easily manipulated and stored in VOTables and can currently be identified using IVOA Unified Content Descriptors (UCDs). However, astrophysical concepts and quantities use a wide variety of names, identifications, classifications and associations, most of which cannot be described or labelled via UCDs.

There are a number of basic forms of organised semantic knowledge of potential use to the VO. Informal “folksonomies” are at one extreme, and are a very lightly coordinated collection of labels chosen by users. A slightly more formal structure is a “vocabulary”, where the label is drawn from a predefined set of definitions which can include relationships to other labels; vocabularies are primarily associated with searching and browsing tasks. At the other extreme are “ontologies”, where the domain is formally captured in a set of logical classes, typically related in a subclass hierarchy. More formal definitions are presented later in this document.

An astronomical ontology is necessary if we are to have a computer (appear to) “understand” something of the domain. There has been some progress towards creating an ontology of astronomical object types to meet this need. However there are distinct use cases for letting human users find resources of interest through search and navigation of the information space..."

"As the astronomical information processed within the Virtual Observatory becomes more complex, there is an increasing need for a more formal means of identifying quantities, concepts, and processes not confined to things easily placed in a FITS image (Flexible Image Transport System), or expressed in a catalogue or a table. We propose that the IVOA adopt a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical community access, use, and combine them. The use of current, open standards ensures that VO applications will be able to tap into resources of the growing semantic web. Several examples of useful astronomical vocabularies are provided, including work on a common IVOA thesaurus intended to provide a semantic common base for VO applications."

Thursday, 11 September 2008

Call for Comments: SKOS Simple Knowledge Organization System Reference; SKOS Primer

The W3C Semantic Web Deployment Working Group is pleased to announce the publication of a Last Call Working Draft for the Simple Knowledge Organisation System Reference (SKOS): http://www.w3.org/TR/2008/WD-skos-reference-20080829/

Our Working Group has made its best effort to address all comments received to date, and we seek confirmation that the comments have been addressed to the satisfaction of the community, allowing us to move forward to W3C Candidate Recommendation following the Last Call process.

The Working Group solicits review and feedback on this draft specification. In particular, the Working Group would be keen to hear comments regarding any features identified at risk, and from those implementing (among others):


    * Editors: editors that either consume or produce SKOS;
    * Services: vocabulary services that provide access to vocabularies using SKOS;
    * Checkers: applications that check whether the constraints on SKOS vocabularies have been violated.

Comments are requested by 3 October 2008, at which time the Working Group intends to close Last Call. All comments are welcome and should be sent to public-swd-wg@w3.org; please include the text "SKOS comment" in the subject line. All messages received at this address are viewable in a public archive.

The Working Group intends to advance the SKOS Reference to W3C Recommendation after further review and comment. This Last Call Working Draft signals the Working Group's belief that it has met its design objectives for SKOS and has resolved all open issues.

The Working Group has also published an update of the companion SKOS Primer: http://www.w3.org/TR/2008/WD-skos-primer-20080829/

The Working Group expects to revise this Primer while the SKOS Reference is undergoing review and eventually publish the Primer as a Working Group Note. Please see also: http://www.w3.org/TR/2008/WD-skos-reference-20080829/#status http://www.w3.org/TR/2008/WD-skos-primer-20080829/#Status

Alistair Miles, Senior Computing Officer
Image Bioinformatics Research Group Department of Zoology
University of Oxford
Web: http://purl.org/net/aliman

Sean Bechhofer
School of Computer Science,
University of Manchester
Web: http://www.cs.manchester.ac.uk/people/bechhofer

Friday, 4 July 2008

Announcement: Intercultural Knowledge Landscapes, Florence, 11-12 September 2008

A workshop "Intercultural Knowledge Landscapes" organized by Terminology and Intercultural Web Landscapes Working Group will take place on 11-12 September 2008 in Florence (Italy).

Venue: L’Agenzia Nazionale per lo Sviluppo dell’Autonomia Scolastica (ex Indire), via Buonarroti 10 – 50122 Firenze.

Content:
- communication and interculturality;
- web accessibility and multilingual glossaries;
- on line information in the domain of communication/knowledge/information;
- e-book on “Communicate differently”;
- training;

The language of the workshop: Italian and English.

Preliminary Programme:

Thursday, the 11th September
10.00 Registration
10.30 Introduction,
Giovanni Biondi
10.40 Working group: criteria, methods and goals,
Paola Capitani
10.50 Le applicazioni del web 2.0 per l’apprendimento e le biblioteche,
Lucia Bertini
11.00 [Title to be announced]
Daniele Montagnani
11.10 [Title to be announced]
Mario Rotta
11.20 La normativa UNI: partenza e obiettivo del web semantico,
Roberto Ravaglia
11.50 Working groups
14.00 Translation e intercultural landscapes,
Franco Bertaccini
14.20 Interculture, communication and…
Mela Bosch
14.40 Formalization of the terms relation in multilingual thesauri,
Piero Cavaleri
15.00 Artificial intelligence and semantic web,
Salem Badee
15.20 E-learning pills,
Umberto Amicucci
16.00 Working groups
17.00 Plenary session


Friday, the 12th September
10.00 OPAC experience on the public libraries’ user profile, Gianfranco Bettoni
10.20 Subject to be defined,
Daniele Toulouse Cordier
11. 00 [Title to be announced],
Claudio Todeschini
12.00 Working groups
13.00 Plenary session
13.30 Conclusions


For further information contact Paola Capitani (paola.capitani@gmail.com)

Monday, 16 June 2008

ISKO UK Event - Sharing Vocabularies on the Web via SKOS

We would like to invite you to the next ISKO UK event entitled Sharing Vocabularies on the Web via Simple Knowledge Organization System (SKOS) which will take place on 21 July 2008 at University College London.

Predictions for the Semantic Web are heavily dependent on the ability of computers to reason and communicate using controlled vocabularies. SKOS (Simple Knowledge Organization System) development aims to bring forward these capabilities.

SKOS names a family of standards being created to express the semantic structure of controlled vocabularies (thesauri, classifications, subject headings etc.) so that they can be accessed and interpreted by programs and services. As a draft Web standard, SKOS Reference provides a data model that can be used as a vehicle for the development, use and sharing of knowledge organization systems across information sectors and within the Semantic Web framework.

Aware of the growing importance of SKOS, ISKO UK in cooperation with School of Library, Archives and Information Studies at UCL has invited a group of experts to introduce this standard, explain its status, potential and scope. Our speakers are involved in the development and application of SKOS and related standards and are hoping to provoke some interesting discussion.

Members of the W3C Semantic Web Deployment Working Group, Alistair Miles and Antoine Isaac and Bernard Vatant from Mondeca, will explain the role of SKOS in the Semantic Web, the ideas behind SKOS and the way it is intended to function. The convenor of BSI committee IDT/2/2/1 Stella Dextre Clarke and collaborators Leonard Will and Nicolas Cochard will discuss the data model of the recently developed BS 8723 standard known as DD8723-5, focusing on its relationship with SKOS and interoperability issues. Ceri Binding and Douglas Tudhope from University of Glamorgan will present their AHDS-funded Semantic Technologies for Archaeological Resources project, raising issues for practical applications of SKOS and SKOS-based terminology web services.

This event, the third in ISKO UK's KOnnecting KOmmunities series, promises a fascinating glimpse of the future of controlled vocabularies. No one involved or interested in the development, management or implementation of controlled vocabularies can afford to miss it. Book your place on the event's page.

Saturday, 22 December 2007

Semantic Web Technologies

The December 2007 edition of this relatively new e-newsletter has recently been published. In November, I asked if ISKO UK members thought it worth monitoring. I am still undecided. It would be good to have your views.

Saturday, 15 September 2007

Semantic Web Technologies

A new online newsletter has recently been launched entitled Semantic Report. Some extracts from the corresponding web site:
"Each month SemanticReport will bring you news, interviews, analysis, presentations, case studies, white papers, and anything you tell us you are interested in receiving with regard to the broad range of technology falling under the domain of semantic technology."
"Our mission is to bring together information that helps focus on the business aspects of semantic technologies and applications. Simply stated, that means we are concerned about presenting the business application of the technology rather than the more academic information."

Monday, 11 June 2007

Semantic Web - an interview with Tim Berners-Lee

From ZDNet.co.uk (08 Jun 2007):

David Berlind interviews Sir Tim Berners-Lee (video), the Director of the World Wide Web Consortium at the MITX (Massachusetts Innovation and Technology Exchange) Technology Awards held at the Four Seasons Hotel in Boston, Massachusetts. The inventor of the world wide web, Sir Tim Berners-Lee was awarded the organisation's 2007 Lifetime Achievement Award.

See also D. Berlind's item on the ZDNet blog "Web inventor Tim Berners-Lee Unplugged: Semantic Web better than APIs for data access"

Tuesday, 24 April 2007

CFP - Special issue of Library Review

Submissions are sought for a special issue of Library Review on the topic of 'Digital libraries and the Semantic Web: context, applications and
research'.

This special themed issue of Library Review consolidates similarly themed conferences (e.g. the International Conference on Semantic Web and Digital Libraries - ICSD-2007) and aims to demonstrate the relevance and application of Semantic Web technologies to digital libraries, repositories, and the LIS community generally.

Submissions may comprise research papers, evaluation, case studies, and descriptions of innovative projects, theoretical expositions, or reviews.
For further submission details see: http://cdlr.strath.ac.uk/LibraryReview/