Showing posts with label Linked_Data. Show all posts
Showing posts with label Linked_Data. Show all posts

Thursday, 29 March 2012

Review of On Location event

ISKO UK and the British Computer Society co-hosted an event all about location data.

The first speaker was Mike Sanderson of 1Spatial, who described using geo-spatial data as helping to power a European knowledge economy. He spoke of the need for auditing and trust of data sources, particularly ways of mitigating poor quality or untrustworthy data. At 1spatial, they do this by comparing as many data sources as they can to establish confidence in the geodata they use. Confidence levels can then be associated with risk, and levels of acceptable risk agreed.

Alex Coley of INSPIRE talked about the UK Location Strategy, which is aiming to make data more interoperable, to encourage sharing, and to improve quality of location knowledge. The Strategy is intended to promote re-use of public sector data, and is based on best practice in linking and sharing to support transparency and accessibility. Historically there has been a lot of isolated working in silos, and the aim is to try to bring all such data together and make it sharable. This should help organisations to cut costs in technology support and reduce unnecessarily duplicated working. Although some organisations need to have very specialised data, there remains much that is common. Location data is frequently present in all sorts of data sets, and can be re-used and repurposed, for example to help understand environmental issues. Location data can be a key to powering interesting mashups - for example someone could link train timetables with weather information, so train companies could offer day trips to resorts most likely to be sunny that day.

The Location Strategy's standardisation of location data is effectively a Linked Data approach, but so far little work has been done to map different location data sets.

Data that is not current is generally less useful than data that is maintained and kept up to date, so data sets that include information about their context and purpose are more useful.

Jo Walsh of EDiNA showcased their map tools. EDiNA is trying to help JISC predict search needs and provide better search services. They are trying to take a Linked Data approach, but there is a need for core common vocabularies. EDiNA runs various projects to create tools to help open up data. Unlock is a text mining tool that helps pick out geo location data from unstructured text. It could be used to add location data as part of digital humanities projects.

One aspect of Linked Data that is often overlooked is that it can "future proof" information resources. If a project, or department, is closed down, its classification schemes and data sets can become unusable, but if stable URIs have been added to classification schemes there is more chance that people in the future will be able to use them.

Matt Bull of the Health Protection Agency explained how geospatial data is useful for health protection, such as tracking infectious diseases, or environmental hazard tracking. Epidemiology. Data is inherently social. Diseases are often linked to the environment - radon gas, social deprivations - and clinics, pharmacies, etc. have locations. This can be used to investigate treatment seeking behaviour as well as patterns of infection, in order to plan resourcing. For example, people often don't use their nearest clinic especially in cities, to seek treatment for sexually transmitted diseases. Such behaviour makes interpretation of data tricky.

Geospatial data is also useful for emergency and disaster planning and monitoring the effects of climate change on the prevalence of infectious diseases.

Stefan Carlyle, of the Environment Agency (EA), talked about the use of geospatial data is in incident management. One example was using geospatial information to model risk of failure of dam and plan an evacuation of the relevant area. Now it is a quick and easy operation that would not have been possible in past without huge effort. Risk assessments of flood defences can now be based on geospatial data and this can help prioritise asset management - e.g. management of flood defences.

Implementing semantic interoperability is a key aim, as is promoting good quality data, and this includes teaching staff to be good "data custodians". Provenance is key to understanding the trustworthiness of data sets.

The EA is focussing on improving semantic interoperability by prioritising key data sets and standardising and linking those, rather than trying to do everything all at once. Transparency is another important aim of the EA's Linked and Open Data strategy and they provide search and browse tools to help people navigate their data sets. Big data and personal data are both becoming increasingly important, with projects to collect "crowd sourced" data providing useful information about local environments.

The EA estimates that its Linked Open Data approach produces about £5 million per year in benefits from reduced duplication of work and other efficiencies such as unifying regional data silos, and from sale of data to commercial organisations. The EA believes its location data will be at the heart of making it an exemplar of pragmatic approach to open Data and transparency.

Carsten Ronsdorf from the Ordnance Survey described various location data standards, how they interact, and how they are used. BS7666 specifies that data quality should be included, so the accuracy of geospatial data is declared. Two key concepts for the OS are the Basic Land and Property Unit and the Unique Property Reference Number. Address data is heavily standardised to provide integration and facilitate practical use.

Nick Turner, also of the OS, then talked about the National Land and Property Gazetteer. The OS was instructed by the government to take over UK address data management because a number of public sector organisations were trying to maintain separate address databases. The OS formed a consortium with them and formed AddressBase.

AddressBase has three levels - basic postal addresses, AddressBase plus which includes addresses of buildings such as churches, temples, and mosques, and other data, and AddressBase premium, which includes details of buildings that no longer exist and buildings that are planned.

AddressBase is widely used as it allows organisations to refer to AddressBase to verify and update or to extract other address information that they need when they need it, rather than having to manage it all by themselves.

Thursday, 14 April 2011

Review of the Public Access to Information event

Public Access to Information? Challenges for Information Gatekeepers was a joint event between ISKO UK and Taxonomies in the Public Sector (TiPS). Michael Warner opened with a short description of TiPS, a discussion and, networking group that seeks to influence government in information issues. The group was established four years ago and welcomes new members.

The Ins and Outs of Information Rights


The first speaker was Christopher Graham, the current UK Information Commissioner. He began by discussing the role of the Information Commissioner’s Office (ICO). It has to enforce such regulations and the Freedom of Information Act (FOIA), the Data Protection Act (DPA) and Environment Information Regulations and Privacy and Electronic Communications Regulations. The ICO provides advice, guidance, monitoring, and promotes best practice and compliance with the law. With a staff of 350, the ICO seeks to be the “authoritative arbiter of information rights” and a model of good regulation.
Practically everybody is a stakeholder for the ICO, from local authorities, to politicians, citizens, and consumers. The commissioner has the rights of a corporation - a huge responsibility - but can only be dismissed by the Queen with the assent of both Houses of Parliament, so it is a good job to have in a recession.
Freedom of Information and Data Protection legislation embody competing rights. The Freedom of Information Act was seen as a bit of a “bolt on” to Data Protection law, but it became clear that they are intertwined and both have to be considered together. Some people have called for the establishment of a “privacy commissioner” to make the case for privacy, but this would just defer the decision point, as someone else would have to take responsibility for deciding on the balance between private rights and public interest.
It is a very exciting time to be involved in information, with controversial issues such as the ethics of the creation of human DNA databases. Linked Open Data is also opening up exciting possibilities. Crime mapping is a classic case of balancing privacy and freedom of information. There is a clearly a strong public interest in crime statistics, but could be detrimental to the rights of people living in high crime areas. Too much anonymising, however, may destroy the usefulness of the data. Ironically, the Big Society could actually end up involving less accountability as information moves into private arenas that do not have the same responsibilities to be open.
Public attitudes towards information security are ambivalent. People like CCTV to protect them, but resent being spied on. They like their data to be secure but are less concerned about the amount of data that organisations collect and store.
Ten years after the introduction of the FOIA, there is still a mixed picture. Organisations should have publication schemes, offer rights of access and processes for handling requests, but there are a growing number of complaints to the ICO over FOIA requests. As the current government’s information and open data agenda becomes more high profile, the public are likely to become more interested.
The FOIA should help to reduce inefficiency as it opens up public sector spending to scrutiny. Breaches of the information act are costly and the ICO monitors organisations to make sure they reply to requests promptly. Answer requests is becoming more difficult for organisations with dwindling budgets. The ICO website is a rich resource of information and support and the ICO tweets as @iconews.

The Checks and Balances of a Transparent Public Sector World of Information


Carol Tullo of the National Archives (NA) discussed the benefits of exploiting and re-exploiting public sector data, while avoiding unacceptable risks. Law, copyright, archives and information science all form part of Carol Tullo’s work. The NA is an information gatekeeper, even if it doesn’t think of itself like that. How do we give people access to public sector information, rather than just allowing people to get their own information? The default position is about proactive release of information and it is a very different world to the one of 20 years ago.
Nobody knows what transparency and accountability mean outside the information world but we use the terms all the time. The NA is trying to explain the concepts. The principles of public data policy have come down to core issues including releasing data under open licensing and open standards in re-usable form. If you can embed metadata standards in a pdf, it is not locked up and the metadata is not easily removed, so authorship provenance etc. is preserved. There are various strands of information management that are not creating standards and tools to publish this data in a sharable reusable form. For example, staff structures and organisational charts of public institutions are of public interest, but are often not kept in reusable sharable formats. The NA has helped developed a tool to standardise organisational charts to help institutions publish these usefully.
The government recognise the value of the data and for ten years there has been and agenda to publish more. It is still slow and there are moves to place obligations on institutions to publish. The NA is encouraging institutions to be proactive about releasing their data. The Open Government Licence launched last September is aimed to encourage this. Some 180 local authorities are now releasing their data under it and the Ministry of Culture in South Korea and the government in British Columbia, Canada, have adopted it. It is the new default licence under the FOIA.
No-one has been sued for using opened up data yet. The hack days and releases to encourage use of the data haven’t caused problems. However, people are worried that they can’t trust the licence that they will somehow get in to trouble for using data they discover on government websites, so more encouragement than a simple link to the licence would help.
The issue of semi-private companies and contractors working for the public sector and how much of their data should be made open also needs thought. Knowing what data is available and how it is structured is also important. There are many inventories in taxonomies, asset registers, etc. that can help.
Structured data, open standards and open formats, are vital. Using standards, schemas and APIs produces really rich metadata that allows sharing. Sheer weight of volume and limited resources can render huge amounts of information inaccessible, without any deliberate cover-up or conspiracy. Digital volumes of data are huge.
The NA helps form legislation to come up with something that is fit for purpose and useful for public sector information workers. Ministers want officials to come up with solutions to problems create by legislation so that people can easily get hold of information to solve their problems effortlessly.
Information managers need to see themselves as gamekeepers, rather than gatekeepers. They should make sure their information stock is healthy, but let it roam free for people to hunt down and use. This is the best way to support growth in an information economy.

What's Wrong with UK Information Law?


Charles Oppenheim gave3 a very entertaining presentation, opening with an anecdote about an early attempt by the Department of Trade and Industry to encourage publication and re-use of government data. The DTI published a list of names and phone numbers of civil servants to contact to ask for information. However, the first civil servant who was contacted immediately demanded where the caller had found his name, then declared that his name and number were official secrets and slammed the phone down. The situation has transformed since then.
There are many breaches of UK and EU information legislation. Personal data should not be transferred outside the European Economic Area unless there is accurate legal protection - such as rights to see and correct information. The USA fails the tests, so no-one can legally transfer data to the USA without some protections. Some safe harbours are declared for companies that are deemed compliant with DPA principles. However, the USA’s PATRIOT Act demands rights for the authorities to see and access all sorts of data for anti-terrorism and security purposes. The owner of the data may not be told that the data is being inspected. Lockheed Martin is handling 2011 census data, but the company is subject to the PATRIOT Act. Despite the statement at the start of the census, a large number of bodies have access to freshly collected census data.
The DPA doesn’t address cloud computing and may be out of date. Computers might be in all sorts of places such as the USA that don’t have adequate data and privacy rights protection. The cloud computing suppliers resist any attempt to abide by the rules. They don’t put compliance in the contract. If you fail to impose safeguards on your cloud computing supplier, you are in breach of the DPA.
People have the right to sue if there has been breach of the act, but you can only sue for distress if the data has appeared in the media. So, if a company sends you threatening letters because of incorrect information, you have no redress for the distress this may cause. It is a criminal offence to unlawfully obtain personal data, and reckless loss if an offence but reckless disregard of other data protection principles is not an offence. In any case, the expense of court cases means that most people could not afford to sue.
The FOIA was declared by Tony Blair to be the biggest mistake of his prime ministerial career. There are many problems with applying it and a lot of complaints end up at the ICO. Firvolous or arduaous requests can be a problem for organisations. In Norway there is an exemption from the Norwegian freedom of information act if the person making the request is obviously drunk! One good effect is the obligation on public authorities to provide electronic datasets under FOIA in a form that can be conveniently reused.

Meaningful, Linked Local Data


Paul Davidson, Chairman, CTO Public Sector Information Domain Team, talked about Linked Data and data standards. He pointed out that data can become more open the more it is processed as it becomes less personal. He contrasted data created at the operational level with the statistical, analytical and political stages of processing, as a records, such as a personal health care record, is then combined with others to produce statistics, which are then analysed an then used as evidence for policies. connect to improve. He described what needs ot be done to make data sharable, by explaining the semantics – controlled vocabularies and descriptive terms so that the subject is understandable, quality assurances so that people know how reliable, accurate and up to date the data is, any relevant rights and consents, and the format it is published in. IT is usually easier to provide such information for public data than for other types of data.
He stressed that public bodies should not require people to have to hunt around their websites and take the data in a single format, but should allow people to get to the raw data and take it away to use in their own applications. Standards are important, but it is better to publish the data in any format rather than keep it locked up while standards are selected.
However, meaningful data is much more than just bland lists. It is very hard to assess value without context –for example the figure for the total expenditure on hotel bills isn’t helpful unless you know whether it was a few people in an expensive hotel or lots in cheap ones. Knowing spending on roads by council is not useful unless you know the lengths of roads in each council area, so you can make a per mile comparison.
Ontologies, URIs, reference lists, and data registries all need to be managed to support Linked Data. Aggreagtors are also needed so that data sets can be brought together and queried easily and it is not clear whether the public sector should be providing such services or leaving such provision to the private sector to develop. Finally, end users who want the data are needed as well.

The afternoon ended with a lively panel session with all the speakers, followed by drinks and networking.

Short biographies of the speakers are available on the ISKO UK website.
This afternoon meeting is organized in co-operation with the UCL Department for Information Studies .