data.police.uk provides a complete snapshot of crime, outcome, and stop and search data, as held by the Home Office at a particular point in history.
The actual data is located on S3 under bucket policeuk-data and can be accessed with a URL similar to
https://policeuk-data.s3.amazonaws.com/archive/20yy-mm.zip , (Where yy,mm are year and month that can be replaced accordingly)
The Structure:
All files are organized by YEAR and MONTH.
Each month has a ZIP file with CSV files inside the zip file.
The January 2015 file 2015-01.zip contains data for all months starting from 2010-12 to 2015-01
[hadoop@ip-172-31-24-128 mnt]$ wget https://policeuk-data.s3.amazonaws.com/archive/2015-01.zip [hadoop@ip-172-31-24-128 mnt]$ unzip 2015-01.zip [hadoop@ip-172-31-24-128 mnt]$ ls 2010-12 2011-02 2011-04 2011-06 2011-08 2011-10 2011-12 2012-02 2012-04 2012-06 2012-08 2012-10 2012-12 2013-02 2013-04 2013-06 2013-08 2013-10 2013-12 2014-02 2014-04 2014-06 2014-08 2014-10 2014-12 2015-01.zip 2011-01 2011-03 2011-05 2011-07 2011-09 2011-11 2012-01 2012-03 2012-05 2012-07 2012-09 2012-11 2013-01 2013-03 2013-05 2013-07 2013-09 2013-11 2014-01 2014-03 2014-05 2014-07 2014-09 2014-11 2015-01 [hadoop@ip-172-31-24-128 mnt]$ cd 2011-04 ##Example file-name structure## 2011-04-avon-and-somerset-street.csv 2011-04-cumbria-street.csv 2011-04-gloucestershire-street.csv
Contents of a sample file:
[hadoop@ip-172-31-24-128 2011-04]$ head -6 2011-04-avon-and-somerset-street.csv Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Last outcome category,Context ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,1.04177957823,52.0373951227,On or near The Street,E01029877,Babergh 005A,Other crime,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.49436551493,51.4181694243,On or near Keynsham Road,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.49436551493,51.4181694243,On or near Keynsham Road,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.50993031962,51.4108734058,On or near Ludlow Close,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.5119272035,51.4094350194,On or near Harlech Close,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,,
The columns in the CSV files are as follows:
[table id=1 /]
The Challenge:
- The given data contains some inbuilt errors in the Easting, Northing , Crime_type fields.
- Data is in CSV format with commas in data itself.
- The CSV files contains column HEADERS i.e the first record in a CSV file is a header record containing column (field) names
What is unique ?
- The same data can be accessed over API. The API is implemented as a standard
JSONweb service usingHTTP GETandPOSTrequests. Full request and response examples are provided in the documentation. - The response contains ID of the crime which may be unique and can used as
HashKeywhile storing and Querying inNoSql. - The JSON file can also be used for as index document for
Elasticsearch.
Example API call via REST: https://data.police.uk/api/crimes-street/all-crime?lat=52.629729&lng=-1.131592&date=2013-01
Example Responce:
[
{
category: "anti-social-behaviour",
persistent_id: "",
location_type: "Force",
location_subtype: "",
id: 20599642,
location: {
latitude: "52.6269479",
longitude: "-1.1121716"
street: {
id: 882380,
name: "On or near Cedar Road"
},
},
context: "",
month: "2013-01",
outcome_status: null
},
{
category: "burglary",
persistent_id: "aebd220e869a235ba92cde43f7e0df29001573b3df1b094bb952820b2b8f44b0",
location_type: "Force",
location_subtype: "",
id: 20604632,
location: {
latitude: "52.6271606",
longitude: "-1.1485111"
street: {
id: 882208,
name: "On or near Norman Street"
},
},
context: "",
month: "2013-01",
outcome_status: {
category: "Under investigation",
date: "2013-01"
}
},
...
]
More details on API access can be found here: data.police.uk/docs/