data.police.uk provides a complete snapshot of crime, outcome, and stop and search data, as held by the Home Office at a particular point in history.

The actual data is located on S3 under bucket policeuk-data and can be accessed with a URL similar to https://policeuk-data.s3.amazonaws.com/archive/20yy-mm.zip , (Where yy,mm are year and month that can be replaced accordingly)

The Structure:

All files are organized by YEAR and MONTH.

Each month has a ZIP file with CSV files inside the zip file.

The January 2015 file 2015-01.zip contains data for all months starting from 2010-12 to 2015-01

[hadoop@ip-172-31-24-128 mnt]$ wget https://policeuk-data.s3.amazonaws.com/archive/2015-01.zip
[hadoop@ip-172-31-24-128 mnt]$ unzip 2015-01.zip
[hadoop@ip-172-31-24-128 mnt]$ ls

2010-12  2011-02  2011-04  2011-06  2011-08  2011-10  2011-12  2012-02  2012-04  2012-06  2012-08  2012-10  2012-12  2013-02  2013-04  2013-06  2013-08  2013-10  2013-12  2014-02  2014-04  2014-06  2014-08  2014-10  2014-12  2015-01.zip
2011-01  2011-03  2011-05  2011-07  2011-09  2011-11  2012-01  2012-03  2012-05  2012-07  2012-09  2012-11  2013-01  2013-03  2013-05  2013-07  2013-09  2013-11  2014-01  2014-03  2014-05  2014-07  2014-09  2014-11  2015-01  

[hadoop@ip-172-31-24-128 mnt]$ cd 2011-04

##Example file-name structure##

2011-04-avon-and-somerset-street.csv  
2011-04-cumbria-street.csv 
2011-04-gloucestershire-street.csv
Contents of a sample file:
[hadoop@ip-172-31-24-128 2011-04]$ head -6 2011-04-avon-and-somerset-street.csv

Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Last outcome category,Context
,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,1.04177957823,52.0373951227,On or near The Street,E01029877,Babergh 005A,Other crime,,
,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.49436551493,51.4181694243,On or near Keynsham Road,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,,
,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.49436551493,51.4181694243,On or near Keynsham Road,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,,
,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.50993031962,51.4108734058,On or near Ludlow Close,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,,
,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.5119272035,51.4094350194,On or near Harlech Close,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,,

The columns in the CSV files are as follows:

[table id=1 /]


The Challenge:

  • The given data contains some inbuilt errors in the Easting, Northing , Crime_type fields.
  • Data is in CSV format with commas in data itself.
  • The CSV files contains column HEADERS i.e the first record in a CSV file is a header record containing column (field) names

What is unique ?

  • The same data can be accessed over API. The API is implemented as a standard JSON web service using HTTP GET and POST requests. Full request and response examples are provided in the documentation.
  • The response contains ID of the crime which may be unique and can used as HashKey while storing and Querying in NoSql.
  • The JSON file can also be used for as index document for Elasticsearch.

Example API call via REST: https://data.police.uk/api/crimes-street/all-crime?lat=52.629729&lng=-1.131592&date=2013-01

Example Responce:
[
    {
        category: "anti-social-behaviour",
        persistent_id: "",
        location_type: "Force",
        location_subtype: "",
        id: 20599642,
        location: {
            latitude: "52.6269479",
            longitude: "-1.1121716"
            street: {
                id: 882380,
                name: "On or near Cedar Road"
            },
        },
        context: "",
        month: "2013-01",
        outcome_status: null
    },
    {
        category: "burglary",
        persistent_id: "aebd220e869a235ba92cde43f7e0df29001573b3df1b094bb952820b2b8f44b0",
        location_type: "Force",
        location_subtype: "",
        id: 20604632,
        location: {
            latitude: "52.6271606",
            longitude: "-1.1485111"
            street: {
                id: 882208,
                name: "On or near Norman Street"
            },
        },
        context: "",
        month: "2013-01",
        outcome_status: {
            category: "Under investigation",
            date: "2013-01"
        }
    },
    ...
]

More details on API access can be found here: data.police.uk/docs/