Can we trust the recording of crimes by the police?: a data analysis project

Have you ever wondered how many times a complaint of a crime is reported to the police in England and Wales and then subsequently marked as “no crime” after investigation by police? (An incident that is initially recorded as a crime can only be marked as no crime if there is “additional verifiable information” indicating a crime did not take place.)

For my first personal project in data science, I decided to analyse the Home Office’s “No crimes” dataset for the financial year 2011/2012 , using the R programming language. (The programming code  is included in italics). The CSV dataset was read into a data frame called nocrimes.

Using length(table(nocrimes$Force_Name)), I worked out that there are 44 police forces in England and Wales, including the British Transport police.

Almost 4 million crimes (3,992,353) in total were reported across the different police forces for various kinds of criminal offences. The average number of reports for each type of criminal offence was stored in a data frame MeanOffences are as follows:

 sort(ceiling(tapply(nocrimes$Force_Offences, nocrimes$Offence_Group, mean)),
decreasing = TRUE)

Type of Offence Average number of reports to Police
Other theft offences 25,117
Violence against the Person 17,330
Criminal damage 14,346
Burglary 11,388
Offences against vehicles 9,488
Drug Offences 5,207
Fraud and forgery 3,211
Robbery 1,698
Other offences 1,370
Total Sexual Offences 1,220
Rape 365

The average proportion of reported incidents recorded as crimes then subsequently marked “no crime” was stored in a data frame MeanNoCrimes as follows:

MeanNoCrimes <- sort(ceiling(tapply((nocrimes$Force_No_Crimes / nocrimes$Force_Offences)*100,
nocrimes$Offence_Group, mean)), decreasing = TRUE)

Type of Offence Average proportion of incidents recorded as crimes then later marked as no crime.
Rape 19%
Fraud and Forgery 17%
Other offences 13%
Total Sexual Offences 9%
Robbery 6%
Drug Offences 4%
Offences against vehicles 4%
Other theft offences 4%
Violence against the person 4%
Burglary 3%
Criminal Damage 3%

Just by observing the two tables above, it can be seen that the types of offences with the highest and lowest number of reports are not the same as the types of offences with the highest and lowest proportion of recorded “no crimes” respectively. This can be validated comparing the names of the head and tail of MeanOffences and MeanNoCrimes:

table(names(head(MeanOffences)) %in% names(head(MeanNoCrimes)))
table(names(tail(MeanOffences)) %in% names(tail(MeanNoCrimes)))

Only one type of offences is found to be in both tables above. Using the following commands,

 names(head(MeanNoCrimes))
names(head(MeanOffences))

the common element is found to be “Drug Offences”, which suggests it is the midpoint – as indicated in the tables. This can be further validated by the comparing the head of MeanOffences with the tail of MeanNoCrimes and vice versa. All of the most reported types of offences (head of MeanOffences) have the lowest proportion that are subsequently marked “no crime” (tail of MeanNoCrimes). Similarly, all of the least reported types of offences (tail of MeanOffences) have the highest proportion that are subsequently marked “no crime” (head of MeanNoCrimes).

Is there anything about rape, sexual offences, fraud and forgery, robbery and “other offences” that makes it more likely that an offence will be subsequently recorded as “no crime”, compared to offences against vehicles, violence against the person, burglary, criminal damage and other theft offences? We are not talking about incidents that are not recorded as crimes at all but incidents that are recorded as crimes initially but then classed as no crime after investigation.

  • It may well be that  that the person reporting the incident believed there was a crime but the investigation revealed evidence that it was not a crime.
  • The person reporting the crime was making a false allegation deliberately.
  • It is possible that the reported incident is subsequently recorded “no crime” because the nature of the crime makes it difficult to verify whether there was a crime or not – but that would go against the advice of HM Constabulary.
  • The lower the number of reported types of crimes, the more frequent the instances of “no crime” appear to be.

Further interrogation of the dataset for 2011/2012 revealed that a further problem. Looking at the average numner of reported incidents of rape for each force,

NoRapePerForce <- tapply(nocrimes$Force_No_Crimes[nocrimes$Offence_Group == “Sexual offences which are rape”],
nocrimes$Force_Name[nocrimes$Offence_Group == “Sexual offences which are rape”],
mean, decreasing = TRUE)

RapePerForce <- tapply(nocrimes$Force_Offences[nocrimes$Offence_Group == “Sexual offences which are rape”],
nocrimes$Force_Name[nocrimes$Offence_Group == “Sexual offences which are rape”],
 mean, decreasing = TRUE)

sort((NoRapePerForce / RapePerForce))

It turns out that the proportion of reported rape incidents subsequently marked “no crime” by City of London police is greater than 100%. There are two reports or rape but five reports of rape that are subsequently marked as “no crime” after investigation.

tapply(nocrimes$Force_Offences[nocrimes$Offence_Group == “Sexual offences which are rape”],
nocrimes$Force_Name[nocrimes$Offence_Group == “Sexual offences which are rape”],
mean)[“London, City of”]

tapply(nocrimes$Force_No_Crimes[nocrimes$Offence_Group == “Sexual offences which are rape”],
nocrimes$Force_Name[nocrimes$Offence_Group == “Sexual offences which are rape”],
mean)[“London, City of”]

Indeed. there are two types of offence where the number of reported crimes is less than the number of reported crimes that are subsequently marked “no crime” – “Sexual offences which are rape” at City of London Police and “Other offences” at Warwickshire Police.

table(nocrimes$Force_No_Crimes < nocrimes$Force_Offences)

subset(nocrimes, nocrimes$Force_No_Crimes > nocrimes$Force_Offences)

If the dataset show what it is described to show on the data.gov.uk website, then there appears to be a couple of errors in the data. Given that both types of offence are in the tail of MeanNoCrimes and head of MeanOffences, then the error raises questions, at the very least, about the reliability of police crime data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s