SF drug geography

Data pulling and cleaning -

Let's pull all SF Crime data provided by SF data:

Let's pull it in and peek at the schema.

Loading output library...

It provides ~ 1M records with:

Here's a nice map of the districts: http://sf-police.org/index.aspx?page=796

Let's create an easy handle (days) for timeseries analysis.

Loading output library...

The first recorded request is 2003-01-01 and most recent is 2015-02-13. Nice.

Distributions and exploration -

Loading output library...

Let's get a more detailed view by examining Descript, which is the particular crime type.

Since there's 912 different crime types, let's slice by percentile and peek at the top types of crime for each PdDistrict.

Cluster the non-normalized data across the top percentile reports and each PdDistrict.

Loading output library...
Loading output library...

Normalize verically across PdDistrict.

Loading output library...
Loading output library...

Normalize horizontally across crime types.

Loading output library...
Loading output library...

(1) GTA is the most common crime in most PdDistricts.

  • Tenderloin is an outlier, enriched in base/rock crack and narcotics.

(2) For the distribution of crime across areas:

  • Southern: Theft, including theft from auto.
  • Tenderloin: Base / rock crack and narcotics.
  • Bayview: Violence and threats.

Now, let's drill down on a specific question -

Lets, re-examine the crime types.

Loading output library...

I'm interested in DRUG/NARCOTIC:

  • I think it will show some interesting dynamics.
  • I think different areas of the city will have different distributions.
Loading output library...

We can use what we had above, but we simply slice the input data on a category first (above).

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Nice. We could study these for a while.

But, here's the point:

I think we can simplify this if we compress different types of drug into groups.

Then, we can examine both temporal and spatail profiles.

Drug dynamics -

We'll create a 30 day window.

Let's group the drug categories to make this easier to examine.

Loading output library...
Loading output library...

Let's add the real dates.

Loading output library...
Loading output library...

Let's iterate through each district.

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

We can also look at correlations between areas for different drugs.

Loading output library...
Loading output library...

With this in mind, we can examine select timeseries data.

Loading output library...
Loading output library...
Loading output library...
Loading output library...

Spatial relationships -

Let's re-do what we did above, but re-scale it.

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

We can now summarize this data using clustered heatmaps.

Loading output library...
Loading output library...
Loading output library...
Loading output library...

Mapping relationships -

Let's isolate all crack-related records.

Plot the crack regimes.

Loading output library...
Loading output library...

Fold-difference in mean between the two regimes.

Loading output library...

Two regimes.

We can look at this spatially.

Use a shapefile for Neighborhoods in SF to overlay the data onto a map.

https://data.sfgov.org/Geographic-Locations-and-Boundaries/Neighborhoods/ejmn-jyk6

Basemap can be used to view this. Some nice work at this link that I drew from:

http://sensitivecities.com/so-youd-like-to-make-a-map-using-python-EN.html

We can use the Basemap library.

Loading output library...
Loading output library...
Loading output library...
Loading output library...