by We You Toh

Ever thought that you just want a cup of coffee when you stay long enough in a bookstore?

I'm usually able to find a coffee shop nearby pretty easily, say within a city block or two, whenever I'm visiting a bookstore. Perhaps there is something to say about bookstores and coffee shops.

Take this a little further. If I were a bookstore owner, I would think that having a sense of how common bookstores and coffee shops are close to each other would be useful information to me. Perhaps even how common are bookstores close to other bookstores. It goes without question that having intelligence about the neighboring complementary services and competitive services is absolutely important to any business owner. After all, quoting Sun Tzu, "Know your enemy and know yourself, find naught in fear for 100 battles."

Just to get this started, we'll examine how common bookstores are near another in my home base, the city of San Francisco. We'll also take a look at how common bookstores are near coffee shops. And while we're at it, I'll make a comparison between San Francisco and New York City, so that we may have some sense if the observations I get may possibly be scalable.

Here at Kyso, I'm able to host this interactive python notebook, and allow you, the reader, to view the python codes used in this project. Feel free to click on the upper right button to toggle between "Code Hidden", "Code Shown" and "Code and Output Shown" to find out more. I've also made use of the Folium library to create some amazing maps, which you may also interact with. You may pan and zoom to check on the locations I've mapped out. You could also click on each marker to get more information about the venue.

This project is also made possible by Foursquare, which is where the venue data are extracted from. By making Foursquare API calls, I am able to build a "venue profile" for the bookstores in San Francisco and New York City. I'm excited to explore some, so let's get started!

Note: An accompanying Jupyter notebook is publicly shared my github repository at:

Feel free to download this file and re-run the whole notebook on your own. Foursquare API credentials are needed to extract the venue data. Visit to find out more about getting the credentials.

Tip: If you run the Jupyter notebook on a local host, set the notebook to 'Trusted' to enable javascript display, otherwise the Folium maps may not display properly.



For consistency in the data collection, I've elected to quantify the following parameters:

  1. How near is "near"?
    • We'll quantify this to be 250m radius (equivalent to about 1 to 2 city block).
  2. How far should the search cover?
    • To ensure consistency in the search process, we'll set the coverage to a 4000m radius, which should cover San Francisco city substantially.

I manage to collect 131 number of bookstore venues in San Francisco, and 183 number of coffee shops within 4000 meters of the city center. In examining how common bookstores are near another bookstore or a coffee shop, I've found that the proportion of bookstores within 250 meters from another one is .687, and the proportions of bookstores within 250 meters from a coffee shop is .710. The distance 250 meters is around one to two city blocks and is the proximity I have chosen for this investigation. It sounds like I can trust my gut instinct that more than half the time I visit a bookstore, I would likely find a coffee shop somewhere.

Heat Map of Bookstores in San Francisco

Loading output library...

This heat map indicates to me that the bookstores are concentrated in certain areas within the city, particularly the along Market Street. The map is interactive, so feel free to zoom in or pan around.

Marked Locations of Bookstores and Coffee Shops in San Francisco

Loading output library...

This map gives me a visual indication that it is indeed not uncommon to find bookstores near coffee shops. The red dots indicate the location of the coffee shops, while the blue dots show where the bookstores are. If you click on a blue dot, it will provide information about its storename and whether it is near another bookstore or coffee shop.

Let's do this statistically though. At 95% confidence level, the proportion of bookstores in San Francisco would fall within the range of (.608, .766) for those near another bookstore. Likewise, the range is (.632, .788) for those near a coffee shop.

The proportion values are all above .5. I now have a numeric basis to say I have confidence with my gut instinct about finding a coffee shop near a bookstore.



In the New York City dataset, we've collected 203 bookstore venues and 162 coffee shop venues. The proportion of bookstores near the other bookstores is 0.749, while the proportion of bookstores near coffee shops is 0.655. (Change the setting at the upper right side of the webpage to "Code and Output Shown" to reveal how the numbers are obtained.)

So to me, the proportions collected for New York is pretty close to those collected for San Francisco. But I will conduct a couple of statistical tests next to tell us whether the difference in these proportion values are significant.

The two-sample proportion Z-test is deployed to compare the proportions of the bookstore data in the two cities. The following assumptions allows the statistical test to be carried out meaningfully:

  1. The samples are independent.
  2. Each sample includes at least 5 successes (i.e. 'True') and 5 failures (i.e. 'False').

Statistically based on the two-sample proportion Z-test, using the Foursquare data collected with a coverage radius of 4000m, the difference in the proportion of bookstores that are within 250m from another bookstore in San Francisco (.687) and that of the bookstores in New York (.749) is not quite significant enough. z = -1.233, p = .217.

Similarly, in the next two-sample proportion Z-test, using the Foursquare data collected with a coverage radius of 4000m, the proportion of bookstores that are within 250m from a coffee shop in San Francisco (.71) is also not significantly different from that of the bookstores in New York (.655), z = 1.044, p = .296.



This analysis has provided me a brief look at the venue profile of bookstores in San Francisco. While the data collected are sampled proportions, I would argue that we may use the proportion values as estimates in predicting the probability of finding another bookstore or a coffee shop nearby. For business owners, I think the proportion values are useful indicators on

  1. the level of competition from neighboring stores offering the same service.
  2. the level of complementary services around a target location.

As I compare between San Francisco and New York, I learn there is similariy in the venue profiles of bookstores in these two cities. Their proportion values are not considered different based on the statistical tests.

I will contend further that if we have profit/loss data and foot traffic data to go with the data collected, we could inspect at what level of competition is beneficial to the service provided, or at what kind of complementary services are good to have around. So, the data journey shall not end here. More data digging shall be on the way...

Thank you for checking out this blog. I hope you have found it enjoyable.

This blog is derived from my other Jupyter python notebook. Check it out my github repository at: