GeoSocial Intelligence

I recently published a short essay in the IEEE Technology & Society Magazine about the opportunities and research challenges that social media present as a data source for understanding complex urban systems in informal settlements.  The post below is a synopsis of the article posted on IdeaPod.

Social media, driven by the explosive uptake in mobile computing, has caused a systematic shift in personal communications on a global scale. From the Arab Spring to the Occupy Movement it is apparent that social media is becoming an integrated part of our global communication infrastructure. Critically, much of this information is underpinned by geographical content such as mobile GPS coordinates, which enable the user to tie their media to a specific location on the Earth’s surface. In this new paradigm, social media are effectively forming a human-powered sensor network.

PetaJakarta_FloodReport 2

As world populations continue to grow, and we face the social, climatic and economic challenges of the 21st century, how can we leverage the potential of this new global network of intelligence sensors? How can we use this data to inform us about the urban system and adapt to global change?

Article originally published in IEEE Technology & Society Magazine Spring 2014:

Spatial network modelling for sanitation planning in informal settlements

Previously, in this blog post, I discussed the ways in which we’re tackling the infrastructure challenges in developing nations using open data. Below are the slides I presented at the first International Symposium for Next Generation Infrastructure. The work presented is a proof-of-concept model using data from Map Kibera to optimise a road-based sewage network. The great thing about using this data is that for the first time we can glean an insight into infrastructure provision in informal urban settlements, and examine methods to improve it.

Developing nations – the network infrastructure challenge

The world is becoming more urban – so how do we ensure our infrastructure meets the social, economic and climatic challenges of the 21st century? This was the theme of the recent International Symposium for Next Generation Infrastructure (ISNGI) hosted by the SMART Infrastructure Facility, at the University of Wollongong. The symposium highlighted some of the great research going on in Australia and around the world to understand how we can make our cities and their infrastructure sustainable for future generations.

IWA Poster

Sanitation Network Modelling Poster

But what about developing nations? How do we model infrastructure when there’s no data? Or when the system is changing so fast that traditional data collection techniques become redundant? How do we quantify the infrastructure requirements of a slum when it’s  population fluctuates by 800,000 people annually? More importantly, how do you engage with that community to understand their needs?

One solution is to use open data and open tools. The world is becoming more connected, and crowd-sourced data offer, for the first time, an insight into infrastructure in some of the world’s poorest cities and informal settlements which have never before been mapped. The Map Kibera project is a really great example of this. In collaboration with colleagues from the UK, we built a prototype model to demonstrate the utility of data from Map Kibera and Open Street Map for spatio-topological network modelling, to optimise road-based sanitation for Kibera. I presented this work at ISNGI, and Ruth recently presented a poster of this work at the International Water Association Congress and Exhibition in Nairobi. We’ve demonstrated it’s possible – the challenge now is to make it work in the real world.

Creating spatial magic with GeoAlchemy2 and PostGIS

I recently discovered the GeoAlchemy2 project – a replacement for the original GeoAlchemy package, focused on providing PostGIS support for SQLAlchemy. The SQLAlchemy package is a “Python SQL Toolkit and Object Relational Mapper”. In a nutshell this means you can write Python classes and map them to PostgreSQL tables without the need to write SQL statements – pretty cool!

PostGIS is great for doing spatial stuff, but if you’re using it as back-end for a Python app then you can spend a lot of time writing Python wrappers around SQL statements, and even with the excellent Psycopg2 package this can be tricky. This is especially true if you’re using the OGR Python bindings to handle PostGIS read/writes.

Enter GeoAlchmey2. I’ve been experimenting with it for a week, in that time I’ve learnt this:

For developing geospatial Python apps with PostGIS, GeoAlchemy2 is nothing short of revolutionary.

You can call PostGIS functions in Python, which means you can use them (and the data) directly within your Python application logic. Here’s an example. The SQL statement below uses PostGIS to create a new line geometry between a point, and the closest point on the nearest line.

Now here’s a snippet from a Python script, performing the same process using GeoAlchemy2.

You’ll notice here that we’re actually calling our own Python function “make_link_line” during the query to create the new geometry. This exemplifies how we can move PostGIS objects around inside the script. Once the query runs we can access the returned data in our application from the row variable. Below is the complete script.

Nearest neighbour PostGIS and GeoAlchemy2 script:

The script above is just a simple example, but it shows how powerful GeoAlchemy2 is for embedding PostGIS objects and methods inside Python. I’m really looking forward to digging deeper into the functionality of GeoAlchemy2 and SQLAlchemy to integrate them within my own projects. Check out the official tutorials for more examples:

London land use map

As part of my PhD I had to produce a land cover map for the Greater London area. I derived a simple land cover classification using the UKMap Basemap (which I previously used to generate the 3D London map). Click on the image below to see a larger version (1.8Mb at 300dpi).

London land cover map

Land cover in the Greater London area

I created the layers using PostGIS tables for each land cover type, based on the Basemap’s Feature Type Code (FTC), which classifies land use based on the National Land Use Database. Using separate tables also significantly improved rendering performance in Quantum GIS (QGIS), which I used for the cartography. I was impressed by QGIS’ ability to process and render such a detailed data-set (the Basemap contains ~11 million polygons for London).


Travelling bookcase

As I had a lot of spare time on my hands before Christmas I decided to build a bookcase as a present for my girlfriend (an avid book reader and collector). The design is based on ‘nomad style’ furniture, which doesn’t require nails or glue – I really liked the idea that it can be dismantled and easily transported. I got my inspiration from instructables.


The finished product

The bookcase dimensions are 90x65x21cm, based on my measurements of existing bookshelves we had. The biggest challenge was finding a timber merchant who stocked plain square edged (PSE) wood that was deeper than 19cm, I think anything less is a bit shallow for most books.

Apart from drilling holes to help chisel some of the joints, I managed to build it without any other power tools. The lid sits on four dowel pegs, and I finished the whole thing with a couple of coats of boiled linseed oil, which provided a lovely finish. I’m happy to say it was well received on Christmas day!

Here are some more photos I took during construction.

Searching for knot-free PSE.

Searching for knot-free PSE.

Making the first cut

Is it straight?

Work in progress

Work in progress

Cutting the first shelf tenon joint

Cutting the first shelf tenon joint

The finished shelves

The finished shelves

Cutting the holes in the uprights

Cutting the holes in the uprights

Testing the shelves for fit

Testing the shelves for fit

A surveyor's dream.

A surveyor’s dream

Confidence intervals of simple linear regression

Plotting confidence intervals of linear regression in Python

After a friendly tweet from @tomstafford who mentioned that this script was useful I’ve re-posted it here in preparation for the removal of my Newcastle University pages.

This script calculates and plots confidence intervals around a linear regression based on new observations. After I couldn’t find anything similar on the internet I developed my own implementation based on Statistics in Geography by David Ebdon (ISBN: 978-0631136880).

Linear regression plot

Plot of linear regression with confidence intervals

# - example of confidence limit calculation for linear regression fitting.

# References:
# - Statistics in Geography by David Ebdon (ISBN: 978-0631136880)
# - Reliability Engineering Resource Website:
# -
# - University of Glascow, Department of Statistics:
# -

import numpy as np
import matplotlib.pyplot as plt

# example data
x = np.array([4.0,2.5,3.2,5.8,7.4,4.4,8.3,8.5])
y = np.array([2.1,4.0,1.5,6.3,5.0,5.8,8.1,7.1])

# fit a curve to the data using a least squares 1st order polynomial fit
z = np.polyfit(x,y,1)
p = np.poly1d(z)
fit = p(x)

# get the coordinates for the fit curve
c_y = [np.min(fit),np.max(fit)]
c_x = [np.min(x),np.max(x)]

# predict y values of origional data using the fit
p_y = z[0] * x + z[1]

# calculate the y-error (residuals)
y_err = y -p_y

# create series of new test x-values to predict for
p_x = np.arange(np.min(x),np.max(x)+1,1)

# now calculate confidence intervals for new test x-series
mean_x = np.mean(x)			# mean of x
n = len(x)				# number of samples in origional fit
t = 2.31				# appropriate t value (where n=9, two tailed 95%)
s_err = np.sum(np.power(y_err,2))	# sum of the squares of the residuals

confs = t * np.sqrt((s_err/(n-2))*(1.0/n + (np.power((p_x-mean_x),2)/

# now predict y based on test x-values
p_y = z[0]*p_x+z[0]

# get lower and upper confidence limits based on predicted y and confidence intervals
lower = p_y - abs(confs)
upper = p_y + abs(confs)

# set-up the plot
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('Linear regression and confidence limits')

# plot sample data
plt.plot(x,y,'bo',label='Sample observations')

# plot line of best fit
plt.plot(c_x,c_y,'r-',label='Regression line')

# plot confidence limits
plt.plot(p_x,lower,'b--',label='Lower confidence limit (95%)')
plt.plot(p_x,upper,'b--',label='Upper confidence limit (95%)')

# set coordinate limits

# configure legend
leg = plt.gca().get_legend()
ltext = leg.get_texts()
plt.setp(ltext, fontsize=10)

# show the plot