Urban Analytics

Example projects from LIDA


Nick Malleson

School of Geography, University of Leeds &
Leeds Institute for Data Analytics (LIDA)

nickmalleson.co.uk

surf.leeds.ac.uk

These slides: http://surf.leeds.ac.uk/presentations.html

Overview

Urban Analytics - very broad!

Focus on population flows and the ambient population

Background - smart cities, the data deluge and the ambient population

Machine learning approaches

Tracking people (HABITS)

Towards a real-time city simulation ...

Data assimilation

Agent-Based modelling

Smart cities and the data deluge

Comparing census daytime populations to aggregate mobile phone estimates
Evaluating aggregate mobile phone population accuracy

Abundance of data about individuals and their environment

"Big data revolution" (Mayer-Schonberger and Cukier, 2013)

"Data deluge" (Kitchin, 2013a)

Smart cities

cities that "are increasingly composed of and monitored by pervasive and ubiquitous computing" (Kitchin, 2013a)

This talk presents some LIDA projects that aim to make use of these data

How many people are there in Traffalgar Square right now?

We need to quantify the ambient population and better understand urban flows:

Crime – how many possible victims?

Pollution – who is being exposed? Where are the hotspots?

Economy – can we attract more people to our city centre?

Health - can we encourage more active travel?

Modelling Footfall with Machine Learning

Aim: to better understand footfall in Leeds city centre, using computer models and footfall camera data

Methods

Data analysis & visualisation

Machine learning

Outcomes

Analysis of changes in footfall patterns over time

A model of footfall, able to quantify the success of events

Indications of most important drivers

Modelling Footfall with ML

Locations of CCTV cameras
Locations of the footfall cameras in Leeds

Footfall Data

8 cameras installed between 2007 and 2009

Track movement of people through their field of vision

Provide counts of number of passers-by per hour

Cover a relatively small area of the city centre, not good for wider dynamics.

Calendar showing footfall patterns over a typical week

Modelling Footfall with ML

Side note: Temporal Change

Saw an change in proportion of people using the city

Daytime (8am to 6pm) -> Evening (6pm to 9pm)

Night time (9pm to 8am) unchanged

Change in proportions of people in the city at different times

Modelling Footfall with ML

People in the rain

Explanatory factors

Bank Holidays

What type of bank holiday?

School and University Holidays

The weather

Mean temperature, wind speed and rainfall

Day of week

Many others (not tested yet...)

Train prices, car parking availability, business opening times, etc.

Modelling Footfall with ML

Errors from the random forest model

Machine Learning Methods

Attempted a number of different methods

Mainly neural networks and random forests

Random forest was most accurate

Modelling Footfall with ML

Feature Importance

VariableRelative Importance
Mean daily temperature1142
Mean daily rainfall 383
Monday131
2013131
Saturday130
2016130
After Trinity opened123
Thursday122
Tuesday116
School holiday115

Modelling Footfall with ML

Predictive Analytics

EventDateReal FootfallPredictionDifference (%)
Tour de France5-Jul-14346,180217,277-37
Trinity Opening21-Mar-13279,473187,381-33
Xmas lights 201307-Nov-13193,441153,750-21
Xmas lights 201512-Nov-15175,126160,105-9
Light Night06-Oct-16225,660198,025-12
     
???30-Dec-12106,423157,842+48
???12-Apr-16134,808173,734+29

Modelling Footfall with ML

Conclusion

Still needs some refinement, but otherwise a useful tool

Potential for prospective analysis as well as forecasting

Not useful for more nuanced analysis

How should roads be re-configured to encourage pedestrians?

Where are most visitors coming from?

How have patterns of use in the city changed?

For this we need more detail about individual movements...

HABITS

Improved policy to mitigate pollutant and inactivity related health burdens through new big data

Aim: Take new 'Track and Trace' (T&T) data generated from mobile phones to support new policies to:

Reduce the disease burden of pollution

Encourage active travel

Lead by the Institute for Transport Studies, in collaboration with Newcastle City Council and funded by the ESRC

A more nuanced measure of population flows?

HABITS
GoSmarter logo

Go Smarter

Smart-phone app built in collaboration with Newcastle City Council

Tracks peoples' journeys

Detects when the user is moving and estimates mode of travel

Rewards for using active / sustainable modes of travel

Aim: Demonstrate how the linking of high-resolution location data and other databases / models can support better policy making

Source: Park, Yoo Min, and Mei-Po Kwan (2017). Individual Exposure Estimates May Be Erroneous When Spatiotemporal Variability of Air Pollution and Human Mobility Are Ignored. Health & Place 43: 85–94.
HABITS

Disease Burden of Pollution

Collaborating with the Newcastle Urban Observatory who are sensing the urban environment

Aim: use T&T data to model urban flows and identify the most serious pollution hotspots.

Source: Park, Yoo Min, and Mei-Po Kwan (2017). Individual Exposure Estimates May Be Erroneous When Spatiotemporal Variability of Air Pollution and Human Mobility Are Ignored. Health & Place 43: 85–94.
HABITS

Data Caveats

T&T data are

High resolution (spatio-temporal)

Abundant

BUT:

How representative of the wider population?

Abundant enough?

Other 'Big' Urban Flow Data

Social Media

Example activity spaces derived from twitter messages
Malleson, N and M. Birkin. (2014) New Insights into Individual Activity Spaces using Crowd-Sourced Big Data. Paper presented at the 2014 BigData conference, Stanford, CA, USA, 27-31 May. [paper (pdf)][slides (html)].
Evaluating aggregate mobile phone population accuracy
Comparing census daytime populations to aggregate mobile phone estimates

Urban Flow Data

Mobile Communications

Large population coverage

Comparing census daytime populations to aggregate mobile phone estimates

Private, unknown methodology, privacy concerns, coarse resolution (?)

Traces of movement from a smart-phone app

Urban Flow Data

Geo-Apps

Smart-phone apps that capture movement / location are becoming ubiquitous

Great potential for understanding (some) urban dynamics

Urban Flow Data

Difficulties

Skewness

Prolific users distort patterns

Representation

Online & public ≠ offline & private

Spatial accuracy

Bias

Participation inequality and the digital divide

Complicated!!

Messy, and "too big for Excel"

Benefits

Volume

Potential for large sub-samples

Velocity

Streaming / regularly updated

Potential for dynamic models

Need to consolidate diverse sources

Solution: Agent-Based Modelling ?

Agent-Based Modelling (ABM)

Autonomous, interacting 'agents'

Model phenomena from the 'bottom-up'

Advantages:

Modelling complexity, non-linearity, emergence

Natural description of a system

Bridge between verbal theories and mathematical models

History of the evolution of the system

ABM Example - Burglary

 

 

 

 

 

 

 

 

 

ABM Problems

1. Computationally Expensive

Not amenable to machine-led calibration

2. Data hungry

Need fine-grained information about individual actions and behaviours

3. Divergent

Usually models represent complex systems

Projections / forecasts quickly diverge from reality

3. Divergence

Complex systems

One-shot calibration

Nonlinear models predict near future well, but diverge over time.

The process of calibration
Typical model development process

3. Divergence

Drawback with the 'typical' model development process

Waterfall-style approach is common

Calibrate until fitness is reasonable, then make predictions

But we can do better:

Better computers

More (streaming) data

Methodological gap

Diagram of dynamic data assimilation and an ABM

Dynamic Data Assimilation

Used in meteorology and hydrology to constrain models closer to reality.

Try to improve estimates of the true system state by combining:

Noisy, real-world observations

Model estimates of the system state

Should be more accurate than data / observations in isolation.

Ensemble Kalman Filter - Basic Process

1. Forecast.

Run an ensemble of models (ABMs) forward in time.

Calculate ensemble mean and variance

2. Analysis.

New 'real' data are available

Integrate these data with the model forecasts to create estimate of model parameter(s)

Impact of new observations depends on their accuracy

3. Repeat

Ensemble Kalman Filter - Basic Process

Diagram of DDA assimilating data

Experiment with an EnFK

Very simple ABM

People walking along a street

Every hour, x people begin at point A

CCTV Cameras at either end count footfall

Some people can leave before they reach the end (bleedout rate)

Aim: Estimate the number of people who will pass camera B

Diagram of the model environment

Hypothetical 'Truth' Data

Use the model to first generate a hypothetical reality

Results - counts at camera A and B

(Preliminary) Experimental Results

Results1 (see caption)
Kalman filter results over 5 days
Results1 (see caption)
Sequential parameter estimation under increased observational uncertainty

(Preliminary) Experimental Results

Forecast and analysis are barely distinguishable

Virtual observations are closer to 'truth' than the analysis :-(

This is probably due to the degree of randomness in the model

EnKF estimates the model parameter (bleedout rate) accurately :-)

Simulating Urban Flows (surf)

surf.leeds.ac.uk

Aim: Create an agent-based model capable of representing the human flows in a real city.

Calibrated using streaming data dynamically

Hoping for European Research Council funding to continue the work

Conclusion

A sample of LIDA Urban Analytics projects

Machine learning approaches

Tracking people (HABITS)

Agent-Based modelling

Towards a real-time city simulation (surf) ...

Urban Analytics

Example projects from LIDA


Nick Malleson

School of Geography, University of Leeds &
Leeds Institute for Data Analytics (LIDA)

nickmalleson.co.uk

surf.leeds.ac.uk

These slides: http://surf.leeds.ac.uk/presentations.html