Hull Digital Developer Group - November Meetup - 3 November 2011, Fudge, Princes Ave, Hull
(Full event details via the above link)
Just a short and rather late note (yet, yet again) about the November meeting.
Sue mainly acts as a bridge between teaching and technical.
Recent project for HERIB, new website created by Strawberry.
Assistive technology - anything to assist in physical use of technology.
Sue presented a number examples of Web Sites that when magnified dont show enough on the screen.
Many more people than you think have some form if impairment. Too many sites assume ME (Mouse and Eyes). We need to allow access to the digital world no matter how impaired an individual is.
Value from Public Data (the 15 minute plea for HELP!)
Eddie Coates-Madden - Hull City Council
Open Data - Direction from Central Government requires Local Government to make data "Open" not just the result of Freedom of Information Requests.
Open Data provision needs to be demand led, open and timely.
HCC already making data (unclean) via spread sheets.
How to make this data more accible? Need help with this - @OpenDataHull
P.S. - Some experiments with Hull City Council's published expense data.
Inspired by Eddie's presentation and last month's presentation on Business Intelligence and my own interests in Open Source, I decided to take a little time to look at combining the three to see what could be done cheaply, quickly and relatively simply. Oh, and I wanted to run it all on my MacBook Pro.
Five Steps from Web Site to Armchair Auditor.
Step 1 - Find anything that Hull City Council is providing to the Public in an OpenData format, i.e. "Machine Readable" (which really means, not a PDF). This involved downloading files published via the Hull City Council Web Site. For this I used "Site Sucker for Mac OS X" and left to run through the Web Site following links down a couple of levels. This gave me a whole bunch of files, tons of of them in fact.
Step 2 - Locate the data in the mass of files. This involved using a script to scan through the files and locate any Comma Separated Value (CSV) files in the mass of files I had downloaded. I used Pentaho Data Integration (PDI) for this as there is a Community Edition (freeware) version and as I also planned to use PDI to load and analyse the data too. PDI is too much of a subject to cover here. However, it does contain a nice visual development environment that lets you build "Transformations" to process data. In this case, I used it's folder and file handling steps to scan through all the files and look for those with the "CSV" extension and copy them to a folder for me to look at to see what I had.
Step 3 - Manually identify what I had. A quick scan through the files revealed that there were two sets of CSV files on the Web Site. The expenditure data, I was expecting plus a set of CSV files for Web Site performance. I concentrated on the expenditure data which was provided in a CSV file per month. I found that some 2010 data was in a different format from the rest so I left that out as the 2011 data was all in the same format, being a CSV but with two header records, the first giving the month and second giving the normal CSV field name header. I wrote a quick script to drop of the first record from each CSV (it actually performed a "tail -n +2 input file > outputfile" for each file. Leaving me with the CSV files I needed.
Step 4 - Load the data into a local MySQL database on my laptop. As I mentioned before, I planned to use a PDI Transformation for this, using its file handling steps to locate the "fixed" CSV files, loading them, calculating a new field to easily identify the month for reporting and loading them all into a common MySQL table for further analysis/reporting in PDI. There was a slight wrinkle as one of the CSV files held it's dates in a different format from the others, so I had to allow for that in the PDI Transformation. I then wrapped this up on a PDI "Job" that allowed me to re-run when I wanted by deleting (truncating) all the MySQL data first.
PDI Transformation Steps
Step 5 - The Armchair Auditor - Having loaded the data into a MySQL table on my laptop, I could then use the "Visualisation" feature of PDI to create Reports or graphics to analyse the expenditure data any way I wanted. Examples below (they all exclude a £30m refund from Riverside Group !). Just a bit of Fun!
Top 10 Service Area by Net Expenditure Jan-Sept 2011
Top 10 Procurement, ICT & Facilities Suppliers by Net Expenditure Jan-Sept 2011
Monthly Net Expenditure with Kingston Communications Jan-Sept 2011