Planet Dfey

April 25, 2011

Tim Dobson

Aaron Esses Wrote: And when i get back nick will take me...

Aaron Esses Wrote: And when i get back nick will take me to atm for you <(4+2+0)>

by tdobson at April 25, 2011 06:57 PM

@@number@@ Wrote: Aaron - Ill text u when im almost home...

@@number@@ Wrote: Aaron - Ill text u when im almost home <(4+2+0)>

by tdobson at April 25, 2011 06:56 PM

April 05, 2011

Ben Webb

GeoDjango + Crime Data

Recently I’ve been trying out Django as a web development framework. For previous opendata projects I’d tried using CouchDB as a database, as it is easier to use directly than traditional SQL based databases, and was supposed to scale well. However, I found that for the kind of projects I used it for, with a large amount of data inputted at the start, queries took minutes to hours to build across the whole dataset, and took up a large amount of data – several times that of the original database, which added up to gigabytes. All in all it wasn’t as nice to develop with (at least forthe purpose of large open datasets) as I’d hoped.

So, I decided to try out Django’s ORM. It had exactly what I was looking for – a nice pythonic interface (no need to write any sql, not even to create the tables), but with the performance you would expect from a well indexed SQL database -  it has handled the hundreds of thousands of rows that I’ve thrown at it rather well.

I had strayed away from using a full blown framework such as Django, as I’d assumed it would be difficult or messy to use one part of the framework (the ORM) in scripts where I didn’t need to use all the other parts. However, Django is actually designed to be loosely coupled, making this much easier than I antipated.

The Saturday before last,  I was at National Hack the Government Day 2011, so decided to use my new Django skills to make an app around the relatively recently released Police data. Since this data is highly geographic, I thought I’d give GeoDjango a go.

As nice as GeoDjango it is, there was somewhat of a learning curve for a one-day hackday, since in addition to learning GeoDjango itself, I had to transition from MySQL to PostgreSQL. GeoDjango relies on the database’s spatial extensions to store and query geographic data. Whilst this presumably has great performance benefits, it meant MySQL was a poor option, as they have limited spatial support (every operation assumes bounding boxes).

Fortunately, with the help of Tim Dobson’s HTML and javascript, I managed to have a mostly finished app to present at the end of the day – Crime4U (link to the project page on the RewiredState wiki, constains more information about the app). This allows people to find the ammount of crime in their constituency area, along with the relevant MP and political party (see Denton and Reddish for example).

Since the hackday I’ve been looking at how to visualise the police  data on a map. Obviously, maps have already been done by the official police.uk site, but only for individual neighbourhoods. However, the amount of data makes creating larger maps somewhat difficult.

Naturally it makes no sense to display all of 100000 points on a map – not only would this need a lot of resources, it would also be visually meaningless. The best visual solution is to group the points in areas of high concentration, replacing them with a circle labelled with the number of points – clustering. For a medium amount of data, this can be done client-side in javascript, and there are libraries available, such as MarkerClusterer to easily do this.

Above a certain number of datapoints this process of properly clustering takes too long, so I decided to implement a server side solution that would try to approximate where these clusters would be. To do this, I split the visible size screen in a series of rectangles, and found the number of points in each. I then merged these rectangles together into clusters – having the position of the cluster favour the centroid of the rectangle with the most points. The clusters are then renederd clientside using the cluster drawing parts of the MarkerClusterer library.

Using this method, I created a crimemap for the whole of the UK, and also a clustered map of all the bus stops (and other transport infrastructure) in Britain, another very large geographic dataset. The data takes a few seconds to be fetched, and the clustering is imperfect, but it seems a decent compromise for such a large amount of data.

To improve the clustering and responsiveness, I suspect the answer would be to perform finer (at the moment the rectangles are from a 7×7 grid), or more traditional clustering beforehand. This could then be cached, and returned promptly to the user. I do wonder whether such a system is already available – there are certainly proprietary solutions, that I assume use this approach, but I was not able to find any FOSS libraries for it.

The source code to Crime4U and the maps is availible under the AGPLv3 License. I hope to do more with transport data this coming saturday, at the Lovely Data hack day in Manchester.

by Ben Webb at April 05, 2011 02:14 PM

April 02, 2011

Sean Whitton

Grokking Org-mode and putting it in charge

Some time ago I was looking for a decent outlining tool to take academic notes electronically, and then I got myself into Org-mode and consequently Emacs and away I went and I’ve chatted about this stuff before, but it’s only pretty recently that I’ve actually settled on a fairly complex Org-mode setup that suits my way of working. When I started out I adopted bits and pieces from all over, mainly from the excellent norang.ca doc (look at your scrollbar), but I didn’t know enough about the software and what parts are more significant than others, and I didn’t know my own working habits well enough. But I’ve got a better picture of those now and recently I started having ideas of how I could make things better. So I sat down and reworked everything and I have grown Org-mode up to my needs (crucially: not any higher).

There is a lot of stuff around online about productivity; there was a recent xkcd about the typical cynical view of all this. While I have read some of this stuff, and can see that people with less traditional working schedules than a student’s may find things like GTD allow them to make better use of their time, but in general I tend to be rather cynical (wow! what a surprise!) about it all myself because it’s wonderfully easy to read about this stuff and feel better about yourself rather than actually do whatever it is you need to do. And it’s vital to recognise that these things might have a small motivational effect (setting yourself up properly to do something means you’re more likely to do it) but they’re not going to help motivate you in general. But as I intend to write properly about soonish, I do not have issues with motivation in a big way. My current issues are more focused than that and while a lack of success does feed back into my motivation to keep going and my tendency to procrastinate, it’s secondary to the issue itself.

So why do I spend a great deal of time setting up my organisational systems? My perfectionism is a factor, and as I have said there is some small motivational boost from having a list of things to tick off, as we are all familiar with. The two main reasons for me are because I don’t trust my memory, and because I want control, and this is rather directed and specific. The first reason is self-explanatory. Org-mode allows me to tie everything together electronically and does what I can’t trust my memory to do. I am slowly getting better at taking the decision not to trust my head and to leave it free to try and figure out how to study Philosophy again, and instead let the computer keep track of pretty much everything. While it might be more romantic to have nice notebooks or the ruled refill pad that screams conscientious-and-unpretentious (you should hear the conversations I have with myself on these things), it isn’t actually as good as storing things in a system one has built oneself that one understands, a system of plain text backed up and synced between computers (not “devices”, computers). I don’t need to remember what I’m supposed to be doing because Org-mode can tell me, and I don’t need to remember what’s going on because I read my e-mails/wrote things down and pumped them back into Org-mode — anywhere in my Org files, and they get brought together and organised automatically — and it tells me what I need to know. I over-exaggerate here. I still know what I’m doing and can tell you what’s important to me this week, and whether I’m on track, and I can give you an idea of what that e-mail said about that upcoming event. But I fall back to something that is complete and tailored to suit me and my life like a glove.

My second reason is about control, and it’s about control of my own time and life in the face of the distractions that hit at us from all sides in this world of the consumption of gratifying activities to fill the hours between sleeping. I am fortunate that I am already removed from cheap social gratification, choosing quality communication with friends over constant electronic connection via phones and social networking websites, so I avoid a certain amount of banal chatter, egoism, ranking of one’s life against others etc. Not being materialist I’m not surrounded with toys of various descriptions. But the Internet beckons, oh how it beckons. There are many fascinating websites out there and one can get a great deal out of browsing around the place, but the issue for me is more specific than just spending time reading because, unless one has something else to do, that’s fine. It’s very rare that I allow my browser to distract me from working on something in this way. Instead, I find myself possessed with a need to know or to make use of pieces of knowledge on specific areas of interest for me. Perhaps this will be best illustrated by examples relating to the present: Emacs, Org-mode and Gnus feature prominently. Page with some keybindings from Emacs, not all of which I know? Must spend time absorbing them. Page with a Gnus feature that I’m not aware of (happened today with tree mode)? Must evaluate and assimilate feature into workflow. Article on typography about how one should typeset footnotes? Must see if my LaTeX templates need updating right now. Article on a philosophical topic that I have a strong opinion on? Better read it now. And so on.

All of these things are valuable. I’m pursuing the things that interest me and learning more about how others see the same subjects and that’s great, but the issue is that when one goes off down the rabbit hole for a while one hands over control of what one things is important to one’s surroundings and less conscious inclinations. There is already too much in my life, and I can’t do everything. My Org-mode setup helps me with this in two ways. Firstly, it tells me what I’ve already decided is important to do today, and it tells me the projects I currently have in progress, and it reminds me that unless I want to make a decision to change my mind, this is what I’ve committed to and this is what the real Sean wants, not the temperamental Sean possessed by the excitement of the ability to join two lines and remove the indentation or whatever. Secondly, Org-mode keeps track of interesting things for me and allows me to bring them up. Not sure if I should be reading this but don’t feel comfortable just throwing it aside, and need to get it out of the way in order to focus in on the day’s tasks? No problem, hit a few keys and store it away in my Org files, tagged so that it can be brought up in a list with a few keystrokes.

The response to this, if you don’t like it, is to talk about how a certain flexibility and spontaneity is lost when one rigs oneself up to a schedule when one doesn’t strictly need to. Productivity in the sense of ticking things off on a list of tasks that are considered good doesn’t have to come first, and if you’re at a time in your life when you can be a little more free and perhaps achieve less then you should take advantage of this and float a little more. I don’t think any flexibility goes anywhere though, it’s merely made more thoughtful. If I decide that something else is genuinely more important, running things via my Org-based system forces me to evaluate my own inclinations of the moment critically against the other things I’ve said I’ll do. I can still decide to change things up in any way I like and Org is flexible enough to make this very easy to do. But I’m back in control, which is good; saying otherwise is probably just over-romanticising life in the modern world. And secondly, I am made very unhappy if I feel I am unproductive. With Org-mode I can see my productivity, am happier and thus more productive and indeed everything else goes better.

My goal right now is to take things to the extreme by rigging myself to Org-mode in all my dealings. For the next 30 days I’m forcing myself to make it almost an obsession, so that I can reap the full benefits. Then to regain some flexibility I will be able to slack off, but hopefully I’ve have figured out what level to go to in order to gain the above-described benefits.

I’ll end with a brief description of my system, since I keep referring to it and as I say I’ve put a good deal of time and effort and thought into it lately to grow it up to my needs and ways of working and the kind of things I do. I have a number of core Org files relating to various aspects of my life; the main ones are Academic.org for degree work and related, Oxford.org for all the other stuff I do during term time (so not got much going on at the moment), the almighty TechNotes.org which contains so many notes, links and plans for computer geek stuff and then my catch-all miscellaneous Sean.org which has errands, political notes, ideas for TV shows, films, music and books to look into and the like. Deep in my directory hierarchy there are things like ~/doc/work/philos/history/Hume.org which has all my notes and tasks on Hume. It’s hard to get the balance right between how much one needs to organise and separate one’s files (an interesting blog post on this is to be found here; this is amusing by the same author), but things are made easier because Org-mode is at its heart a piece of outlining software, and outlining models how you think, so a certain amount of organisation just happens automatically as long as you remember to use the keybinding that inserts headings as well as the keys that type text.

But the bigger reason why this doesn’t matter that much is the other component of the system which is Org’s agenda view. This thing is amazing, pulling together tasks from across your Org files, arranging them according to useful metrics such as tags, scheduled dates and deadlines, adding warnings for upcoming deadlines and the like, and then pulling in appointments from either Org-mode itself or an external calendar program, birthdays and wedding anniversaries from your address book and finally it even adds results from Google Weather if you have the right elisp. The key thing I’ve done recently, perhaps, has been realising the significance of the agenda and how building one’s system and customisations around that view rather than around the Org files themselves, which organise themselves as much as is necessary, is the key to success.

The word “agenda” doesn’t do this tool justice. I have four blocks to mine, and you can view something that looks a bit like it here. At the top I have a list of the tasks I’ve marked as in progress. This has two kinds of things in it: tasks that I am actually working on right now/today, and also so-called “stuck projects”, which come out in a different colour (not so on the above-linked export, unfortunately). Below that I have a list of tasks that are waiting on responses from other people. It’s important to look at these each day to see if people need reminding or can be relied upon to just get it done, and it wouldn’t be so good to have these show up as ordinary TODOs. Below that I have my appointments/calendar events, weather, scheduled tasks, daily “habits” or things I wish to accomplish regularly and repetitively, accompanied by coloured progress charts, and then at the very bottom I have a list of all undated TODO items.

Hidden from view are items marked as SOMEDAY. This is a task that doesn’t actually need to be done, unlike a TODO, but that it would be nice to be done — this is Org keeping track of interesting things for me. I bring these up in different categories with other agenda keybindings. And last of all there is my buffer of tasks to refile. These are links and notes I have shoved into Org-mode quickly and unceremoniously and without organisation, and once per day I move them into the appropriate .org files.

by Sean at April 02, 2011 11:43 PM