Datademo summer project retrospective

A few months ago I entered my project idea to Datademo (http://datademo.fi/english/), a Finnish micro funding program for Open Data projects. This post describes how my project went and what I learned.

The idea

My idea was to visualize connections between people and organizations in Finnish politics. It would require gathering the raw data, cleaning it up and then visualizing it. I wanted to make an interactive service, not just plain visualization.

How the project went?

Datademo submission and voting

I must say the Datademo process was very easy for the participant. The barrier to entry was very low and I got good feedback from other participants. Voting process was very open, all the participants had the chance to vote their favorites.

My project was one of the winners to get the funding, thanks to all the voters.

I didn’t claim the funding, however, because I didn’t want to gamble with the unemployment office (the Finnish unemployment system almost certainly takes away your benefits if you do anything else than just seek jobs).

Parsing the data

I started the project by scraping the official parliament web site to get the raw data into machine readable form. This was messy because the Finnish Parliament pages are not designed with openness in mind. Everything must be done by tedious scraping and manual cleaning.

I spent the first couple of weeks to get the data into proper form (tab-separated values) for processing. The scraping program was written in Clojure and enlive html processing library. I had to cheat with html fetching by saving all the 200 parliament member html pages to my dev machine because I didn’t want to fight with the page structure.

Writing the html parser was rewarding, but it took time as I was still learning Clojure when I started the project. So writing a working parser was a big win for me.

Parser automated the data extraction process, but the raw data still contained plenty of errors which had to be corrected manually. I a used google spreadsheet for storing and editing the data. It took a few days to complete the manual edit phase. Finally, I exported the data as tab-separated value files.

Side note: using data for only 200 parliament members limits the practical investigative usability of the project. It would be more useful with the data for all the municipality politicians because they have tighter coupling to organizations and corporations.

Populating the database

I wanted to learn and use Neo4j graph database. I managed to get the basics working quite easily as the query language is pretty easy. At this point I implemented only read operations. The data entry was done in a batch run from the command line. I wrote some command-line Python code to convert the data from tab-separated values to Neo4j cypher batch format.

Building the UI

UI was my biggest obstacle in the project.

At first I tried to implement the user interface with hiccup, a Clojure html templating library. I managed to create text-only version which displayed the connections in list format.

For the second version I learned Om, a Clojure interface for Facebook’s React JS library. I enjoyed working with Om but it was quite hard to learn. With Om I managed to create a working search and basic editing functionalities.

For creating apps in Om, you have to learn yet another technology stack; Clojure core.async and ClojureScript, which take time and effort to understand properly. Fortunately I succeeded learning enough ClojureScript this time, I had tried it previously, but it was too alpha for me back then. Unfortunately I didn’t have enough time to put into learning core.async and my project stalled.

I managed to add D3.js graph library to the project and verified it with very basic code.

The results?

The sources for the project are on my github project page at https://github.com/tfrisk/connections.

I couldn’t make the visualizations working. In that sense the project was a fell short.

On more personal perspective the project was a success because I learned new technologies and made the learning process public. My Clojure skills are now much better than before and I learned the basics of Neo4j, Om, React and core.async.

Quick root cause analysis

What are the reasons the project didn’t met its original goals?

Too many things to learn at the same time

I realize I tried to learn too many things in a limited time. It took too much time to learn the basics of every new technology and this made the overall progress too slow. Almost the whole technology stack I selected was new to me.

But you have to learn new things to improve yourself or else you’ll end up as outdated. I don’t regret my design choices, now I know what these technologies are and I’ll make better decisions in the future. Don’t be afraid to fail.

Anyways if you plan to participate in a timeboxed project, be aware of learning requirements. Using new technologies in a project is the norm, but the process is slower when you are learning.

Technology choices

Because Om, ClojureScript and core.async are still in alpha development phase, they lack the documentation the established technologies have. Fortunately the community is very helpful and supportive.

Using cutting edge technology was a deliberate risk, but I really wanted to learn them.

Goal setup

Looking back it’s evident I should have started with something smaller and then grow it further. I was too ambitious. It would be wise to use existing tools to verify your idea before creating your own. Unfortunately the existing tools are often either hard to learn or expensive.

Real life commitments

The first half of the project time period was fine and I had time to work for the project. Then I made the best progress and my motivation was good.

During the second half I had other things going on in my life, family matters that were far more important than programming. These ”real life distractions” were the ultimate reason why my project didn’t finish as planned. I just had to prioritize and I chose the real life.

Conclusion

I still think the project was fun and worth my effort. I didn’t manage to create a new awesome service, but I was able to grow my programming skills which will help me in the future.

Participating in Datademo was a good thing and I met nice people there.

I still have the dataset which can be useful for someone.

If you want to share your thoughts on this, please comment or tweet.

Focus is hard at online work

Everyone knows that focusing on work at hand is important, otherwise you’ll get nothing done. This is especially important in online work. I can remember being more productive way back when the internet wasn’t this ubiquitous as today.

How to get better focus then? Here are some ways I have tried:

Be offline – it works if you don’t need online services to do your work.

Shut down the Social Media Sirens – many services we use are carefully designed to make us addicted to them and use them all day, every day. Shut them down while you are working. This applies to email as well.

Turn off notifications – they distract you every time they arrive.

Use small computer screen – this may sound wrong, but it helps you to focus on the task at hand. See this excellent article for further info: http://mattgemmell.com/small-screen-productivity/ .

Listen to music – it really helps you to focus. I have been listening various web radios, but they always have some annoyance; short playlists, constant commercials or frequent talk. Just recently I found https://www.focusatwill.com which is a service made for this. I have used it for a week now and I like it so far. It doesn’t get in my way and the music selection is nice.

Remove people distractions – open offices are evil and ruin your productivity with a constant flow of people distracting you whether they intend it or not. Your eyes notice movements and your ears are listening the signs of danger. This leads to reduced focus on your actual task as your body is busy avoiding the dangers in the wild. I’m not saying you have to move to solitary confinement, but moving to a less crowded place might help.

Block distracting web content – use ad blocks and noscripts on your browser while you work to filter out distracting web elements. As a side effect, your browsing will be more responsive. Other option is to block specific web sites with apps like http://selfcontrolapp.com/.

Set yourself time limits and have breaks – if you allow yourself to work for long periods without breaks your productivity will decrease over time. Having breaks also allows your mind to process the problems on the background which is really important. One method to do this is a https://en.wikipedia.org/wiki/Pomodoro_Technique.

This was a brief look at how to get better focus to work. If you want to share your thoughts, please comment.