Most of us use some form of Torque in an academic environment to run scripts on a cluster. Typically I would have some form of script to generate the necessary qsub scripts for submission, but it wasn't ever generic enough. So this has been my working stab at making that happen.
For whatever reason none of the python packages have a function to calculate the Gini coefficient, which is a fairly standard metric for inequality used in economics circles. I wrote this function but I wanted to explain if first.
When I worked on metabolism I didn't have any needs to plot data on an actual geographic area (although I always wished some form of coordinate system existed like that for my data). But in my switch to working with health data I now have tons of spatial data. Moreover this spatial component is a fairly important effect on the patterns and behavior that I observe.
I started working in computational research with no meaningful experience. I spent two years in high school “programming” in C++ on a Windows 98 machine with an IDE that made the programs run (sometimes) through what must have been magic. The past five years have been a constant refinement of the computational research process for me and I've been wanting to write it all down for awhile now so others could learn from it (for those that are curious, I've learned most of my tricks from blogs written by others).
There are a number of points that I want to hit on in the future but for now I want to focus on:
- Creating a standard library
- Record-keeping and on the fly analysis
A standard library
When I first started programming libraries were so damn impressive. The capabilities and sheer amount of code obfuscated from me made them seem as monolithic as the programming language itself. After a little while it becomes obvious that libraries put on their coding pants one leg at a time just like we do (spoiler to newbies). What’s the point of this interlude? To convince you to start buidling your own general purpose “library”.
Well now you sort of can!
I just presented at AMIA on my methodology to use limited patient demographic/location data to estimate the number of patient cases at a smaller geographic locality. The methodology is split between a Monte Carlo simulation and GIS methods (Semi-variogram+Kriged Surface+Geographical Gaussian Simulation) to estimate patient cases. Currently the monte carlo part is already coded and freely available on my
bitbucket. The latter half of the code is coming as soon as we can identify an appropriate open source GIS library to port it the current code into.
Typically matplotlib produces plots that look, well, horrible. This can be a little bit of a pain, especially because I've switched my workflow to lean heavily on ipython notebook to maintain a lab notebook. This means removing xmgrace from my workflow except for manuscript figure preparation since it won't show up inline unless I make the plot, save it as a png, move the file to the notebook directory, and then link the file in a text section. Not exactly user friendly or an improvement to my workflow. Unfortunately, just because I want to make plots in python for exploration doesn't mean that I can tolerate ugly graphs and that godforsaken default font.
I would go back and tell my 10 year old self:
- Yes, you really don't need to have very good handwriting.
- It is okay to scribble in the margins. It means you're bored, so if someone takes affront to the amount of scribbling on work turned in then it's their fault.
- Your decision that "I don't need to be good at writing since I'm good at math" was, in fact, the stupidest decision possible. You really shot future you in the foot with that one. So, thanks!
That's a little bit of a misnomer. I didn't stop checking my email completely. I just started turning on "Do Not Disturb" on my laptop and phone all day. I only look at emails for about two hours every day and it's liberating. I highly suggest trying it.