« October 2006 | Main | May 2007 »
January 21, 2007
Time sheet - parsing input with Python
We will start developing our application in python.
For the examples you will need a simple text editor like vi or notepad. Or you can even type in the code directly to your python interpreter console. My favorite plain text and code editor is scite. Mac geeks will take the textmate (their only mate ;-) ) just kidding, I'm simply jealous with my boring black PC.
First, lets design our input format. It should be text oriented and easy to type and read, with other words, go into the direction represented by wiki movement and DSLs (domain specific languages).
# project "lookup" table # keyword "project", shortcut, customer name, address - no spaces allowed project N BigCustomer Musterstr.99,Duesseldorf # multiple months per file possible month 01 2007 # lines starting with a number represent entries for single days # fields are separated by spaces or tabs # field sequence: day_of_month project_id from to break remaining_fields_as_dictionary 1 N 9:00 17:30 0:30 comment:this_day_was_very_exhausting taxi:34.56 2 N 8:30 15:00 1:00 3 N 9:00 17:30 0:30 12 N 10:00 20:00 0:30
Opening a file in python is as easy as
f = open("/path/to/sample_input.txt")
going through all the lines and stripping the whitespace at the beginning and the end of every line is as easy as
for line in f: print line.strip()
By the way, if your file is not on your local drive, but somewhere in internet, for example if you are checking it in with subversion or some other WebDAV based tool, you can use openurl instead of open and get a file like object so there is no need for changes to your remaining code:
f = openurl("http://mysvn.example.com/myrepository/my_time_sheet_data.txt") for line in f: print line.strip()
As a next step lets split the line to tokes. Guessed how the function is called? tokens = line.split() It uses any whitespace (spaces, tabs) as delimeter and returns a list. Python list is similar to java's array or arraylist or collection, only more powerful. You can access an element of the list or a range of elements using square brackets:
print tokens[2] # third element, lists are zero-based print tokens[3:5] # fourth to fifth element, the right border is not included print tokens[5:] # remaining elements, after the fifth
A note about the brackets: python has powerful build in language concepts like tuples, lists and dictionaries. For the initializing use parenthesis, square and curly brackets respectively.
# use tuples if the number and meaning of elements are fixed invention1 = ("web", "Tim Berners Lee", 1980) invention2 = ("wheel", "anonymous", -3000) # a list my_breakfast = ["apple", "orange", "tee"] numbers = range(10) # dictionary, use key, colon, value for single elements cities = {"New York":"USA", "Fuchu":"Japan", "Los Angeles":"USA"}
Lets put the details of the project to a dictionary (something like hashmap in other languages):
if tokens[0] == "project": projects[tokens[1]] = tokens
If we have a short name for the project we can easily access for example the address in following way:
projects[the_short_name][3]
Later we can define the container for project details as dictionary too or as a class so we can access the properties of the project more comfortably through the property names instead of numbers.
Putting it all together
projects = {} f = open("/home/vd/work/innoq/Sandbox/vd/TimeSheet/sample_input.txt") projects = {} employeeName = "nobody" for raw_line in f: line = raw_line.strip() if len(line) > 0 and line[0:1] <> "#": tokens = line.split() if tokens[0] == "employee": employeeName = tokens[1] elif tokens[0] == "project": projects[tokens[1]] = tokens elif tokens[0] == "month": print "month" elif tokens[0].isdigit(): print tokens[2], "-", tokens[3] print projects
If you run the source above on our data file, you will get an output like
>python -u "TimeSheet.py" month 9:00 - 17:30 8:30 - 15:00 9:00 - 17:30 10:00 - 20:00 {'N': ['project', 'N', 'BigCustomer', 'Musterstr.99,Duesseldorf']} >Exit code: 0
It took me less than 5 minutes to write this first version of the parsing code. And how many lines of code and how many seconds of your life you need in C++ / Java / Assembler for the parser of your first simple domain specific language?
P.S. I used "if elif else" in my implementation. I am sure, there is a more elegant way for a dispatcher in python. We just need to find out, if there is a "switch" statement in python. ;-) Stay tuned.
Posted by VladimirDobriakov at 8:29 PM
January 14, 2007
Time sheet - the use case
Somebody said "We, developers have a great privilege to be able to develop tools for ourselves".
At our company we have to print out our time sheet every month, let the customer sign it and transfer it to the accounting department. At the end of the month we also hand in the travel costs report, which (for every day of the month) contains the name of the customer, the address of the customer's site (for computing the kilometer allowance), the working time and the particular costs.
By the way, later the tax inspector does some sort of consistency check, so it appears logical to create both documents from the same source.
I was using excel sheets, I mean two separate sheets for time sheet and for travel costs. I mean the first one is on my desktop computer at customer's office (if the project takes longer) and the second one is at home (I do not always carry my notebook with me, it is not allowed to connect it to customer's network anyway). I tried to combat synchronization problems with google docs. You upload your excel sheet, edit docs online with some ajax magic and if you have luck, you can continue editing from any other place. The idea is great and I love all the google products anyway, but this one is a beta, you know. ;-)
I could continue with different scenarios involving a paper based note-book and cross media synchronization problems, but you already got the idea what a mess it is.
The idea to program some application, preferably a web application is obvious. I've heard, some of our guys even started to implement this but abandoned it later. I can imagine why...
A developer can make his own tool, but the time/energy saving must outweigh the development effort. Unfortunately, the case is not so often even for an experienced Java / EJB developer.
Lets have a look on how you can be much more successful with a better programming language and better platform, both feature clean and readable syntax, powerful language concepts, well-thought intuitive API and a light-weight easy-to-install runtime environment.
I am not talking about Ruby on Rails, although it is pretty close to that.
After I've promised so much, stay tuned for the practical solution.
Posted by VladimirDobriakov at 9:09 PM
January 7, 2007
Blog name
I eventually found the subtitle for my blog. “opinionated blog about non-opinionated software”.
Motivation:
I love opinionated blogs like Joel’s. Sometimes it is polarizing and I can not always or even often agree with the author, but I like to hear / read different and unorthodox opinions.
In opposite, with the software and especially source code I strive for elegance and consensus. We should end up with a code that (possibly) everybody in the team loves and which complies with principles like DRY (do not repeat yourself) and is short and easy understandable and maintainable. The opposite to unmaintainable one.
Posted by VladimirDobriakov at 6:52 AM