So I agreed to do a presentation on Pywikipediabot for Indonesian Python community. The only problem was I hadn’t really understand it yet. I also didn’t really trust my ability in doing (and reading) Python. Luckily there was still time (a month) before presentation. I should have got plenty time for preparation.
My first step was to take a quick refresher course, A Gentle Introduction to Python from Mechanical MOOC. Because I had exposure to Python before I didn’t really follow the exact time table. I picked some topic I didn’t really understand, mainly object-oriented programming part of Python, and I did the exercises.
Of course this didn’t really cover the Pywikipediabot framework itself, which was rather intimidating to me. I had used it before in limited way, and I found it rather easy to use (not to read or modify though). But some changes had complicated things (at least for me).
First is the move to git from subversion. I had read that git was hard to use, and I rather dreaded the change. The pywikipediabot manual had been updated to cover the git usage, but I found errors if you didn’t do daily update (I only tried to update about a week after initial installation). Subversion was rather more forgiving in this respect. I just deleted and did the clone operation again.
Second the configuration (and installation). There are steps that needs doing after downloading the whole framework that was not documented (the most important one is setting PYTHONPATH). Well at least for the new version of Pywikipediabot. The documentation of the framework was for the old branch (the compat branch), and there are some things that needs to be updated. It seems to be better now.
So for the last days before presentation I decided to tackle the framework itself. I didn’t think I would be really able to master it, so I focused on some things I really wanted to do. That is, mass creating of articles. Admittedly this can be done with other tools (it seems to be easier to be done with Auto Wiki Browser for instance) but Pywikipediabot seems to be more flexible for other uses.
My strategy for creating articles was to put necessary data in a csv (comma separated value) file, then read the records from the file to patch together a wiki formatted text which can be uploaded to Wikipedia. To read the csv file I use csv module from Python standard library, and Pywikipediabot is only necessary to upload the text into Wikipedia.
The script itself is quite simple. I’ve put my presentation online, where you can see yourself the implementation of the strategy. I intended to create articles on our members of parliament, but I couldn’t find good structured data for it. So I did manual data entry just for five MPs, which served the purpose for demonstration.
The difficulty for finding good database of MPs lead me to interest on Open Data, and encouraged me to attend a meetup about that topic on 4 September. But that will be another blog post.
I am rather pleased to agree to do the presentation. First I am much better motivated at programming Python now. I also find some interesting projects to do, like learning to scrape data from websites. My interest on data journalism is also rekindled