Jul 12, 2014

GSoC 2014 - Updates for weeks 6 to 8

A look at what I have been upto during Weeks 6, 7 and 8 in my IDLE Improvements project as a part of Google Summer of Code 2014.
I have mainly been working on adding a functionality into IDLE wherein one can analyze the file being edited from within IDLE

The Aim

The overall idea is as follows: The user should be able to analyze the file that is being edited from within IDLE. "Analyze" encompasses tools likes pep8, pyflakes and pylint. It should also be capable of handing *any* other program(whether they are similar to the ones mentioned earlier, or otherwise). The benefits of such a feature includes, but not limited to greater PEP8 enforcement, ability to detect syntax errors, common typo's and feedback on coding style.



The different Approaches...

Over the past three weeks, I have attempted two different approaches to implement the above feature.

1. Using subprocess.Popen
This approach assumes that the 3rd party program is callable from the command line.
./xyz filename.py <additional-args>
All programs which satisfy this criteria can be called using Popen. An advantage of using this approach is that maintaining it is very easy - you only have to detect and correct bugs which are induced in the actual runner code, i.e. you don't have to worry about changes in the 3rd party program(read point 2 for an opposite situation).
In this approach, all the config settings like name of program, its location in the filesystem, its config files, additional arguments to be supplied to it are all stored in a single config file. The values are retrieved from this config file as and when required. The user is responsible for ensuring that the 3rd party program works as expected/ updating to meet personal requirements etc.
When the user requests to analyze a file in IDLE, subprocess.Popen is called with the required parameters and the result is displayed to the user.

The problems that I did face in this approach are mostly due to differing behavior between POSIX and non-POSIX systems. What worked flawlessely on my system(Linux) would either "hang" or become non-responsive on Windows systems. Debugging on Windows via a Virtual Machine(VM) is a rather time consuming affair.
Then there was the issue of deadlocking when using PIPE for both stdout and stderr. Using both Popen.proc and Popen.communicate would deadlock when the output produced was large.


2. Importing modules from a standard interface
In this approach, there is a generic API which will be bundled into IDLE. This API is responsible for searching "specific 3rd party program" API's in predefined locations(site-packages folder of the Python installation). The 3rd party program API's are to be installed by the end-user using PIP. IDLE will auto-detect all such 3rd party program API's and configure them to default values. The API skeleton of all 3rd party program API's are same.
When the user requests to analyze a file in IDLE, the specific API is loaded and the methods are called in order(preprocess, process, obtain result etc).

Maintenance is an issue with this approach. We have to ensure that the generic API is bug free. Given the fact that this is going to part of idlelib in Python's stdlib, this is perfectly agreeable. But, what might prove cumbersome in the future is that, we would have to  maintain tens(or probably hundreds) of 3rd party program specific API's. We have to constantly keep updating  them whenever they change interfaces etc.
Keeping track of such changes in itself is a time consuming and resource sapping task.

What next?
We have agreed that the best way forward, atleast initially would be to continue using subprocess.Popen approach. Once all the OS specific differences related to subprocess.Popen have been understood and accounted for in the code, it *should* be smooth sailing.
I also plan to make the user experience better on programs which take a lot of time by redirecting the output to a file and polling the file continuously in the result display window.


No comments:

Post a Comment