I just stumbled upon this article on Reddit. This echos a number of things I may not have mentioned on this blog, but I’ve said it elsewhere in talks and when speaking with friends. The reasons cited are (of course this is actually more than 10):
- Freedom
- Readability
- Documentation system
- High level vs. low level
- Standard library
- 2rd Party OSS for Science Work
- Data Structures
- Module system
- Project scalability
- Calling syntax
- Multiple Programming Paradigms
- Distribution system
Definitely worth looking over if you’re on the fence about switching away from something like MATLAB. Below I’ll discuss some of these points:
Freedom
One of the first and foremost advantages of using Python, NumPy, SciPy and related modules is that they’re available for free and under quite liberal licenses (MIT, BSD, etc.. rather than GPL). This means you’re really free to do with these whatever you want, with very minimal exceptions.
Readability
Python syntax, in part by enforcing certain white-space constraints results in very readable code. As the linked article notes, this is by design.
Documentation
Python has a fabulous system for documentation which is a language feature and not some strange form of comment that one has to rigidly structure in order for something like Doxygen to scrape out relevant details. A great example is the Python core documentation. Docstrings support describing method/function behavior inline in the code in a fairly elegant fashion which immediately becomes available through the documentation system. Additionally Python docstrings can even incorporate unit tests which double as usage examples.
High level vs low level
Python provides allows you to write easy to read, highly maintainable code, but what about performance? Python provides a number of approaches for dealing with performance that provide a great tradeoff to allow the developer to write maintainable code and then jump down to lower levels in order to optimize spots where performance might be less than desired. At the higher levels, using appropriate data structures/types and utilizing functionality that these types provide can make up much of the performance gap with lower level implementations in C or Fortran. Here I’m simply referring to what MATLAB frequently calls vectorization, if I want to take two matrices of equal shape and multiply them by one another, one could do this with a for loop and operate on these element-by-element in Python, but this will be much slower than desired. Using the functionality built into modules like NumPy allow you to do this operation by simply multiplying one matrix by the other in a single binary operation. Python also provides a number of great ways to integrate lower level Fortran or C/C++ code as needed when these types of Python-only efficiency gains are not enough. In my opinion truly great abstractions acknowledge that they don’t cover every usage case and provide workable methods to break out of the pure abstraction without completely tearing it down.
Standard library
Python is batteries included. You get a huge variety of high quality modules out of the box without having to search the web for build instructions or precompiled binaries for doing simple things like working with HTTP servers, compressing/decompressing files, parsing JSON, XML, threading/multiprocessing, etc.. NumPy doesn’t come bundled with Python, but you won’t spend huge amounts of time pulling together modules from tons of sources just to get basic functionality that you need.
The other items mentioned in the article are certainly important ones as well, but those above are some of the more important ones for me.