Changes From Python 2 to 3 Making Code Refactoring Unavoidable
Being faced with the prospect of having to delve into old code to get it running against a new language version is one of the things developers try to avoid at any cost. It’s hard to understand old code; unexpected changes might introduce new bugs, performance will likely take a hit, and it is a tremendous effort for little payback. After all, you’re adding nothing new, just getting it to run as before. This post will cover some examples of changes from Python 2 and 3 that make this code refactoring unavoidable.
So you have your Python 2 application running just as intended. It’s been working perfectly for a few years now, the development team ironed out all the wrinkles, and everyone is happy with how it performs. It gets the job done and out of the way when it’s no longer needed.
But, out of the blue, a new vulnerability is released, targeting Python 2. Security patches are no longer available as it is out of official support, so your fantastic application, through no fault of its own, is now vulnerable too and has become a security liability for the organization. Being the diligent IT manager you are, you task your development team with prioritizing refactoring the code to run against a secure version of Python 3.
Your team had other projects in hand, and all of that is now put on hold. It’s a costly decision, but it has to be done, right? It takes a few weeks to gather all the details about the necessary code changes and how they affect other, unexpected, or no longer remembered, sections of the code. A large part of the core logic has to be rewritten, as some functionality has been deprecated since Python 2, and the replacement in Python 3 is not drop-in-place, so new unit tests have to be created too.
Another few weeks go by as the developers struggle to understand poorly documented code written by an ex-colleague who is no longer with the organization. There seem to be some sections of code that “just work,” but no one understands why anymore. Oh, and a few dependencies don’t support Python 3 and probably won’t any time soon – the projects seem to no longer be supported.
After months of hard work, the development team returns with a Frankenstein build of a new version of the application. It slightly resembles the original; some UI choices make it tricky to use the accumulated knowledge, so retraining the operators is necessary.
Performance seems to take a hit anytime someone applies any change to the database. No one is quite sure why, but one of the dependencies that had to be written from scratch dealt with that better. And all the initial projects the team had in hand are now late and over-budget too.
The team came back with some of the findings to keep as useful knowledge for future refactoring efforts:
Division:
Python 3: 5/2 == 2.5
Python 2: 5/2 == 2 (doubling on “/” to “//” returns the old behavior)
Dividing integers now returns a float and is no longer an int.
Print:
Python 3: print(“”)
Python 2: print “”
Dictionary changes:
Python 3: dict methods keys(), items(), values() return views
Python 2: those methods return lists. Also, iterkeys(), iteritems, and itervalues() methods are no longer supported.
Sorting:
Python 3: cmp() should no longer be used.
Text and binary:
Text and binary data are now different concepts and should no longer be stored in strings. Some string operations no longer work in data where they worked before because it contains characters no longer allowed on strings.
Tuples:
Tuple parameters can no longer be unpacked as before.
Several keywords were removed:
“<>” is now “!=”.
Integer literals no longer work with a closing l or L.
String literals no longer work with a closing u or U.
Importing everything from a module now only works at the module level, not inside a function.
And many, many other changes (The complete list can be seen here: https://docs.python.org/3/whatsnew/3.0.html, which was used as a source for this article).
At the end of the process, the application works worse than before; no actual benefits were reaped from moving to a new language level – only constructs that replaced old functionality were used, and no new advanced features, as the goal was just to get it to run again. It was a costly task that affected many other organization-level activities and caused disruptions at all levels.
What could have been done instead?
A better alternative exists in the form of Extended Lifecycle Support for Python. Providing timely security fixes without introducing language-level changes, the problem could have been solved simply by deploying the service. The same code would have continued to run as-is, the security vulnerability would have been fixed, and all the other disruptions could have been entirely avoided.