Back to table of contents

Credit: public domain

Evolution

Andrew J. Ko

Programs change. You find bugs, you fix them. You discover a new requirement, you add a feature. A requirement changes because the world changes, you revise a feature. The simple fact about programs are that they're rarely stable, but rather a constantly changing, living documents that shift as much as the world around them shift.

Nowhere is this constant evolution more apparent then in our daily encounters with software updates. The apps on our phones are constantly being updated to improve our experiences, while the web sites we visit potentially change every time we visit them, without us noticing. These different models have different notions of who controls changes to user experience: should software companies control when your experience changes or should you? And with systems with significant backend dependencies, is it even possible to give users control over when things change?

To manage change, developers use all kinds of tools and practices.

One of the most common ways of managing change is to refactor code. Refactoring helps developers modify the architecture of a program while keeping its behavior the same, enabling them to implement or modify functionality more easily. For example, one of the most common and simple refactorings is to rename a variable (renaming its definition and all of its uses). This doesn't change the architecture of a program at all, but does improve its readability. Other refactors can be more complex. For example, consider adding a new parameter to a function: all calls to that function need to pass that new parameter, which means you need to go through each call and decide on a value to send from that call site. Studies of refactoring in practice have found that refactorings can be big and small, that they don't always preserve the behavior of a program, and that developers perceive them as involving substantial costs and risks (Kim et al. 2012).

Another fundamental way that developers manage change is version control systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. While today the most popular version control system is Git, there are actually many types. Some are centralized, representing one single ground truth of a project's code, usually stored on a server. Commits to centralized repositories become immediately available to everyone else on a project. Other version control systems are distributed, such as Git, allowing one copy of a repository on every local machine. Commits to these local copies don't automatically go to everyone else; rather, they are pushed to some central copy, from which others can pull updates.

Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository (Brindescu et al. 2014). Google uses one big centralized version control repository for all of its projects, however, because it offers one source of truth, simplified dependency management, large-scale refactoring, and flexible team boundaries (Potvin & Levenberg 2016).

When code changes, you need to test it, which often means you need to build it, compiling source, data, and other resources into an executable format suitable for testing (and possibly release). Build systems can be as simple as nothing (e.g., loading an HTML file in a web browser interprets the HTML and displays it, requiring no special preparation) and as complex is hundreds and thousands of lines of build script code, compiling, linking, and managing files in a manner that prepares a system for testing, such as those used to build operating systems like Windows or Linux. To write these complex build procedures, developers use build automation tools like make, ant, gulp and dozens of others, each helping to automate builds. In large companies, there are whole teams that maintain build automation scripts to ensure that developers can always quickly build and test. In these teams, most of the challenges are social and not technical: teams need to clarify role ambiguity, knowledge sharing, communication, trust, and conflict in order to be productive, just like other software engineering teams (Phillips et al. 2014).

Perhaps the most modern form of build practice is continuous integration. This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of continuous integration is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. This only works if builds are fast. For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors (Vasilescu et al. 2015). Some teams even go further than continuous integration, building continuous delivery systems that ensure that complete builds are readily available for release (potentially multiple times per day for software on the web). Having a repeatable, automated deployment process is key for such processes (Chen 2015).

One last problem with changes in software is managing the releases of software. Good release management should archive new versions of software, automatically post the version online, make the version accessible to users, keep a history of who accesses the new version, and provide clear release notes describing changes from the previous version (van der Hoek et al. 1997). By default, all of this is quite manual, but many of these steps can be automated, streamlining how teams release changes to the world. You've probably encountered these most in the form of software updates to applications and operating systems.

Next chapter: Debugging

Further reading

Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig. 2014. How do centralized and distributed version control systems impact software changes? In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 322-333.

Chen, L. (2015). Continuous delivery: Huge benefits, but challenges too. IEEE Software, 32(2), 50-54.

Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2012. A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE '12).

Shaun Phillips, Thomas Zimmermann, and Christian Bird. 2014. Understanding and improving software build teams. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 735-744.

Potvin, R., & Levenberg, J. (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM, 59(7), 78-87.

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integration in GitHub. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 805-816.

André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf. 1997. Software release management. ACM SIGSOFT International Symposium on Foundations of Software Engineering, 159-175.

Podcasts

Software Engineering Daily, Continuous Delivery with David Rice.