Taming Bugs in Parallel Software
Beckman Fellow 2010-2011
Computing is shifting from a mostly sequential paradigm, where a computer program performs only one operation at a time, to a mostly parallel paradigm, where a computer program performs many operations at once. With parallel computing, software developers face a more challenging environment. First they must foresee not only what each operation will do by itself but also how multiple operations will affect each other when they execute at the same time. Then when it comes to testing the software, they face the difficulty of reconstructing a given interaction to locate and fix any bugs (also known as errors or faults) that are found.
Professor Marinov’s research project proposes to improve this testing process. On the theoretical side his work will focus on the schedule (also known as the ordering or interleaving) of parallel operations. Specifying all possible schedules results in the most thorough test but can be prohibitively expensive or impossible. Specifying only one schedule lacks coverage but can be very useful in isolating a bug. Wherein lies the best cost-benefit balance for testing that results in reliable, error-free computations?
First his research will develop a novel language that makes it easy to specify schedules. The basic entity here is an event that a test can raise at various points. Further work will address how to execute a test for a given set of schedules, how to generate schedules automatically for a given test, how to generate both the tests and their schedules automatically, how to select subsets of tests to run, and how to prioritize the order of tests to
find bugs faster.
Professor Marinov plans to evaluate the work using about two dozen software programs with documented bug data. The artifacts that are developed will be shared through the Software-artifact Infrastructure Repository and will undergo further refinement through collaborations with private industry.