Practical, Systematic Fuzz Testing for Securing Scientific Software
Funding Agency: National Science Foundation under the Cybersecurity Innovation for Cyberinfrastructure (CICI) Transition to Cyberinfrastructure Resilience (TCR) program.
Award: $1,200,000
Dates: 01-OCT-2024 through 30-SEP-2027
Collaborative proposal with University of Utah
Dates: 01-OCT-2024 through 30-SEP-2027
Collaborative proposal with University of Utah
 
Unfortunately, the semantic gaps between conventional and scientific computing leaves fuzzing far less effective on scientific software: the lack of scalable, cross-language program analysis and instrumentation hinders the fuzzing of today’s complex, multi-language scientific applications; and worse yet, the intricate, highly-structured data formats expected by scientific software are seldom formalized, restricting the world’s most powerful fuzzers to testing only surface-level code. These asymmetries limit scientific software developers
from thoroughly vetting their code, and impede responsible vulnerability disclosure efforts for high-value targets like the PETSc and SciPy scientific APIs. Thus, combating the ever ever-increasing threat of cyber attacks targeting critical scientific cyberinfrastructure demands that high-performance, systematic fuzzing techniques be transitioned to today’s scientific software ecosystem.
This project will transition research in cybersecurity, software engineering, and systems to bring thorough, systematic vetting to scientific software:
- Cross-Language Instrumentation: We will adapt state-of-the-art code instrumentation platforms to enable effective feedback-guided fuzzing of software comprising multiple programming languages; as well as the tracking of cross-language program events toward accelerated vulnerability discovery.
- Automatic Interface Harnessing: We will introduce mutation testing for automating synthesis of fuzzing interfaces, injecting fuzzer-generated test data directly to programs’ core functionality; and integrate these systems within key software development platforms toward proactive pre-release security vetting.
- Input Specification Extraction: We will repurpose program analysis techniques for reconstructing the structure and semantics of scientific software’s complex input data formats; and redesign existing tooling to leverage these retrieved input specifications for thoroughly auditing scientific application code.
 
