Viper Compiler – Language Processors Project

Overview

The Viper Compiler represents a complete academic project built for the Language Processors course at Universidad Carlos III de Madrid. Designed to implement and integrate key compiler phases—lexical, syntactic, and semantic analysis—this compiler is robust, modular, and highly maintainable, featuring advanced error handling, simplification of grammar complexity, and additional pre-processing capabilities.

Core Components & Highlights

Lexical Analysis

  • Supports scientific notation and significant digit management.
  • Unified token handling for numeric values (hexadecimal, binary, octal, decimal).
  • Improved character analysis with extended ASCII validation.

Syntax Analysis

  • Simplified and refined LL grammar to reduce complexity and ambiguity.
  • Explicitly restricts nested function and type definitions for clear scope management.
  • Comprehensive rules for error detection and reporting, avoiding cascading parsing errors.

Semantic Analysis

  • Object-oriented Python hierarchy for semantic evaluation (Expression, literals, function calls).
  • Type inference method (infer_type) for dynamic semantic checks and automatic conversions.
  • Rigorous scope checking for function-local variable declarations to avoid naming conflicts.

Extra Features & Capabilities

Local Scope Management

  • Implemented robust scope checks ensuring local variables are managed within function contexts.
  • Allows safe re-declaration of global variables within functions without scope conflicts.

Enhanced Error Recovery

  • Continued analysis despite encountering lexical, syntactic, or semantic errors, ensuring maximum error reporting from a single pass.

Preprocessing Functionality

Inspired by the C preprocessor, this tool includes two primary directives:

  • %append <path>: Inserts external file contents at specified points.
  • %supplant <old> <new>: Performs global replacements in the source file before lexical analysis.

Testing & Automation

  • Developed an extensive automated testing suite managed by Bash scripts.
  • Automatic generation and comparison of .token, .symbol, .record, and .error files.
  • Ensured reproducibility and correctness by standardizing test environments on Linux-based UC3M Virtual Classrooms.

Professional Skills Developed

  • Compiler Construction: Deep understanding of lexical, syntactic, and semantic analysis; error recovery; and language parsing techniques.
  • Python Expertise: Advanced usage of PLY (Python Lex-Yacc), object-oriented design, and scripting for automation.
  • Software Engineering Principles: Modularity, code reuse, automated testing, and documentation.

Challenges & Lessons Learned

This project underscored the complexity of building integrated compiler systems capable of robust error detection and handling. The development required careful consideration of modular design, the implementation of sophisticated parsing techniques, and the automation of validation tests to maintain quality throughout development phases.

Despite constraints such as concurrent academic responsibilities, the final product is cohesive, robust, and effective in achieving all initial goals and requirements.

Future Directions

Potential improvements include refining error messages for better clarity, expanding pre-processing capabilities (conditional compilation, advanced macros), and integration with backend code generation tools.

Access & Resources

  • Source Code & Documentation: Available upon request or through the GitHub Repository.
  • Detailed Usage Instructions: Included in the project documentation.