Information Extraction

Pilot Schedule Optimization Tool
Client
Self Development
YeAr
2025
Category
Natural Language Processing (NLP)
Service
Optimizing Operations
Information Extraction
Tools / Languages Used
  • Python (pdfminer, Pandas, NumPy)
  • Excel for data exploration and filtering
  • Regex for text parsing
  • Jupyter Notebook for development
Technical Skills
  • PDF scraping and information extraction
  • Data cleaning and transformation
  • Combinatorial logic and constraint-based filtering
  • Automation and data structuring for decision support
Soft Skills
  • Translating a real-world manual process into an automated solution
  • Problem-solving and creative application of data skills
  • Clear documentation and user-oriented design
  • Collaboration and communication with end user (pilot scheduling context)
Step 1: Exploratory Data ANalysis
  • Reviewed monthly pilot bid packets provided in PDF format — each containing hundreds of possible flight pairings with dates, destinations, duty hours, and rest periods.
  • Identified challenges: PDFs were inconsistent in format, data was embedded in text blocks, and not easily searchable.
  • Determined the need for a structured dataset to filter flights by date and hour constraints.
Step 2: Solution Design
  • Built a Python scraper using pdfminer and regex patterns to extract flight number, departure/arrival times, duty hours, and layover details from the PDFs.
  • Structured the extracted data into a clean, tabular format using Pandas, then exported to Excel for accessibility.
  • Designed logic to:
    • Exclude flights overlapping specific “off” dates.
    • Ensure compliance with FAA rest period rules (e.g., minimum 10 hours rest).
    • Meet required monthly duty hour minimums.
Step 3: Model Assessment
  • Validated extracted data against sample manual entries for accuracy ( >95% match).
  • Tested multiple scenarios (e.g., different off-day combinations, partial weeks off).
  • Ensured results returned valid flight pairings that met all rest and hour requirements.
  • Added flags in Excel for quick visual validation (e.g., highlighting conflicts or insufficient hours).
Step 4: Results / How It’s Used
  • The final tool automatically outputs flight combinations that satisfy personal preferences and scheduling constraints.
  • Reduced the manual schedule selection process from several hours to just minutes.
  • Provided flexibility to test multiple “what-if” scenarios quickly (e.g., alternative days off).
  • The approach can be adapted for other pilots with similar scheduling constraints or even scaled into a user interface in the future.
  • The final tool automatically outputs flight combinations that satisfy personal preferences and scheduling constraints.
  • Reduced the manual schedule selection process from several hours to just minutes.
  • Provided flexibility to test multiple “what-if” scenarios quickly (e.g., alternative days off).
  • The approach can be adapted for other pilots with similar scheduling constraints or even scaled into a user interface in the future.