2025 PolyU-Job-Post-Enhanced GitHub

GitHub Link: https://github.com/MrPumpkinsss/PolyU-Job-Post-Enhanced

PolyU-Job-Post-Enhanced

PolyU-Job-Post-Enhanced is a web project that crawls job posting data from the PolyU POSS website, processes and enhances the search functionality, and then displays the improved results on my website: https://mrpumpkinsss.github.io/PolyU-Job-Post-Enhanced/.

Table of Contents

Project Overview

This project extracts job posting information from the PolyU POSS website using a custom Python web crawler. After enhancing the search functionality (including multi-keyword search, case sensitivity options, and sortable columns), the job posting data is displayed in a user-friendly interface built with HTML, CSS, and JavaScript. The crawled data is saved in an Excel file (job_listings.xlsx) and is parsed on the client side using the SheetJS XLSX library.

Features

  • Data Collection
    The Python crawler logs into the PolyU POSS website, navigates through the job board, handles disclaimer pop-ups, and scrapes job posting data including company name, position, recruitment ID, post date, deadline, salary, job type, and detailed job information.

  • Global Search
    The web interface offers two search fields that enable live filtering of job listings, with an option for case-sensitive searches.

  • Job Type Filtering
    Users can filter job postings via a dropdown menu by various job types (e.g., Graduate Position, Part-time Position, Temporary Position, etc.).

  • Sortable Data
    The summary table supports sorting by columns such as “Recruitment ID”, “Post Date”, and “Deadline”. Clicking a header toggles the sort order between ascending and descending.

  • Data Display
    • Left Panel: Displays a summary of job postings by merging key details (company name, position, salary, and job type) along with additional metadata like Recruitment ID, Post Date, and Deadline.
    • Right Panel: When a summary row is selected, detailed job information (formatted in HTML) is shown. The content is sanitized to remove images and certain SVG elements.
  • Disclaimer Modal
    Before accessing the job board data, users must acknowledge a disclaimer that outlines the data source (PolyU POSS), potential data delays or inaccuracies, and liability limitations.

Project Structure

PolyU-Job-Post-Enhanced/
├── index.html          # Main webpage that presents job data with search and filtering options.
├── main.py             # Python crawler script that scrapes job data from PolyU POSS and generates job_listings.xlsx.
├── job_listings.xlsx   # Excel file containing the scraped job data (generated by main.py).
└── README.md           # Project documentation file.

Crawler Script (main.py)

The main.py file is a Python script that utilizes Selenium WebDriver to perform crawling on the PolyU POSS website. Its main functions include:

  1. Login Process:
    Navigates to the PolyU POSS login page (https://www40.polyu.edu.hk/poss/secure/login/loginhome.do) and submits user credentials.
    Note: Replace the placeholder username and password with your own credentials.

  2. Handling the Disclaimer:
    Waits for the disclaimer modal to appear, selects the necessary checkboxes, and clicks the “Continue” button to proceed.

  3. Scraping Job Details:
    Iterates over job postings (across multiple pages) to extract various details such as:
    • Company name, position, and recruitment ID.
    • Post date and deadline.
    • Salary and job type.
    • Additional job details (e.g., About, Job Description, Skills & Requirements, Application Methods, and Nature of Business).
  4. Data Saving:
    Collected job data is stored in a list and then saved to an Excel file (job_listings.xlsx) with adjusted column widths and row heights using pandas and xlsxwriter.

How to Run main.py

  1. Install Dependencies:

    Ensure you have the required Python packages installed:

    pip install selenium pandas xlsxwriter
    
  2. ChromeDriver:

    Make sure that Chrome is installed and that your ChromeDriver version is compatible with your browser. The ChromeDriver should be in your system path.

  3. Execute the Script:

    Run the script in your terminal:

    python3 main.py
    

    The script will log in to PolyU POSS, navigate through the job board, scrape job data, and save it as job_listings.xlsx.

Setup and Running

Local Setup for the Web Interface

  1. Clone the Repository:

    git clone https://github.com/mrpumpkinsss/Job-Post-Enhanced.git
    cd Job-Post-Enhanced
    
  2. Front-End Dependencies:
    The web interface utilizes resources loaded via CDNs (e.g., SheetJS for parsing Excel data). No additional installation is required.

  3. Launch a Local Server:
    You can use an HTTP server (e.g., Live Server in VSCode or http-server):

    npx http-server
    
  4. Access the Project:
    Open your browser and navigate to the provided URL (e.g., http://localhost:8080) to view the website.

Deployment on GitHub Pages

  1. Push the code to your GitHub repository.
  2. In the repository settings, activate GitHub Pages using either the main or gh-pages branch.
  3. The site will then be available at https://mrpumpkinsss.github.io/Job-Post-Enhanced/.

Disclaimer

Disclaimer:
The information on this website is collected via web crawling from the PolyU POSS website. It is not directly provided by the original job posting institution.

  • The data may be delayed or inaccurate and is intended for reference only.
  • Users should verify the original data or contact the publisher before making any decisions.
  • Neither this website nor its operators are liable for any inaccuracies or delays in the presented information.
  • Discrepancies due to updates on third-party websites are not the responsibility of this website.

Users must click the “Agree” button on the disclaimer modal before proceeding.

Dependencies and Resources

Contributing

Contributions are welcome! If you have suggestions for improvements or encounter any issues, please feel free to open an issue or submit a pull request.

License

This project is open source and distributed under the MIT License.