2025 PolyU-Job-Post-Enhanced 
GitHub Link: https://github.com/MrPumpkinsss/PolyU-Job-Post-Enhanced
PolyU-Job-Post-Enhanced
PolyU-Job-Post-Enhanced is a web project that crawls job posting data from the PolyU POSS website, processes and enhances the search functionality, and then displays the improved results on my website: https://mrpumpkinsss.github.io/PolyU-Job-Post-Enhanced/.
Table of Contents
- Project Overview
- Features
- Project Structure
- Crawler Script (main.py)
- Setup and Running
- Disclaimer
- Dependencies and Resources
- Contributing
- License
Project Overview
This project extracts job posting information from the PolyU POSS website using a custom Python web crawler. After enhancing the search functionality (including multi-keyword search, case sensitivity options, and sortable columns), the job posting data is displayed in a user-friendly interface built with HTML, CSS, and JavaScript. The crawled data is saved in an Excel file (job_listings.xlsx) and is parsed on the client side using the SheetJS XLSX library.
Features
Data Collection
The Python crawler logs into the PolyU POSS website, navigates through the job board, handles disclaimer pop-ups, and scrapes job posting data including company name, position, recruitment ID, post date, deadline, salary, job type, and detailed job information.Global Search
The web interface offers two search fields that enable live filtering of job listings, with an option for case-sensitive searches.Job Type Filtering
Users can filter job postings via a dropdown menu by various job types (e.g., Graduate Position, Part-time Position, Temporary Position, etc.).Sortable Data
The summary table supports sorting by columns such as “Recruitment ID”, “Post Date”, and “Deadline”. Clicking a header toggles the sort order between ascending and descending.- Data Display
- Left Panel: Displays a summary of job postings by merging key details (company name, position, salary, and job type) along with additional metadata like Recruitment ID, Post Date, and Deadline.
- Right Panel: When a summary row is selected, detailed job information (formatted in HTML) is shown. The content is sanitized to remove images and certain SVG elements.
- Disclaimer Modal
Before accessing the job board data, users must acknowledge a disclaimer that outlines the data source (PolyU POSS), potential data delays or inaccuracies, and liability limitations.
Project Structure
PolyU-Job-Post-Enhanced/
├── index.html # Main webpage that presents job data with search and filtering options.
├── main.py # Python crawler script that scrapes job data from PolyU POSS and generates job_listings.xlsx.
├── job_listings.xlsx # Excel file containing the scraped job data (generated by main.py).
└── README.md # Project documentation file.
Crawler Script (main.py)
The main.py file is a Python script that utilizes Selenium WebDriver to perform crawling on the PolyU POSS website. Its main functions include:
Login Process:
Navigates to the PolyU POSS login page (https://www40.polyu.edu.hk/poss/secure/login/loginhome.do) and submits user credentials.
Note: Replace the placeholder username and password with your own credentials.Handling the Disclaimer:
Waits for the disclaimer modal to appear, selects the necessary checkboxes, and clicks the “Continue” button to proceed.- Scraping Job Details:
Iterates over job postings (across multiple pages) to extract various details such as:- Company name, position, and recruitment ID.
- Post date and deadline.
- Salary and job type.
- Additional job details (e.g., About, Job Description, Skills & Requirements, Application Methods, and Nature of Business).
- Data Saving:
Collected job data is stored in a list and then saved to an Excel file (job_listings.xlsx) with adjusted column widths and row heights usingpandasandxlsxwriter.
How to Run main.py
Install Dependencies:
Ensure you have the required Python packages installed:
pip install selenium pandas xlsxwriterChromeDriver:
Make sure that Chrome is installed and that your ChromeDriver version is compatible with your browser. The ChromeDriver should be in your system path.
Execute the Script:
Run the script in your terminal:
python3 main.pyThe script will log in to PolyU POSS, navigate through the job board, scrape job data, and save it as
job_listings.xlsx.
Setup and Running
Local Setup for the Web Interface
Clone the Repository:
git clone https://github.com/mrpumpkinsss/Job-Post-Enhanced.git cd Job-Post-EnhancedFront-End Dependencies:
The web interface utilizes resources loaded via CDNs (e.g., SheetJS for parsing Excel data). No additional installation is required.Launch a Local Server:
You can use an HTTP server (e.g., Live Server in VSCode or http-server):npx http-serverAccess the Project:
Open your browser and navigate to the provided URL (e.g.,http://localhost:8080) to view the website.
Deployment on GitHub Pages
- Push the code to your GitHub repository.
- In the repository settings, activate GitHub Pages using either the
mainorgh-pagesbranch. - The site will then be available at https://mrpumpkinsss.github.io/Job-Post-Enhanced/.
Disclaimer
Disclaimer:
The information on this website is collected via web crawling from the PolyU POSS website. It is not directly provided by the original job posting institution.
- The data may be delayed or inaccurate and is intended for reference only.
- Users should verify the original data or contact the publisher before making any decisions.
- Neither this website nor its operators are liable for any inaccuracies or delays in the presented information.
- Discrepancies due to updates on third-party websites are not the responsibility of this website.
Users must click the “Agree” button on the disclaimer modal before proceeding.
Dependencies and Resources
SheetJS XLSX Library:
Loaded via CDN: https://cdnjs.cloudflare.com/ajax/libs/xlsx/0.18.5/xlsx.full.min.jsGoogle Fonts:
The project uses the Roboto font family, loaded from: https://fonts.googleapis.com/css2?family=Roboto:wght@400;500;700&display=swapSelenium WebDriver:
Used inmain.pyfor automating browser interactions on PolyU POSS.
Contributing
Contributions are welcome! If you have suggestions for improvements or encounter any issues, please feel free to open an issue or submit a pull request.
License
This project is open source and distributed under the MIT License.
