GitHunt

Foundit Jobs Scraper πŸ”

Foundit Jobs Scraper helps you collect structured job listing data from Foundit.in search results, including role details, company profiles, and recruiter metadata.
It’s built for teams who need reliable job market data at scaleβ€”great for analytics, research, and recruitment workflows.

Bitbash Banner

Telegram Β  WhatsApp Β  Gmail Β  Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for foundit-jobs-scraper you've just found your team β€” Let’s Chat. πŸ‘†πŸ‘†

Introduction

This project scrapes job listings from Foundit.in (formerly Monster India) and returns clean, structured records per job.
It solves the problem of manually collecting and normalizing job data across multiple search result pages.
It’s designed for analysts, recruiters, founders, and developers who need repeatable job data extraction for reporting or monitoring.

Built for job market data workflows

  • Supports multiple Foundit search URLs in one run for batch collection.
  • Extracts job, company, and recruiter fields into a consistent schema.
  • Includes proxy support plus anti-blocking and retry controls for stability.
  • Streams results to storage as they’re processed to reduce memory pressure.
  • Lets you cap collection volume with a configurable maxItems limit.

Features

Feature Description
Multi-URL batch scraping Provide multiple search result URLs and collect jobs across them in one run.
Structured output schema Returns normalized fields for job details, company info, recruiter, and URLs.
Configurable item limit Use maxItems to control runtime, cost, and dataset size.
Proxy rotation support Runs with proxy configuration to reduce blocks and improve reach.
Anti-blocking safeguards Adds rate limiting, retries, and resilient navigation logic.
Real-time processing Writes results progressively to avoid large in-memory batches.
Error recovery Retries failed pages and continues safely when partial failures occur.

What Data This Scraper Extracts

Field Name Field Description
searchUrl The Foundit search URL used to discover the job listing.
jobId Unique job identifier from the platform.
title Job title as displayed in the listing.
locations Job location(s) (single or multiple cities/regions).
experience.min Minimum years of experience required (when available).
experience.max Maximum years of experience required (when available).
salary.currency Salary currency (e.g., INR) when present.
salary.isConfidential Indicates whether salary is hidden/confidential.
company.name Company name posting the job.
company.profile Company description/profile text (when available).
company.id Company identifier on the platform.
postingDetails.createdAt Original posting timestamp (ISO format when available).
postingDetails.updatedAt Relative/absolute last update information (when available).
postingDetails.closedAt Closing date/time when listed (if provided).
postingDetails.totalApplicants Total applicants count shown on the listing (if available).
jobDetails.industries Industry categories associated with the job.
jobDetails.functions Functional categories (e.g., IT, Sales).
jobDetails.roles Role categories/titles mapped by the platform.
jobDetails.employmentTypes Employment type (e.g., Full time).
jobDetails.skills Skills text/keywords (comma-separated or free-form).
jobDetails.designations Designations associated with the role.
recruiter.id Recruiter identifier (if provided).
recruiter.name Recruiter name (if provided).
urls.jobUrl Direct job listing URL.
urls.companyUrl Company jobs/career page URL.
status.isUrgentHiring Whether the listing is flagged as urgent hiring.
status.isHotJob Whether the listing is flagged as a hot job.
status.quickApply Whether quick apply is enabled.
status.activeJob Whether the job appears active/open.

Example Output

[
  {
    "searchUrl": "https://www.foundit.in/srp/results?query=ai&locations=Bengaluru+%2F+Bangalore&searchId=3c714d81-f4d1-4031-b35e-c86bf504caf8",
    "jobId": 28333095,
    "title": "Gen AI Developer",
    "locations": "Bengaluru, Hyderabad",
    "experience": { "min": 8, "max": 12 },
    "salary": { "currency": "INR", "isConfidential": false },
    "company": {
      "name": "Birlasoft Limited",
      "profile": "Birlasoft, a global leader at the forefront of Cloud, AI, and Digital technologies...",
      "id": 776562
    },
    "postingDetails": {
      "createdAt": "2024-04-12T11:05:25.000Z",
      "updatedAt": "6 days ago",
      "closedAt": "2025-04-18T18:30:00.000Z",
      "totalApplicants": 369
    },
    "jobDetails": {
      "industries": ["IT/Computers - Software"],
      "functions": ["IT"],
      "roles": ["Software Engineer/Programmer", "Team Leader/Technical Leader"],
      "employmentTypes": ["Full time"],
      "skills": "Gen AI Developer, Gen AI LLM Data science,Data Science, Machine Learning",
      "designations": ["Software Engineer/Programmer", "Team Leader/Technical Leader"]
    },
    "recruiter": { "id": 1191711, "name": "Nitu Kumari" },
    "urls": {
      "jobUrl": "https://www.foundit.in/job/gen-ai-developer-birlasoft-limited-bengaluru-bangalore-hyderabad-secunderabad-telangana-28333095",
      "companyUrl": "https://www.foundit.in/search/birlasoft-limited-776562-jobs-career"
    },
    "status": {
      "isUrgentHiring": false,
      "isHotJob": false,
      "quickApply": true,
      "activeJob": true
    }
  }
]

Directory Structure Tree

Foundit Jobs Scraper πŸ”/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ runner.py
β”‚   β”œβ”€β”€ crawlers/
β”‚   β”‚   β”œβ”€β”€ foundit_search_crawler.py
β”‚   β”‚   └── foundit_job_crawler.py
β”‚   β”œβ”€β”€ extractors/
β”‚   β”‚   β”œβ”€β”€ job_parser.py
β”‚   β”‚   β”œβ”€β”€ company_parser.py
β”‚   β”‚   β”œβ”€β”€ recruiter_parser.py
β”‚   β”‚   └── schema_normalizer.py
β”‚   β”œβ”€β”€ net/
β”‚   β”‚   β”œβ”€β”€ http_client.py
β”‚   β”‚   β”œβ”€β”€ proxy_manager.py
β”‚   β”‚   └── rate_limiter.py
β”‚   β”œβ”€β”€ storage/
β”‚   β”‚   β”œβ”€β”€ dataset_writer.py
β”‚   β”‚   └── exporters.py
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ defaults.json
β”‚   β”‚   └── logging.yaml
β”‚   └── utils/
β”‚       β”œβ”€β”€ dates.py
β”‚       β”œβ”€β”€ text.py
β”‚       └── retry.py
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ inputs.sample.json
β”‚   └── output.sample.json
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_job_parser.py
β”‚   β”œβ”€β”€ test_schema_normalizer.py
β”‚   └── fixtures/
β”‚       └── job_page_sample.html
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ LICENSE
└── README.md

Use Cases

  • Recruiters use it to track role availability across cities, so they can prioritize outreach and sourcing faster.
  • Market researchers use it to analyze hiring demand by industry and function, so they can publish trend insights with real numbers.
  • HR teams use it to benchmark titles and experience ranges, so they can calibrate job leveling and compensation discussions.
  • Founders and operators use it to monitor competitor hiring, so they can infer growth signals and team expansion plans.
  • Data teams use it to feed dashboards and reports, so they can keep hiring analytics up to date with minimal manual work.

FAQs

How do I provide multiple searches in one run?
Add multiple entries to searchUrls in the input. The scraper will iterate through each URL and merge results into one dataset, while still preserving the original searchUrl field per job for traceability.

What happens if some jobs don’t show salary or applicants?
The output schema remains consistent, but optional fields may be missing or set to null depending on availability. Salary often appears as confidential; this is captured using salary.isConfidential so you can filter those listings later.

How does it handle blocking, rate limits, or timeouts?
It uses configurable rate limiting, automatic retries with backoff, and proxy support to reduce blocks. If a page fails after retries, it logs the error and continues, preventing one bad page from stopping the full batch.

Does it extract details from individual job pages or only from results pages?
It’s designed to collect comprehensive fields (job, company, recruiter, and status), which typically requires visiting job detail pages for completeness. If a field is not present on the page or is restricted, it will not be fabricated.


Performance Benchmarks and Results

Primary Metric: Average throughput of 18–35 job records per minute when scraping 1–3 search URLs with detail-page enrichment enabled and moderate rate limiting.

Reliability Metric: 96–99% successful job detail extraction on stable connections when using rotating proxies and automatic retries (3 attempts with exponential backoff).

Efficiency Metric: Memory footprint stays under ~250–450 MB for runs of 1,000 jobs due to streaming writes and minimal in-memory buffering.

Quality Metric: Typical field completeness of 85–95% for core fields (title, jobId, locations, company, urls), with optional fields (salary, applicants, recruiter) varying based on listing availability.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
β˜…β˜…β˜…β˜…β˜…

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
β˜…β˜…β˜…β˜…β˜…

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
β˜…β˜…β˜…β˜…β˜