LinkedIn Business Niches Scraper

LinkedIn Business Niches Scraper helps you discover businesses in specific cities and industries by combining targeted Google queries with LinkedIn result parsing. It turns messy search results into clean, structured business lead data you can filter, enrich, and use for outreach or market research. Built for teams that need fast niche discovery without manual searching.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for linkedin-business-niches you've just found your team — Let’s Chat. 👆👆

Introduction

This project finds and structures business leads by location + category using Google-to-LinkedIn discovery flows.
It solves the time-waste of manually searching, opening results, and copying details into spreadsheets.
It’s designed for growth teams, agencies, recruiters, and analysts who need reliable niche lead discovery at scale.

City + Industry Business Discovery

Generates search queries that combine city + niche keywords to surface relevant business profiles.
Visits and evaluates results with browser automation to handle dynamic pages and redirects.
Extracts consistent business identifiers (name, profile URL, website, category hints) for clean exports.
Supports pagination and throttling controls to keep runs stable and repeatable.
Produces structured datasets ready for enrichment, outreach, or analytics workflows.

Features

Feature	Description
City + niche query builder	Automatically composes Google queries for specific industries/categories within a target city.
Headless browser crawling	Uses browser automation to navigate JavaScript-heavy pages and capture resolved URLs.
Result deduplication	Prevents repeated businesses across pages and keywords with normalized URL + name matching.
Configurable proxy support	Supports rotating proxies and session settings to reduce blocks and improve consistency.
Pagination controls	Limits pages per query and total results to keep runs within predictable bounds.
Structured exports	Outputs consistent JSON objects suitable for CSV/Sheets/CRM imports.
Error handling + retries	Retries transient failures and tracks per-request outcomes for reliability.
Performance tuning knobs	Concurrency, delays, and per-domain rate limits for balanced throughput.

What Data This Scraper Extracts

Field Name	Field Description
query	The generated search query (city + niche) used for discovery.
city	Target city used in the query.
category	Target industry/category keyword used in the query.
businessName	Business or organization name detected from result/title.
linkedinUrl	Resolved LinkedIn business/profile URL.
website	Business website if found on the profile or in result snippets.
industry	Industry label when available (from page text or metadata hints).
locationText	Location string found on the page (city/region when available).
followerCount	Follower count when available on the profile.
employeeCountRange	Employee size range when available.
description	Short description/about snippet if available.
sourceUrl	The page URL where the record was discovered (search result or profile).
resultRank	Rank/position in the search results page.
collectedAt	ISO timestamp when the record was collected.

Example Output

[
  {
    "query": "marketing agency in Austin site:linkedin.com/company",
    "city": "Austin",
    "category": "marketing agency",
    "businessName": "BrightPath Marketing",
    "linkedinUrl": "https://www.linkedin.com/company/brightpath-marketing/",
    "website": "https://brightpathmarketing.com",
    "industry": "Marketing Services",
    "locationText": "Austin, Texas, United States",
    "followerCount": 8421,
    "employeeCountRange": "11-50",
    "description": "Performance-focused growth partner for B2B and local brands.",
    "sourceUrl": "https://www.google.com/search?q=marketing+agency+in+Austin+site%3Alinkedin.com%2Fcompany",
    "resultRank": 3,
    "collectedAt": "2025-12-12T12:10:31.492Z"
  }
]

Directory Structure Tree

LinkedIn Business Niches/
├── src/
│   ├── main.js
│   ├── config/
│   │   ├── defaults.js
│   │   └── schema.json
│   ├── crawlers/
│   │   ├── googleCrawler.js
│   │   └── linkedinCrawler.js
│   ├── extractors/
│   │   ├── parseGoogleResults.js
│   │   ├── parseLinkedinProfile.js
│   │   └── normalize.js
│   ├── services/
│   │   ├── queryBuilder.js
│   │   ├── dedupeStore.js
│   │   ├── rateLimiter.js
│   │   └── logger.js
│   ├── storage/
│   │   ├── datasetWriter.js
│   │   └── stateStore.js
│   └── utils/
│       ├── validators.js
│       ├── urls.js
│       └── time.js
├── input/
│   ├── INPUT.schema.json
│   └── INPUT.example.json
├── data/
│   ├── sample.output.json
│   └── sample.keywords.txt
├── tests/
│   ├── normalize.test.js
│   ├── queryBuilder.test.js
│   └── parsers.test.js
├── .gitignore
├── package.json
├── package-lock.json
└── README.md

Use Cases

Growth agencies use it to build niche lead lists by city, so they can launch faster outbound campaigns with cleaner targeting.
B2B sales teams use it to discover local businesses in specific categories, so they can fill pipelines without manual prospecting.
Market researchers use it to map competitors in a region, so they can analyze category density and positioning.
Recruiters use it to find companies hiring in certain niches, so they can identify targets for candidate outreach.
Local SEO consultants use it to collect business profiles at scale, so they can prioritize outreach and partnership opportunities.

FAQs

How do I control which cities and niches are searched?
Provide a list of cities and category keywords in the input. The tool generates combinations (or uses your explicit pairs) to create consistent query coverage. For tighter targeting, limit the number of categories per city and cap pages per query.

What if LinkedIn pages don’t show all fields (followers, employee range, website)?
Not every profile exposes the same fields. The scraper extracts what’s available and leaves missing fields as null. For enrichment, you can later merge outputs with your CRM or a separate website/contacts enrichment workflow.

How do I reduce duplicates across keywords and pages?
Deduplication is applied using normalized LinkedIn URLs and cleaned business names. If you’re using very similar keywords, keep a shared dedupe store enabled so businesses found under multiple queries still appear once.

How do I prevent blocks and improve reliability?
Use rotating proxies, set realistic delays, and keep concurrency moderate. If you see partial runs, reduce parallelism and increase wait times between pages to stabilize navigation and extraction.

Performance Benchmarks and Results

Primary Metric: ~45–90 discovered business records per minute on typical city+niche queries with moderate concurrency and stable proxy routing.

Reliability Metric: 93–97% successful page processing rate when using rotation + conservative throttling; drops significantly under aggressive concurrency without proxies.

Efficiency Metric: ~250–450 MB peak memory usage during multi-page runs (browser sessions + routing), with throughput scaling linearly up to a practical concurrency ceiling.

Quality Metric: 85–95% of records include a valid LinkedIn URL and business name; 40–70% include secondary enrichment fields (website/industry/location) depending on profile visibility.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time." Nathan Pennington Marketer ★★★★★	"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on." Eliza SEO Affiliate Expert ★★★★★	"Exceptional results, clear communication, and flawless delivery. Bitbash nailed it." Syed Digital Strategist ★★★★★

meitzcjakubzuqy/linkedin-business-niches