ShivanshSGuppta/H-Clique-Densities-Algo-Application
This repository contains a C++ implementation of an exact algorithm to find the densest subgraph based on h-clique densities in an undirected graph. It supports any h ≥ 2 and is tested on standard datasets like AS733 and Netscience.
Densest Subgraph via h-Clique Densities
This repository contains a C++ implementation of an exact algorithm to find the densest subgraph based on h-clique densities in an undirected graph.
It supports any h ≥ 2 and is tested on standard datasets like AS733 and Netscience.
Description
Given an undirected graph G=(V,E), the goal is to find a subgraph S that maximizes the density defined by the number of h-cliques per node in S.
This project implements the exact algorithm based on a binary search framework combined with a flow-network construction to find the optimal densest subgraph.
The steps include:
- Finding all (h-1)-cliques.
- Constructing a specialized flow network.
- Running the Ford-Fulkerson algorithm to compute maximum flows and corresponding minimum cuts.
- Performing binary search over possible densities to find the best subgraph.
1. Requirements
- A C++17 compatible compiler (e.g., g++, clang++)
- Linux, MacOS, or Windows with a terminal
- (Optional) make for easier building
2. Compilation2. Compilation
g++ -std=c++17 -O2 -o densest_subgraph main.cpp
3. Running
./densest_subgraph <graph_file> , where
- <graph_file>: Path to the input graph file.
- : The size of cliques to consider (h ≥ 2).
Input Format
n m
u1 v1
u2 v2
...
um vm
Where:
- n = number of nodes
- m = number of edges
- Each line after gives an undirected edge between nodes u and v.
- Nodes are indexed from 0 to n-1.
Output
After execution, the program prints:
- Number of nodes and edges in the loaded graph.
- Details about the search process (binary search steps).
- The densest subgraph found:
- Number of nodes
- Number of h-cliques
- Clique-density
- List of nodes in the subgraph
- Total execution time in milliseconds.
Datasets Used
- AS733
- Netscience
Performance Notes
- The exact algorithm guarantees finding the optimal subgraph, but can be computationally heavy for large graphs or large values of h.
- Works efficiently for moderate-sized graphs (up to a few thousand nodes) and small h values (2, 3, or 4)
Dense Subgraph Discovery via Clique Density
This project analyzes real-world network datasets (specifically AS733 and Netscience) to find dense subgraphs based on h-clique density. It uses clique enumeration, core decomposition, and Dinic's Max-Flow algorithm to process the data and identify densest subgraphs for different clique sizes (h values).
Features
- Loads graph datasets in edge list format.
- Enumerates all (h-1)-cliques and h-cliques.
- Computes core decomposition for graph sparsification.
- Identifies the densest subgraphs based on h-clique density.
- Measures computation time and outputs density results for different values of h.
- Supports large sparse networks.
Requirements
- C++17 or later
- A C++ compiler like g++ or clang++
- Standard C++ libraries (iostream, vector, queue, algorithm, etc.)
Dataset
- as733.txt — An AS-level Internet topology graph (available online).
- netscience.txt — Collaboration network among scientists (you'll need to similarly adapt the load_as733 function if using a different dataset format).
Usage
- Clone the repository
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name - Place the dataset (e.g., as733.txt) in the project directory.
- Compile the code:
g++ -std=c++17 -O2 -o dense_clique dense_clique.cpp - Run the program:
./dense_clique
The program will:
- Load the dataset.
- For h = 2, 3, ..., 6, find the densest subgraph based on h-clique density.
- Output the density results into a file called density_vs_h.txt.
-
Output:
- Densities for each h are printed in the terminal.
- The file density_vs_h.txt contains:
h density
2 0.0456
3 0.0123
4 0.0051
...
where h is the clique size and density is the highest found density for subgraphs.
File Structure
dense_clique.cpp # Main source code
as733.txt # Input graph file (example)
density_vs_h.txt # Output file: density results
Notes
- The code uses binary search to efficiently check for edge existence.
- For large graphs, clique enumeration can be computationally expensive for large h; you may want to adjust h accordingly.
- Currently, the code assumes that node indices are zero-based and contiguous (0, 1, 2, ...).
Possible Improvements
- Extend to other datasets by modifying the graph loading function.
- Parallelize clique enumeration for large-scale graphs.
- Implement memory optimization for very large datasets.