Achieve 100x Speedups in Graph Analytics Using Nx-cugraph
Hey everyone! I recently passed the NVIDIA Data Science Professional Certification, and I’m thrilled to share some insights to help you on your journey. This is part of a series where I’ll break down key concepts and tools covered in the certification, focusing on how to leverage GPU acceleration for blazingly fast machine learning. I have included all the Colab notebooks I used so that you can quickly grasp the concepts by running them instantly on Google Colab. Let’s get started.
NetworkX
NetworkX is a powerhouse for graph analytics in Python, beloved for its ease of use and vast community. However, as graphs grow, its pure-Python nature can lead to performance bottlenecks. What if you could keep the familiar NetworkX API but get a massive speedup for larger datasets? Enter nx-cugraph
, a RAPIDS backend that lets NetworkX leverage the power of NVIDIA GPUs.
This post dives into how nx-cugraph
can significantly accelerate your NetworkX workflows, demonstrated with common graph algorithms like Betweenness Centrality and PageRank.
Click, Copy and Run the notebook.
Link to the Colab Notebook
What You Will Learn
- Why NetworkX, despite its popularity, can be slow for large graphs.
- How NetworkX 3.0+ allows for dispatching algorithms to accelerated backends.
- What
nx-cugraph
is and how it brings GPU acceleration to NetworkX. - How to set up your environment to use
nx-cugraph
. - See practical examples of speedups for Betweenness Centrality and PageRank algorithms on both small and large datasets.
- Understand the minimal code changes required to get these performance benefits.
The NetworkX Challenge: Performance at Scale
NetworkX is incredibly popular, downloaded millions of times. Its user-friendly API, extensive documentation, and easy installation make it a go-to for graph analysis. However, this ease comes with a trade-off: its Python implementation can struggle with the performance demands of larger, real-world graph datasets.
Accelerated NetworkX to the Rescue!
NetworkX 3.0 introduced a game-changing feature: the ability to dispatch algorithm calls to alternative, more performant backend implementations. This means you don’t have to abandon your existing NetworkX code to tap into serious performance gains, like those offered by GPUs.
The nx-cugraph
library, part of the NVIDIA RAPIDS ecosystem, is one such backend. It allows NetworkX to offload computations to NVIDIA GPUs, dramatically speeding up graph algorithms.
Configuring NetworkX to Use cuGraph by Default
A neat feature of nx-cugraph
(version 24.10+) is the NX_CUGRAPH_AUTOCONFIG
environment variable. Setting this to True
before importing NetworkX tells NetworkX to use the “cugraph” backend by default.
%env NX_CUGRAPH_AUTOCONFIG=True
import networkx as nx
print(f"using networkx version {nx.__version__}")
# This notebook uses a caching feature that might produce warnings for some users.
# The notebook uses recommended APIs, so we can safely ignore this specific warning.
nx.config.warnings_to_ignore.add("cache")
With this setup, most of your existing NetworkX algorithm calls will automatically be GPU-accelerated without any further code changes!
Seeing is Believing: Algorithm Acceleration
Let’s look at how nx-cugraph
speeds up a couple of popular algorithms.
A Simple Start: Zachary’s Karate Club
We’ll begin with the classic Zachary’s Karate Club graph (34 nodes, 78 edges).
G = nx.karate_club_graph()
G.number_of_nodes(), G.number_of_edges()
# Output: (34, 78)
Betweenness Centrality
This algorithm measures a node’s importance based on how many shortest paths pass through it.
With nx-cugraph
(GPU accelerated, default due to NX_CUGRAPH_AUTOCONFIG
):
%%time
nxcg_bc_results = nx.betweenness_centrality(G)
# CPU times: user 177 ms, sys: 70.1 ms, total: 247 ms
# Wall time: 762 ms
With default NetworkX (CPU): To explicitly use the original NetworkX implementation, we use the backend="networkx"
argument.
%%time
nx_bc_results = nx.betweenness_centrality(G, backend="networkx")
# CPU times: user 191 ms, sys: 13.6 ms, total: 205 ms
# Wall time: 204 ms
For such a small graph, the overhead of GPU kernel launches might make the nx-cugraph
version appear slightly slower. The real power shines with larger datasets. The notebook visualizes these results, showing that both backends produce the same centrality rankings.
PageRank
PageRank scores nodes based on their relative “importance” by analyzing links.
With nx-cugraph
(GPU accelerated):
%%time
nxcg_pr_results = nx.pagerank(G)
# CPU times: user 11.4 ms, sys: 10.8 ms, total: 22.2 ms
# Wall time: 68.2 ms
With default NetworkX (CPU):
%%time
nx_pr_results = nx.pagerank(G, backend="networkx")
# CPU times: user 3.8 ms, sys: 1.11 ms, total: 4.9 ms
# Wall time: 19.8 ms
Again, for tiny graphs, CPU can be faster. However, the results are numerically very close, as shown by comparing them in a DataFrame:
%load_ext cudf.pandas
import pandas as pd
import pytest
from IPython.display import display, HTML
print("Do both results have the same values (within tolerance)? "
f"{nxcg_pr_results == pytest.approx(nx_pr_results, rel=1e-6, abs=1e-11)}")
# Output: Do both results have the same values (within tolerance)? True
df = pd.DataFrame(
columns=["nx node", "nxcg node", "nx PR", "nxcg PR"],
data=[(a, c, b, d) for (a, b), (c, d) in zip(nx_pr_results.items(),
nxcg_pr_results.items())])
df.sort_values(by="nx PR", ascending=False, inplace=True)
print("\nTop 5 nodes based on PageRank")
display(HTML(df.head(5).to_html(float_format=lambda f: f"{f:.7g}")))
The output confirms the PageRank scores are essentially identical.
Betweenness Centrality on a Large Graph
For large graphs, calculating all-pairs shortest paths for Betweenness Centrality is often infeasible. We use the k
parameter to approximate by sampling k
nodes.
With default NetworkX (CPU), k=1
(larger k
values are impractical):
%%time
bc_results_large_nx = nx.betweenness_centrality(G_large, k=1, backend="networkx")
# CPU times: user 2min 1s, sys: 4.02 s, total: 2min 5s
# Wall time: 2min 5s
With nx-cugraph
(GPU), k=1
:
%%time
bc_results_large_nxcg_k1 = nx.betweenness_centrality(G_large, k=1)
# CPU times: user 935 ms, sys: 200 ms, total: 1.14 s
# Wall time: 1.17 s
Over 100x speedup! (2min 5s vs 1.17s)
With nx-cugraph
, we can afford a much larger (and more accurate) k
. With nx-cugraph
(GPU), k=100
:
%%time
bc_results_large_nxcg_k100 = nx.betweenness_centrality(G_large, k=100)
# CPU times: user 26.7 s, sys: 658 ms, total: 27.3 s
# Wall time: 27.3 s
Running with k=100
on the GPU is still significantly faster (27.3s) than k=1
on the CPU (2min 5s).
A note on comparing betweenness_centrality
with k
: Since it’s an approximation based on random samples, results might differ slightly between NetworkX and nx-cugraph
unless a common seed and sampling strategy are used, which is an area for future updates.
PageRank on a Large Graph
With default NetworkX (CPU):
%%time
nx_pr_results_large = nx.pagerank(G_large, backend="networkx")
# CPU times: user 1min 39s, sys: 5.02 s, total: 1min 44s
# Wall time: 1min 44s
With nx-cugraph
(GPU):
%%time
nxcg_pr_results_large = nx.pagerank(G_large)
# CPU times: user 540 ms, sys: 293 ms, total: 834 ms
# Wall time: 877 ms
Another massive speedup: over 100x! (1min 44s vs 877ms). The results remain consistent within tolerance.
Key Takeaways for certification ✨
Migrating your NetworkX workflows to GPU acceleration with nx-cugraph
offers substantial benefits, especially as your data grows:
- 🚀 Blazing Speed: Experience dramatic performance improvements (often >100x) for graph algorithms on large datasets by leveraging GPU power.
- 💻 Minimal Code Changes: Thanks to the backend system and
NX_CUGRAPH_AUTOCONFIG
, you can accelerate existing NetworkX code with little to no modification. - 📊 Enhanced Scalability: Tackle much larger, real-world graph problems that were previously impractical with CPU-only NetworkX.
- 🛠️ Simple Setup: Easy installation via
pip
and straightforward configuration to enable thecugraph
backend. - 🤝 Familiar NetworkX API: Continue working with the well-known and loved NetworkX interface, minimizing the learning curve.
If you’re working with graphs that are pushing the limits of traditional NetworkX, nx-cugraph
is a fantastic way to boost your productivity and unlock new possibilities in graph analytics.