Transforming Financial Statements into Knowledge Graphs Using Neo4j LLM Knowledge Graph Builder

Kshitij Kutumbe
4 min readDec 10, 2024

--

Financial statements are a treasure trove of information, yet they often remain buried in unstructured formats like PDFs, reports, and web pages. Neo4j LLM Knowledge Graph Builder bridges this gap by transforming such unstructured financial data into a knowledge graph that is structured, queryable, and insightful. Here’s how you can harness this tool to gain deeper insights into financial data.

Why Create a Financial Knowledge Graph?

Financial knowledge graphs help:

  1. Identify key entities and relationships in financial data, such as revenue, expenses, assets, liabilities, and their interdependencies.
  2. Visualize complex data structures, providing clarity on cash flow, profitability, and risk.
  3. Enable advanced querying and analytics for decision-making, fraud detection, and compliance reporting.

What is Neo4j LLM Knowledge Graph Builder?

The Neo4j LLM Knowledge Graph Builder is a tool that converts unstructured text into a structured graph using the power of Large Language Models (LLMs) like OpenAI, Gemini, and Llama3. It provides a seamless way to extract nodes (entities) and relationships from financial statements, storing them in a Neo4j graph database for further analysis.

Key Features for Financial Use Cases

  • Multi-format Support: Process PDFs, text documents, web content, or even video transcripts.
  • Lexical Graphs: Organize document chunks and their embeddings.
  • Entity Graphs: Extract meaningful financial entities and relationships.
  • Query-Ready Graphs: Use techniques like GraphRAG or Text2Cypher to interact with the graph.

How to Create a Financial Knowledge Graph (Step-by-Step)

1. Prepare Your Neo4j Environment

  1. Create a Neo4j AuraDB Instance:
  • Sign up or log in to the Neo4j Console.
  • Set up a free AuraDB database for your graph storage.

2. Download Credentials:

  • Save the database credentials file for secure access.

2. Neo4j LLM Knowledge Graph Builder

  • You can either use docker based application or use Neo4j’s own application

3. Upload Financial Data

Load unstructured financial data, such as:

  • PDFs: Balance sheets, income statements, and cash flow statements.
  • Documents: Company reports or earnings calls.
  • Cloud Sources: Files from AWS S3 or GCS buckets.

4. Extract Financial Entities and Relationships

Use the LLM Knowledge Graph Builder to extract:

  • Entities: Revenue, Expenses, Net Income, Assets, Liabilities.
  • Relationships:
  • “Net Income is derived from Revenue and Expenses.”
  • “Revenue contributes to Shareholder Equity.”

5. Visualize the Knowledge Graph

  1. Use the in-app graph interface to explore relationships.
  2. Refine the schema to improve accuracy:
  • Define node types (e.g., Revenue, Asset).
  • Customize relationship types (e.g., “Impacts,” “Includes”).

6. Query Financial Data Using RAG

Interact with the knowledge graph using Retrieval-Augmented Generation (RAG) techniques:

  • GraphRAG: Ask questions like:
  • “What factors contributed to a drop in net income?”
  • “How are current liabilities linked to operating expenses?”
  • Text2Cypher: Automatically generate Cypher queries to extract relevant insights.

7. Advanced Analysis with Neo4j Bloom

Use Neo4j Bloom for:

  • Visualization: Understand entity hierarchies and relationships.
  • Pattern Discovery: Detect anomalies or trends in financial flows.

Technical Workflow

1. Document Processing

  1. Documents are uploaded and converted into Document nodes.
  2. Content is split into smaller chunks using LangChain Loaders.

2. Lexical Graph Creation

  1. Chunks are embedded and linked to their parent document.
  2. Similar chunks are connected using k-Nearest Neighbor (kNN) graphs.

3. Entity and Relationship Extraction

Entities and relationships are extracted using:

  • llm-graph-transformer (for general entities).
  • Custom Schema (for financial-specific relationships).
  • Extracted entities are linked back to their original text chunks.

4. Storing and Querying

  1. Nodes and relationships are stored in the Neo4j database.
  2. Embeddings are saved in both the graph and vector indices.

Applications in Financial Data Analysis

1. Profitability Analysis

  • Visualize how revenue streams contribute to profitability.
  • Identify the largest expense contributors.

2. Fraud Detection

  • Map suspicious relationships between transactions and accounts.
  • Highlight outlier patterns in cash flow.

3. Risk Assessment

  • Trace interdependencies between liabilities and assets.
  • Understand exposure to financial risks.

Actionable Steps

  1. Set Up Neo4j AuraDB: Start with a free database instance.
  2. Deploy the LLM Knowledge Graph Builder: Install locally or use their own platformed application
  3. Upload Financial Data: Load your financial documents for processing.
  4. Define Your Schema: Customize entity and relationship types for precision.
  5. Analyze with Queries: Use RAG techniques to gain actionable insights.
  6. Visualize and Iterate: Leverage Neo4j Bloom for advanced exploration.

To connect with me on this and other AI related topics:

kshitijkutumbe@gmail.com

Conclusion

By using Neo4j LLM Knowledge Graph Builder, financial analysts and decision-makers can transform unstructured financial statements into structured knowledge graphs, enabling deeper insights, automated queries, and actionable intelligence. Whether it’s analyzing profitability, detecting fraud, or assessing risks, this tool empowers you to make data-driven decisions with confidence.

Stay tuned for more such detailed blogs on Generative AI and Knowledge Graphs in other domain as well.

--

--

Kshitij Kutumbe
Kshitij Kutumbe

Written by Kshitij Kutumbe

Data Scientist | NLP | GenAI | RAG | AI agents | Knowledge Graph | Neo4j kshitijkutumbe@gmail.com www.linkedin.com/in/kshitijkutumbe/

Responses (1)