Knowledge Graphs – Open Source Integrated AI and Semantic Tech

Flexible GraphRAG or RAG is flexing to the max: 8 Graph databases, 10 Vector databases, 3 search engines working (can docker compose all including dashboards), 13 data sources

X.com Steve Reiner @stevereiner LinkedIn Steve Reiner LinkedIn Posts

Flexible GraphRAG or Flexible RAG, an Apache 2.0 open source python platform, is now flexing to the max using LlamaIndex, in terms of supporting more databases and data sources: supports 8 graph databases, 10 vector databases, 3 search engines, and 13 data sources,. Also supports knowledge graph auto-building, schemas, LlamaIndex LLMs, Docling doc processing (LlamaParse coming soon), GraphRAG mode, RAG only mode, Hybrid search, and AI query / chat. Has React, Vue, and Angular frontends, and a FastAPI backend. React, Vue, Angular, Backend now work on Windows, Mac, Linux (standalone or in docker). Also has a FastMCP MCP server. Has a convenient docker compose that can include any of the databases (Vector, Graph, Search, Alfresco) and Dashboards / Consoles.

(Flexible GraphRAG AI chat shown with Hyland products web page(s) used with web pages data source to auto-generated a Neo4j graph).

Convenient docker compose: you can choose to include from all supported 10 vector and 8 graph databases, Elasticsearch, OpenSearch, and Hyland Alfresco Community. Include in the docker-compose.yaml by just removing the # comment in front of their includes. Dashboards / Consoles for these databases, as much as possible are also included in the docker compose choices (either in the yaml file for the database or for some a separate yaml file include).

You can run the docker with the databases with the backend and frontends (React, Angular, Vue) running stand alone in separate terminal windows. In addition to running the databases in docker, you can include the backend and frontends in the dock compose by including the app-stack.yaml and proxy.yaml includes. Now have no config duplication for standalone backend+frontends vs full docker mode: previously had to repeat all config in app-stack.yaml now use env_file: include standalone backend .env and overrides with include of docker.env (for configs that need host.docker.internal)

All 8 Graph database working: Neo4j, ArcadeDB, FalkorDB, archived Kuzu (LadybugDB fork todo), NebulaGraph, Memgraph, Amazon Neptune, Amazon Neptune Analytics

All 10 Vector databases working: Qdrant, Elasticsearch vector, OpenSearch vector, Neo4j vector, Milvus, Weaviate, Chroma (both http, embedded), Pinecone, PostgreSQL + pgvector, LanceDB

All 3 Search engines working: Elasticsearch, OpenSearch, LlamaIndex built-in BM25

New Data Sources: using LlamaIndex readers: 1. working ones that don’t use document processing: Web Pages, Wikipedia, Youtube, 2. working using document processing: S3, 3. ones using document processing still to test: Google Drive, Microsoft OneDrive, Azure Blob, GCS, Box, SharePoint.

Support for Docling document processing is currently available. Being able configure to use LlamaParse coming soon.

Original data sources with document processing that don’t use LlamaIndex readers: filesystem, Alfresco, CMIS. Hyland Alfresco Community can be included in the docker compose by taking the “#” comment off the beginning of its include.

LLMs: LlamaIndex LLMs (LlamaIndex has support for very many), Flexible GraphRAG currently has config for 1. tested, working: OpenAI, Ollama, 2. untested: Anthropic Claude, Google Gemini, Azure OpenAI.

Previous Flexible GraphRAG posts:

See Flexible GraphRAG Initial Version Blog Post

See New Tabbed UI for Flexible GraphRAG (and Flexible RAG)

See Flexible GraphRAG: Performance improvements, FalkorDB graph database support added

See Flexible GraphRAG: Supports ArcadeDB Graph Database with new LlamaIndex Integration

See Flexible GraphRAG: Amazon Neptune, Neptune Analytics, and Graph Explorer support added

Flexible GraphRAG: Amazon Neptune, Neptune Analytics, and Graph Explorer support added

Flexible GraphRAG on GitHub

X.com Steve Reiner @stevereiner LinkedIn Steve Reiner LinkedIn Posts

Amazon Neptune, and Amazon Neptune Analytics support is working and checked int0 the Flexible GraphRAG github.

Graph Explorer is supported and working with these graph databases and is also checked in. It runs in a docker and can be used to query and visualize with both Amazon Neptune and Amazon Neptune Analytics.

Gremlin and openCypher can be used in Amazon Neptune with Graph Explorer, while openCypher is the primary language for Neptune Analytics. SPARQL is available for graph queries in the general Neptune database, SPARQL support for Graph Explorer is officially on the AWS development roadmap, but no firm timeline has been announced as of October 2025

Note that for Neptune Analytics, Flexible GraphRAG had to put in a wrapper class to filter out vector queries from its LamaIndex integration that were causing errors in Neptune Analytics. This wasn’t an issue with regular Neptune.

Flexible GraphRAG or Flexible RAG , an Apache 2.0 open source python platform, supports 8 graph databases, 10 vector databases, 3 search engines, and 13 data sources,. Supports knowledge graph auto-building, schemas, LlamaIndex LLMs, Docling doc processing (LlamaParse coming soon), GraphRAG mode, RAG only mode, Hybrid search, and AI query / chat. Has React, Vue, and Angular frontends, and a FastAPI backend. React, Vue, Angular, and Backend now work on Windows, Mac, Linux (standalone or in docker). Has a convenient docker compose that can include any of the databases (vector, graph, search, alfresco) and dashboards / consoles. There is also a Flexible GraphRAG MCP server.

Previous Flexible GraphRAG posts:

See Flexible GraphRAG Initial Version Blog Post

See New Tabbed UI for Flexible GraphRAG (and Flexible RAG)

See Flexible GraphRAG: Performance improvements, FalkorDB graph database support added

See Flexible GraphRAG: Supports ArcadeDB Graph Database with new LlamaIndex Integration

Flexible GraphRAG: Supports ArcadeDB Graph Database with new LlamaIndex Integration

Flexible GraphRAG on GitHub

X.com Steve Reiner @stevereiner LinkedIn Steve Reiner LinkedIn Posts

Flexible GraphRAG added support for the ArcadeDB graph database using this new integration:

ArcadeDB LlamaIndex Integration and arcadedb-python available:

arcadedb-llama-index Github

arcadedb-python Github

ArcadeDB (Apache 2.0) is a next generation Multi-Model Database for Graphs, Documents, Key/Value and Time-Series. Supports SQL, Cypher, Gremlin and MongoDB queries

arcadedb.com

ArcadeDB Github

Flexible GraphRAG is open source python platform supporting Docling document processing, knowledge graph auto-building, schemas, 13 data sources, 10 Vector databases, 7 Graph databases, ElasticSearch and OpenSearch search engines, RAG, GraphRAG, hybrid search, and AI query / chat. Has React, Vue, and Angular frontends, and a FastAPI backend. Also has a FastMCP MCP server.

Previous Flexible GraphRAG posts:

See Flexible GraphRAG Initial Version Blog Post

See New Tabbed UI for Flexible GraphRAG (and Flexible RAG)

See Flexible GraphRAG: Performance improvements, FalkorDB graph database support added

Creating Knowledge Graphs automatically for GraphRAG: Part 2: with LLMs

And the winner is using LLMs to create knowledge graphs over using NLP. Can LLMs do a better job? The Neo4j LLM Graph Builder in particular, has shown they can. What about the cost of using OpenAI along with the loss of privacy of data by submitting? The answer is free and local LLM models (Llama3 versions are available thru ollama) work too with Graph Builder. I tested with OpenAI GPT-4o, llama3, llama3.1, llama3.2. I noticed gemma2 is also available thru ollama. With these local LLMs, you will need a high end Nvidia card to work best.

Neo4j Labs LLM Knowledge Graph Builder main info site

Short Youtube demo video

The Online LLM Graph Builder can be used. You need to provide it with your Aura Neo4j connection info (you can create an account for a free Aura DB). It only has Diffbot, OpenAI, and Gemini LLM models available.

Graph Builder can upload from local files, AWS S3, web pages, Wikipedia, and Youtube. Google GCS can be a source if configured.

First choose the LLM model to use. Then upload one or more files. Then choose generate graph. You can view the graphs with the basic viewer (which allows hiding chunk nodes, community nodes, so you can see the entities and relationships). The Bloom viewer is also available, which is more complicated.

You can also chat with the data using GraphRAG and your chosen LLM. Answers have a icon below them that when clicked, provides info on graph doc sources, what entities, and what chunks were used to answer.

LLM Graph Builder Github project (Apache 2.0 open source)

The online version doesn’t have the llama3 models. So you need to clone the github project and build locally. To add using Meta Llama3 models, you need to configure it. You use the example.env to create a .env file and then add an optional OpenAI key, LLM model configuration, and indicate you initial Neo4j database info. Neo4j connection info can also be provided in the UI. Then do docker compose up. I have a fork of the main branch in my LLM Graph Builder that has added: configuration for lllama3, llama3.1, llama3.2, and openai gpt-4 choices, some neo4j connection config examples, switched to 8090 to not conflict with Alfresco 8080, has an additional debug log to so you can check on model config. and has a sample files folder with space-station.txt.

Speaking of Alfresco, I could add to my Alfresco GenAI Semantic project to call the separable backend of Graph Builder to generate a knowledge graph of new or updated Alfresco documents that have a new custom aspect. The backend may only have support for sources coming for the app’s kinds of sources currently. Also note in terms of UI integration, Alfresco’s ADF components and the ACA client use Angular. Neo4j Graph Builder’s front end uses React (and so does some of their other software projects).

space-station.txt with OpenAI GPT-4o:

space-station.txt with Meta Llama3:

space-station.txt with Meta Llama3.1:

space-station.txt with smaller Meta Llama3.2:

OpenAI GPT-4o with Albert Einstein Wikipedia page (340 nodes, 230 relationships):

Meta Llama3 with Albert Einstein Wikipedia page (150 nodes, 150 relationships), not shown: Llama3.1 (had 161 nodes, 85 relationships), not shown Llama3.2 (125 nodes, 76 relationships)

Creating Knowledge Graphs automatically for GraphRAG: Part 1: with NLP

(next post Part 2: with LLM)

I first investigated how NLP could be used for both entity recognition and relation extraction for creating a knowledge graphs of content. Tomaz Bratanic’s Neo4j blog article used Relik for NLP along with LlamaIndex for creating a graph in Neo4j, and setting up an embedding model for use with LLM queries.

In my llama_relik github project, I used the notebook from the blog article and changed it to use fastcoref instead of coreferee. Fastcoref was mentioned in the medium article version of the Neo4j blog article in the comments. It’s supposed to work better. There is also a python file in this project than can be used instead of the notebook.

I submitted some fixes to Relik on Windows, but it performs best on Linux in general and was more able to use the GPU “cuda” mode instead of “cpu”.

Similar work has been done using Rebel for NLP by Neo4j / Tomaz Bratanic, Saurav Joshi, and Qrious Kamal

Note that Relik has closed information extraction (CIE) models that do both entity linking (EL) and relation extraction (RE) . It also has models focused on either EL or RE.

Below is a screenshot from Neo4j with a knowledge graph created with the python file from the llama_relik project using the “relik-cie-small” model with the spacy space station sample text (ignore chunk node and it’s mentions relations). Notice how it has separate entities for “ISS” and “International Space Station” .

The “relik-cie-large” model finds more relations in screenshot below. It also has separate entities for “ISS” and “International Space Station” (and throws in second “International Space Station”).