Flexible-GraphRAG: Performance improvements, FalkorDB graph database support added

See Flexible GraphRAG Initial Version Blog Post

See New Tabbed UI for Flexible GraphRAG (and Flexible RAG)

Flexible GraphRAG on GitHub

X.com Steve Reiner @stevereiner LinkedIn Steve Reiner LinkedIn

  1. Improved the performance of flexible-graphrag
    • Added doing parallel Docling document conversion helped pipeline timing
    • Now not doing KeywordExtractor/SummaryExtractor also helped pipeline timing
    • Ollama Parallel Processing (need OLLAMA_NUM_PARALLEL=4)
    • Async PropertyGraphIndex with use_async=True
    • Increased kg_batch_size from 10 to 20 chunk
    • Logging added for performance timing
  2. Added performance testing results to readme.md (6 docs with openai with each graph database (neo4j, kuzu, falkordb)
  3. Added docs/performance.md: has performance testing results for each graph database with 2,4,6 docs with openai and 2,4 docs with ollama
  4. Added support for FalkorDB graph database https://www.falkordb.com/ and https://github.com/FalkorDB/falkordb The abstractions of LlamaIndex, LlamaIndex support for FalkorDB, and the configurability of flexible-graphrag made this a relatively straightforward process.
  5. Added LlamaIndex DynamicLLMPathExtractor support (works on openai, not on ollama currently)
  6. Added config of kg extractor type (simple, schema, or dynamic) to set which LlamaIndex extractor to use (SimpleLLMPathExtractor, SchemaLLMPathExtractor, or DynamicLLMPathExtractor)
  7. Added config of MAX_TRIPLETS_PER_CHUNK and MAX_PATHS_PER_CHUNK
  8. Added readme.md info on system environment setup of ollama for performance and parallelism (OLLAMA_CONTEXT_LENGTH, OLLAMA_NUM_PARALLEL, etc.)
  9. Added new default schema with 35+ relationship combinations, more relations, and entity types: PERSON, ORGANIZATION, TECHNOLOGY, PROJECT, LOCATION
  10. Fixed file upload dialog performance in all 3 front ends: React, Angular, and Vue (chosen files display quickly after dialog ok)

Creating Knowledge Graphs automatically for GraphRAG: Part 1: with NLP

(next post Part 2: with LLM)

I first investigated how NLP could be used for both entity recognition and relation extraction for creating a knowledge graphs of content. Tomaz Bratanic’s Neo4j blog article  used Relik for NLP along with LlamaIndex for creating a graph in Neo4j, and setting up an embedding model for use with LLM queries.

In my llama_relik github project, I used the  notebook from the blog article and changed it to use fastcoref instead of coreferee. Fastcoref was mentioned in the medium article version of the Neo4j blog article in the comments. It’s supposed to work better. There is also a python file in this project than can be used instead of the notebook.

I submitted some fixes to Relik on Windows, but it performs best on Linux in general and was more able to use the GPU “cuda” mode instead of “cpu”.

Similar work has been done using Rebel for NLP by Neo4j / Tomaz Bratanic, Saurav Joshi, and Qrious Kamal

Note that Relik has closed information extraction (CIE) models that do both entity linking (EL) and relation extraction (RE) . It also has models focused on either EL or RE.

Below is a screenshot from Neo4j with a knowledge graph created with the python file from the llama_relik project using the “relik-cie-small” model with the spacy space station sample text (ignore chunk node and it’s mentions relations). Notice how it has separate entities for “ISS” and “International Space Station” .

The “relik-cie-large” model finds more relations in screenshot below. It also has separate entities for “ISS” and “International Space Station” (and throws in second “International Space Station”).