Flexible-GraphRAG: Performance improvements, FalkorDB graph database support added

See Flexible GraphRAG Initial Version Blog Post

See New Tabbed UI for Flexible GraphRAG (and Flexible RAG)

Flexible GraphRAG on GitHub

X.com Steve Reiner @stevereiner LinkedIn Steve Reiner LinkedIn

  1. Improved the performance of flexible-graphrag
    • Added doing parallel Docling document conversion helped pipeline timing
    • Now not doing KeywordExtractor/SummaryExtractor also helped pipeline timing
    • Ollama Parallel Processing (need OLLAMA_NUM_PARALLEL=4)
    • Async PropertyGraphIndex with use_async=True
    • Increased kg_batch_size from 10 to 20 chunk
    • Logging added for performance timing
  2. Added performance testing results to readme.md (6 docs with openai with each graph database (neo4j, kuzu, falkordb)
  3. Added docs/performance.md: has performance testing results for each graph database with 2,4,6 docs with openai and 2,4 docs with ollama
  4. Added support for FalkorDB graph database https://www.falkordb.com/ and https://github.com/FalkorDB/falkordb The abstractions of LlamaIndex, LlamaIndex support for FalkorDB, and the configurability of flexible-graphrag made this a relatively straightforward process.
  5. Added LlamaIndex DynamicLLMPathExtractor support (works on openai, not on ollama currently)
  6. Added config of kg extractor type (simple, schema, or dynamic) to set which LlamaIndex extractor to use (SimpleLLMPathExtractor, SchemaLLMPathExtractor, or DynamicLLMPathExtractor)
  7. Added config of MAX_TRIPLETS_PER_CHUNK and MAX_PATHS_PER_CHUNK
  8. Added readme.md info on system environment setup of ollama for performance and parallelism (OLLAMA_CONTEXT_LENGTH, OLLAMA_NUM_PARALLEL, etc.)
  9. Added new default schema with 35+ relationship combinations, more relations, and entity types: PERSON, ORGANIZATION, TECHNOLOGY, PROJECT, LOCATION
  10. Fixed file upload dialog performance in all 3 front ends: React, Angular, and Vue (chosen files display quickly after dialog ok)

Creating Knowledge Graphs automatically for GraphRAG: Part 2: with LLMs

And the winner is using LLMs to create knowledge graphs over using NLP. Can LLMs do a better job? The Neo4j LLM Graph Builder in particular, has shown they can. What about the cost of using OpenAI along with the loss of privacy of data by submitting? The answer is free and local LLM models (Llama3 versions are available thru ollama) work too with Graph Builder. I tested with OpenAI GPT-4o, llama3, llama3.1, llama3.2. I noticed gemma2 is also available thru ollama. With these local LLMs, you will need a high end Nvidia card to work best.

Neo4j Labs LLM Knowledge Graph Builder main info site

Short Youtube demo video

The Online LLM Graph Builder can be used. You need to provide it with your Aura Neo4j connection info (you can create an account for a free Aura DB). It only has Diffbot, OpenAI, and Gemini LLM models available.

Graph Builder can upload from local files, AWS S3, web pages, Wikipedia, and Youtube. Google GCS can be a source if configured.

First choose the LLM model to use. Then upload one or more files. Then choose generate graph. You can view the graphs with the basic viewer (which allows hiding chunk nodes, community nodes, so you can see the entities and relationships). The Bloom viewer is also available, which is more complicated.

You can also chat with the data using GraphRAG and your chosen LLM. Answers have a icon below them that when clicked, provides info on graph doc sources, what entities, and what chunks were used to answer.

LLM Graph Builder Github project (Apache 2.0 open source)

The online version doesn’t have the llama3 models. So you need to clone the github project and build locally. To add using Meta Llama3 models, you need to configure it. You use the example.env to create a .env file and then add an optional OpenAI key, LLM model configuration, and indicate you initial Neo4j database info. Neo4j connection info can also be provided in the UI. Then do docker compose up. I have a fork of the main branch in my LLM Graph Builder that has added: configuration for lllama3, llama3.1, llama3.2, and openai gpt-4 choices, some neo4j connection config examples, switched to 8090 to not conflict with Alfresco 8080, has an additional debug log to so you can check on model config. and has a sample files folder with space-station.txt.

Speaking of Alfresco, I could add to my Alfresco GenAI Semantic project to call the separable backend of Graph Builder to generate a knowledge graph of new or updated Alfresco documents that have a new custom aspect. The backend may only have support for sources coming for the app’s kinds of sources currently. Also note in terms of UI integration, Alfresco’s ADF components and the ACA client use Angular. Neo4j Graph Builder’s front end uses React (and so does some of their other software projects).

space-station.txt with OpenAI GPT-4o:

space-station.txt with Meta Llama3:

space-station.txt with Meta Llama3.1:

space-station.txt with smaller Meta Llama3.2:

OpenAI GPT-4o with Albert Einstein Wikipedia page (340 nodes, 230 relationships):

Meta Llama3 with Albert Einstein Wikipedia page (150 nodes, 150 relationships), not shown: Llama3.1 (had 161 nodes, 85 relationships), not shown Llama3.2 (125 nodes, 76 relationships)

Alfresco GenAI Semantic project updated: now adds regular Alfresco tags, uses local Wikidata and DBpedia entity recognizers

The Alfresco GenAI Semantic  github project  now adds regular Alfresco tags when performing auto tagging when enhancing with links to Wikidata and DBpedia. Semantic entity linking info is kept in 3 parallel multi-value properties (labels, links, super type lists) in the WikiData and DBpedia custom aspects. The labels values are used for the tag labels.

I switched to a local, private Wikidata recognizer.  The spaCy-entity-linker python library is used for getting Wikidata entity links without having to call a public serivce api. It was created before spaCy had its own entity linking system. It still has the advantage of not needing to do training. Had previously used the  spaCyOpenTapioca library, which calls an OpenTapioca public web service api URL. Note the URLs in the links properties do go to the public website wikidata.org if used in your application.

I also switched to a local, private DBpedia Spotlight entity recognizer in a docker composed in. The local URL to this docker is given the to the spacy DBpedia Spotlight for SpaCy library. This library was using a public Spotlight web service api URL by default previously. Note the URLs in the links properties do go to to the public website dbpeda.org if used in your application.

For documents with the Wikidata or DBpedia aspects added to them, tags will show up in the Alfresco clients (ACA, ADW, Share) after PDF rendition creation and alfresco-genai-semantic AI Listener gets responses from REST apis in the genai-stack. Shown below are tags in the ACA community content app:

Multi-value Wikidata aspect properties of a document in the ACA client are shown below in the view details expanded out. The labels property repeats what the labels of the tags have. The links properties have URLs to wikidata.org. The super types properties have the zero “” or one or multiple comma separated super types in wikidata for each entity. These supertypes are wikidata ids (are links once you add “http://www.wikidata.org/wiki/” in front of the ids).

The same style DBpedia aspect multivalue properties are shown below in the ACA client. Note that the super types can be from Wikidata, DBpedia, Schema (schema.org), foaf, or DUL (ontologydesignpatterns.org DUL.owl), etc.