Alfresco GenAI Semantic project updated: now adds regular Alfresco tags, uses local Wikidata and DBpedia entity recognizers

The Alfresco GenAI Semantic  github project  now adds regular Alfresco tags when performing auto tagging when enhancing with links to Wikidata and DBpedia. Semantic entity linking info is kept in 3 parallel multi-value properties (labels, links, super type lists) in the WikiData and DBpedia custom aspects. The labels values are used for the tag labels.

I switched to a local, private Wikidata recognizer.  The spaCy-entity-linker python library is used for getting Wikidata entity links without having to call a public serivce api. It was created before spaCy had its own entity linking system. It still has the advantage of not needing to do training. Had previously used the  spaCyOpenTapioca library, which calls an OpenTapioca public web service api URL. Note the URLs in the links properties do go to the public website wikidata.org if used in your application.

I also switched to a local, private DBpedia Spotlight entity recognizer in a docker composed in. The local URL to this docker is given the to the spacy DBpedia Spotlight for SpaCy library. This library was using a public Spotlight web service api URL by default previously. Note the URLs in the links properties do go to to the public website dbpeda.org if used in your application.

For documents with the Wikidata or DBpedia aspects added to them, tags will show up in the Alfresco clients (ACA, ADW, Share) after PDF rendition creation and alfresco-genai-semantic AI Listener gets responses from REST apis in the genai-stack. Shown below are tags in the ACA community content app:

Multi-value Wikidata aspect properties of a document in the ACA client are shown below in the view details expanded out. The labels property repeats what the labels of the tags have. The links properties have URLs to wikidata.org. The super types properties have the zero “” or one or multiple comma separated super types in wikidata for each entity. These supertypes are wikidata ids (are links once you add “http://www.wikidata.org/wiki/” in front of the ids).

The same style DBpedia aspect multivalue properties are shown below in the ACA client. Note that the super types can be from Wikidata, DBpedia, Schema (schema.org), foaf, or DUL (ontologydesignpatterns.org DUL.owl), etc.

Alfresco GenAI Semantic Project

The Alfresco GenAI Semantic github project is available now. This is a fork of the Alfresco GenAI project with spaCy NLP python library entity linking to DBpedia and Wikidata added for now.

The Alfresco GenAI project provides support for generative AI with local or cloud LLMs for Alfresco. This includes summarization, categorization, image description, chat prompting about doc content.

The Alfresco GenAI Semantic project adds named entity recognition (NER) / entity linking of documents in Alfresco to Wikidata and DBpedia. Currently 2 custom aspects have multi-value properties for the links, alfresco tags aren’t used yet.

The spaCy NLP python library along with spaCy projects are used. The spaCyOpenTapioca project is used for getting Wikidata entity links. The DBpedia Spotlight for SpaCy project is used for getting DBpedia entity links. Note these both use external servers, which can be setup locally. NER can also be done with just spaCy. The spaCy-LLM python package that integrates Large Language Models (LLMs) into spaCy pipelines is available. The Alfresco GenAI Semantic project currently doesn’t use spacy-llm yet.

Below shows what test\space-station.txt after upload and entity linking with the Entity Link Wikidata aspect looks like in the Alfresco ACA content app in the view details when expanded out:

Below shows what test\space-station.txt after entity linking with the Entity Link DBpedia aspect looks like in the Alfresco ACA content app in the view details when expanded out:

Github

I added forks of all the old Integrated Semantics projects to github.com/stevereiner  from people who used the google code export before google took it away. Thanks Richard Esplin (esplinr), pubsnow, and MaxTyutyunnikov.

I moved back to Miami, Florida from the SF Bay area 2 1/2 years ago. Hopefully will get to putting some new stuff in this github area, despite being semi-retired. Steve

TypeScript for Alfresco and CMIS – Alfresco DevCon 2012 lightning talk slides and sample app

I also uploaded my slides to SlideShare from the second lightning talk presentation I made at Alfresco DevCon 2012 San Jose.

TypeScript for Alfresco and CMIS – Alfresco DevCon 2012 San Jose

This briefly covered some languages that can be translated to JavaScript (TypeScript, Dart, ActionScript, CoffeeScript) and used for developing HTML5/JS  desktop and mobile web applications. TypeScript seems to be the best choice. The IDEs and editors currently supporting TypeScript was then listed.

Finally, my plans to support various Alfresco and CMIS things with TypeScript was covered: port CMIS Spaces and FlexSpaces from Flex/AS3 to TypeScript, TypeScript wrappers for AlfJS and CMIS.JS, additional Alfresco and CMIS TypeScript libraries, sample showing a Share dashlet written in TypeScript, and a TypeScript definition file for intellisense / compile time type checking for Alfresco WebScripts.

The small TypeScript app (start on a repo browser) I started with definition wrappers for AlfJS, YUI3, with a dummy tree (no real data yet) and folder table (that displays data from Alfresco with AlfJS) is included here alf-yui-typescript-app1.zip (will add to github later). The definition for YUI3 comes from what this gist had with adds to get it to compile in Visual Studio 2012 with the TypeScript plugin.

Apache Stanbol Version of OpenCalais Integration – Alfresco DevCon 2012 lightning talk slides

I uploaded my slides to SlideShare from the first lightning talk presentation I made at Alfresco DevCon 2012 San Jose:

An Alfresco Apache Stanbol Integration (port of OpenCalais integration) – Alfresco DevCon 2012 San Jose

It covers the port of the OpenCalais Integration and its Share UI extension to work with Apache Stanbol. These integrations support auto-tagging, semantic tag clouds, and semantic geo-tagged maps. Both integrations are open source and available on Google Code .