Alfresco GenAI Semantic Project

The Alfresco GenAI Semantic github project is available now. This is a fork of the Alfresco GenAI project with spaCy NLP python library entity linking to DBpedia and Wikidata added for now.

The Alfresco GenAI project provides support for generative AI with local or cloud LLMs for Alfresco. This includes summarization, categorization, image description, chat prompting about doc content.

The Alfresco GenAI Semantic project adds named entity recognition (NER) / entity linking of documents in Alfresco to Wikidata and DBpedia. Currently 2 custom aspects have multi-value properties for the links, alfresco tags aren’t used yet.

The spaCy NLP python library along with spaCy projects are used. The spaCyOpenTapioca project is used for getting Wikidata entity links. The DBpedia Spotlight for SpaCy project is used for getting DBpedia entity links. Note these both use external servers, which can be setup locally. NER can also be done with just spaCy. The spaCy-LLM python package that integrates Large Language Models (LLMs) into spaCy pipelines is available. The Alfresco GenAI Semantic project currently doesn’t use spacy-llm yet.

Below shows what test\space-station.txt after upload and entity linking with the Entity Link Wikidata aspect looks like in the Alfresco ACA content app in the view details when expanded out:

Below shows what test\space-station.txt after entity linking with the Entity Link DBpedia aspect looks like in the Alfresco ACA content app in the view details when expanded out:

TypeScript for Alfresco and CMIS – Alfresco DevCon 2012 lightning talk slides and sample app

I also uploaded my slides to SlideShare from the second lightning talk presentation I made at Alfresco DevCon 2012 San Jose.

TypeScript for Alfresco and CMIS – Alfresco DevCon 2012 San Jose

This briefly covered some languages that can be translated to JavaScript (TypeScript, Dart, ActionScript, CoffeeScript) and used for developing HTML5/JS  desktop and mobile web applications. TypeScript seems to be the best choice. The IDEs and editors currently supporting TypeScript was then listed.

Finally, my plans to support various Alfresco and CMIS things with TypeScript was covered: port CMIS Spaces and FlexSpaces from Flex/AS3 to TypeScript, TypeScript wrappers for AlfJS and CMIS.JS, additional Alfresco and CMIS TypeScript libraries, sample showing a Share dashlet written in TypeScript, and a TypeScript definition file for intellisense / compile time type checking for Alfresco WebScripts.

The small TypeScript app (start on a repo browser) I started with definition wrappers for AlfJS, YUI3, with a dummy tree (no real data yet) and folder table (that displays data from Alfresco with AlfJS) is included here alf-yui-typescript-app1.zip (will add to github later). The definition for YUI3 comes from what this gist had with adds to get it to compile in Visual Studio 2012 with the TypeScript plugin.

Apache Stanbol Version of OpenCalais Integration – Alfresco DevCon 2012 lightning talk slides

I uploaded my slides to SlideShare from the first lightning talk presentation I made at Alfresco DevCon 2012 San Jose:

An Alfresco Apache Stanbol Integration (port of OpenCalais integration) – Alfresco DevCon 2012 San Jose

It covers the port of the OpenCalais Integration and its Share UI extension to work with Apache Stanbol. These integrations support auto-tagging, semantic tag clouds, and semantic geo-tagged maps. Both integrations are open source and available on Google Code .

FlexibleShare updated

FlexibleShare extends FlexibleDashboard (dashboard framework, BI charting, reporting pods) with FlexSpaces doc management pods (Alfresco backend) and adds additional Flex pods for Share collaboration (Alfresco Share backend). All three of these projects are open source. FlexibleShare has been updated to use code from the latest versions of FlexSpaces and FlexibleDashboard, and the Share pods have added site selection drop downs.  Also added an  Alfresco Add-Ons page for FlexibleShare.

The doc management portion now has support for Alfresco 4.0, and a new preferences dialog for easier setup of server domain/port and API key setup of optional semantic auto-tagging with the OpenCalais Integration for Alfresco. The default config in the flexibleShareAirPods.xml just has the combined multi-view FlexSpaces pod shown in the top left. This screenshot also shows the available search, tasks, and local files pods (the all repository doc lib pod is not shown). In the AIR version, files from the local files pod can be copied into a doc lib view via drag/drop. Also in the AIR version, multi-select files can be copied via drag / drop from the desktop into a doc lib view,  copied out via drag/drop, or the native desktop clipboard can be used to copy/paste of files between the desktop and a doc lib   (AIR can do more than the HTML5 drag in available in some browers).

flexibleshareairbld4-33percent.png

The Share collaboration wiki, blog, discussions, doclib Flex pods are now more usable out of the box with added drop-downs to select Share site to work with (instead of setting the share site shortName in the pods xml file). More work is needed to hook up the calendar pod to load Share site calendar info (and an add event dialog is not available yet). Although the calendar pod is able to load iCalendar files, more work is needed to get it to work with the iCalendar data available from the Alfresco “slingshot” /calendar/eventList?site={shortName}&format=calendar webscript.

flexiblesharesharepods2.png

Planned for FlexibleShare:  calendar pod hookup with Share sites,   multiple repository support,  support for CMIS repositories, drag/drop copy between repositories, support for Alfresco Cloud repositories, multi-repository search,  Solr facets search navigation,  support for Apache Stanbol semantic auto-tagging / semantic search,  mobile/touch?, and a port/translation to HTML5 / CSS / JavaScript (FlexSpaces, CMIS Spaces, FlexibleDashboard, FlexibleShare).

Steve Reiner
Integrated Semantics
@stevereiner on twitter

OpenCalais Integration updated for Alfresco 4.0 and added use of new Share config mechanism. Apache Stanbol Plans

The OpenCalais Integration for Alfresco was finally updated for Alfresco 4.0 . Given the shift away from the Alfresco Forge to the Alfresco Add+Ons catalog site, the new home for the OpenCalais integration is now on a  Google Code site pointed to by its add+ons page

For Alfresco 4.0 with Solr enabled, an issue was fixed. (Some code that needed to get to newly added top level semantic tag categories right away needed to change to use a CategoryService API instead of using a search query since there is added delay in indexing. Changing the alfresco.cron  value  in solrcore.properties from 15 secs to 150 secs helped to get something that was intermittent to be reproducible every time.)

The Share Integration (semantic tag cloud dashlet, semantic geo-tagged map dashlet, auto-tag action menu in doc libraries and repository) was updated to use the new doclib action config mechanism added in 4.0 . Its much nicer to put an added action menu in a web-extension/share-config-custom.xml file than to setup the  modified versions of actions-common.get.head.ftl , documentlist.get.config.xml, etc in web-extension.  (Helpful ECM Stuff blog post on Share action config in Alfresco 4.0)

To use the free OpenCalais service, you need to get an API key from  opencalais.com  This allows you to submit 50,000 documents a day. More requests are supported in the non free version called Calais vs. the free (but not open source) service called OpenCalais. Note that document size per submission is to 100k bytes in all versions, the service retains extracted metadata (doesn’t retain content). So its geared more for news articles than large sensitive documents.  Calais has a test page to try out giving it text and seeing what it extracts.

To use the Share auto-tag action menu  (used to do a one time auto-tag on a document) you need a Calais api key setup in module\calais\module-context.xml  (see readme.txt).  Semantic tags will be listed in the properties section of a Share document details page  (not with regular tag UI since a different category content model / custom root category is used for semantic tags). You can also add one or more semantic tag clouds dashlets and a semantic geo-tagged map dashlet to share dashboards (site and/or global) to navigate from semantic tags to documents.   In explorer, doing a one time auto-tag you need to used the run a rule on a doc action and give the Calais key each time in the dialog.   A  rule to auto-tag documents in a folder can be setup in Explorer or Share (using the “Auto-tag with Calais” action, and you need to give the Calais api key as a parameter to this).

FlexSpaces has support for the OpenCalais integration in all its versions (desktop AIR client, Flex in-browser, Mobile AIR).  Like Share it supports semantic tag clouds, a semantic geo-tagged map, and one time auto-tagging like Share. It has additional OpenCalais features: semantic tag suggestion, adding / removing semantic tags on a document.  You can setup a Calais api key (and Alfresco server info) in FlexSpaces preferences dialog that was added in the 2012.02.08 version and avoid having to do this in FlexSpacesConfig.xml . Info entered is sticky and per user on their local machine (stored in a Local Shared Object). So theoretically each user could submit 50,000 documents a day to OpenCalais if they each signed up for a key.  FlexibleShare includes FlexSpaces and its semantic features, but hasn’t been updated with the preferences dialog or other recent FlexSpaces changes yet (update: FlexibleShare 6/28/2012 version now has the preferences dialog and Alfresco 4.0 support too).

FlexSpaces Preferences Dialog

Plan to have an Alfresco integration with Apache Stanbol on the same Semantics4Alfresco Google Code  site with the OpenCalais integration.  Apache Stanbol (derived from the IKS project) is fully open source, is a general stack of frameworks for semantic content management and can do more than content enhancement,  can get around the drawbacks of OpenCalais, and gives you more flexibility to setup customized ontologies vs. the fixed support Calias has. Stanbol can also call other enhancement engines instead of the default OpenNLP or even chain them together. Stanbol has an adapter for OpenCalais. For enhancing news, OpenCalais works better out of the box than OpenNLP.   Zaizi has already done Stanbol integration work, although only a version for an old IKS version is currently open source.  Integrated Semantics will leverage  / extend any newer Stanbol integration that Zaizi makes available open source.  A Stanbol integration could extend Solr facets with semantic facets.