Apache Stanbol Version of OpenCalais Integration – Alfresco DevCon 2012 lightning talk slides

I uploaded my slides to SlideShare from the first lightning talk presentation I made at Alfresco DevCon 2012 San Jose:

An Alfresco Apache Stanbol Integration (port of OpenCalais integration) – Alfresco DevCon 2012 San Jose

It covers the port of the OpenCalais Integration and its Share UI extension to work with Apache Stanbol. These integrations support auto-tagging, semantic tag clouds, and semantic geo-tagged maps. Both integrations are open source and available on Google Code .

OpenCalais Integration updated for Alfresco 4.0 and added use of new Share config mechanism. Apache Stanbol Plans

The OpenCalais Integration for Alfresco was finally updated for Alfresco 4.0 . Given the shift away from the Alfresco Forge to the Alfresco Add+Ons catalog site, the new home for the OpenCalais integration is now on a  Google Code site pointed to by its add+ons page

For Alfresco 4.0 with Solr enabled, an issue was fixed. (Some code that needed to get to newly added top level semantic tag categories right away needed to change to use a CategoryService API instead of using a search query since there is added delay in indexing. Changing the alfresco.cron  value  in solrcore.properties from 15 secs to 150 secs helped to get something that was intermittent to be reproducible every time.)

The Share Integration (semantic tag cloud dashlet, semantic geo-tagged map dashlet, auto-tag action menu in doc libraries and repository) was updated to use the new doclib action config mechanism added in 4.0 . Its much nicer to put an added action menu in a web-extension/share-config-custom.xml file than to setup the  modified versions of actions-common.get.head.ftl , documentlist.get.config.xml, etc in web-extension.  (Helpful ECM Stuff blog post on Share action config in Alfresco 4.0)

To use the free OpenCalais service, you need to get an API key from  opencalais.com  This allows you to submit 50,000 documents a day. More requests are supported in the non free version called Calais vs. the free (but not open source) service called OpenCalais. Note that document size per submission is to 100k bytes in all versions, the service retains extracted metadata (doesn’t retain content). So its geared more for news articles than large sensitive documents.  Calais has a test page to try out giving it text and seeing what it extracts.

To use the Share auto-tag action menu  (used to do a one time auto-tag on a document) you need a Calais api key setup in module\calais\module-context.xml  (see readme.txt).  Semantic tags will be listed in the properties section of a Share document details page  (not with regular tag UI since a different category content model / custom root category is used for semantic tags). You can also add one or more semantic tag clouds dashlets and a semantic geo-tagged map dashlet to share dashboards (site and/or global) to navigate from semantic tags to documents.   In explorer, doing a one time auto-tag you need to used the run a rule on a doc action and give the Calais key each time in the dialog.   A  rule to auto-tag documents in a folder can be setup in Explorer or Share (using the “Auto-tag with Calais” action, and you need to give the Calais api key as a parameter to this).

FlexSpaces has support for the OpenCalais integration in all its versions (desktop AIR client, Flex in-browser, Mobile AIR).  Like Share it supports semantic tag clouds, a semantic geo-tagged map, and one time auto-tagging like Share. It has additional OpenCalais features: semantic tag suggestion, adding / removing semantic tags on a document.  You can setup a Calais api key (and Alfresco server info) in FlexSpaces preferences dialog that was added in the 2012.02.08 version and avoid having to do this in FlexSpacesConfig.xml . Info entered is sticky and per user on their local machine (stored in a Local Shared Object). So theoretically each user could submit 50,000 documents a day to OpenCalais if they each signed up for a key.  FlexibleShare includes FlexSpaces and its semantic features, but hasn’t been updated with the preferences dialog or other recent FlexSpaces changes yet (update: FlexibleShare 6/28/2012 version now has the preferences dialog and Alfresco 4.0 support too).

FlexSpaces Preferences Dialog

Plan to have an Alfresco integration with Apache Stanbol on the same Semantics4Alfresco Google Code  site with the OpenCalais integration.  Apache Stanbol (derived from the IKS project) is fully open source, is a general stack of frameworks for semantic content management and can do more than content enhancement,  can get around the drawbacks of OpenCalais, and gives you more flexibility to setup customized ontologies vs. the fixed support Calias has. Stanbol can also call other enhancement engines instead of the default OpenNLP or even chain them together. Stanbol has an adapter for OpenCalais. For enhancing news, OpenCalais works better out of the box than OpenNLP.   Zaizi has already done Stanbol integration work, although only a version for an old IKS version is currently open source.  Integrated Semantics will leverage  / extend any newer Stanbol integration that Zaizi makes available open source.  A Stanbol integration could extend Solr facets with semantic facets.

Alfresco OpenCalais Integration Share UI

The Alfresco OpenCalais Integration now has UI (Spring Surf / HTML /JavaScript / YUI)  for Alfresco Share in addition to the support in FlexSpaces (Flex/Flash).  The Share UI has a semantic tag cloud dashlet, a geo-tagged (Google map based) semantic map dashlet, and an auto-tagging action.  The Share UI is for Alfresco 3.3 and 3.4.

share-calais-dashlets-2.png

The dashlets will show semantic tags in all share sites when added to the overall Share dashboard, and show site specific semantic tags when added to site dashboards.  Clicking on a tag in the semantic tag cloud or on a semantic tag map marker will take you to a search results list of documents with the semantic tag.  The semantic tag cloud dashlet can be changed to show semantic tags for a specific category or all categories.

The semantic tag cloud dashlet is based on  Will Abson’s tag cloud dashlet in the Alfresco Share Extras collection. Will now also has a Google map dashlet in this collection showing geo-location of photo files using Tika extracted metadata available in Alfresco 3.4.

share-calais-autotag-action-2.png

The added auto-tag action menu (in more menu and details page) can be used to auto tag the selected document with the OpenCalais service. This action is added to both site document libraries and repository document library page menu.The auto-tagging action can also be setup in a content rule to auto-tag all documents in a folder in the rule UI of Alfresco Explorer or in Share (choose to perfom the action “Auto-tag with Calais”).

Note that semantic tags are implemented with categories with a custom root category. They won’t show up in regular Alfresco tag or category UI.  Currently only the Alfresco Explorer details page will list semantic tags (update 3/30/2011: will now show up the Share doc details page too in the 1.3.1 version of the OpenCalais integration).

FlexSpaces, in addition to having the semantic tag clouds, semantic map, and auto-tag action features in the Share UI, also has support for suggesting semantic tags and for editing what semantic tags are assigned to a document.  See the semantic features in action in this screen-cam of an older version of FlexSpaces.

FlexSpaces, CMIS Spaces, and FlexibleDashboard updates

FlexSpaces 0.95  (for Alfresco and for Adobe LiveCycle Content Services ES2)

CMIS Spaces (based on FlexSpaces, for content servers supporting the CMIS standard)

  • Build 17 Added multi-file drag out to desktop in AIR version (in addition to existing multi-file drag in)
  • Build 16 Added fixes to get navigation/browsing and upload to work on Day Software CRX 2.1 + CMIS package (issues remaining with Day CRX: upload doesn’t show up, 0 search results).
  • Tested with Alfresco 3.3g and Day CRX 2.1. Previously have tested with Alfresco 3.2/3.3, EMC Documentum, IBM FileNet, Nuxeo. Haven’t tried with new CMIS support for SharePoint Microsoft has.
  • CMIS Spaces on Google Code
  • CMIS Spaces on Alfresco Forge

FlexibleDashboard

  • Build 2:  Added BIRT report viewer pod
  • Build 2: Added pivotable OLAP grid with XMLA datasource support (Mondrian, Pentaho, etc.)
  • FlexibleDashboard on Google Code

Flexspaces with Adobe LiveCycle Content Services ES, Calais Integration works with ES2

FlexSpaces Easier To Use With LiveCycle Content Services ES

I finally updated FlexSpaces  (version 0.931) to not need a recompile for a server URL change with LiveCycle Content Sevices ES (changes to FlexSpacesConfig.xml and to server side services-config.xml still required).

Calais Integration

I tested with LiveCycle ContentSevices ES 8.2.1  and also with the LiveCycle ES2 M3 R3 beta. On ES2, I also tested with the Alfresco Calais Integration and it works fine via the FlexSpaces Calais UI (auto semantic tagging, tag suggestion, google map geo-tagging).  Nice thing about the ES2 installer is it allows you to include custom AMP files. I used the turnkey install, selected the custom option when the Configuration Manager ran.  I had the calaisIntegration.amp release 1.1  in c:\amps.  When the config mgr is deploying content services, check the include custom amps checkbox, and browse to choose c:\amps.

Remaining Problems

Still have two remaining problems with FlexSpaces on LC Content Services.  Get an authentication prompt on upload that can cause the first upload not work on AIR/Windows (on Mac/AIR, Windows/Mac/browser get errors on upload). Alfresco has an alf_ticket URL arg that makes it easy to authenticate with. You can’t use this with LiveCycle ContentServices. You have to use authentication headers. Flex doesn’t let you use  headers with FileReference.upload()   or navigateToURL()   (view a file given an Alfresco download URL).    For navigateToURL, it works other than getting an authentication prompt the first time using file viewing.For uploading, I think by switching to upload to  /remoting/lcfileupload  instead directly to a webscript url will be the part of the solution. This will get files into the LiveCycle “Repository”.  Just need to get files from there to the LiveCycle Content Services repository.  Unlike this ADC article I don’t want to have to require the LC Process Mgt option. If anybody has some suggested APIs or sample code, let me know. Don’t think there is a work around for the navigateToURL issue.

Details

1. Changed to new up a ChannelSet with channel URLs coming from the FlexSpaceConfig.xml Spring ActionScript file instead of compiling in a services-config.xml.2. See doc\livecycle\readmeLiveCycleContentServices.txt for FlexSpacesConfig.xml LC CS specific changes, and server side services-config.xml changes still required.3. Note FlexSpaces needs its FlexSpacesConfig.xml configured with a Calais key and a Google Map api key to get UI for the Calais Integration enabled (see doc\flexspacesAir\readmeFlexSpacesForAIR.txt)4. For instructions on adding an amp to an existing install (of 8.2.1, dir name, deploy areas different on ES2) see Dr Flex & Dr LiveCycle. (Haven’t tried the Calais Integration on LC CS 8.2.1. It should work since it works with Alfresco 2.1).