SOLR Security with ManifoldCF
This article explains how to implement SOLR “document level security” using Manifold Connector Framework (ManifoldCF). ManifoldCF is an open source framework for pulling content out of a repository and sending it on to targets such as SOLR via a plug-in style and connector-based architecture. ManifoldCF includes connectors for numerous commercial and open source data sources, including Documentum, SharePoint, JDBC and RSS. ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.
The ManifoldCF security model is based loosely on the standard authorization concepts and hierarchies found in Microsoft’s Active Directory. ManifoldCF defines the concept of an access token. As per the ManifoldCF security model, it is the job of the authority to provide a list of access tokens for a given searching user. Multiple authorities cooperate, in that each one can add to the list of access tokens describing a given user’s security.
The sections below are about setting up ManifoldCF, ManifoldCF crawler usage and configuring ManifoldCF plugin with SOLR.
- Setup ManifoldCF
- Configuration of ManifoldCF with SOLR
This section explains how to setup ManifoldCF.
Download ManifoldCF binary distribution from -
https://manifoldcf.apache.org/en_US/download.html and unzip it
Open command prompt and use start.bat to start ManifoldCF as shown below:
This will start ManifoldCF, get the required services running and the desired connection types properly registered.
- ManifoldCF user interface can be accessed using crawler.
When you enter the Framework user interface for the first time, you will first be asked to log in
- Enter the login username and password for your system. By default, the user name is "admin" and the password is "admin". The screen should look something like this:
- Create an output connection by selecting the "List Output Connections" option
- Enter the Name, the Description and click on the “Type” tab to select SOLR output connection and then click on "continue"
- Select the "Single server" option from Solr type, since we are setting up in a single box
- Select the "Server" tab to configure SOLR
- Select the "Schema tab" to enter primary key information about the existing Solr collection and save
- Authority Groups: Create an authority group by clicking on the "List Authority Groups" and "Add a new authority group" options
User Mapping Connections:
- Create a mapping connection by clicking on the "List User Mapping Connections" and "Add a new connection" options
- Select the "Type" tab as the regular expression mapper and save. If everything is good, the crawler displays - "Connection working"
- Create an authority connection by clicking on the List Authority Connections link
- Create a new connection by clicking on Add new connection
- Enter name and description and click on "Type" to select "Authority type" as follows:
- Select the authority group which was created before and save it.
Configuration of ManifoldCF plugin with SOLR:
This section provides a step by step process to configure ManifoldCF plugin with Solr.
- Copy from $:\apache-manifoldcf-2.3\plugins\solr\solr-X.X\apache-manifoldcf-solr-X.X-plugin-2.2.JAR to Solr core lib directory
There are two ways to hook up security to Solr in this package. The first is using a Query Parser plugin. The second is using a Search Component. In both cases, the first step is to have ManifoldCF installed and running.
- Then, you will need to add fields to Solr schema.xml file for document authorization information. These fields need to contain ‘allow’ and ‘deny’ information of documents.
- The default value of "__no security__" is required by this plugin, so do not forget to include it.
- Using the Query Parser Plugin: To set up the query parser plugin, modify your solrconfig.xml to add the query parser
- Start Solr instance and using following xml data, post xml to Solr. In this example, see the highlighted text to provide user token to access document
- Query data without providing user tokens will return results which have a user token as "__nosecurity" (default token). In the above scenario, Solr will not return the document with a token
- Query with following user tokens, the Solr will have all the results along with above results.
MCF Authority Service:
Access Token: ManifoldCF defines the concept of an access token. An access token, according to ManifoldCF, is a string which is meaningful only to a specific connector or connectors. This string describes the ability of a user to view (or not view) some set of documents. To see access token, use the following URL.http://localhost:8345/mcf-authority-service/UserACLs?username=User1
Indexing data to SOLR:
Query data using SOLR Admin: