Data access and security management in the context of an Enterprise Search platform
While web search engines are used to cope with scalability, reliability and, to some extent, relevancy, it appears that published pages on the Internet are, by nature, to be accessed and read by as many people as possible.
Modeled after web search experience and its ability to merge so many public data sources into a single answers list based on a simple query typed into a search bar, enterprise search platforms have been able to break down data silos and reproduce the same user experience as applied to enterprise content for some time.
However, Enterprise Search vendors (including Microsoft and Google and despite their vast experience in Web Search) have faced new challenges like data source variety, heterogeneity of document formats, complex metadata management, data governance, etc.
But the biggest obstacle to deploying an enterprise search solution is probably respect of security rules and, more precisely, user rights to access information.
The golden rule of security is: In an enterprise search system, the end user must not be able to access any piece of information they aren’t allowed to access without the enterprise search system.
In other words, no document, no sentence inside a document, no metadata, no numeric value from any data management system, and even the existence of such information should not be retrievable by an end user who wouldn’t be allowed to access that information if the enterprise search system was not there to help.
Examples are obvious. Information regarding your colleagues’ salaries, trade secrets, intellectual property, personal emails, detailed financials, strategic forecasting and so forth are stored in silos based on services, business units, hierarchical organizations, and even individual folders and should not be accessible in any way in an enterprise search system, except by those who have explicit permission in the data source itself.
This is the basic truth of security management in enterprise search.
Now let’s explore this challenge further.
As previously mentioned, Enterprise Search is all about accessing information from multiple data sources via a single point of access while respecting user access rights.
Let’s consider this information landscape.
Each data source is made up of some type of storage and, in general, software that manages access to the content in that storage. We are used to talking about Content Management Systems (CMS) even if, from a broader perspective, we should also include file systems, databases, and any business application relying on software whose role is to host and manage content.
Assuming these systems are somehow secured, only a defined population of users are allowed to access the data according to specific security rules. Access rights (or denials) can be setup such that they depend on several factors. The most encountered security model is called RBAC (Role Based Access Control).
In short, each document can be seen by a given population that is determined by a defined set of users or groups of users.
It could be that simple, but unfortunately, these rules will vary from one data source to another. Moreover, they are often defined using different vocabularies based on multiple people directories and a complex hierarchical organization. The first challenge is to reconcile user identities each time we index documents from multiple data sources.
To be clear, this identity reconciliation is needed to turn the many data source identities into a single identity so that when the end user logs into the enterprise search system, we know all the information that user should have access to based on all of their varied user and group affiliations.
We often talk about domain mapping to link user identities from different systems together as they correspond to the same person. This process may have to be done frequently if your IT system is diverse and made up of multiple types of data systems But the benefit of domain mapping is knowing exactly who each end user is regardless of the number of identities that individual may have across data sources and user directories.
Let’s now consider how an enterprise search query will retrieve only those documents a specific end user is allowed to access based on the various data sources permissions.
Two main methodologies coexist: ‘early binding’ and ‘late binding’. In early binding, the query is performed along with a security filter so only those documents that the end user is allowed to access in the data sources are retrieved by the search engine. To make this possible, all access rights are ‘indexed’ and attached to the indexed documents.
With early binding, searching for only documents the end user is allowed to access in the data sources is just like performing an advanced search composed of the search terms plus a strong, additional criterion preventing unauthorized documents from being retrieved from the index.
In the case of ‘late binding’, the engine is asked to retrieve any relevant documents. But before presenting the answers to the end user, a filter is applied on the front-end to display only those documents that the end user is authorized to access in the data sources. This requires that each possible answer is checked against the user access rights in the various data sources before any answers are displayed.
While early binding relies mostly on a smart indexing, late binding requires all of the work to be done at search time, which can lead to several performance and consistency issues.
For example: let’s assume an end user is searching for something very broad, causing the search engine to retrieve a long list of relevant documents organized into pages, facets, tabs, etc. It’s very possible that this end user will only have access to few of these possible answers.
The post-filtering performed by the late-binding approach will have serious consequences on query time and will also affect the consistency of pagination, metadata counts, and many other navigation features like autosuggest, spelling correction, etc. Furthermore, this could result in displaying automatically suggested or corrected words that only belong to forbidden documents. (This could even result in a security breach by revealing confidential words!)
Therefore, with the infrequent exception of very special cases where access rights are too dynamic and context-based. The early-binding method is the best option to ensure that permissions across all data sources are followed while providing the best end user experience at search time.
To conclude this section, it’s important to recognize that there are many additional benefits to early binding This brief explanation only covers those found most often in a general use case.
Let’s now consider several cases where security management goes far beyond the general case described above. To make them clear and easy to understand, we will state them as a single sentence from the user perspective.
Enterprise Search systems can be purposely set up so that:
- As a user, I am allowed to identify the existence of a given document, but I’m not allowed to open it. This is frequently requested use case, which basically correspond to an overriding of the native security rule. Such behavior can be configured, but the administrator must be aware of possible side effects. Several display or navigation features might disclose parts of the document (e.g., extracted metadata or named entities displayed in facets, autosuggest, etc.).
- As a user, I am allowed to search and retrieve a document, but I’m not allowed to search and access specific metadata. For instance, I am allowed to read an employee information card, but I cannot see the employee’s salary, which is stored in a precise index column that is only visible to human resources and accounting services.
- As a user, I am only allowed to search and retrieve part of a document. The rest of the document’s content is protected so only a sub-population with higher privileges may search and access the full document.
- As a user, I am allowed to search and retrieve a document if and only if several conditions set on the document itself are all satisfied. This case is often the norm for investigation and intelligence services. To establish a strong security model, rules define the combination of attributes, such as role, project, resource type, action, etc., that are required in order for the end user to access a document. It is not enough for a user to have only a specific role, but rather the user’s access request must match all attributes defined in the rule for the end user access the document. This security model is called ABAC (Attribute-Based Access Control) as opposed to RBAC described above.
- As a user, my access to a document is granted or forbidden after applying an ordered list of conditions. In general, the process stops as soon as a condition is applicable. This type of security models can be found in complex data management solutions like PLM (Product Lifecycle Management) systems.
Moreover, you can encounter a sophisticated combination from all of the above examples and many others.
While there are several additional security points to consider like protection against intrusion, data in-flight encryption, data at-rest encryption, and many others, we’ve discovered here how challenging the respect of security can be when implementing an Enterprise Search platform. Failing to protect data while deploying an enterprise search solution is never an option.
In the past, the risk of opening a security breach in the enterprise while deploying an Enterprise Search solution at large scale had been a major obstacle to such an initiative. Today, this risk has dramatically decreased, even so an Enterprise Search solution must evolve with ever growing security management complexity.
My advice… don’t be afraid to perform a security audit at the staging phase and before go-live. If properly selected, the Enterprise Search solution must be ready for that.
And finally, the impact on performance of poorly implemented security management should never be overlooked. Performance and scalability are always another condition for adoption, even if your Enterprise Search system is well protected.
Related blog posts.
Recently, we announced that Sinequa Search Cloud is now available through Microsoft Azure Marketplace. This blog post explains how acquiring our solution directly from the marketplace brings additional benefits to your ...