Intelligent Analytics Capabilities

Key features

Natural Language Processing (NLP) and Machine Learning power the advanced features of Sinequa's Intelligent search platform. After more than 25 years of NLP research, we are experts at making sense of each piece of text, whatever the native language. In addition, the platform embeds state-of-the-art Deep Learning frameworks to close the gap between the experience of classical enterprise search and today's web search engines. The resulting proprietary index is optimized to cope with huge volumes and intensive usage.

Guide to Natural Language Processing (NLP)

Natural language processing is the science behind machine comprehension. If you’re new to the concept or looking for an overview of what it is and how it’s used, then this guide is for you.

Natural Language Text Processing

Natural Language Text Processing

Sinequa's NLP semantically enriches content in any language and powers an intelligent employee experience for search and analytics. At indexing time, NLP applies to:

  • Automatic detection of document languages (more than 135 detected languages) with a language splitter to manage documents that are switching from one language to another
  • Real Multilingual Analysis at the highest level within many languages such as English, French, German, Arabic, Chinese (simplified), Chinese (Traditional), Korean, Danish, Spanish, Finnish, Greek, Italian, Japanese, Dutch, Polish, Portuguese, Romanian, Russian, Swedish, Thai, Norwegian
  • Part of Speech tagging
  • Concept extractions for related terms
  • Depending on the languages, several speech technologies must apply, such as transliteration (Japanese), compound word splitting (German, for instance), model-based disambiguations, etc.
  • Semantic analysis
  • Content enrichment with variants, standard or custom rewritings, etc.
Statistical Analysis

Statistical Analysis

Sinequa utilizes advanced information retrieval techniques to provide relevant and contextualized results. Sinequa embeds a sophisticated variation of well known TF-IDF and BM-25 algorithms, enhanced by multiple factors including:

  • Proximity: a measure of the "closeness" of multiple search terms in a document
  • Proximity to the head content
  • Fidelity to the original form with regards to all linguistic variants
  • Neural search footprint from document passages
  • Relevancy corrections based on document freshness, data source weighting, document ratings, and feedback models
  • Text part weighting
  • Business rules
Semantic Extractors

Semantic Extractors

Intelligently identifies and extracts entities for document classification and tagging.

  • Extensive entity identification and extraction capabilities including geographic entities (such as countries, cities, and states), people names, companies, numerals, dates, times, amounts, distance, quantities, measure units, phone numbers, coordinates, URLs, e-mail addresses, hashtags, cashtags, at tags, date spans, time spans, and many other types of Personal Identifiable Information (PII)
  • Supplemental extractors integrated from third-party partners solutions such as SciBite, Refinitiv Intelligent tagging, Linguamatics, or MS Azure Media Services
Text Mining with NLP Skills

Text Mining with NLP Skills

Sinequa provides advanced capabilities to detect patterns in text, specifically for entity extraction, including:

  • Lists of named entities, cooccurrences, relationships
  • Complex patterns
  • Code-based extraction with inline C# custom developments

Advanced capabilities are included with the Sinequa platform to significantly simplify creating complex extraction rules, enhancing native capabilities with help from a dedicated descriptive language.

Data Classification

Data Classification

Easily classify documents using our embedded machine learning models and semantic techniques without being a data scientist. When content cannot be easily organized based on its location, existing metadata or associated properties, the dynamic classification may help surface structure out of the apparent chaos. Two technologies are combined to make this happen:

  • Deep-learning-based classification: The Sinequa platform enables administrators and subject matter experts to manage the complete lifecycle of their classification projects from inception to production and management of prediction accuracy over time (Active Learning), with a labeling application that enables subject matter experts to provide ongoing feedback. Neural networks power this technology, implemented with help from a transparently embedded Tensorflow framework and BERT transfer learning language models.
  • Rule-based classifiers. Classifiers are decision trees that can classify whether documents would be retrieved by typing one amongst several queries. As an asynchronous post-processing task, documents are spread across categories depending on their ability to match simple or complex search criteria.
Multi-layer Index Data Structure

Multi-layer Index Data Structure

Sinequa relies on its comprehensive and efficient index structure to deliver superior relevance from even the most extensive content and datasets without compromising performance.

If enterprise search were all about matching a keyword, a single index would suffice. While this may be sufficient for narrow applications of highly classified and structured data, it would fail if applied to unstructured data. Since the vast majority of information is captured in everyday language, no single index can serve as an optimal measure of the information contained in a corpus. Therefore, there is no one “ideal” index for every potential information query. The best results are achieved when multiple indexes are combined, each providing a different perspective or emphasis and a comprehensive view of the information available – thus deriving the best possible understanding of the meaning it carries.

When indexing unstructured data, Sinequa automatically generates a variety of indexes to provide the most comprehensive assessment of the text content. Sinequa also provides the ability to tailor how the different indexes are used in a search (by changing their weightings), allowing search results to be fine-tuned for the best results in highly specialized contexts. Therefore, the Sinequa index is a dynamic combination of the indexes: full text, structured, semantic.

Sinequa can query any combinations of indexes with different schemas, searching through all structured and unstructured data at once to offer the best data discovery modes.

Sinequa does not rely on any data structure derived from Apache Lucene. Multiple NLP layers process the raw text and optimally enrich it at the lowest possible level, ensuring rich functionality does not impact search performance because of external and supplemental packages or libraries.

Sinequa indexes comprise full-text parts, typed columns to store and retrieve associated metadata or entities extracted at indexing time, columns dedicated to security aspects, etc.

The index data structure is strongly optimized to ensure elasticity. It mitigates competition between simultaneous updates and searches. It is also secured with safe transactions, redundancy, and internal reorganization capabilities.

Partnerships that give rise to innovative solutions

aurexia
atos
accenture
Capgemini
cognizant

Discover what Sinequa can do for your business

Sinequa’s Search Cloud brings organizations of all sizes the most complete enterprise search ever. Schedule a personalized demo to show how Sinequa can benefit your organization.

What our customer say

"Sinequa is simply great technology. We immediately saw its benefit watching it perform something we didn’t know was possible. It makes an exponential difference for our organization. We were also impressed with the number of smart connectors available out-of-the-box and Sinequa’s unique ability to develop new ones."

Oliver Thoennessen, Senior Manager Global IT Drug Development

UCB
Resources
Documentation

Documentation

The Documentation Portal is the reference guide to all features and components of the Sinequa platform. In addition, the portal contains number of tutorials, how-tos, as well as the extensive reference manuals of our development frameworks, toolkits and APIs.

Download Center

Download Center

The Download Center is the place from where you can download the Sinequa platform, from the latest stable version to the most experimental one. Go through the various release notes, search for older versions your environment might require, and securely upload your files, should you want to exchange documents with the customer support team.

Support Service

Support Service

A Sinequa support portal is accessible by customers or partners who have been trained and certified by Sinequa. It lets you submit cases and track all changes in relation with our Customer Engineering’s team. The support portal is also a good entry point to submit feature requests or questions that require quick answers.

Sinequa
Discover the power of Sinequa
Get started
©2023 Sinequa. All rights reserved | Privacy policy | Consent choices | Accessibility Statement