Powerful ingestion pipeline

Key features

An intelligent search platform is only as good as the content that it can access. With hundreds of pre-built Enterprise Search connectors, Sinequa connects to all sources and formats, allowing you to extract value from all your content – structured and unstructured. Built on 25 years of research in natural language processing (NLP), employees get the most relevant answers to their questions. The platform applies deep learning to enrich information, enabling continuous relevance improvement.

Guide to Natural Language Processing (NLP)

Natural language processing is the science behind machine comprehension. If you’re new to the concept or looking for an overview of what it is and how it’s used, then this guide is for you.

Standard connectors

Standard search connectors

Sinequa natively offers a multitude of connectors to cope with any enterprise data sources, wherever they reside.

  • Off-the-shelf connection to most of all enterprise sources and systems, with hundreds of prepackaged connectors
  • New connectors added regularly to the standard catalog to better meet customer's expectations and support emerging and modern content management systems
  • Connectors developed and maintained internally, ensuring full integration, optimized analysis, ease of configuration
  • Full compliance with the sources' native security model
  • No code! Just fill in a few parameters that are specific to each connector type.
Custom search connectors

Custom search connectors

Complete toolkit to develop custom connectors for connecting to home-grown data sources.

  • Connector toolkit for incorporating content from internally built and legacy systems
  • Standard connectors like JSON connector, XML connector, Database connector, HTTP connector, File System connectors enhanced by plug-ins to address particular data sources
  • Any SaaS product can be indexed with help from a custom connector in a matter of hours, assuming they include a Web services API
  • Custom connectors code already available in Open Source (e.g:Slack connector)
Supported document formats

Supported document formats

Sinequa provides the ability to recognize and process more than 300 formats of structured and unstructured data.

  • Native support for more than 300 document formats
  • Text in any Unicode character set, including double-byte character sets such as Chinese and Japanese
  • Markup languages, such as HTML, XML
  • XMP-compatible formats, such as JPEG, TIFF, PSD, EPS
  • Microsoft Office documents, such as Word, Excel, PowerPoint, and RTF
  • Adobe PDF
  • Open Office document formats
  • Compressed archives can be indexed as single containers or recursively as folders containing files and other archives (Zip, Rar, Tar, 7z, Pst, bz2, etc.)
  • Structured formats (JSON, CSV, SAS datasets, SAP IDOC, AutoCAD DWG, 3D parts, etc.)
  • Image formats, like BMP, JPEG, PNG, GIF
  • An extensive list of document converters to extract text from any document format

The Sinequa platform also supports Optical Character Recognition (OCR). It reads any standard format generated by such applications, enabling you to index and search vast amounts of paper documents not natively created in electronic formats.

Data collection

Data collection

Optimized or customized content ingestion, thanks to pre-parameterized templates and support for on-demand indexing.

  • On-demand mode: triggered on-demand, as needed by the administrator
  • Scheduled mode: automatic execution at pre-set intervals or according to a set calendar, using the built-in Sinequa scheduler or any third-party scheduler
  • Trigger mode: automatically triggered events (e.g., whenever a document is added to a particular location or after 1,000 records are added to a database)

Indexing can be full or incremental:

  • Complete indexing: the source is fully indexed (or re-indexed); used for initial indexing of a new data source or when a datastore is replaced rather than updated
  • Incremental indexing: only new or updated data is indexed
  • Real-time indexing: the data source itself indicates what has to be indexed compared to the last indexing task, contributing to a short-time synchronization between the index and the content source
  • Collection-cache mode: ability to re-index without accessing the data source, which is beneficial when new semantic extractors are set up to identify new concepts, named entities, or relations after NLP resources are updated
Indexing process

Indexing process

Comprehensive document scanning and in-depth text analysis are foundational for a customizable indexing pipeline, including native integrations with SciBite, Azure Media Services, and others.

Deep text analysis at indexing time including:

  • Recursive scan of multiple records documents (CSV, JSON, XML, PST files, Compressed archives, etc.)
  • Text conversion to HTML format from any native binary format
  • NLP at indexing time (language detection, part of speech tagging, etc.)
  • Multi-layer full-text indexing from whole data content
  • Standard named entity extractions
  • Standard or tailored text mining
  • Data mapping from data sources metadata and extracted entities to typed index columns
  • Multiple entry points via plug-ins to customize the indexing pipeline as needed

Partnerships that give rise to innovative solutions


Discover what Sinequa can do for your business

Sinequa’s Search Cloud brings organizations of all sizes the most complete enterprise search ever. Schedule a personalized demo to show how Sinequa can benefit your organization.

What our customer say

"We are excited to work with Sinequa on this important contract. Using its knowledge management platform, we are helping NASA to access and utilize decades’ worth of information. By better connecting NASA’s workforce to digital content, we can help them deliver on critical space missions."

Bob Genter, Executive Vice President and General Manager of the Civilian Markets Customer Group at SAIC



The Documentation Portal is the reference guide to all features and components of the Sinequa platform. In addition, the portal contains number of tutorials, how-tos, as well as the extensive reference manuals of our development frameworks, toolkits and APIs.

Download Center

Download Center

The Download Center is the place from where you can download the Sinequa platform, from the latest stable version to the most experimental one. Go through the various release notes, search for older versions your environment might require, and securely upload your files, should you want to exchange documents with the customer support team.

Support Service

Support Service

A Sinequa support portal is accessible by customers or partners who have been trained and certified by Sinequa. It lets you submit cases and track all changes in relation with our Customer Engineering’s team. The support portal is also a good entry point to submit feature requests or questions that require quick answers.

Discover the power of Sinequa
Get started
©2022 Sinequa. All rights reserved | Privacy policy | Consent choices