Sinequa Augments Companies with Release of New Generative AI Assistants. Learn more

Unlock the Value of Data in Life Sciences with SciBite and Sinequa

Posted by Charlotte Foglia

Unlock the Value of Data in Life Sciences with SciBite and Sinequa

Organizations are racing to convert data into insight. Some 45.3 percent of executives surveyed say they’re investing $50 million or more annually in big data and artificial intelligence (AI) initiatives, with another 16.7 percent investing $500 million annually in these initiatives. Healthcare and life sciences organizations are leading the pack. Healthcare and life sciences executives report that:

  • 40.9 percent have created a data-driven organization.
  • Some 66.7 percent are driving innovation with data.
  • 45.5 percent have achieved transformational outcomes with data.
  • 50 percent are increasing their investment in data.
  • 63.6 percent believe they are leaders in data and AI.

This success is even more amazing when you consider the complexity of the data the industry produces and manages. Life sciences organizations must glean insight from data that encompasses scientific literature; genetics, chemistry, and statistical data; real-world evidence such as scientific publications, clinical trial findings, and regulatory filings and approvals; manufacturing, marketing and sales information; external publications and reports; competitive analyses – and the list goes on and on.

This data is spread throughout life sciences organizations and online. It encompasses both structured data from databases and a massive amount of unstructured data from multiple systems, silos , and external sources.

Thus, there are complexity and volume challenges to solve when it comes to empowering life sciences users to extract knowledge, gain insights, and make decisions. Data must be findable, easily accessible, and interoperable, regardless of its content, format, or source. So, just how are industry life sciences leaders powering ahead to master data?

Data is the new water

You can liken data to water. Your business needs it to survive in today’s competitive business landscape.

After all, once analyzed, search data can help businesses in numerous ways. In the banking industry, for instance, data analytics can pinpoint money laundering and other illegal activities. In healthcare, data can drive healthcare reductions while helping patients avoid preventable diseases.

When it comes to business analytics, according to a Qlik/IDC study, three out of four companies that invested in data management and analytics increased their revenue, operational efficiency, and profitability by over 15%. The bottom-line? Data helps you make better business decisions. To be more specific, it helps you learn more about your operations, customers, and general business processes.

All that said, what types of data are most common?

  • Structured data is well-organized data that can be found in spreadsheets, database tables, and CVS tables. However, it may surprise you to learn that it is not always easy to analyze and interpret this type of data
  • Semi-structured data is organized to a degree and can be found in Javascript files, HTML files, and TXT files. It is usually captured by a content management system (CMS).
  • Unstructured data is any information that doesn’t fit neatly into a database. It includes pictures, video, audio files, analog signals, and human speech. Not surprisingly, unstructured data can be difficult to interpret.

The Life Sciences Industry Is Turning to Semantic Analytics and Enterprise Search

Increasingly, these organizations are harnessing powerful search capabilities, that combine ontology-lead semantic analysis and AI-driven search. Put more simply, this means that life sciences companies are deploying powerful search capabilities designed by scientists and technologists—for scientists.

Sinequa and SciBite have combined forces to offer these essential capabilities. Scibite’s TERMite named entity recognition engine in unison with VOCabs, SciBite’s hand curated vocabularies scan scientific content, in both structured and unstructured formats, recognizing and extracting key scientific concepts. TERMite can process about a million words a second, enabling it to be used across enterprises for multiple use cases.

The integration of TERMite into Sinequa intelligent search platform via an application programming interface (API) improves the search experience, enhancing both recall and precision. Sinequa also applies natural language processing (NLP) to clean and enrich information.

Then, the business search engine uses machine learning (ML) algorithms and artificial intelligence (AI) to find similar content, classify it, and create search queries that train themselves based on usage. Over time, the results get better and better, just as conventional users have witnessed the improvements in search engines such as Google and Bing over time.

However, just like other industries, scientific terms are often ambiguous. Does GSK mean the firm GlaxoSmithKline or glycogen synthase kinase? Semantic analysis done by the SciBite platform uses curated vocabularies and rule-based automated curation to make sense of terminology, ensuring that the results delivered are accurate, relevant, and clear to users.

Improving the User Experience with Easy, Fast Scientific Entity and Relationship Searching

Leveraging SciBite and Sinequa capabilities to tag documents allows the user to ask a question or search for something in a simple way. For example, users could ask: “What are the genes being discussed in a specific research paper?” “What is the relation between a particular drug and adverse events?” And finally, “What research has our organization done on COVID-19 vaccines?”

For life sciences organizations, the ability to search easily and quickly is transformative. Researchers, who possess advanced degrees and specialized expertise, can spend their time on such important tasks as reviewing the right clinical research, speeding drug discovery work by building on past results, accessing the right experts, and executing clinical trials. They can use an intuitive interface to search for content using common scientific terms and synonyms and surface the results they need when they need them. That means less duplicative work, more time for research, and faster innovation.

Results are also personalized. A researcher working on clinical trials for a late-stage cancer drug will necessarily see different results than one doing research on how to vaccinate against the latest COVID-19 variants.

Improving Life Sciences Search with Vocabulary Editing to Create Custom Ontologies

But wait, it gets better! Science is changing all the time. Life sciences firms have proprietary processes, technologies and intellectual property that varies from their competitors. Thus, SciBite, through its’ ontology management tool CENtree, allows users to create and edit vocabularies to make them more relevant to their organization.

To make sure changes are both relevant and authorized, the platform employs a robust governance model. This enables the democratization of vocabulary development and maintenance, allowing contributions to be made from across the organization, while ensuring the appropriate safeguards are in place.

Your researchers can add new terms, add or remove synonyms, or even create an entirely new custom vocabulary for a specific purpose. The combined platform provides both public ontologies which can be modified for particular use cases, or capability to upload spreadsheets that have entities to be extracted. In addition, you can also create “one ontology to rule them all,” an ontology that connects all of your other ontologies.

As an example, in a recent webinar, we demonstrated how SciBite could be used to create a new ontology for COVID-19 vaccine research. Our presenters took the publicly available Coronavirus Infectious Disease Ontology that had 7,000 classes and multiple branches, quickly creating a new ontology in CENtree specific for COVID-19 vaccine terms.

In the webinar, we were also able to add relevant synonyms for these vaccines, improving the name density recognition and making the search tool much more powerful in terms of delivering the full range of relevant results. In addition, users can add common misspellings for terms, if they so desire.

Our presenters also showed how you could move and reclassify terms. For example, the Johnson & Johnson COVID-19 vaccine was not classified as an authorized COVID-19 vaccine in the original public ontology. So, we simply changed the superclass of the vaccine so that it appeared in the authorized COVID-19 vaccine branch of the ontology. With the ability to continuously evolve terminology and relationships, the ontologies become increasingly robust and useful over time.

Once the terms are recognized and classified, Sinequa gets to work. In addition to surfacing insights in a user-friendly interface, the platform enables analytics on content. Consider the example of a small molecule. Should it be researched further?

Sinequa can help you do not just a document search, but also use analytics to pick up early signals on the promising molecule. You can mine different signals from unstructured natural language content that demonstrate that the new molecule seems to be effective at treating various conditions. That early insight can help you allocate research resources and investment accordingly.

Data Value at Scale

So, how can you use these capabilities? You can empower your knowledge workers with intuitive search tools, create custom ontologies, develop new search use cases, and strip cost and waste out of processes. Since life sciences is an extremely competitive market, being faster to discover a new molecule’s capabilities, streamline clinical research and move to trials might be the difference between being first to market and reaping the victor’s spoils or being second, for a much lower revenue take.

Want to see how powerful enterprise search can be? Use the COVID-19 Intelligent Insight Portal, created by Sinequa, to research the latest scientific papers, clinical trial data, and analysis on COVID-19 and SARS-CoV-2. The portal provides access to 120,000 papers, with more added every day, aiding the international search to scale effective therapies, vaccinate the world’s people, and reduce the terrible impact of this disease.