Sinequa Augments Companies with Release of New Generative AI Assistants. Learn more

Chat with Sinequa Assistant
Sinequa GenAI AssistantSinequa GenAI Assistant

Sensitive Data: How Enterprise PII Discovery Enables GDPR Compliance

Posted by Charlotte Foglia

Sensitive Data: How Enterprise PII Discovery Enables GDPR Compliance

Personally identifiable information, also known as “PII data,” presents a problem for the enterprise. The more PII you collect about your customers, the better you can serve them. You’ll know their names when they call. You can run rich analytics on your customer database.

However, all these business advantages come with risks.Various laws impose strict rules on how PII can be stored and threaten fines if PII is revealed in a data breach.

To be compliant, and avoid embarrassment to the brand, companies need to implement tight controls over PII. However, they must first find it to do that, and PII can be hiding in unexpected places in an enterprise. Data discovery tools help businesses figure out where they are storing PII to manage and protect it according to the law and related policies.

Understanding PII and why it is important (and risky) for businesses

What is Personal Identifiable Information? PII is any piece of data that can be used to identify a person, either on its own or in combination with another piece of data. Examples of PII include a person’s name and address, credit card information, driver’s license or passport number, phone number and date of birth. Even a license plate number, bank account number or email address can be considered PII.

Businesses generally want to get as much PII from their customers as possible. PII has business value because the more PII a company has, the richer the customer data profile.

Email addresses and phone numbers make it possible to market to customers. Home addresses are good for direct mail marketing. Birth dates enable “happy birthday” greetings. Identifiers like license plate numbers can be useful for correlating a customer with other data sources, such as vehicle registrations, which can be used to market car loans, insurance and so forth.

PII creates a security risk, however. In fact, PII presents an unusual security paradox: It helps security while making an organization potentially more vulnerable at the same time. PII can be used to authenticate customers and system users. If a customer can present his driver’s license number as proof of identity, that will help prevent a malicious actor from gaining unauthorized access to accounts and applications.

At the same time, if PII can be breached, it is easier to impersonate users. A PII data breach is also a financially costly security incident to remediate. It can lead to severe reputational damage. Companies, therefore, need to control data risk resulting from storing PII.

In terms of compliance, PII is subject to many different laws, most of which deal with consumers’ rights to privacy. Because consumers also tend to be voters, the government takes consumer privacy quite seriously. In the EU, the General Data Privacy Regulation (GDPR), mandates strict security protections for PII.

Companies can face significant fines under GDPR if they fail to keep PII private and secure. With GDPR, citizens are also able to request that businesses storing the PII delete it. This is known as the “right to be forgotten.” The California Consumer Privacy Act (CCPA) has comparable terms.

Other laws that protect PII by restricting sharing and requiring breach notifications include:

  • the Health Insurance Portability and Accountability Act (HIPAA),
  • the Gramm-Leach-Bliley Act for financial information,
  • the Fair Credit Reporting Act (FCRA), which covers consumer credit information, and
  • the Family Educational Rights and Privacy Act (FERPA), which covers students.

Companies regularly undergo audits to track whether they comply with these laws.

What is PII discovery?

All of these regulations have something in common. Companies that store PII need to know what they have and where it is in their systems.

There are several important reasons for this. For one thing, they need to understand where PII is stored to protect it against a breach. They also need to be able to find PII so they can delete it upon consumer request.

In reality, however, PII can be all over the place. It can be embedded in PDF documents, such as legal agreements or delivery receipts. It can be in backup volumes. It can be translated into foreign languages or stored in binary formats.

A PII discovery tool is software that’s designed to find PII wherever it may be hiding in an enterprise. It’s an essential part of a compliance program for GDPR, CCPA and the like. PII discovery must also be automated or capable of automated processes. There is too much PII to locate using manual methods.

How to become GDPR compliant with PII discovery

Becoming GDPR compliant can be challenging because PII might be contained in unstructured forms of data. For example, while isolating a phone number in a log file is a matter of simple pattern matching, it’s an entirely different level of effort to find people’s names, social security numbers or postal addresses inside a 200-page long PDF document, written in Japanese and located inside a content management system (CMS).

The PDF might be one of a million such documents, e.g., a warranty contract. Only a complete scan with a powerful PII discovery tool will be effective for GDPR compliance by identifying all the PII contained in the CMS and systems like it.

Intelligent search platform to find sensitive data

An intelligent search platform is well-suited to the PII data discovery workload. It is designed to find data in whatever form it takes. The Sinequa solution, for example, employs Artificial Intelligence (AI), Natural Language Processing (NLP) and text mining to identify any PII, whatever its type, format, spelling and language.

An enterprise-grade intelligent search platform also offers connectivity to virtually any data source and document format. This can be critical for effective PII discovery, especially if an enterprise has PII in old formats, zip files, multimedia documents and so forth. An intelligent search platform for PII discovery needs to be scalable, too. It should handle millions of documents in a limited time, with the ability to follow changes that occur in the data set.


GDPR compliance requires fast, accurate PII discovery. The security and compliance risks are significant if an enterprise cannot master this task.

An intelligent enterprise search platform offers a compelling solution. It can identify and unify monitoring and management for all the PII spread across a company’s entire IT landscape. The platform thus enables ongoing, continuous compliance with GDPR and similar regulations.

Register to the newsletter
Sign up to our newsletter