For most organizations, collecting sensitive data such as payment information or healthcare history is necessary for doing business. But in recent years, the rise of remote work and cloud computing have created seemingly impossible challenges for data security and compliance readiness. Sensitive data discovery tells organizations exactly where their sensitive or regulatory data resides at all times. It is an essential component of data protection.
Sensitive data discovery is finding and identifying all confidential and regulated information kept by an organization. The process focuses on anything related to individuals that is regulated by law. It also involves finding proprietary information and other confidential enterprise items.
Sensitive data discovery gives security teams the ability to protect this information. They can consistently make it available when needed and securely remove it when necessary.
It helps ensure that data security best practices are being followed. It is also crucial to ensure that the necessary controls are in place for regulatory compliance.
Ultimately, sensitive data discovery aims to streamline identifying and classifying protected information. This makes it possible to determine which data sources require protection. To prevent data loss, recognize potential threats, and manage the fallout from any leaks.
Organizations should protect all sensitive information . But the amount of data protection needed depends on its sensitivity. The most rigorous levels of security, which consume the most resources, should be reserved for the most sensitive levels. Less-restricted levels may not require the same security measures.
There is no universal standard for data classification, and your organization may use its system. But in general, there are four primary levels of sensitivity that are not exclusive from each other.
Public information is what can be found through publicly available media:
- Corporate LinkedIn account
- Twitter account
- Public Github, Youtube, Vimeo repositories and their associated search
- and, of course, by extension, everything that is made searchable through a web search engine like Google.
This definition also applies to information about a company, a particular individual, or any other topic (such as an analyst report, market analysis, financial information, etc.). Suppose you can find it without using any repository with restricted access, professional account, or any facility provided by your employer. In that case, you can consider it public information.
If classifying certain information as public is easy to manage using the previous definition and recommendations, let’s now consider how to classify non-public information available to you in your professional environment as either Restricted information or Internal information. Here, a precise definition is much more challenging to provide since every piece of information has been created to be shared with a specific audience in a specific professional situation. Let’s consider the following definitions of Internal Data and Confidential Data which are not exclusive.
Internal information is a piece of data, document, or knowledge created within a professional environment that is not explicitly meant to be disclosed to anybody outside your company. This simple definition may help in many circumstances. Of course, you will not always be able to determine whether a piece of information is intended for internal purposes only by looking at it.
Fortunately, the information is typically contextualized based on its physical location inside the company IT environment and its workflow provided by a content management system, like Salesforce, Confluence, Jira, Impact, Github, etc.
In such a case, the purpose of the corresponding application and the technical user permissions that restrict access may provide enough information to answer this simple question: Is the document internal only, or can it be shared with someone outside the company? Of course, internal information doesn’t mean it is shareable with any employee. Before sharing a document or communicating information with another employee, you should always check to see if this employee has permission to find the same information by accessing the corresponding repository without your help.
Let’s now consider a particular kind of information, which is, by exclusion, neither purely internal nor fully public, based on the previous definitions. An external third party has shared such information with you, like a prospect, customer, analyst, or partner. Or an employee or employee group created it to be shared with this third party.
In both cases, the information is restricted to a specific audience, and anybody having access to this information should be able to determine this. In such a case, and before sharing such information, the parties will generally expect to protect their assets against any unwanted disclosure using what is commonly called a Non-Disclosure Agreement or NDA.
Types of Sensitive Data
What is considered sensitive data? Cybercriminals frequently target specific types of information. Because of this, they have specific legal protections. Here are the most common types your organization may handle and store.
1. Personally Identifiable Information (PII)
Define as information that can identify a person; this may include Social Security numbers, email addresses, phone numbers, or account numbers. PII is not exclusive to customers . Your organization may collect PII from anyone it interacts with, such as employees or contractors.
2. Protected Health Information (PHI)
Information that can be used to associate a person’s identity with their healthcare history. There are 18 specific identifiers of PHI as defined by the Health Insurance Portability and Accountability Act (HIPAA).
3. Payment Card Industry Data Security Standard (PCI-DSS)
Information about individual cardholders. This includes names, account numbers, expiration dates, verification codes, PINs, etc. Any organization that handles electronic payments must conform to PCI-DSS measures to protect that information whenever accepted, transferred, stored, and processed.
4. Consumer Behavior Data
Information that could be used to identify a person or their household through their interaction with a website, application, or advertisement. This could include search or browsing history, products purchased, or geolocation. This category is specifically protected by the California Consumer Privacy Act (CCPA).
5. Nonpublic Enterprise Data
While the other types of information on this list have defined legal protections, this category is not governed by law. Instead, it refers to restricted information specific to your business. It could include trade secrets, confidential memos, industrial designs, or anything else that could harm your organization if leaked.
Industries with the Greatest Need for Sensitive Data Discovery
Every organization has data protection needs . But certain industries must also adhere to specific rules and regulations. These industries are subject to higher security standards. They are also more frequently audited. This makes it essential for organizations in these industries to maintain the highest standards of data security.
By its nature, the ecommerce industry carries a significant burden to protect the sensitive data of online consumers. This almost always includes payment card and consumer behavior information.
Commercial banks, credit unions, insurance companies, brokerage firms, accountants, and other financial services institutions deal with heavily regulated information.
Local, state, and federal government organizations collect and store volumes of information about individuals. Some of this information is a matter of public record, but sensitive data requires specific protection by law.
To learn more about how Sinequa provides data protection in the Government & Defense agencies, Click Here
Healthcare organizations are responsible for ensuring the privacy and security of patient information. Electronic health information is rigorously protected by HIPAA, HITECH and the Omnibus Rule.
Discover how Sinequa's customers use its platform to respect information governance and security Here
5. Higher Education
Universities, colleges, vocational schools and other higher education institutions collect a wide variety of sensitive data. This could include anything from student PII collected at enrollment to PHI collected at campus health clinics.
With the advent of the Internet of Things, devices are collecting greater volumes of data at an unprecedented rate. Manufacturers must ensure that any sensitive data they collect is protected per the law.
Find out more about personal information in manufacturing
Companies that collect location information, payment data, voice recordings, and other sensitive data must take the necessary steps to comply with all rules and regulations governing that information.
The Challenges of Sensitive Data Discovery
Three key factors present enormous challenges to finding, authenticating, and protecting the confidentiality of sensitive data.
1. Data Volume
Today, organizations collect, store, and move more significant amounts of data . The sheer volume of information flowing through an organization makes it a tall order to track, secure, and purge information as required.
2. Scattered Data
In recent years, the adoption of cloud computing and remote work has become ubiquitous. Business processes are becoming increasingly interconnected. Content is often spread across different databases, applications, shared files, and other data sources. There are an ever-increasing number of paths that data can travel through and locations where it can end up.
3. Unstructured Data
As much as 80% of all data is unstructured , it can't be easily searched, analyzed or interpreted. This includes anything that doesn't fit neatly into a database, such as web pages, social media posts, images, audio, video content, etc. Mining these various formats to identify sensitive data embedded within requires advanced analysis.
How to Find and Identify Sensitive Data
Sensitive data discovery should follow five main steps.
All data, both sensitive and non-sensitive, needs to be found. This is true regardless of storage location or format. The location of all information, both cloud, and on-premise systems should be documented to ensure compliance with regulations.
The next step is to analyze every bit of collected data to determine its sensitivity. This requires data discovery tools capable of mining unstructured data and analyzing textual content through statistical and linguistic techniques that can handle the ambiguities of natural languages, such as PIIs hidden inside long sentences and other challenges.
Any data that has been identified as being unnecessary should be discarded. Policies should be set into place to continuously purge information when it is found to be no longer necessary.
Security measures must be put in place to build an effective information protection strategy. Security should be a combination of physical measures (such as locked rooms with controlled entry) and digital measures (such as dynamic encryption and automatic confidentiality assessments).
Once your data has been collected, analyzed, and secured, some of it may be used under privacy laws to gain a competitive advantage. This knowledge can provide valuable insights across your organization. It enables you to serve your customers better, make smarter business decisions, and become more agile in an evolving marketplace.
Sensitive Data Discovery Reduces Risk and Helps Protect Data
When a breach exposes sensitive data, the damage can be considerable and long-lasting. Not just in terms of regulatory fines and lawsuits but also in terms of competitive advantage and revenue loss. The harm to your organization's reputation can last for years.
As your organization collects information at an unprecedented rate, it's more crucial to use data discovery tools to assess and manage your entire information protection landscape. Sensitive data discovery is essential to preventing data loss and staying in compliance. With it, you can develop and implement the necessary security measures to prevent the compromise of sensitive information and safeguard your organization against loss.
Related blog posts.
Cognitive search and analytics technologies are all about accessing the right information at the right time.The increased severity of domestic security breaches due to terrorist threats and cyber crime poses a ...