Data Quality - Standard, clean data is the foundation

Clean and reliable data is the essential foundation of any data applications. Data quality issues could cause major issues in a company. Invalid execute dashboard can lead to wrong business decisions, data issues will cause wrong business rules results, invalid output from data models etc. Clean data will increase trust, productivity and reduce costs.

Quality Center Solution

InsightLake Quality Center Big Data based solution enables companies to perform following operations to create reliable data in both real time & batch pipelines.

  • Profile data - technical & semantic
  • Validate data
  • Monitor data quality & perform quality analytics
  • Enhance or enrich data
  • Standardize and clean data
  • Perform de-duplication
  • Clean data elements like address, phone, email etc from external sources/services.
  • Create single customer view
  • Integration with metadata, lineage and data governance

Using integrated governance controls Quality center allows creation of trusted, validated and governed data sets for organization.

Data profiling

Data discovery and profiling is the key feature to understand the data and its elements. Profiling exposes data quality issues. Data profiling on sample or complete data will allow data administrators to see following details:

  • Technical data types
  • Null or unsupported values
  • Mean, average, min, max
  • Sample values
  • Logical context types like currency, OS, geo information etc.
  • Sensitive data discovery like SSN, Credit card, phone number, email etc.
  • Results from profiling rules
  • Relationships
  • Dashboard showing data element distributions, trends in automated profiling etc.

Data standardization

Organizations build clean data sets, which they call data marts, subject areas etc. These data sets get data from various data sources. Data from different sources can come in different formats and with quality issues. These in-consistent data elements can be standardized using quality rules in quality center easily with rich function library.

  • Dates - convert to standard date format
  • Geo Locations - convert to ISO codes
  • Currency conversion
  • Brand name standardization
  • Default or null value standardization

Data cleansing

Like data standardization, data cleansing cleans incorrect data. For example if a customer has wrong city in the address and if its used for sending marketing information then it will not reach customer and cause lost opportunity. Cleaning data is essential in making business operational processes work effectively, delivery accurate insights etc.

InsightLake Quality Center provides following features for data cleansing using rich function library.

  • Address cleansing
  • Currency conversion
  • Reference lookup
  • De-duplication
  • Email, phone cleansing

Data quality monitoring

Quality center enables quality monitoring using automated workflows. These workflows profile data periodically and run quality business rules and produce alerts, dashboards and summarized reports.

Data enrichment

Quality center allows data to be enriched in real or batch process to increase its value. For example IP address could be enriched with integrated Geo library to enhance the data with country, city, state, region, zip code. Geo enriched data then could be used for better location based analytics.

Address Cleansing Service

Maintaining clean, valid and deliverable customer or vendor contact information reduces cost and improves productivity. Quality center address verification service allows address cleansing using different pluggable country providers like in USA USPS. By default Google geo service is used. Addresses can be cleaned in real time feeds or in batch operation mode.

Data De-Duplication With Fuzzy Matching

Identify duplicate data using de-dup feature of Quality center. Plain matching or fuzzy matching could be used to identify matching elements with thresholds.

Email, Phone Cleansing Service

Quality center provides email and phone validation service, which could clean format errors, standardize phone numbers and verify emails and phone numbers.

Single Clean & Validated Customer View

Most of the companies have customer data spread across various systems. Its necessary to keep customer contact/master data in clean consistent manner and organization can get a clean single validated customer view.

Fuzzy matching feature allows finding duplicate customers or customers who create fake/duplicate identities.

Quality center enables organizations to maintain clean customer data.

InsightLake's Customer 360 Solution further enables companies to expand single customer view to see all customer's interactions, attributes, trends and customer journey.