Understanding customer behavior plays a vital role in the modern digital marketing world to ensure success in any digital initiative. There is no other proven source or tool to understand customers than a Unified Customer View that provides a 360-degree view of the customer including demography, buying behaviour, activity history, product preferences, etc.
The problem many organizations face is having customer data in multiple systems and not being able to derive any meaningful insights from that valuable source of information. Mergers & Acquisitions, incremental development of the platforms, changes in the platform all contribute to increased fragmentation of data. Depending on the size of the organization many different departments might be having their own tools and platforms which further contributes to scattered data.
Recently ICF NEXT implemented the unified customer view data lake solution with Adobe AEM as the experience platform for one of the leading Health Insurance companies in India. Some of the key data sources client has are listed below:
CRM & Offline data sources: Agent, Member, Proposer and policy data. Demographic and health related data.
Transactional Data: Policy purchase, change in policies, addition of members, agents etc.
Meta data: Product related data, branch related data.
User behavior Data: Derived from Adobe Analytics that is implemented on all their digital platforms.
Chat bot: User profile and chat related data gathered by chat bot.
By merging gathered information about prospects, customers and behavioral data into a single record, digital marketers can make informed decisions on the most appropriate marketing strategies such as:
Personalization: Having powerful experience platforms like Adobe Experience Manager, combining behavioral data from Adobe Analytics and offline data user specific personalization journey can be achieved. Classifying users based on the life-cycle stages in the digital properties, Policies, Products and offers that are relevant to the category are displayed.
Campaign Strategy: By unifying offline data and online data gathered by Adobe Analytics where various success metrics are tracked at visit level and individual user level, we can reach the right individual to deliver the personalized campaigns via Adobe Campaign. This greatly increases the marketing campaign effectiveness since there is no more hit-and-miss approach.
Analytics Strategy: Creating the dataset for Machine Learning models is will be less complex since it is easy to extract different attributes of the customers/users that serves as the feature set for the ML model.
Data Enrichment: Additional attributes for users can be added very easily from the data gathered by Adobe Analytics, CRM system, business analysis tool and affiliate channel that will help in launching new campaigns.
We designed and built the solution right from ground zero that involved,
- Identifying the candidate data platforms
- ETL tool and date ingress strategy
- Data democratization strategy
- Handling delta data
Below are the candidate data platforms that were identified after an extensive evaluation exercise including multiple POCs to demonstrate the benefits offered by the various platforms.
SQL Server or Oracle relational database
Client already had SQL server and Oracle databases which we could leverage as a data platform. This approach had the below advantages:
- Familiar, tried and tested platform
- License cost savings
- Ease of data movement
Over time, as the quantum of data grows exponentially (including the additional data source of website and app – in the form of transactions), it becomes important to have a solution that can scale horizontally. Relational databases can only grow vertically and maintaining the growing data will become an extremely cumbersome task at some point in the future.
Apache Cassandra
Apache Cassandra is a distributed, masterless peer to peer database system which provides linear scaling capability of exponentially growing datasets, similar to the one being experienced in this scenario. Data is distributed across individual Cassandra instances called nodes.
Data modeling plays a very key role in Cassandra. Since data is distributed, query patterns have to be identified prior to data modeling, as any new query will result in creation of additional table with redundant data.
From the requirement perspective, need of the hour was to have a data platform that supports flexible schema.
MongoDB
MongoDB is a document-based database that provides high performance along with high availability and automatic scaling. The data structure is primarily composed of key and value pairs. The value may include other documents, arrays, and arrays of documents.
MongoDB’s support of embedded-document data model reduces I/O activity on database systems. Indexes support faster queries and can include keys from embedded documents and arrays. Indexes reside in memory for faster querying of data resulting in improved system/application response.
MongoDB leverages sharding mechanism to provide horizontal scalability. Replica sets can be setup to provide high availability and JSON like data model addresses the requirement of flexible schema. Indexes can be created to improve query performance based on the query need.
MongoDB has multiple moving parts and managing the same can become a cumbersome activity on production load.
MongoDB Atlas is a fully supported cloud-based platform which takes care of infrastructure maintenance, scaling and monitoring. This helped us to concentrate more on implementation details and development.
MongoDB Atlas also features Trigger powered by Stitch framework which can act as a powerful ELT(Extract Load and Transform) tool using which we can dynamically transform the data that is getting ingested into MongoDB and stream it into other MongoDB collections and other systems.
ETL Tool
Pentaho DI Community edition is used as the ETL layer to extract the data from source databases and load it into MongoDB. It provides GUI based interface to develop the ETL processes which aids rapid development and execution.
Putting it all together
Data Ingress Strategy
A bulk upload ETL is created to pull the data from the source SQL server tables based on a specific period of created date-time of the records. The ETL automatically goes backward in time, based on created date-time and pulls the entire record-sets. Records belonging to number of days is configurable to make sure the load on source database is optimized and data ingress is done in a controlled manner.
A Delta ETL job runs for every 20 mins (Configurable) to pull records from source tables based on modified date-time stamp. Number of records per query is made configurable to ensure load on the ETL server and the source database system is controlled/optimized.
A Delta Error handler ETL runs every night to upsert last two days (configurable) records for each source table. This will ensure any missing records are updated.
Data Democratization Strategy
Multiple REST APIs are created to fetch the data for various channels like Mobile Apps, Campaign, Chatbot and portals.
Using MongoDB Stitch Trigger, the ELT (Extract Load and Transform) process is created to transform data to multiple structured collections. This will ensure that the request response time for the queries are highly optimized for better application performance and overall user experience.
Using MongoDB charts, graphical reports can be generated via aggregation framework to visually represent the data.
Security
The nature of the client’s business involves a significant amount of sensitive personal information of thousands of users. Hence ensuring data security becomes extremely important. Data flow from source system to MongoDB is through VPC tunnels. This ensures data security through secured pipelines.
OAuth with JWE is implemented in custom spring boot authorization servers. RSA encryption based on JKS keys is used to generate authorization tokens to ensure secured API access.
Additional security to encrypt user name and password, auto lock of the account based on failed login attempt has also been implemented to prevent MITM attacks.
The Omni Channel Experience
Since anonymous user data, web and mobile behavior data, profile data and data related to customer journey is coming from the same data source, we are able to achieve individual user level personalization by integrating Adobe Experience Manager with the REST based API layer. OSGI services are developed to consume the data through APIs to achieve personalization.
Not only the website, but also Customer facing and Agent facing mobile apps are also consuming and enriching the data from the same data source.
Through the cross-channel personalization engine, we are able to show the personalized product or policy on the website to the user who has shown interest and generated the quote for a particular product on the mobile app or chatbot.
Using the seamless integration with Adobe Analytics and Adobe Campaign through Adobe I/O dynamic trigger points like drop offs from forms and payment journey are created to send the near real time campaign.
Benefits realized so far
Within a month of implementation, unified customer view has already started to show the positive trends with respect to customer / user digital experience. With the help of Adobe Experience Manager powered experience layer and data driven personalization, the user engagement in the website improved significantly with 20% decrease in the bounce rate.
Insights and availability of the unified data has opened opportunities to run data driven personalized campaigns.
Very informative and brilliant write up, Bharath!!
Very informative. Thanks!
Quite informative. Well explained
Knowledge in abundance. Very cool stuff🙂
yes you are right.. Ksolves has been simplifying world-leading NoSQL databases for its customers since its inception. Over 30 TB data management efficiency, high throughput and low latency are something that can be easily achieved by the experts at Ksolves. Furthermore, the team of Apache Cassandra Developers is best at leveraging the power of Apache Cassandra with Spark parallelism to process your big data in a more efficient manner.