Models on Big Data

Companies are storing large amount of data in Hadoop. Data gets processed both in real time and bach modes. To gain insights, create scores, predict values, pattern detection and perform scenario analysis data scientists build complex models and utilize machine learning. Insight Lake Model solution provides features to create and manage models. Features include:

Model solution

Insight Lake Model solution provides features to create and manage models. Features include:

  • Support for many ML algorithms like SVD, Bayes, Random Forest, Kmeans etc.
  • Machine learning model creation using drag and drop UI
  • Real time anomaly detection models
  • Monte Carlo simulation for scenario analysis
  • Import and export of machine learning models using PMML
  • Integration of Spark MLib, Python and R based models
  • Model management and monitoring
  • Event pattern creation and detection
  • Scoring models
  • Forecasting models
  • Deep learning models & integration with third party tools

ML Models

Model solution provides different clustering and classification models. Feature engineering can be done using an intuitive UI, where data could be easily ingested, cleaned and features could be selected easily.

Common properties on models can be set like test population size, saving trained model etc.

Model Lifecycle

Model solution provides interactive UI to build models, version them, when data scientists are ready with the models they can promote them to upper environments and include them in automated pipelines.

Models are stored as template configurations and could be also exported as PMML.

Models can also be imported in PMML format.


Scoring model allows complex formulas with configurable weighted parameters. Scenario analysis enables weights to be modeled appropriately.

Anomaly Detection

Anomalies or outliers are the most important indicators of big change of pattern

Model solution allows anomaly detection with seasonality to allow real outlier detection.

Monte Carlo Simulation

Model solution allows scenario testing using Monte Carlo simulation. Data scientists can build scenario profiles, which includes parameter distribution patterns and perform variability testing.

Scenario analysis provides distribution of target output parameters with given input variables with their distribution patterns.

Pattern Matching

Model solution allows complex pattern matching and alerting in real time.

Operations team can build pattern models and apply them in real time or explore those on historical data.


Model solution allows creation of forecasting models using machine learning or forecasting algorithms. These algorithms can use seasonality adjustments with smoothing. Other than classification ML algorithm other algorithms like Holt-Winters, auto regression on time series are used.

Deep Learning

Model solution has implemented deep learning models mostly for classification and future road map is to integrate with third party solutions like Tensor Flow.