Thursday, May 24, 2012

Current trends affecting predictive analytic

I was going through one article by Johan Blomme on predictive analytic and found really interesting. Here is a lil summary:

Traditionally, BI systems provided a retrospective view of the business by querying data warehouses containing historical data. Contrary to this, contemporary BI-systems analyze real-time event streams in memory. In today’s rapidly changing business environment, organizational agility not only depends on operational monitoring of how the business is performing but also on the prediction of future outcomes which is critical for a sustainable competitive position.
Predictive analytics leverages actionable intelligence that can be integrated in operational processes.

Current trends affecting predictive analytic:

·         Standards for Data mining and Model Deployment
·         Predictive Analytics in the Cloud
·         Structured and Un Structured Data types
·         Advance Database Technology (MPP, Column Based, In Memory..etc)

Standards for data mining and model deployment : CRISP-DM
o    A systematic approach to guide the data mining process has been developed by a consortium of vendor and users of data mining, known as Cross Industry Standard for Data Mining (CRISP-DM).
o    In the CRISP-DM model, data mining is described as an interactive process that is depicted in several phases (business and data understanding, data preparation, modeling, evaluation and deployment) and their respective tasks. Leading vendors of analytical software offer workbenches that make the CRISP-DM process explicit.

Standards for data mining and model deployment : PMML
o    To deliver a measurable ROI, predictive analytics requires a focus on decision optimization to achieve business objectives. A key element to make predictive analytics pervasive is the integration with commercial lines operations. Without disrupting these operations, business users should be able to take advantage of the guidance of predictive models.
o    For example, in operational environments with frequent customer interactions, high-speed scoring of real-time data is needed to refine recommendations in agent-customer interactions that address specific goals, e.g. improve retention offers. A model deployed for these goals acts as a decision engine by routing the results of predictive analytics to users in the form of recommendations or action messages.
o    A major development for the integration of predictive models in business applications is the PMML-standard (Predictive Model Markup Language) that separates the results of data mining from the tools that are used for knowledge discovery.

Structured and unstructured data types:
o    The field of advanced analytics is moving towards providing a number of solutions for the handling of big data. Characteristic for the new marketing data is its text-formatted content in unstructured data sources which covers « the consumer’s sphere of influence » : analytics must be able to capture and analyze consumer-initiated communication.
o    By analyzing growing streams of social media content and sifting through sentiment and behavioral data that emanates from online communities, it is possible to acquire powerful insights into consumer attitudes and behavior. Social media content gives an instant view of what is taking place in the ecosystem of the organization. Enterprises can leverage insights from social media content to adapt marketing, sales and product strategies in an agile way.
o    The convergence between social media feeds and analytics also goes beyond the aggregate level. Social network analytics enhance the value of predictive modeling tools and business processes will benefit from new inputs that are deployed. For example, the accuracy and effectiveness of predictive churn analytics can be increased by adding social network information that identifies influential users and the effects of their actions on other group members.

Advances in database technology : big data and predictive analytics
o    As companies gather larger volumes of data, the need for the execution of predictive models becomes more prevalent.
o    A known practice is to build and test predictive models in a development environment that consists of operational data and warehousing data. In many cases analysts work with a subset of data through sampling. Once developed, a model is copied to a runtime environment where it can be deployed with PMML. A user of an operational application can invoke a stored predictive model by including user defined functions in SQL-statements. This causes the RDBMS to mine the data iself without transferring the data into a separate file. The criteria expressed in a predictive model can be used to score, segment, rank or classify records.
o    An emerging practice to work with all data and directly deploy predictive models is in-database analytics. For example, Zementis (www.zementis.com) and Greenplum (www.greenplum.com) have joined forces to score huge amounts of data in-parallel. The Universal PMLL Plug-in developed by Zementis is an in-database scoring engine that fully supports the PMML-standard to execute predictive models from commerial and open source data mining tools within the database.

Predictive analytics in the cloud
o    While vendors implement predictive analytics capabilities into their databases, a similar development is taking place in the cloud. This has an impact on how the cloud can assist businesses to manage business processes more efficiently and effectively. Of particular importance is how cloud computing and SaaS provide an infrastructure for the rapid development of predictive models in combination with open standards. The PMML standard has yet received considerable adoption and combined with a service-oriented architecture for the design of loosely coupled systems, the cloud computing/SaaS model offers a cost-effective way to implement predictive models.
o    As an illustration of how predictive models can be hosted in the cloud, we refer to the ADAPA scoring engine (Adaptive Decision and Predictive Analytics, www.zementis.com). ADAPA is an on demand predictive analytics solution that combines open standards and deployment capabilities. The data infrastructure to launch ADAPA in the cloud is provided by Amazon Web Services (www.amazonwebservices.com). Models developed with PMML-compliant software tools (e.g. SAS, Knime, R, ..) can be easily uploaded in the ADAPA environment.
o    The on-demand paradigm allows businesses to use sophisticated software applications over the Internet, resulting in a faster time to production with a reduction of total cost of ownership.
o    Moving predictive analytics into the cloud also accelerates the trend towards self-service BI. The so-called democratization of data implies that data access and analytics should be available across the enterprise. The fact that data volumes are increasing as well as the need for insights from data, reinforce the trend for self-guided analysis. The focus on the latter also stems from the often long development backlogs that users experience in the enterprise context. Contrary to this, cloud computing and Saas enable organizations to make use of solutions that are tailored to specific business problems and complement existing systems.

Tuesday, May 22, 2012

Next Generation MDM (From TDWI)

What is MDM (Master Data Management)?
Master data management (MDM) is the practice of defining and maintaining consistent definitions of business entities (e.g., customer or product) and data about them across multiple IT systems and possibly beyond the enterprise to partnering businesses. MDM gets its name from the master and/or reference data through which consensus-driven entity definitions are usually expressed. An MDM solution provides shared and governed access to the uniquely identified entities of master data assets, so those enterprise assets can be applied broadly and consistently across an organization.

Top 10 Priorities for Next Generation MDM
1.     Multi-data-domain MDM Many organizations apply MDM to the customer data domain alone,and they need to move on to other domains, such as products, financials, and locations. Singledata-domain MDM is a barrier to correlating information across multiple domains.
2.      Multi-department, multi-application MDM MDM for a single application (such as ERP, CRM,or BI) is a safe and effective start. But the point of MDM is to share data across multiple,diverse applications and the departments that depend on them. It’s important to overcomeorganizational boundaries if MDM is to move from being a local fix to being an infrastructurefor sharing data as an enterprise asset.
3.      Bidirectional MDM “Roach motel” MDM is when you extract reference data and aggregate it in a master database from which it never emerges (as with many BI and CRM systems). Unidirectional MDM is fine for profiling reference data, but bidirectional MDM is required to improve or author reference data in a central place and then publish it out to various applications.
4.      Real-time MDM The strongest trend in data management today (and BI/DW, too) is toward realtime operation as a complement to batch. Real time is critical to verification, identity resolution, and the immediate distribution of new or updated reference data.
5.      Consolidating multiple MDM solutions How can you create a single view of the customer when you have multiple customer-domain MDM solutions? How can you correlate reference data across domains when the domains are treated in separate MDM solutions? For many organizations, next generation MDM begins with a consolidation of multiple, siloed MDM solutions.
6.      Coordination with other disciplines To achieve next generation goals, many organizations need to stop practicing MDM in a vacuum. Instead of MDM as merely a technical fix, it should also align with business goals for data. MDM should also be coordinated with related data management disciplines, especially DI and DQ. A program for data governance or stewardship can provide an effective collaborative process for such coordination.
7.      Richer modeling Reference data in the customer domain works fine with flat modeling, involving a simple (but very wide) record. However, other domains make little sense without a richer, hierarchical model, as with a chart of accounts in finance or a bill of materials in manufacturing. Metrics and key performance indicators—so common in BI, today—rarely have proper master data in multidimensional models.
8.     Beyond enterprise data Despite the obsession with customer data that most MDM solutions suffer, almost none of them today incorporate data about customers from Web sites or social media. If you’re truly serious about MDM as an enabler for CRM, next generation MDM (and CRM, too) must reach into every customer channel. In a related area, users need to start planning their strategy for MDM with big data and advanced analytics.
9.      workflow and process management Too often, development and collaborative efforts in MDM are mostly ad hoc actions with little or no process. For an MDM program to scale and grow, it needs workflow functionality that automates the proposal, review, and approval process for newly created or improved reference data. Vendor tools and dedicated applications for MDM now support workflows within the scope of their tools. For a broader scope, some users integrate MDM with BPM tools.
10.   MDM solutions built atop vendor tools and platforms Admittedly, many user organizations find that homegrown and hand-coded MDM solutions provide adequate business value and technical robustness. However, these solutions are usually in simple departmental silos. User organizations should look into vendor tools and platforms for MDM and other data management disciplines when they need broader data sharing and more advanced functionality, such as real-time operation, two-way synchronization, identity resolution, event processing, service orientation, and process workflows or other collaborative functions.