Data Mining Algorithms

Don't use data mining as a black box. Get a deep understanding of how the data mining algorithms work. This knowledge is not only theoretical; it helps you developing better models in production.

Data mining is gaining popularity as the most advanced data analysis technique. With modern data mining engines, products, and packages, like SQL Server Analysis Services (SSAS), Excel, and R, data mining has become a black box. It is possible to use data mining without knowing how it works. However, not knowing how the algorithms work might lead to many problems, including using the wrong algorithm for a task, misinterpretation of the results, and more.

Big Data is creating significant new opportunities for organizations to derive new value and create competitive advantage from their most valuable asset: information. For businesses, Big Data helps drive efficiency, quality, and personalized products and services, producing improved levels of customer satisfaction and profit. For scientific efforts, Big Data analytics enable new avenues of investigation with potentially richer results and deeper insights than previously available. In many cases, Big Data analytics integrate structured and unstructured data with real-time feeds and queries, opening new paths to innovation and insight.

Normally in data mining a mathematical model is constructed for the purpose of prediction or description. A model can be thought of as a virtual box that accepts a set of inputs, then uses that input to generate output.

Prediction modeling algorithms use selected input attributes and a single selected output attribute from your dataset to build a model. The model, once built, is used to predict an output value based on input attribute values. The dataset used to build the model is assumed to contain historical data from past events in which the values of both the input and output attributes are known. The data mining methodology uses those values to construct a model that best fits the data.