Monday, 30 April, 2018

What is Data Mining?

Lambros Photios


Data mining is a process involving the discovery of valuable information within large databases. Data mining as a term is commercially primitive due to the large sizes of modern databases, as well as the granularity of data stored. However, the concept isn’t new, with data science methodologies identified in the 1700s, and the commercialisation of this field emerging in the early 2000s. Coupled with data science is the need to store high volumes of data from which conclusive outcomes can be deduced, also referred to as data mining. It’s with this movement that organisations initiated systems to store more information, particularly across their digital engagement (but not exclusively digital), with the expectation that this would be eventually utilised. An article published by The Economist in May 2017 identified that the most valuable resource in the world is no longer oil, but data.

This article is an introduction to data mining, in particular the identification of what defines data mining. However, this is ultimately part of a larger series intended to cover:

  • The Data Mining Process
  • Software for Data Mining
  • Data Storage (Data Lakes, Data Warehouses)
  • Data Ethics and Privacy
  • Uses of Data Science (Machine Learning, Statistics, and Analysis)
  • Business Use Cases

Business outcomes

Data mining should be leveraged primarily to enable business outcomes. As with all digital solutions, the outcomes should be mapped first, with the digital capabilities to produce this solution. This is juxtaposed by the false (but very frequently adopted) approach of determining the capabilities of technology, and using this to prepare an outcome. It’s important to not be at the mercy of technology, but to utilise technology (of which there are a plethora of solutions) to enable business outcomes.

Available data

Once the key outcomes of the business have been deduced, the next step is to identify the data available to the business. If we look at WalMart, a supercenter for consumer shopping in the US, we are presented with an understanding of the volume of data available to companies. With over US $35 million in sales processed per hour, data from sales alone includes product purchases, order total, time of payment, and the store visited. This is just a small snapshot of data available to a company such as WalMart on a grand scale, which is utilised by drive business decisions. For most companies, data is either not captured, or underutilised (fortunately, it’s often the latter). Data is generally a huge enabler across marketing and sales, so we will use marketing as the department for the purpose of an example. Within marketing divisions, companies are now storing data such as website page views, visit duration, and other interactions that previously weren’t tracked by tools such as Google Analytics. By harnessing this information, businesses are able to determine how users interact with a website, where they generally leave a company’s website, and what the cause. This is all highly statistical, and means outcomes no longer need to be derived from assumptions. When this approach is harnessed within sales, marketing, product, delivery, customer service, and management, the outcomes can be instrumental to more effectively achieving the outcomes of the business.

Benefits realised by companies

Companies have already realised huge benefits by adopting a data mining mindset, combined with data science methodologies. TAL Life Limited is a prime example of this, with an advanced data based classification engine providing a more competitive underwriting experience to clients. This engine adapts to market conditions, enabling an extremely intelligent response to trends. Execution of this strategy resulted in a 30 percent increase in new business within the first year, and a 40 percent reduction in per unit cost of their business operations, resourcing, and underwriting segments. This has been instrumental in producing a more desirable bottom line to TAL, which has been orchestrated by a changed mindset towards their information. Similarly, Farmers Edge Laboratories, a Canadian agriculture company is using satellite, weather, and agricultural data to provide crop yield analysis, irrigation optimisation and diagnostics for agricultural equipment. This enables more streamlined outcomes based on data, as opposed to trial and error.

Enhancing outcomes with data science

Data science enables large databases (or datasets) acquired through data mining to be leveraged. It is noteworthy to point out:

  1. Strategies in businesses are largely based on assumptions without data; and,
  2. Data mining only involves capturing information, but not utilisation of this information.


Data is being used for a variety of emerging digital fields such as statistical analysis, exploratory data analysis, machine learning (artificial intelligence), and pattern recognition. Despite these not being revolutionary, the emergence of these fields as their own industry is becoming increasingly prominent. For example, Cambridge Analytica (a company known publicly due to their relationship with the recent Facebook scandal) worked with a leading womenswear brand using data based strategies to produce an overall Return On Advertising Spend (ROAS) of 2.29.

Within this article, we’ve outlined some tangible benefits of data mining, and the consequent analysis that can be performed. This series will focus not just on the strategies for mining this information, but also those required to create high performance outcomes for businesses. We will also work through prevalent cultural issues such as privacy, ethics, and the recent Facebook scandal.

If you have any questions, please don’t hesitate to get in contact with us here.

About the Author.

Lambros Photios

View on LinkedIn

I embarked on my entrepreneurial journey six years ago with one goal: To build a culture and technology focused company. Working with industry leaders, I’ve had the honour of delivering challenging projects with intricate specifications, and within tight deadlines.