Wednesday, 20 January 2016

A Statistician's View on Big Data and Data Science

Abstract There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician’s view on these terms and illustrates the connection between data science and statistics

Data Science can be applied to a large set of application areas including predictive health analytics, business analytics, social science research, data-based journalism, advertising and marketing, intelligent transport, smart cities, education, retail and creative industries, public policies, etc.



Data Science is the art of processing and extracting richer insights from large volume of structured and unstructured data for operationalization to draw commercial and social value.

Using voluminous data to solve the real-world problems requires expertise in statistical analysis, business domain, IT and programming. Data scientists in an organization are tasked to get the real value from the raw data available to them. They manipulate large data sets, learn and adept from the models and implement machine learning algorithms. All the more, it is important to shape a business question into guided queries that could be answered using the available database.

By analyzing and identifying patterns, relationships and regularities in data, data scientists provide insights about the dynamics of data that could be used to leverage business intelligence. The way companies have to deal with tons of data load each day, data science is the way out. Specialized tools are needed to get this all done.

How Data Science Works?

Data Science is an iterative macro process that essentially involves data acquisition, data cleansing and data analysis. After multiple iterations, a data scientist can embed the model into a dashboard that the organization can use to make strategies.

Traditional data sampling methods are likely to filter out the rare events. The use of significantly large data sets enhances the probability of capturing rare events which has a direct bearing on the predictive power of the model. Data science is key to get intelligence from the data available in various forms like blogs, comments, emails, queries, charts, maps, podcasts, statistics, graphs and audio-video clips.

A good data science solution is scalable and could be integrated with any model that the company may be using currently.

Data Science Features

  • Customer Prediction Get to know when a customer is likely to abandon a sale, leave or stay.
  • Information security Identify potential hackers and detect anomalies in the network information flow architecture.
  • Customer categorization Categorize customers into groups having similar behavior and interests.
  • Fraud prevention Identify fraudulent credit card transactions, insurance claims etc. in good time.
  • Flexible access Get access to shared infrastructure, data patterns and iterative searches.
  • Attributes Use attributes at fine granular level and spread them over longer durations.

0 comments:

Post a Comment