Data trends in big data are changing the whole tech industry. The growth in big data means that not only do processors have to advance to be capable of processing all of the data that is being generated, but there also need to be improvements in storage systems, networks, and approaches to data security and governance. Moore’s law states that the number of transistors used in a single chip will double every two years and, for several decades, we have seen the number of transistors increase from 68,000 in 1980 to more than 23 billion in 2018.
Listed below are the leading data trends in big data, as identified by GlobalData.
Many big data vendors have had to contend with a growing market perception that data governance, security, and management have taken a back seat to accessibility and speed. In response, most companies are now accepting the challenge and openly prioritising data governance. This is expected to result in multiple disparate solutions being replaced by single data management platforms, leading to efficient scalability, collection, and distribution of data.
The transformative value of data-driven business insights has led to market demands that data be made available to the widest applicable base of users, enabling them to draw insights through self-service analytics model. This driver began with an emphasis on data consumers and has now expanded to target producers with new tools supporting data analysis and the creation of visualisations. This trend has already begun to transform the publishing industry, with data journalism going mainstream.
Owing predominantly to market demands for data democratisation, enterprise buyers are now in need of data integration and preparation tools capable of retaining access to disparate data sources without sacrificing data quality and security. Machine learning and AI-enabled smart data integration tools such as SnapLogic’s Intelligent Integration platform can replace extract, transform, load (ETL) processes, and recommend the best solutions to help data scientists in organisations.
Data privacy and data protection
The misuse and mishandling of personal data is a hot topic. Increased regulation around the storage and processing of data is highly likely –it is already underway in Europe in the form of the General Data Protection Regulation (GDPR), which came into force in May 2018. Many technology areas are reliant on large data sets and any restrictions on their ability to use them could have significant consequences for future growth.
AI for data quality
One of the benefits of using AI is that it can improve data quality. This improvement is needed within any analytics-driven organisation where the proliferation of personal, public, cloud, and on-premise data has made it nearly impossible for IT to keep up with user demand. Companies want to improve data quality by taking advanced design and visualisation concepts typically reserved for the final product of a BI solution, namely dashboards and reports, and putting them to work at the very beginning of the analytics lifecycle. AI-based data visualisation tools, such as Qlik’s Sense platform and Google Data Studio, are enabling enterprises to identify critical data sets which need attention for business decision-making, reducing human workloads.
In an effort to speed time-to-market for custom-built AI tools, technology vendors are introducing pre-enriched, machine-readable data specific to given industries. Intended to help data scientists and AI engineers, these kits include the data necessary to create AI models that will speed the creation of those models. For example, the IBM Watson Data Kit for food menus includes 700,000 menus from across 21,000 US cities and dives into menu dynamics like price, cuisine, ingredients, etc. This information could be used directly in an AI travel app that enables users to locate nearby establishments catering to their specific dietary requirements.
Data as a service (DaaS)
Analytics and BI tools require data from a single, high-performance relational database, but most organisations have multiple solutions, formats, and sources for data. Hence IT teams typically apply custom ETL processes and proprietary tools to integrate data from various systems and improve the accessibility of analytics solutions. This approach leads to numerous challenges, including increased infrastructure costs, low architecture flexibility, increased data complexity, complex data governance, and an increase in the time taken to move data between systems. DaaS – a cloud service that provides users with on-demand data access – helps enterprises address these challenges. It is typically deployed with data lakes, which are huge repositories of unstructured, semi-structured, and structured data. DaaS solutions store and manage enterprise data by compiling it into relevant streams. Technology firms such as Microsoft, Oracle, and MuleSoft (acquired by Salesforce in 2018) offer DaaS solutions.
This is an edited extract from the Big Data in Utilities – Thematic Research report produced by GlobalData Thematic Research.