Showing posts with label Modernize: Cloud-ready data. Show all posts
Showing posts with label Modernize: Cloud-ready data. Show all posts

Tuesday, 28 February 2023

How to use Netezza Performance Server query data in Amazon Simple Storage Service (S3)

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides

In this example, we will demonstrate using current data within a Netezza Performance Server as a Service (NPSaaS) table combined with historical data in Parquet files to determine if flight delays have increased in 2022 due to the impact of the COVID-19 pandemic on the airline travel industry. This demonstration illustrates how Netezza Performance Server (NPS) can be extended to access data stored externally in cloud object storage (Parquet format files).

Background on the Netezza Performance Server capability demo


Netezza Performance Server (NPS) has recently added the ability to access Parquet files by defining a Parquet file as an external table in the database. This allows data that exists in cloud object storage to be easily combined with existing data warehouse data without data movement. The advantage to NPS clients is that they can store infrequently used data in a cost-effective manner without having to move that data into a physical data warehouse table.

To make it easy for clients to understand how to utilize this capability within NPS, a demonstration was created that uses flight delay data for all commercial flights from United States airports that was collected by the United States Department of Transportation (Bureau of Transportation Statistics). This data will be analyzed using Netezza SQL and Python code to determine if the flight delays for the first half of 2022 have increased over flight delays compared to earlier periods of time within the current data (January 2019 – December 2021).

This demonstration then compares the current flight delay data (January 2019 – June 2022) with historical flight delay data (June 2003 – December 2018) to understand if the flight delays experienced in 2022 are occurring with more frequency or simply following a historical pattern.

For this data scenario, the current flight delay data (2019 – 2022) is contained in a regular, internal NPS database table residing in an NPS as a Service (NPSaaS) instance within the U.S. East2 region of the Microsoft Azure cloud and the historical data (2003 – 2018) is contained in an external Parquet format file that resides on the Amazon Web Services (AWS) cloud within S3 (Simple Storage Service) storage.

All SQL and Python code is executed against the NPS database using Jupyter notebooks, which capture query output and graphing of results during the analysis phase of the demonstration. The external table capability of NPS makes it transparent to a client that some of the data resides externally to the data warehouse. This provides a cost-effective data analysis solution for clients that have frequently accessed data that they wish to combine with older, less frequently accessed data. It also allows clients to store their different data collections using the most economical storage based on the frequency of data access, instead of storing all data using high-cost data warehouse storage.

Prerequisites for the demo


The data set used in this example is a publicly available data set that is available from the United States Department of Transportation, Bureau of Transportation Statistics website at this URL: https://www.transtats.bts.gov/ot_delay/ot_delaycause1.asp?qv52ynB=qn6n&20=E

Using the default settings will return the most recent flight delay data for the last month of data available (for example, in late November 2022, the most recent data available was for August 2022). Any data from June 2003 up until the most recent month of data available can be selected.

The data definition


For this demonstration of NPS external tables capabilities to access AWS S3 data, the following tables were created in the NPS database.

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 1 – NPS database table definitions

The primary tables that will be used in the analysis portion of the demonstration are the AIRLINE_DELAY_CAUSE_CURRENT table (2019 – June 2022 data) and the AIRLINE_DELAY_CAUSE_HISTORY (2003 – 2018 data) external table (Parquet file). The historical data is placed in a single Parquet file to improve query performance versus having to join sixteen external tables in a single query.

The following diagram shows the data flows:

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 2 – Data flow for data analysis

Brief description of the flight delay data


Before the actual data analysis is discussed, it is important to understand the data columns tracked within the flight delay information and what the columns represent.

A flight is not counted as a delayed flight unless the delay is over 15 minutes from the original departure time.

There are five types of delays that are reported by the airlines participating in flight delay tracking:

◉ Air Carrier – the reason for the flight delay was within the airline’s control such as maintenance or flight crew issues, aircraft cleaning, baggage loading, fueling, and related issues.

◉ Extreme Weather – the flight delay was caused by extreme weather factors such as a blizzard, hurricane, or tornado.

◉ National Aviation System (NAS) – delays attributed to the national aviation system which covers a broad set of conditions such as non-extreme weather, airport operations, heavy traffic volumes, and air traffic control.

◉ Late arriving aircraft – a previous flight using the same aircraft arrived late, causing the present flight to depart late.

◉ Security – delays caused by an evacuation of a terminal or concourse, reboarding of an aircraft due to a security breach, inoperative screening equipment, and/or long lines more than 29 minutes in screening areas.

Since a flight delay can result from more than one of the five reasons for the delay, the delays are captured using several different columns of information. The first column, ARR_DELAY15 contains the number of minutes of the flight delay. There are five columns that correspond to the flight delay types: CARRIER_CT, WEATHER_CT, NAS_CT, SECURITY_CT, and LATE_AIRCRAFT_CT. The sum of these five columns will equal the time listed in the ARR_DELAY15 column.

Because multiple factors can contribute to a flight delay, the individual components of the flight delay can indicate a fractional portion of the overall flight delay. For example, the overall delay of 4.00 (ARR_DELAY15) is comprised of 2.67 for CARRIER_CT and 1.33 for LATE_AIRCRAFT_CT to equal the total 4.00 flight delay. This allows for further analysis to understand all factors that contributed to the overall flight delay time.

Here is an excerpt of the flight delay data to illustrate how the ARR_DELAY15 and flight delay reason columns interact:

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 3 – Portion of the flight delay data highlighting the column relationships

Flight delay data analysis


In this final section, the actual data analysis and results of the flight delay data analysis will be highlighted.

After the flight delay tables and external files (Parquet format files) were created and data loaded, there were several queries executed to validate that the data was for the correct date range within each table and that valid data was loaded into all the tables (internal and external).

Once this data validation and table verification was complete, the data analysis of the flight delay data began.

The initial data analysis was performed on the data in the internal NPS database table to look at the current flight delay data (2019 – June 2022) using this query.

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 4 – Initial analysis on current flight delay data

The data was displayed using a bar graph as well to make it easier to understand.

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 5 – Bar graph of current flight delay data (2019 – June 2022)

In looking at this graph, it appears that 2022 has fewer flight delays than the other recent years of flight delay data, with the exception of 2020 (the height of the COVID-19 pandemic). However, the flight delay data for 2022 is for six months only (January – June) versus the 12-months of data for the years 2019 through 2021. Therefore, the data must be normalized to provide a true comparison of flight delays between 2019 through 2021 and the partial year’s data of 2022.

After the data is normalized by comparing the number of flight delays compared to the total number of flights, the data can provide a valid comparison from the 2019 through the June 2022 time-period.

Figure 6 – There is a higher ratio of delayed flights in 2022 than in the period from 2019 – 2021

As Figure 6 highlights, when looking at the number of delayed flights compared to the total flights for the period, the flight delays in 2022 have increased over the prior years (2019 – 2021).

The next step in the analysis is to look at the historical flight delay data (2003 – 2018) to determine if the 2022 flight delays follow a historical pattern or if the flight delays have increased in 2022 due to the results of the pandemic period (airport staffing shortages, pilot shortages, and related factors).

Here is the initial query result on the historical flight delay data using a line graph output.

Figure 7 – Initial query using the historical data (2003 – 2018)

Figure 8 – Flight delays increased early in the historical years

After looking at the historical flight delay data from 2003–2018 at a high level, it was determined that the historical data should be separated into two separate time periods: 2003–2012 and 2013–2018. This separation was determined by analyzing the flight delays for each month of the year (January through December) and comparing the data for each of the historical years of data (2003–2018). With this flight delay comparison, the period from 2013–2018 had fewer flight delays for each month than the flight delay data for the period from 2003–2012.

The result of this query was output in a bar graph format to highlight the lower number of flight delays for the years from 2013–2018.

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 9 – Flight delays were lower during 2013 through 2018

The final analysis combines the historical flight delay data and illustrates the benefit of combining data from external AWS S3 parquet format and local Netezza format do a monthly analysis of the 2022 flight delay data (local Netezza) and graph it alongside the two historical periods (parquet): 2003–2012 and 2013–2018.

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 10 – The query to calculate monthly flight delays for 2022

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
Figure 11 – Flight delay comparison of 2022 (red) with historical period #1 (2003-2012) (blue) and historical period #2 (2013-2018) (green)

As the flight delay data graph indicates, the flight delays for 2022 are higher for every month from January through June (remember, the 2022 flight delay data is only through June) than the historical period #2 from 2013–2018. Only the oldest historical data (2003–2012) had flight delays comparable to 2022. Since the earlier analysis of current data (2019–June 2022) showed that 2022 had more flight delays than the period from 2019 through 2021, flight delays have increased in 2022 versus the last 10 years of flight delay data. This seems to indicate that the cause of the increased flight delays are factors related to the COVID-19 pandemic impacts to the airline industry.

A solution for quicker data analysis


The capabilities of NPS along with the ability to perform data analysis using Jupyter notebooks and integration with IBM Watson Studio as part of Cloud Pak for Data as a Service (with a free tier of usage) allow clients to perform data analysis quickly on a data set that can span the data warehouse and external Parquet format files in the cloud. This combination provides clients flexibility and cost savings by allowing them to host data in a storage medium based on application performance requirements, frequency of data access required, and budgetary constraints. By not requiring a client to move their data into the data warehouse, NPS can provide an advantage over other vendors such as Snowflake.

Supplemental section with additional details


Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
The SQL used to create the native Netezza table with current data (2019-June 2022)

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
The SQL to define a database source in Netezza for the cloud object storage bucket

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
The SQL to create external table for 2003 through 2018 from parquet files

Amazon Simple Storage Service (S3), IBM, IBM Exam, IBM Exam Prep, IBM Career, IBM Skills, IBM Jobs, IBM Tutorial and Materials, IBM Guides
The SQL to ‘create table as select’ from the parquet file

Source: ibm.com

Saturday, 25 February 2023

5 misconceptions about cloud data warehouses

Modernize: Cloud-ready data, Collect: Make data accessible, Organize: Business-ready analytics, IBM Exam, IBM Exam Prep, IBM Exam Preparation, IBM Tutorial and Materials, IBM Jobs, IBM Prep, IBM Preparation

In today’s world, data warehouses are a critical component of any organization’s technology ecosystem. They provide the backbone for a range of use cases such as business intelligence (BI) reporting, dashboarding, and machine-learning (ML)-based predictive analytics, that enable faster decision making and insights.

The rise of cloud has allowed data warehouses to provide new capabilities such as cost-effective data storage at petabyte scale, highly scalable compute and storage, pay-as-you-go pricing and fully managed service delivery. Companies are shifting their investments to cloud software and reducing their spend on legacy infrastructure. In 2021, cloud databases accounted for 85% of the market growth in databases. These developments have accelerated the adoption of hybrid-cloud data warehousing; industry analysts estimate that almost 50% of enterprise data has been moved to the cloud.

What is holding back the other 50% of datasets on-premises? Based on our experience speaking with CTOs and IT leaders in large enterprises, we have identified the most common misconceptions about cloud data warehouses that cause companies to hesitate to move to the cloud.

Misconception 1: Cloud data warehouses are more expensive


When considering moving data warehouses from on-premises to the cloud, companies often get sticker shock at the total cost of ownership. However, a more detailed analysis is needed to make an informed decision. Traditional on-premises warehouses require a significant initial capital investment and ongoing support fees, as well as additional expenses for managing the enterprise infrastructure. In contrast, cloud data warehouses may have a higher annual subscription fee, but they incorporate the upfront investment and additional ongoing overhead. Cloud warehouses also provide customers with elastic scalability, cheaper storage, savings on maintenance and upgrade costs, and cost transparency, which allows customers to have greater control over their warehousing costs. Industry analysts estimate that organizations that implement best practices around cloud cost controls and cloud migration see an average savings of 21% when using a public cloud and a 13x revenue growth rate for adopters of hybrid-cloud through end-to-end reinvention.

Misconception 2: Cloud data warehouses do not provide the same level of security and compliance as on-premises warehouses


Companies in highly regulated industries such as finance, insurance, transportation and manufacturing have a complex set of compliance requirements for their data, often leading to an additional layer of complexity when it comes to migrating data to the cloud. In addition, companies have complex data security requirements. However, over the past decade, a vast array of compliance and security standards, such as SOC2, PCI, HIPAA, and GDPR, have been introduced, and met by cloud providers. The rise of sovereign clouds and industry specific clouds are addressing the concerns of governmental and industry specific regulatory requirements. In addition, warehouse providers take on the responsibility of patching and securing the cloud data warehouse, to ensure that business users stay compliant with the regulations as they evolve.

Misconception 3: All data warehouse migrations are the same, irrespective of vendors


While migrating to the cloud, CTOs often feel the need to revamp and “modernize” their entire technology stack – including moving to a new cloud data warehouse vendor. However, a successful migration usually requires multiple rounds of data replication, query optimization, application re-architecture and retraining of DBAs and architects.

To mitigate these complexities, organizations should evaluate whether a hybrid-cloud version of their existing data warehouse vendor can satisfy their use cases, before considering a move to a different platform. This approach has several benefits, such as streamlined migration of data from on-premises to the cloud, reduced query tuning requirements and continuity in SRE tooling, automations, and personnel. It also enables organizations to create a decentralized hybrid-cloud data architecture where workloads can be distributed across on-prem and cloud.

Misconception 4: Migration to cloud data warehouses needs to be 0% or 100%


Companies undergoing cloud migrations often feel pressure to migrate everything to the cloud to justify the investment of the migration. However, different workloads may be better suited for different deployment environments. With a hybrid-cloud approach to data management, companies can choose where to run specific workloads, while maintaining control over costs and workload management. It allows companies to take advantage of the benefits of the cloud, such as scale and elasticity, while also retaining the control and security of sensitive workloads in-house. For example, Marriott International built a decentralized hybrid-cloud data architecture while migrating from their legacy analytics appliances, and saw a nearly 90% increase in performance. This enabled data-driven analytics at scale across the organization.

Misconception 5: Cloud data warehouses reduce control over your deployment


Some DBAs believe that cloud data warehouses lack the control and flexibility of on-prem data warehouses, making it harder to respond to security threats, performance issues or disasters. In reality, cloud data warehouses have evolved to provide the same control maturity as on-prem warehouses. Cloud warehouses also provide a host of additional capabilities such as failover to different data centers, automated backup and restore, high availability, and advanced security and alerting measures. Organizations looking to increase adoption of ML are turning to cloud data warehouses that support new, open data formats to catalog, ingest, and query unstructured data types. This functionality provides access to data by storing it in an open format, increasing flexibility for data exploration and ML modeling used by data scientists, facilitating governed data use of unstructured data, improving collaboration, and reducing data silos with simplified data lake integration.

Additionally, some DBAs worry that moving to the cloud reduces the need for their expertise and skillset. However, in reality, cloud data warehouses only automate the operational management of data warehousing such as scaling, reliability and backups, freeing DBAs to work on high value tasks such as warehouse design, performance tuning and ecosystem integrations.

By addressing these five misconceptions of cloud data warehouses and understanding the nuances, advantages, trade-offs and total cost ownership of both delivery models, organizations can make more informed decisions about their hybrid-cloud data warehousing strategy and unlock the value of all their data.

Getting started with a cloud data warehouse


At IBM we believe in making analytics secure, collaborative and price-performant across all deployments, whether running in the cloud, hybrid, or on-premises. For those considering a hybrid or cloud-first strategy, our data warehousing SaaS offerings including IBM Db2 Warehouse and Netezza Performance Server, are available across AWS, Microsoft Azure, and IBM Cloud and are designed to provide customers with the availability, elastic scaling, governance, and security required for SLA-backed, mission critical analytics.

When it comes to moving workloads to the cloud, IBM’s Expert Labs migration services ensure 100% workload compatibility between on-premises workloads and SaaS solutions.

No matter where you are in your journey to cloud, our experts are here to help customize the right approach to fit your needs. See how you can get started with your analytics journey to hybrid cloud by contacting an IBM database expert today.

Source: ibm.com

Thursday, 2 February 2023

Data platform trinity: Competitive or complementary?

IBM, IBM Exam, IBM Exam Prep, IBM Tutorial and Materials, IBM Guides, IBM Certification, IBM Skill, IBM Job

Data platform architecture has an interesting history. Towards the turn of millennium, enterprises started to realize that the reporting and business intelligence workload required a new solution rather than the transactional applications. A read-optimized platform that can integrate data from multiple applications emerged. It was Datawarehouse.

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume.

Yet another decade passed. And it became clear that data lake and datawarehouse are no longer enough to handle the business complexity and new workload of the enterprises. It is too expensive. Value of the data projects are difficult to realize. Data platforms are difficult to change. Time demanded a new solution, again.

Guess what? This time, at least three different data platform solutions are emerging: Data Lakehouse, Data Fabric, and Data Mesh. While this is encouraging, it is also creating confusion in the market. The concepts and values are overlapping. At times different interpretations are emerging depending on who is being asked.

This article endeavors to alleviate those confusions. The concepts will be explained. And then a framework will be introduced, which will show how these three concepts may lead to one another or be used with each other.

Data lakehouse: A mostly new platform


Concept of lakehouse was made popular by Databricks. They defined it as: “A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.”

While traditional data warehouses made use of an Extract-Transform-Load (ETL) process to ingest data, data lakes instead rely on an Extract-Load-Transform (ELT) process. Extracted data from multiple sources is loaded into cheap BLOB storage, then transformed and persisted into a data warehouse, which uses expensive block storage.

This storage architecture is inflexible and inefficient. Transformation must be performed continuously to keep the BLOB and data warehouse storage in sync, adding costs. And continuous transformation is still time-consuming. By the time the data is ready for analysis, the insights it can yield will be stale relative to the current state of transactional systems.

Furthermore, data warehouse storage cannot support workloads like Artificial Intelligence (AI) or Machine Learning (ML), which require huge amounts of data for model training. For these workloads, data lake vendors usually recommend extracting data into flat files to be used solely for model training and testing purposes. This adds an additional ETL step, making the data even more stale.

Data lakehouse was created to solve these problems. The data warehouse storage layer is removed from lakehouse architectures. Instead, continuous data transformation is performed within the BLOB storage. Multiple APIs are added so that different types of workloads can use the same storage buckets. This is an architecture that’s well suited for the cloud since AWS S3 or Azure DLS2 can provide the requisite storage.

Data fabric: A mostly new architecture


The data fabric represents a new generation of data platform architecture. It can be defined as: A loosely coupled collection of distributed services, which enables the right data to be made available in the right shape, at the right time and place, from heterogeneous sources of transactional and analytical natures, across any cloud and on-premises platforms, usually via self-service, while meeting non-functional requirements including cost effectiveness, performance, governance, security and compliance.

The purpose of the data fabric is to make data available wherever and whenever it is needed, abstracting away the technological complexities involved in data movement, transformation and integration, so that anyone can use the data. Some key characteristics of data fabric are:

A network of data nodes

A data fabric is comprised of a network of data nodes (e.g., data platforms and databases), all interacting with one another to provide greater value. The data nodes are spread across the enterprise’s hybrid and multicloud computing ecosystem.

Each node can be different from the others

A data fabric can consist of multiple data warehouses, data lakes, IoT/Edge devices and transactional databases. It can include technologies that range from Oracle, Teradata and Apache Hadoop to Snowflake on Azure, RedShift on AWS or MS SQL in the on-premises data center, to name just a few.

All phases of the data-information lifecycle

The data fabric embraces all phases of the data-information-insight lifecycle. One node of the fabric may provide raw data to another that, in turn, performs analytics. These analytics can be exposed as REST APIs within the fabric, so that they can be consumed by transactional systems of record for decision-making.

Analytical and transactional worlds come together

Data fabric is designed to bring together the analytical and transactional worlds. Here, everything is a node, and the nodes interact with one another through a variety of mechanisms. Some of these require data movement, while others enable data access without movement. The underlying idea is that data silos (and differentiation) will eventually disappear in this architecture.

Security and governance are enforced throughout

Security and governance policies are enforced whenever data travels or is accessed throughout the data fabric. Just as Istio applies security governance to containers in Kubernetes, the data fabric will apply policies to data according to similar principles, in real time.

Data discoverability

Data fabric promotes data discoverability. Here, data assets can be published into categories, creating an enterprise-wide data marketplace. This marketplace provides a search mechanism, utilizing metadata and a knowledge graph to enable asset discovery. This enables access to data at all stages of its value lifecycle.

The advent of the data fabric opens new opportunities to transform enterprise cultures and operating models. Because data fabrics are distributed but inclusive, their use promotes federated but unified governance. This will make the data more trustworthy and reliable. The marketplace will make it easier for stakeholders across the business to discover and use data to innovate. Diverse teams will find it easier to collaborate, and to manage shared data assets with a sense of common purpose.

Data fabric is an embracing architecture, where some new technologies (e.g., data virtualization) play a key role. But it allows existing databases and data platforms to participate in a network, where a data catalogue or data marketplace can help in discovering new assets. Metadata plays a key role here in discovering the data assets.

Data mesh: A mostly new culture


Data mesh as a concept is introduced by Thoughtworks. They defined it as: “…An analytical data architecture and operating model where data is treated as a product and owned by teams that most intimately know and consume the data.” The concept stands on four principles: Domain ownership, data as a product, self-serve data platforms, and federated computational governance.

Data fabric and data mesh as concepts have overlaps. For example, both recommend a distributed architecture – unlike centralized platforms such as datawarehouse, data lake, and data lakehouse. Both want to bring out the idea of a data product offered through a marketplace.

Differences exist also. As it is clear from the definition above, unlike data fabric, data mesh is about analytical data. It is narrower in focus than data fabric. Secondly, it emphasizes operational model and culture, meaning it is beyond just an architecture like data fabric. The nature of data product can be generic in data fabric, whereas data mesh clearly prescribes domain-driven ownership of data products.

The relationship between data lakehouse, data fabric and data mesh


Clearly, these three concepts have their own focus and strength. Yet, the overlap is evident.

Lakehouse stands apart from the other two. It is a new technology, like its predecessors. It can be codified. Multiple products exist in the market, including Databricks, Azure Synapse and Amazon Athena.

Data mesh requires a new operating model and cultural change. Often such cultural changes require a shift in the collective mindset of the enterprise. As a result, data mesh can be revolutionary in nature. It can be built from ground up at a smaller part of the organization before spreading into the rest of it.

Data fabric does not have such pre-requisites as data mesh. It is does not expect such cultural shift. It can be built up using existing assets, where the enterprise has invested over the period of years. Thus, its approach is evolutionary.

So how can an enterprise embrace all these concepts?

Address old data platforms by adopting a data lakehouse

It can embrace adoption of a lakehouse as part of its own data platform evolution journey. For example, a bank may get rid of its decade old datawarehouse and deliver all BI and AI use cases from a single data platform, by implementing a lakehouse.

Address data complexity with a data fabric architecture

If the enterprise is complex and has multiple data platforms, if data discovery is a challenge, if data delivery at different parts of the organization is difficult – data fabric may be a good architecture to adopt. Along with existing data platform nodes, one or multiple lakehouse nodes may also participate there. Even the transactional databases may also join the fabric network as nodes to offer or consume data assets.

Address business complexity with a data mesh journey

To address the business complexity, if the enterprise embarks upon a cultural shift towards domain driven data ownership, promotes self-service in data discovery and delivery, and adopts federated governance – they are on a data mesh journey. If the data fabric architecture is already in place, the enterprise may use it as a key enabler in their data mesh journey. For example, the data fabric marketplace may offer domain centric data products – a key data mesh outcome – from it. The metadata driven discovery already established as a capability through data fabric can be useful in discovering the new data products coming out of mesh.

Every enterprise can look at their respective business goals and decide which entry point suits them best. But even though entry points or motivations can be different, an enterprise may easily use all three concepts together in their quest to data-centricity.

Source: ibm.com

Saturday, 21 January 2023

Four starting points to transform your organization into a data-driven enterprise

IBM Exam Study, IBM Career, IBM Skills, IBM Jobs, IBM Prep, IBM Tutorial and Materials, IBM Certification

Due to the convergence of events in the data analytics and AI landscape, many organizations are at an inflection point. Regardless of size, industry or geographical location, the sprawl of data across disparate environments, increase in velocity of data and the explosion of data volumes has resulted in complex data infrastructures for most enterprises. Furthermore, a global effort to create new data privacy laws, and the increased attention on biases in AI models, has resulted in convoluted business processes for getting data to users. How do business leaders navigate this new data and AI ecosystem and make their company a data-driven organization? The solution is a data fabric.

A data fabric architecture elevates the value of enterprise data by providing the right data, at the right time, regardless of where it resides.  To simplify the process of becoming data-driven with a data fabric, we are focusing on the four most common entry points we see with data fabric journeys. In 2023, we have four entry points aligned to common data & AI stakeholder challenges.

We are also introducing IBM Cloud Park for Data Express. These are solutions that are aligned to the data fabric entry points. IBM Cloud Pak for Data Express solutions provide new clients with affordable and high impact capabilities to expeditiously explore and validate the path to become a data-driven enterprise. IBM Cloud Pak for Data Express solutions offer clients a simple on ramp to start realizing the business value of a modern architecture.

Data governance


The data governance capability of a data fabric focuses on the collection, management and automation of an organization’s data. The automated metadata generation is essential to turn a manual process into one that is better controlled. In this way it helps avoid human error and tags data so that policy enforcement can be achieved at the point of access rather than individual repositories.  This data-driven approach makes it easier to find the data that best fits their needs of business users. More importantly, this capability enables business users to quickly and easily find the quality data that conforms to regulatory requirements. IBM’s data governance capability enables the enforcement of policies at runtime anywhere, in essence “policies that move with the data”. This capability will provide data users with visibility into origin, transformations, and destination of data as it is used to build products.  The result is more useful data for decision-making, less hassle and better compliance.

Data integration


The rapid growth of data continues to proceed unabated and is now accompanied by not only the issue of siloed data but a plethora of different repositories across numerous clouds. The reasoning is simple and well-justified with the exception of data silos; more data allows the opportunity to provide more accurate data-driven insights, while using multiple clouds helps avoid vendor lock-in and allows data to be stored where it best fits. The challenge, of course, is the added complexity of data management that hinders the actual use of that data for better decisions, analysis and AI.

As part of a data fabric, IBM’s data integration capability creates a roadmap that helps organizations connect data from disparate data sources, build data pipelines, remediate data issues, enrich data quality, and deliver integrated data to multicloud platforms. From there, it can be easily accessed via dashboards by data consumers or those building into a data product. The kind of digital transformation that an organization gets with data integration ensures that the right data can be delivered to the right person at the right time. With IBM’s data integration portfolio, you are not locked into just a single integration style. You can select a hybrid integration strategy that aligns with your organization’s business strategy to meet the needs of your data consumers wanting to access and utilize the data.

Data science and MLOps


AI is no longer experimental. These technologies are becoming mainstream across industries and are proving key drivers of enterprise innovation and growth, leading to more accurate, quicker strategic decisions. When AI is done right, enterprises are seeing increased revenues, improved customer experiences and faster time-to-market, all of which leads to revenue gains and improvements in their competitive positioning.

The data science and MLOps capability provides data science tools and solutions that enable enterprises to accelerate AI-driven innovation, simplify the MLOps lifecycle, and run any AI model with a flexible deployment. With this capability, not only can data-driven companies operationalize data science models on any cloud while instilling trust in AI outcomes, but they are also in a position to improve the ability to manage and govern the AI lifecycle to optimize business decisions with prescriptive analytics.

AI governance


Artificial intelligence (AI) is no longer a choice. Adoption is imperative to beat the competition, release innovative products and services, better meet customer expectations, reduce risk and fraud, and drive profitability. However, successful AI is not guaranteed and does not always come easy. AI initiatives require governance, compliance with corporate and ethical principles, laws and regulations.

A data fabric addresses the need for AI governance by providing capabilities to direct, manage and monitor the AI activities of an organization. AI governance is not just a “nice to have”. It is an integral part of an organization adopting a data-driven culture. It is critical to avoid audits, hefty fines or damage to the organization’s reputation. The IBM AI governance solution provides automated tools and processes enabling an organization to direct, manage and monitor across the AI lifecycle.

IBM Cloud Pak for Data Express solutions


As previously mentioned, we now provide a simple, lightweight, and fast means of validating the value of a data fabric. Through the IBM Cloud Pak for Data Express solutions, you can leverage data governance, ELT Pushdown, or data science and MLOps capabilities to quickly evaluate the ability to better utilize data by simplifying data access and facilitating self-service data consumption. In addition, our comprehensive AI Governance solution complements the data science & MLOps express offering. Rapidly experience the benefits of a data fabric architecture in a platform solution that makes all data available to drive business outcomes.

Source: ibm.com

Tuesday, 10 January 2023

Using a digital self-serve experience to accelerate and scale partner innovation with IBM embeddable AI

IBM Exam Study, IBM Tutorial and Material, IBM Career, IBM Skills, IBM Jobs, IBM Learning

IBM has invested $1 billion into our partner ecosystem. We want to ensure that partners like you have the resources to build your business and develop software for your customers using IBM’s industry-defining hybrid cloud and AI platform. Together, we build and sell powerful solutions that elevate our clients’ businesses through digital transformation.

To that end, IBM recently announced a set of embeddable AI libraries that empower partners to create new AI solutions. In fact, IBM supports an easy and fast way to embed and adopt IBM AI technologies through the new Digital Self-Serve Co-Create Experience (DSCE).

The Build Lab team created the DSCE to complement its high-touch engagement process and provide a digital self-service experience that scales to tens of thousands of Independent Software Vendors (ISVs) adopting IBM’s embeddable AI. Using the DSCE self-serve portal, partners can discover and try the recently launched IBM embeddable AI portfolio of IBM Watson Libraries, IBM Watson APIs, and IBM applications at their own pace and on their schedule. In addition, DSCE’s digitally guided experience enables partners to effortlessly package and deploy their software at scale.

IBM Exam Study, IBM Tutorial and Material, IBM Career, IBM Skills, IBM Jobs, IBM Learning

Your on-ramp to embeddable AI from IBM


The IBM Build Lab team collaborates with qualified ISVs to build Proofs of Experience (PoX) demonstrating the value of combining the best of IBM Hybrid Cloud and AI technology to create innovative solutions and deliver unique market value.

DSCE is a wizard-driven experience. Users respond to contextual questions and get suggested prescriptive assets, education, and trials while rapidly integrating IBM technology into products. Rather than manually searching IBM websites and repositories for potentially relevant information and resources, DSCE does the legwork for you, providing assets, education, and trial resources based on your development intent. The DSCE guided path directs you to reference architectures, tutorials, best practices, boilerplate code, and interactive sandboxes for a customized roadmap with assets and education to speed your adoption of IBM AI.

Embark on a task-based journey


DSCE works seamlessly for both data scientist and machine learning operations (ML-Ops) engineers’ personas.

For example, data scientist, Miles wants to customize an emotion classification model to discover what makes customers happiest. His startup provides analysis of customer feedback to help the retail e-commerce customers it serves. He wants to provide high-quality analysis of the most satisfied customers, so he chooses a Watson NLP emotion classification model that he can fine-tune using an algorithm that predicts ‘happiness’ with greater confidence than pre-trained models. This type of modeling can all be done in just a few simple clicks:

◉ Find and try AI ->
◉ Build with AI Libraries ->
◉ Build with Watson NLP ->
◉ Emotion classification ->
◉ Library and container ->
◉ Custom train the model ->
◉ Results Page

IBM Exam Study, IBM Tutorial and Material, IBM Career, IBM Skills, IBM Jobs, IBM Learning

The bookmarkable Results Page gives a comprehensive set of assets for both training and deploying a model. For accomplishing the task of “Training the Model,” Miles can explore interactive demos, reserve a Watson Studio environment, copy a snippet from a Jupyter notebook, and much more.

If Miles, or his ML-Ops counterpart, Leena, wants to “Deploy the Model,” they can get access to the trial license and container of the new Watson NLP Library for 180 days. From there it’s easy to package and deploy the solution on Kubernetes, Red Hat OpenShift, AWS Fargate, or IBM Code Engine. It’s that simple!

Try embeddable AI now


Try the experience here: https://dsce.ibm.com/ and accelerate your AI-enabled innovation now. DSCE will be extended to include more IBM embeddable offerings, satisfying modern developer preferences for digital and self-serve experiences, while helping thousands of ISVs innovate rapidly and concurrently. If you want to provide any feedback on the experience, get in touch through the “Contact us” link on your customized results page.

Source: ibm.com

Saturday, 7 January 2023

Data architecture strategy for data quality

IBM Exam Study, IBM Tutorial and Materials, IBM Certification, IBM Career, IBM Skills, IBM Jobs, Modernize: Cloud-ready data,Collect: Make data accessible, Data Management

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues. Just one of these problems can prove costly to an organization. Having to deal with all of them can be devastating.

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture.

How the right data architecture improves data quality


The right data architecture can help your organization improve data quality because it provides the framework that determines how data is collected, transported, stored, secured, used and shared for business intelligence and data science use cases.

The first generation of data architectures represented by enterprise data warehouse and business intelligence platforms were characterized by thousands of ETL jobs, tables, and reports that only a small group of specialized data engineers understood, resulting in an under-realized positive impact on the business. Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps.

Both approaches were typically monolithic and centralized architectures organized around mechanical functions of data ingestion, processing, cleansing, aggregation, and serving. This created number of organizational and technological bottlenecks prohibiting data integration and scale along several dimensions: constant change of data landscape, proliferation of data sources and data consumers, diversity of transformation and data processing that use cases require, and speed of response to change.

What does a modern data architecture do for your business?


A modern data architecture like Data Mesh and Data Fabric aims to easily connect new data sources and accelerate development of use case specific data pipelines across on-premises, hybrid and multicloud environments. Combined with effective data lifecycle management, which evolves into data as product management, a modern data architecture can enable your organization to:
 
◉ Allow data stewards to ensure data compliance, protection and security
◉ Enhance trust in data by getting visibility into where data came from, how it has changed, and who is using it
◉ Monitor and identify data quality issues closer to the source to mitigate the potential impact on downstream processes or workloads
◉ Efficiently adopt data platforms and new technologies for effective data management
◉ Apply metadata to contextualize existing and new data to make it searchable and discoverable
◉ Perform data profiling (the process of examining, analyzing and creating summaries of datasets)
◉ Reduce data duplication and fragmentation

Because your data architecture dictates how your data assets and data management resources are structured, it plays a critical role in how effective your organization is at performing these tasks. Meaning, data architecture is a foundational element of your business strategy for higher data quality. Critical capabilities of modern high-quality data quality management solutions require an organization to:

◉ Perform data quality monitoring based on pre-configured rules
◉ Build data modeling lineage to perform root cause analysis of data quality issues
◉ Make a dataset’s value immediately understandable
◉ Practice proper data hygiene across interfaces

How to build a data architecture that improves data quality


A data strategy can help data architects create and implement a data architecture that improves data quality. Steps for developing an effective data strategy include:

1. Outlining business objectives you want your data to help you accomplish

For example, a financial institution may look to improve regulatory compliance, lower costs, and increase revenues. Stakeholders can identify business use cases for certain data types, such as running data analytics on real-time data as it’s ingested to automate decision-making to drive cost reduction.

2. Taking an inventory of existing data assets and mapping current data flows

This step includes identifying and cataloging all data throughout the organization into a centralized or federated inventory list, thereby removing data silos. The list should detail where each dataset resides and what applications and use cases rely on it. Next, select the data needed for your key use cases and prioritize those data domains that included it.

3. Developing a standardized nomenclature

A naming convention and aligned data format (data classes) for data used throughout the organization helps to ensure data consistency and interoperability across departments (domains) and use cases.

4. Determining what changes must be made to the existing architecture

Decide on the changes that will optimize your data for achieving your business objectives. Researching the different types of modern data architectures, such as a data fabric and data mesh can help you decide on the data structure most suitable to your business requirements.

5. Deciding on KPIs to gauge a data architecture’s effectiveness

Create KPIs and use advanced analytics that link the measure of your architecture’s success to how well it supports data quality.

6. Creating a data architecture roadmap

Companies can develop a rollout plan for implementing data architecture and governance in three to four data domains per quarter.

Data architecture and IBM


A well-designed data architecture creates a foundation for data quality through transparency and standardization that frames how your organization views, uses and talks about data.

As previously mentioned, a data fabric is one such architecture. A data fabric automates data discovery, governance and data quality management and simplifies self-service data access to data distributed across a hybrid cloud landscape. It can encompass the applications that generate and use data, as well as any number of data storage repositories such as data warehouses, data lakes (which store vast amounts of big data), NoSQL databases (which store unstructured data) and relational databases that utilize SQL.

Source: ibm.com

Saturday, 24 December 2022

How data, AI and automation can transform the enterprise

IBM Exam, IBM Exam Prep, IBM Tutorial and Materials, IBM Certification, IBM Career, IBM Skills, IBM Jobs

Today’s data leaders are expected to make organizations run more efficiently, improve business value, and foster innovation. Their role has expanded from providing business intelligence to management, to ensuring high-quality data is accessible and useful across the enterprise. In other words, they must ensure that data strategy aligns to business strategy. Only from this foundation can data leaders foster a data-driven culture, where the entire organization is empowered to take advantage of automation and AI technologies to improve ROI. These areas can transform the enterprise, from cost savings to revenue growth to opening new business opportunities.

Building the foundation: data architecture


Collecting, organizing, managing, and storing data is a complex challenge. A fit-for-purpose data architecture underpins effective data-driven organizations. Driven by business requirements, it establishes how data flows through the ecosystem from collection to processing to consumption. Modern cloud-based data architectures support high availability, scalability and portability; intelligent workflows, analytics and real-time integration; and connection to legacy applications via standard APIs. Your choice of data architecture can have a huge impact on your organization’s revenue and efficiencies, and the costs of getting it wrong can potentially be substantial.

The right data architecture can allow organizations to balance cost and simplicity and reduce data storage expenses, while making it easy for data scientists and line of business users to access trusted data. It can help eliminate siloes and integrate complex combinations of enterprise systems and applications to take advantage of existing and planned investments. And to increase your return on AI and automation investments, organizations should consider automated processes, methodologies, and tools that manage an organization’s use of AI through AI governance.

Taking advantage of automation for LOB and IT activities


You can use data to completely digitize your organization with automation and AI. The challenge is bringing it all together and implementing it across lines of business and IT.

For line-of-business functions, here are five key capabilities to consider:

1. Process mining to identify the best candidates for automation and scale your automation initiatives before investments are carried out

2. Robotic process automation (RPA) to automate manual, time-consuming tasks

3. A workflow engine to automate digital workflows

4. Operational decision management to analyze, automate, and govern rules-based business decisions

5. Content management to manage the growing volume of enterprise content that’s required to run your business and support decisions

6. Document processing to read your documents, extract data, and refine and store the data for use

Looking at the digitization of IT, here are three capability areas to evaluate:

1. Enterprise observability to improve application performance monitoring and accelerate CI/CD pipelines

2. Application resource management to proactively deliver the most efficient compute, storage, and network resources to your applications

3. AI to proactively identify potential risks or outage warning signs across IT environments

Help increase ROI on data, AI and automation investments by making data and AI ethics a part of your culture


But process and people can’t be ignored. If you don’t properly infuse AI into a major process in an organization, there may be no real impact. You should consider infusing AI into supply chain procurement, marketing, sales, and finance processes, and adapt processes accordingly. And since people run the processes, data literacy is pivotal to data-driven organizations so they can both take advantage of and challenge the insights an AI system can provide. If data users don’t agree or understand how to interpret their options, they might not follow the process. This can be a particularly high risk when you consider the implications this can have when it comes to cultivating a culture of data and AI ethics, and complying with data privacy standards.

Building a data-driven organization is a multifaceted undertaking spanning IT, leadership, and line of business functions. But the dividends are unmistakable. It sets the stage for enterprise-wide automation and IT. It can provide a competitive edge to organizations in their ability to quickly identify opportunities for costs savings and growth, and even unlock new business models.

Source: ibm.com

Monday, 7 November 2022

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Quality Solutions

IBM Exam, IBM Exam Study, IBM Career, IBM Skills, IBM Jobs, IBM Prep, IBM Preparation

Data is the new oil and organizations of all stripes are tapping this resource to fuel growth. However, data quality and consistency are one of the top barriers faced by organizations in their quest to become more data-driven. So, it is imperative to have a clear data quality strategy that relies on proactive data quality management as data moves from producers to consumers.

Unlock quality data with IBM


We are excited to share that Gartner recently named IBM a Leader in the 2022 Gartner® Magic Quadrant™ for Data Quality Solutions.


We believe, this is a testament to IBM’s vision to empower data professionals with trusted information through data quality capabilities including data cleansing, data lineage, data observability, and master data management.

IBM recently expanded its data quality capabilities with the acquisition of Databand.ai and its leading data observability offerings. This complements IBM’s partnership with MANTA to integrate automated data lineage capabilities from MANTA with IBM Watson Knowledge Catalog on Cloud Pak for Data.

Why does data quality matter across the data lifecycle?


Data quality issues can have far-reaching consequences across the lifecycle of data:

1. Analytics and AI

When a sophisticated AI/ML model confronts bad-quality data, it is the latter that usually wins. As organizations increasingly rely on AI/ML for critical business decisions, the role of a trusted data foundation that delivers high-quality data is paramount. So, it is important to provide data consumers with a curated set of high-quality data and allow them to search for relevant data through a well-defined data catalog.

2. Data Engineering

A research survey points out that data engineers spend two days per week firefighting bad data. This could be because a lot of the current data quality approaches are reactive, triggered only when data consumers complain about data quality. Once poor-quality data moves from data sources into downstream processes, it gets challenging to remediate quality issues. A smarter approach would be to plug data quality issues upstream through active monitoring and automated data cleansing at the source. Data observability capability makes data quality checks upstream possible.

3. Data Governance

Ensuring data quality is critical for data governance initiatives. Increasingly enterprise data is spread across multiple environments which contributes to inconsistent data silos that complicate data governance initiatives and create data integrity issues that could impact Business Intelligence and analytics applications. It is critical to promote a common business language across the enterprise to break down these silos. One effective way to identify bad-quality data before it flows into downstream processes is with the use of active metadata to foster greater understanding and trust in data and ensure that only high-quality data makes its way to data consumers. Equally important is the ability to understand data lineage by tracking the flow of data back to its source which can prove handy when remediating data quality issues.

IBM’s holistic approach to Data Quality


With a strong end-to-end data management experience combined with innovation in metadata and AI-driven automation, IBM differentiates itself by offering integrated quality and governance capabilities.

IBM Watson Knowledge Catalog, QualityStage, and Match360 services on Cloud Pak for Data offer a composable data quality solution with an easy way to start small and expand your data quality program across the full enterprise data ecosystem.  Watson Knowledge Catalog serves as an automated, metadata-driven foundation that assigns data quality scores to assets and improves curation through automated data quality rules. The solution offers out-of-the-box automation rules to simplify the addressing of data quality issues.

With the recent acquisition of Databand.ai,  a leading provider of data observability solutions, IBM can elevate traditional DataOps by using historical trends to compute statistics about data workloads and data pipelines directly at the source, determining if they are working, and pinpointing where any problems may exist. IBM’s partnership with Manta for automated data lineage capabilities further strengthens its ability to help clients find, track and prevent issues closer to the source and for a more streamlined operational approach to managing data.

IBM offers a wide range of capabilities necessary for end-to-end data quality management including data profiling (both at rest and in-flight), data cleansing, data monitoring, data matching (discovering duplicated records or linking master records), and data enrichment to ensure data consumers have access to high-quality data.

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.GARTNER and Magic Quadrant are registered trademarks and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

Source: ibm.com

Tuesday, 11 October 2022

How IBM Planning Analytics can help fix your supply chain


IBM Planning Analytics, IBM Exam Prep, IBM Career, IBM Tutorial and Materials, IBM Skills, IBM Jobs, IBM AI, IBM Analysis

IBM Planning Analytics, or TM1 as it used to be known, has always been a powerful upgrade from spreadsheets for all kinds of planning and reporting use cases, including financial planning and analysis (FP&A), sales & operations planning (S&OP), and many aspects of supply chain planning (SCP). As far back as the 1990s and early 2000s there were companies, like the one discussed in this podcast episode, that took advantage of TM1’s power to support full integration of their financial and supply chain planning processes.

Build planning models to improve supply chain management


The challenge faced by every company is matching supply with demand. In a perfect world you would know precisely how much of your product the market desires, and you would be able to produce and ship exactly that amount to every location where your customers would be waiting, ready to buy.

In lieu of a perfect world, what do you do? You plan. Plans help you explore the consequences of your decisions in advance so you can understand your hedging options: Do I build up inventory here? Do I need to find new suppliers there? Do I have enough cash to fund these investments while also covering day-to-day operations?

You also build planning models to capture relationships and constraints so that you can change your driver assumptions and immediately see the impact on resources and capacity over time. Having the ability to build and use models in this way is fundamental to managing supply chain and financial risk through activities like “what-if scenario planning”, as explained in this blog post. Time matters too: your models must be quick to run, so analysis can be done before the assumptions are out-of-date. As such, planning becomes a continuous rolling activity as the lines between “plan”, “budget” and “forecast” are blurred.

Since there are clear cross-functional business correlations between demand and sales, supply costs and Cost of Goods Sold, it’s not hard to argue for supply chain and financial planning models to be integrated across the Extended Planning and Analysis (xP&A) cycle. However, the reality of this is complicated by several factors including:

◉ Differing time horizons and cadences: Days/Weeks vs Months/Quarters

◉ Differing levels of detail: SKUs/ Products vs Product Groups/ Lines of Business

◉ The need to collaborate, share data and agree on definitions across organizational boundaries and systems

Choosing the right technology to support xP&A for your strategic goals


A growing number of forward-looking companies are successfully navigating these complexities using IBM Planning Analytics, a technology capable of supporting secure collaboration, fast automated data acquisition, driver-based and AI-powered predictive modeling, and, unique in the market, the handling of large amounts of detail at scale without sacrificing performance.

With the right technology-foundation in place, it becomes easier to tackle the business alignment questions, starting with designing an end-to-end integrated business planning process that will lead efficiently to a consensus forecast (or plan).

The first step is always the unconstrained demand plan.

Even when supply constraints seem overwhelming, it’s still important to have this view, so you can take action to overcome the constraints in the future. Depending on the patterns of your business, predictive models can play a significant role in improving the accuracy of your demand plan, while also saving time through automation, as experienced by Arthrex, a global medical device company.

The next step is to start layering on constraints.

In a manufacturing, distribution or retail context, this is the supply plan. The supply plan is typically anchored in capacity and can combine manufacturing capacity, supply capacity and labor capacity.

Then, everything comes together.

With everything in the IBM Planning Analytics dashboard, it’s now possible to see where and when capacity shortfalls (or excesses) are imminent and explore options for mitigating situations in accordance with strategic goals.

IBM Planning Analytics can help your teams modify assumptions such as production capacity and labor allocation across a variety of scenarios in real-time, and immediately see the impact on all related metrics including constrained demand, inventory, sales, costs, and cash. QueBIT’s webinar includes a demonstration with IBM Planning Analytics of the interplay between all these components, beginning with the demand plan and ending with the impact on financial statements. You can also find a more nuanced explanation of the relationship between supply chain decisions and financial KPIs here.

I also encourage you to join the IBM Business Analytics live stream event on October 25th, to hear more case studies on how businesses have used Planning Analytics to accelerate data-driven business decision making.

Source: ibm.com