Saturday 29 June 2024

Applying generative AI to revolutionize telco network operations

Applying generative AI to revolutionize telco network operations

Generative AI is shaping the future of telecommunications network operations. The potential applications for enhancing network operations include predicting the values of key performance indicators (KPIs), forecasting traffic congestion, enabling the move to prescriptive analytics, providing design advisory services and acting as network operations center (NOC) assistants.

In addition to these capabilities, generative AI can revolutionize drive tests, optimize network resource allocation, automate fault detection, optimize truck rolls and enhance customer experience through personalized services. Operators and suppliers are already identifying and capitalizing on these opportunities.

Nevertheless, challenges persist in the speed of implementing generative AI-supported use cases, as well as avoiding siloed implementations that impede comprehensive scaling and hinder the optimization of return on investment.

In a previous blog, we presented the three-layered model for efficient network operations. The main challenges in the context of applying generative AI across these layers are: 

  • Data layer: Generative AI initiatives are data projects at their core, with inadequate data comprehension being one of the primary complexities. In telco, network data is often vendor-specific, which makes it hard to understand and consume efficiently. It is also scattered across multiple operational support system (OSS) tools, complicating efforts to obtain a unified view of the network. 
  • Analytics layer: Foundation models have different capabilities and applications for different use cases. The perfect foundation model does not exist because a single model cannot uniformly address identical use cases across different operators. This complexity arises from the diverse requirements and unique challenges that each network presents, including variations in network architecture, operational priorities and data landscapes. This layer hosts a variety of analytics, including traditional AI and machine learning models, large language models and highly customized foundation models tailored for the operator. 
  • Automation layer: Foundation models excel at tasks such as summarization, regression and classification, but they are not stand-alone solutions for optimization. While foundation models can suggest various strategies to proactively address predicted issues, they cannot identify the absolute best strategy. To evaluate the correctness and impact of each strategy and to recommend the optimal one, we require advanced simulation frameworks. Foundation models can support this process but cannot replace it. 

Essential generative AI considerations across the 3 layers 


Instead of providing an exhaustive list of use cases or detailed framework specifics, we will highlight key principles and strategies. These focus on effectively integrating generative AI into telco network operations across the three layers, as illustrated in Figure 1.

Applying generative AI to revolutionize telco network operations
Figure 1 -Generative AI in three-layered model for future network operations 

We aim to emphasize the importance of robust data management, tailored analytics and advanced automation techniques that collectively enhance network operations, performance and reliability. 

1. Data layer: optimizing telco network data using generative AI 


Understanding network data is the starting point for any generative AI solution in telco. However, each vendor in the telecom environment has unique counters, with specific names and value ranges, which makes it difficult to understand data. Moreover, the telco landscape often features multiple vendors, adding to the complexity. Gaining expertise in these vendor-specific details requires specialized knowledge, which is not always readily available. Without a clear understanding of the data they possess, telecom companies cannot effectively build and deploy generative AI use cases. 

We have seen that retrieval-augmented generation (RAG)-based architectures can be highly effective in addressing this challenge. Based on our experience from proof-of-concept (PoC) projects with clients, here are the best ways to leverage generative AI in the data layer: 

  • Understanding vendor data: Generative AI can process extensive vendor documentation to extract critical information about individual parameters. Engineers can interact with the AI using natural language queries, receiving instant, precise responses. This eliminates the need to manually browse through complex and voluminous vendor documentation, saving significant time and effort. 
  • Building knowledge graphs: Generative AI can automatically build comprehensive knowledge graphs by understanding the intricate data models of different vendors. These knowledge graphs represent data entities and their relationships, providing a structured and interconnected view of the vendor ecosystem. This aids in better data integration and utilization in the upper layers. 
  • Data model translation: With an in-depth understanding of different vendors’ data models, generative AI can translate data from one vendor’s model to another. This capability is crucial for telecom companies that need to harmonize data across diverse systems and vendors, ensuring consistency and compatibility. 

Automating the understanding of vendor-specific data, generating metadata, constructing detailed knowledge graphs and facilitating seamless data model translation are key processes. Together, these processes, supported by a data layer with RAG-based architecture, enables telecom companies harness the full potential of their data. 

2. Analytics layer: harnessing diverse models for network insights 


On a high level, we can split the use cases of network analytics into two categories: use cases that revolve around understanding the past and current network state and use cases that predict future network state. 

For the first category, which involves advanced data correlations and creating insights about the past and current network state, operators can leverage large language models (LLMs) such as Granite™, Llama, GPT, Mistral and others. Although the training of these LLMs did not particularly include structured operator data, we can effectively use them in combination with multi-shot prompting. This approach helps in bringing additional knowledge and context to operator data interpretation. 

For the second category, which focuses on predicting the future network state, such as anticipating network failures and forecasting traffic loads, operators cannot rely on generic LLMs. This is because these models lack the necessary training to work with network-specific structured and semi-structured data. Instead, operators need foundation models specifically tailored to their unique data and operational characteristics. To accurately forecast future network behavior, we must train these models on the specific patterns and trends unique to the operator, such as historical performance data, incident reports and configuration changes. 

To implement specialized foundation models, network operators should collaborate closely with AI technology providers. Establishing a continuous feedback loop is essential, wherein you regularly monitor model performance and use the data to iteratively improve the model. Additionally, hybrid approaches that combine multiple models, each specializing in different aspects of network analytics, can enhance overall performance and reliability. Finally, incorporating human expertise to validate and fine-tune the model’s outputs can further improve accuracy and build trust in the system. 

3. Automation layer: integrating generative AI and network simulations for optimal solutions 


This layer is responsible for determining and enforcing optimal actions based on insights from the analytics layer, such as future network state predictions, as well as network operational instructions or intents from the operations team. 

There is a common misconception that generative AI handles optimization tasks and can determine the optimal response to predicted network states. However, for use cases of optimal action determination, the automation layer must integrate network simulation tools. This integration enables detailed simulations of all potential optimization actions using a digital network twin (a virtual replica of the network). These simulations create a controlled environment for testing different scenarios without affecting the live network.

By leveraging these simulations, operators can compare and analyze outcomes to identify the actions that best meet optimization goals. It is worth highlighting that simulations often leverage specialized foundation models from the analytics layer, like masked language models. These models allow manipulating parameters and evaluating their impact on specific masked parameters within the network context. 

The automation layer leverages another set of use cases for generative AI, namely the automated generation of scripts for action execution. These actions, triggered by network insights or human-provided intents, require tailored scripts to update network elements accordingly. Traditionally, this process has been manual within telcos, but with advancements in generative AI, there’s potential for automatic script generation. Architectures with generic LLMs augmented with retrieval-augmented generation (RAG) show good performance in this context, provided operators ensure access to vendor documentation and suitable methods of procedure (MOP). 

Generative AI plays a significant role in future telco operations, from predicting KPIs to responding to network insights and user intents. However, addressing challenges such as efficient data comprehension, specialized predictive analytics and automated network optimization is crucial. IBM has hands-on experience in each of these areas, offering solutions for efficient data integration, specialized foundation models and automated network optimization tools.

Source: ibm.com

Friday 28 June 2024

Best practices for augmenting human intelligence with AI

Best practices for augmenting human intelligence with AI

Artificial intelligence (AI) should be designed to include and balance human oversight, agency and accountability over decisions across the AI lifecycle. IBM’s first Principle for Trust and Transparency states that the purpose of AI is to augment human intelligence. Augmented human intelligence means that the use of AI enhances human intelligence, rather than operating independently of, or replacing it. All of this implies that AI systems are not to be treated as human beings, but rather viewed as support mechanisms that can enhance human intelligence and potential

AI that augments human intelligence maintains human responsibility for decisions, even when supported by an AI system. Humans therefore need to be upskilled—not deskilled—by interacting with an AI system. Supporting inclusive and equitable access to AI technology and comprehensive employee training and potential reskilling further supports the tenets of IBM’s Pillars of Trustworthy AI, enabling participation in the AI-driven economy to be underpinned by fairness, transparency, explainability, robustness and privacy. 

To put the principle of augmenting human intelligence into practice, we recommend the following best practices:

  1. Use AI to augment human intelligence, rather than operating independently of, or replacing it.
  2. In a human-AI interaction, notify individuals that they are interacting with an AI system, and not a human being.
  3. Design human-AI interactions to include and balance human oversight across the AI lifecycle. Address biases and promote human accountability and agency over outcomes by AI systems.
  4. Develop policies and practices to foster inclusive and equitable access to AI technology, enabling a broad range of individuals to participate in the AI-driven economy.
  5. Provide comprehensive employee training and reskilling programs to foster a diverse workforce that can adapt to the use of AI and share in the advantages of AI-driven innovations. Collaborate with HR to augment each employee’s scope of work.

For more information on standards and regulatory perspectives on human oversight, research, AI Decision Coordination, sample use cases and Key Performance Indicators, see our Augmenting Human Intelligence POV and KPIs below.

Source: ibm.com

Thursday 27 June 2024

Top 7 risks to your identity security posture

Top 7 risks to your identity security posture

Detecting and remediating identity misconfigurations and blind spots is critical to an organization’s identity security posture especially as identity has become the new perimeter and a key pillar of an identity fabric. Let’s explore what identity blind spots and misconfigurations are, detail why finding them is essential, and lay out the top seven to avoid.

What are the most critical risks to identity security? Identity misconfigurations and identity blind spots stand out as critical concerns that undermine an organization’s identity security posture.

An identity misconfiguration occurs when identity infrastructure and systems are not configured correctly. This can result from administrative error, or from configuration drift, which is the gradual divergence of an organization’s identity and access controls from their intended state, often due to unsanctioned changes or updates.

Identity blind spots are risks that are overlooked or not monitored by an organization’s existing identity controls, leaving undetected risks that threat actors might exploit.

Why is finding these risks important?


Traditionally, security measures focus on fortifying an organization’s network perimeter by building higher “walls” around its IT resources. However, the network perimeter has become less relevant with the adoption of cloud computing, SaaS services and hybrid work. In this new landscape, full visibility and control of the activities of both human and machine identities is crucial for mitigating cyberthreats.

Both research and real-world incidents where a compromised identity served as the attacker’s initial entry point validate the need to secure identities. The Identity Defined Security Alliance’s most recent research found that 90% of organizations surveyed have experienced at least one identity-based attack in the past year.

Meanwhile, the latest Threat Intelligence Index Report validated what many of us in the industry already knew: Identity has become the leading attack vector. The 2024 report showed a 71% increase in valid identities used in cyberattacks year-over-year. Organizations are just as likely to have a valid identity used in a cyberattack as they are to see a phishing attack. This is despite significant investments in infrastructure security and identity access and management solutions. Hackers don’t hack in; they log in.

One notable recent example of an identity-based attack is the Midnight Blizzard attack disclosed in January 2024. Based on what has been published about the attack, the malicious actors carried out a password spray attack to compromise a legacy nonproduction test tenant account. Once they gained a foothold through a valid account, they used its permissions to access a small percentage of the company’s corporate email user accounts. They might then exfiltrate sensitive information, including emails and attached documents.

What are the top seven risks to an organization’s identity security posture to avoid?


To stay one step ahead of identity-related attacks, identity and security teams should proactively improve their identity security posture by finding and remediating these common identity misconfigurations and blind spots. These are the key risks organizations should take steps to avoid:

Missing multi-factor authentication (MFA)

The US Cybersecurity and Infrastructure Security Agency (CISA) consistently urges organizations to implement MFA for all users and all services to prevent unauthorized access. Yet, achieving this goal can prove challenging in the real world. The complexity lies in configuring multiple identity systems, such as an organization’s Identity Provider and MFA system. Along with hundreds of applications’ settings to enforce MFA for thousands of users and groups. When not configured correctly, it can lead to a scenario where MFA is not enforced due to accidental omission or gaps in session management.

Password hygiene

Effective password hygiene is crucial to an organization’s identity security posture, but common identity misconfigurations frequently undermine password quality and increase the risk of data breaches. Allowing weak or commonly used passwords facilitates unauthorized access through simple guessing or brute force attacks.

Strong but default passwords can make password spray attacks easier. Using outdated password hash algorithms like SHA-1, MD4, MD5, RC2 or RC4, which can be quickly decoded, further exposes user credentials. Also, inadequate salting of passwords weakens their defense against dictionary and rainbow table attacks, making them easier to compromise.

Bypass of critical identity and security systems

Organizations deploy Privileged Access Management (PAM) systems to control and monitor access to privileged accounts, such as domain administrator and admin-level application accounts. PAM systems provide an extra layer of security by storing the credentials to privileged accounts in a secure vault and brokering access to protected systems via a proxy server or bastion host.

Unfortunately, PAM controls can be bypassed by resourceful admins or threat actors if not configured correctly, significantly reducing the protection they should provide. A similar problem can occur when users bypass zero trust network access (ZTNA) systems due to initial configuration issues or configuration drift over time.

Shadow access

Shadow access is a common blind spot in an organization’s identity security posture that can be difficult for organizations to discover and correct. Shadow access is when a user retains unmanaged access via a local account to an application or service for convenience or to speed up troubleshooting. Local accounts typically rely on static credentials, lack proper documentation and are at higher risk of unauthorized access. A local account with high privileges such as a super admin account is especially problematic.

Shadow assets

Shadow assets are a subset of shadow IT and represent a significant blind spot in identity security. Shadow assSets are applications or services within the network that are “unknown” to Active Directory or any other Identity Provider. This means that their existence and access are not documented or controlled by an organization’s identity systems, and these assets are only accessed by local accounts. Without integration into Active Directory or any other Identity Provider, these assets do not adhere to an organization’s established authentication and authorization frameworks. This makes enforcing security measures such as access controls, user authentication and compliance checks challenging. Therefore, shadow assets can inadvertently become gateways for unauthorized access.

Shadow identity systems

Shadow identity systems are unauthorized identity systems that might fall under shadow assets but are called out separately given the risk they pose to an organization’s identity security posture. The most common shadow identity system is the use of unapproved password managers.

Given the scope of their role, software development teams can take things further by implementing unsanctioned secret management tools to secure application credentials and even standing up their own Identity Providers. Another risky behavior is when developers duplicate Active Directory for testing or migration purposes but neglect proper disposal, exposing sensitive employee information, group policies and password hashes.

Forgotten service accounts

A service account is a type of machine identity that can perform various actions depending on its permissions. This might include running applications, automating services, managing virtual machine instances, making authorized API calls and accessing resources. When service accounts are no longer in active use but remain unmonitored with permissions intact, they become prime targets for exploitation. Attackers can use these forgotten service accounts to gain unauthorized access, potentially leading to data breaches, service disruptions and compromised systems, all under the radar of traditional identity security measures.

Adopt identity security posture management (ISPM) to reduce risk


Identity and access management (IAM) systems such as Active Directory, Identity Providers and PAM typically offer limited capabilities to find identity misconfigurations and blind spots that lead to a poor identity security posture. These identity security solutions typically don’t collect the necessary telemetry to identify these issues. This requires collecting and correlating data from multiple sources, including identity system log data, network traffic, cloud traffic and remote access logs.

That is why identity and security teams implement ISPM solutions such as IBM® Verify Identity Protection to discover and remediate identity exposures before an attacker can exploit them. IBM can help protect all your identities and identity fabric by using logs already in your security information and event management (SIEM) solutions or deploying IBM Verify Identity Protection sensors. IBM delivers fast time to value with unmatched visibility into identity activities in the first hours after deployment.

Source: ibm.com

Tuesday 25 June 2024

Speed, scale and trustworthy AI on IBM Z with Machine Learning for IBM z/OS v3.2

Speed, scale and trustworthy AI on IBM Z with Machine Learning for IBM z/OS v3.2

Recent years have seen a remarkable surge in AI adoption, with businesses doubling down. According to the IBM® Global AI Adoption Index, about 42% of enterprise-scale companies surveyed (> 1,000 employees) report having actively deployed AI in their business. 59% of those companies surveyed that are already exploring or deploying AI say they have accelerated their rollout or investments in the technology. Yet, amidst this surge, navigating the complexities of AI implementation, scalability issues and validating the trustworthiness of AI continue to be significant challenges that companies still face.   

A robust and scalable environment is crucial to accelerating client adoption of AI. It must be capable of converting ambitious AI use cases into reality while enabling real-time AI insights to be generated with trust and transparency.  

What is Machine Learning for IBM z/OS? 


Machine Learning for IBM® z/OS® is an AI platform tailor-made for IBM z/OS environments. It combines data and transaction gravity with AI infusion for accelerated insights at scale with trust and transparency. It helps clients manage their full AI model lifecycles, enabling quick deployment co-located with their mission-critical applications on IBM Z without data movement and minimal application changes. Features include explainability, drift detection, train-anywhere capabilities and developer-friendly APIs. 

Machine Learning for IBM z/OS use cases


Machine Learning for IBM z/OS can serve various transactional use cases on IBM z/OS. Top use cases include:

1. Real-time fraud detection in credit cards and payments: Large financial institutions are increasingly experiencing more losses due to fraud. With off-platform solutions, they were only able to screen a small subset of their transactions. In support of this use case, the IBM z16™ system can process up to 228 thousand z/OS CICS credit card transactions per second with 6 ms response time, each with an in-transaction fraud detection inference operation using a Deep Learning Model.

Performance result is extrapolated from IBM internal tests running a CICS credit card transaction workload with inference operations on IBM z16. A z/OS V2R4 logical partition (LPAR) configured with 6 CPs and 256 GB of memory was used. Inferencing was done with Machine Learning for IBM z/OS running on Websphere Application Server Liberty 21.0.0.12, using a synthetic credit card fraud detection model and the IBM Integrated Accelerator for AI. Server-side batching was enabled on Machine Learning for IBM z/OS with a size of 8 inference operations. The benchmark was run with 48 threads performing inference operations. Results represent a fully configured IBM z16 with 200 CPs and 40 TB storage. Results might vary. 

2. Clearing and settlement: A card processor explored using AI to assist in determining which trades and transactions have a high-risk exposure before settlement to reduce liability, chargebacks and costly investigation. In support of this use case, IBM has validated that the IBM z16 with Machine Learning for IBM z/OS is designed to score business transactions at scale delivering the capacity to process up to 300 billion deep inferencing requests per day with 1 ms of latency.

Performance result is extrapolated from IBM internal tests running local inference operations in an IBM z16 LPAR with 48 IFLs and 128 GB memory on Ubuntu 20.04 (SMT mode) using a synthetic credit card fraud detection model exploiting the Integrated Accelerator for AI. The benchmark was running with 8 parallel threads, each pinned to the first core of a different chip. The lscpu command was used to identify the core-chip topology. A batch size of 128 inference operations was used. Results were also reproduced using a z/OS V2R4 LPAR with 24 CPs and 256 GB memory on IBM z16. The same credit card fraud detection model was used. The benchmark was run with a single thread performing inference operations. A batch size of 128 inference operations was used. Results might vary. 
 
3. Anti-money laundering: A bank was exploring how to introduce AML screening into their instant payments operational flow. Their current end-day AML screening was no longer sufficient due to stricter regulations. In support of this use case, IBM has demonstrated that the IBM z16 with z/OS delivers up to 20x lower response time and up to 19x higher throughput when colocating applications and inferencing requests versus sending the same inferencing requests to a compared x86 server in the same data center with 60 ms average network latency.

Performance results based on IBM internal tests using a CICS OLTP credit card workload with in-transaction fraud detection. A synthetic credit card fraud detection model was used. On IBM z16, inferencing was done with MLz on zCX. Tensorflow Serving was used on the compared x86 server. A Linux on IBM Z LPAR, located on the same IBM z16, was used to bridge the network connection between the measured z/OS LPAR and the x86 server. Additional network latency was introduced with the Linux “tc-netem” command to simulate a network environment with 5 ms average latency. Measured improvements are due to network latency. Results might vary. IBM z16 configuration: Measurements were run using a z/OS (v2R4) LPAR with MLz (OSCE) and zCX with APAR- oa61559 and APAR- OA62310 applied, 8 CPs, 16 zIIPs and 8 GB of memory. x86 configuration: Tensorflow Serving 2.4 ran on Ubuntu 20.04.3 LTS on 8 Skylake Intel® Xeon® Gold CPUs @ 2.30 GHz with Hyperthreading turned on, 1.5 TB memory, RAID5 local SSD Storage.  

Machine Learning for IBM z/OS with IBM Z can also be used as a security-focused on-prem AI platform for other use cases where clients want to promote data integrity, privacy and application availability. The IBM z16 systems, with GDPS®, IBM DS8000® series storage with HyperSwap® and running a Red Hat® OpenShift® Container Platform environment, are designed to deliver 99.99999% availability.

Necessary components include IBM z16; IBM z/VM V7.2 systems or above collected in a Single System Image, each running RHOCP 4.10 or above; IBM Operations Manager; GDPS 4.5 for management of data recovery and virtual machine recovery across metro distance systems and storage, including Metro Multisite workload and GDPS Global; and IBM DS8000 series storage with IBM HyperSwap. A MongoDB v4.2 workload was used. Necessary resiliency technology must be enabled, including z/VM Single System Image clustering, GDPS xDR Proxy for z/VM and Red Hat OpenShift Data Foundation (ODF) 4.10 for management of local storage devices. Application-induced outages are not included in the preceding measurements. Results might vary. Other configurations (hardware or software) might provide different availability characteristics. 

Source: ibm.com

Saturday 22 June 2024

How IBM and AWS are partnering to deliver the promise of responsible AI

How IBM and AWS are partnering to deliver the promise of responsible AI

The artificial intelligence (AI) governance market is experiencing rapid growth, with the worldwide AI software market projected to expand from USD 64 billion in 2022 to nearly USD 251 billion by 2027, reflecting a compound annual growth rate (CAGR) of 31.4% (IDC). This growth underscores the escalating need for robust governance frameworks that ensure AI systems are transparent, fair and comply with increasing regulatory demands. In this expanding market, IBM and Amazon Web Services (AWS) have strategically partnered to address the growing demand from customers for effective AI governance solutions.

A robust framework for AI governance


The combination of IBM watsonx.governance™ and Amazon SageMaker offers a potent suite of governance, risk management and compliance capabilities that streamline the AI model lifecycle. This integration helps organizations manage model risks, adhere to compliance obligations and optimize operational efficiencies. It provides seamless workflows that automate risk assessments and model approval processes, simplifying regulatory compliance.

IBM has broadened its watsonx™ portfolio on AWS to include watsonx.governance™, providing tools essential for managing AI risks and ensuring compliance with global regulations. This integration facilitates a unified approach to AI model development and governance processes, enhancing workflow streamlining, AI lifecycle acceleration and accountability in AI deployments.

Adhering to the EU AI Act


The partnership between IBM and Amazon is particularly crucial in light of the EU AI Act, which mandates strict compliance requirements for AI systems used within the European Union. The integration of watsonx.governance with Amazon SageMaker equips businesses to meet these regulatory demands head-on. It provides tools for real-time compliance monitoring, risk assessment and management specific to the requirements of the EU AI Act. This ensures that AI systems are efficient and aligned with the highest standards of legal and ethical considerations in one of the world’s most stringent regulatory environments.

Addressing key use cases with integrated solutions


Compliance and regulatory adherence

Watsonx.governance provides tools that help organizations comply with international regulations such as the EU AI Act. This is particularly valuable for businesses operating in highly regulated industries like finance and healthcare, where AI models must adhere to strict regulatory standards.

In highly regulated industries like finance and healthcare, AI models must meet stringent standards. For example, in banking, watsonx.governance integrated with Amazon SageMaker ensures that AI models used for credit scoring and fraud detection comply with regulations like the Basel Accords and the Fair Credit Reporting Act. It automates compliance checks and maintains audit trails, enhancing regulatory adherence.

Risk management

By integrating with Amazon SageMaker, watsonx.governance allows businesses to implement robust risk management frameworks. This helps in identifying, assessing and mitigating risks associated with AI models throughout their lifecycle, from development to deployment.

In healthcare, where AI models predict patient outcomes or recommend treatments, it is crucial to manage the risks associated with inaccurate predictions. The integration allows for continuous monitoring and risk assessment protocols, helping healthcare providers quickly rectify models that show drift or bias, thus ensuring patient safety and regulatory compliance.

Model governance

Organizations can manage the entire lifecycle of their AI models with enhanced visibility and control. This includes monitoring model performance, ensuring data quality, tracking model versioning and maintaining audit trails for all activities.

In the retail sector, AI models used for inventory management and personalized marketing benefit from this integration. Watsonx.governance with Amazon SageMaker enables retailers to maintain a clear governance structure around these models, including version control and performance tracking, ensuring that all model updates undergo rigorous testing and approval before deployment.

Operational efficiency

The integration helps automate various governance processes, such as approval workflows for model deployment and risk assessments. This speeds up the time-to-market for AI solutions and reduces operational costs by minimizing the need for manual oversight.

In manufacturing, AI-driven predictive maintenance systems benefit from streamlined model updates and deployment processes. Watsonx.governance automates workflow approvals as new model versions are developed in Amazon SageMaker, reducing downtime and ensuring models operate at peak efficiency.

Data security and privacy

Ensuring the security and privacy of data used in AI models is crucial. Watsonx.governance helps enforce data governance policies that protect sensitive information and ensure compliance with data protection laws like the General Data Protection Regulation (GDPR).

For governmental bodies using AI in public services, data sensitivity is paramount. Integrating watsonx.governance with Amazon SageMaker ensures that AI models handle data according to strict government standards for data protection, including access controls, data encryption and auditability, aligning with laws like the GDPR.

Broadening the market with IBM’s software on AWS


IBM also offers a wide range of software products and consulting services through the AWS Marketplace. This includes 44 listings, 29 SaaS offerings and 15 services available across 92 countries, featuring a consumption-based license for Amazon Relational Database Service (RDS) for Db2®, which simplifies workload management and enables faster cloud provisioning.

Looking forward


As the AI landscape evolves, the partnership between IBM and Amazon SageMaker is poised to play a pivotal role in shaping responsible AI practices across industries. By setting new standards for ethical AI, this strategic collaboration enhances the capabilities of both organizations and serves as a model for integrating responsible AI practices into business operations.

Discover the future of AI and data management: Elevate your business with IBM’s data and AI solutions on AWS. Learn how our integrated technologies can drive innovation and efficiency in your operations. Explore detailed case studies, sign up for expert-led webinars and start your journey toward transformation with a free trial today. Embrace the power of IBM and AWS to harness the full potential of your data.

Source: ibm.com

Thursday 20 June 2024

The recipe for RAG: How cloud services enable generative AI outcomes across industries

The recipe for RAG: How cloud services enable generative AI outcomes across industries

According to research from IBM, about 42 percent of enterprises surveyed have AI in use in their businesses. Of all the use cases, many of us are now extremely familiar with natural language processing AI chatbots that can answer our questions and assist with tasks such as composing emails or essays. Yet even with widespread adoption of these chatbots, enterprises are still occasionally experiencing some challenges. For example, these chatbots can produce inconsistent results as they’re pulling from large data stores that might not be relevant to the query at hand.

Thankfully, retrieval-augmented generation (RAG) has emerged as a promising solution to ground large language models (LLMs) on the most accurate, up-to-date information. As an AI framework, RAG works to improve the quality of LLM-generated responses by grounding the model on sources of knowledge to supplement the LLM’s internal representation of information. IBM unveiled its new AI and data platform, watsonx, which offers RAG, back in May 2023.

In simple terms, leveraging RAG is like making the model take an open book exam as you are asking the chatbot to respond to a question with all the information readily available. But how does RAG operate at an infrastructure level? With a mixture of platform-as-a-service (PaaS) services, RAG can run successfully and with ease, enabling generative AI outcomes for organizations across industries using LLMs.

How PaaS services are critical to RAG


Enterprise-grade AI, including generative AI, requires a highly sustainable, compute- and data-intensive distributed infrastructure. While the AI is the key component of the RAG framework, other “ingredients” such as PaaS solutions are integral to the mix. These offerings, specifically serverless and storage offerings, operate diligently behind the scenes, enabling data to be processed and stored more easily, which provides increasingly accurate outputs from chatbots.

Serverless technology supports compute-intensive workloads, such as those brought forth by RAG, by managing and securing the infrastructure around them. This gives time back to developers, so they can concentrate on coding. Serverless enables developers to build and run application code without provisioning or managing servers or backend infrastructure.

If a developer is uploading data into an LLM or chatbot but is unsure of how to preprocess the data so it’s in the right format or filtered for specific data points, IBM Cloud Code Engine can do all this for them—easing the overall process of getting correct outputs from AI models. As a fully managed serverless platform, IBM Cloud Code Engine can scale the application with ease through automation capabilities that manage and secure the underlying infrastructure.

Additionally, if a developer is uploading the sources for LLMs, it’s important to have highly secure, resilient and durable storage. This is especially critical in highly regulated industries such as financial services, healthcare and telecommunications.

IBM Cloud Object Storage, for example, provides security and data durability to store large volumes of data. With immutable data retention and audit control capabilities, IBM Cloud Object Storage supports RAG by helping to safeguard your data from tampering or manipulation by ransomware attacks and helps ensure it meets compliance and business requirements.

With IBM’s vast technology stack including IBM Code Engine and Cloud Object Storage, organizations across industries can seamlessly tap into RAG and focus on leveraging AI more effectively for their businesses.

The power of cloud and AI in practice


We’ve established that RAG is extremely valuable for enabling generative AI outcomes, but what does this look like in practice?

Blendow Group, a leading provider of legal services in Sweden, handles a diverse array of legal documents—dissecting, summarizing and evaluating these documents that range from court rulings to legislation and case law. With a relatively small team, Blendow Group needed a scalable solution to aid their legal analysis. Working with IBM Client Engineering and NEXER, Blendow Group created an innovative AI-driven tool, leveraging the comprehensive capabilities of  to enhance research and analysis, and streamlines the process of creating legal content, all while maintaining the utmost confidentiality of sensitive data.

Utilizing IBM’s technology stack, including IBM Cloud Object Storage and IBM Code Engine, the AI solution was tailored to increase the efficiency and breadth of Blendow’s legal document analysis.

The Mawson’s Huts Foundation is also an excellent example of leveraging RAG to enable greater AI outcomes. The foundation is on mission to preserve the Mawson legacy, which includes Australia’s 42 percent territorial claim to the Antarctic and educate schoolchildren and others about Antarctica itself and the importance of sustaining its pristine environment.

With The Antarctic Explorer, an AI-powered learning platform running on IBM Cloud, Mawson is bringing children and others access to Antarctica from a browser wherever they are. Users can submit questions via a browser-based interface and the learning platform uses AI-powered natural language processing capabilities provided by IBM watsonx Assistant to interpret the questions and deliver appropriate answers with associated media—videos, images and documents—that are stored in and retrieved from IBM Cloud Object Storage.

By leveraging infrastructure as-a-service offerings in tandem with watsonx, both the Mawson Huts Foundation and Blendow Group are able to gain greater insights from their AI models by easing the process of managing and storing the data that is contained within them.

Enabling Generative AI outcomes with the cloud


Generative AI and LLMs have already proven to have great potential for transforming organizations across industries. Whether it’s educating the wider population or analyzing legal documents, PaaS solutions within the cloud are critical for the success of RAG and running AI models.

At IBM, we believe that AI workloads will likely form the backbone of mission-critical workloads and ultimately house and manage the most-trusted data, so the infrastructure around it must be trustworthy and resilient by design. With IBM Cloud, enterprises across industries using AI can tap into higher levels of resiliency, performance, security, compliance and total cost of ownership.

Source: ibm.com

Tuesday 18 June 2024

Immutable backup strategies with cloud storage

Immutable backup strategies with cloud storage

Cyberthreats, once a mostly predictable risk limited to isolated incidents, are now pervasive. Attackers aided by advancements in AI and global connectivity are continually seeking out vulnerabilities in security defenses so they can access critical infrastructure and customer data. Eventually, an attack will compromise an administrative account or a network component, or exploit a software vulnerability, ultimately gaining access to production infrastructure. These inevitable attacks are why having immutable offsite backups for both application and customer data is critical to achieving a swift recovery, minimizing downtime and limiting data loss.

In an era characterized by digital interconnectedness, businesses must confront an ever-evolving array of cyberthreats, which present formidable challenges in defending against attacks. Some of the common challenges that enterprises face when protecting data are:

◉ Maintaining data integrity and privacy amid the threat of potential data breaches and data leaks.
◉ Managing IT budgets while dealing with increased cyberthreats and regulatory compliance.
◉ Dealing with strains on resources and expertise to implement robust data protection measures, which leave businesses vulnerable to data loss and cyberattacks.
◉ Contending with new complexities of managing and securing sensitive data from massive information producing workloads like IoT, AI, mobile and media content workloads.

Use backups to protect your data


Backups serve as a foundational element in any robust data protection strategy, offering a lifeline against various threats, from cyberattacks to hardware failures to natural disasters. By creating duplicates of essential data and storing them in separate locations, businesses can mitigate the risk of permanent loss and ensure continuity of operations in the face of breaches or unforeseen catastrophes. Backups provide a safety net against ransomware attacks, enabling organizations to restore systems and data to a pre-incident state without succumbing to extortion demands.

Additionally, backups offer a means of recovering from human errors, such as accidental deletion or corruption, thereby preventing potentially costly disruptions and preserving valuable intellectual property and customer information. In essence, backups function as a fundamental insurance policy, offering peace of mind and resilience in an increasingly volatile digital landscape.

Workloads and patterns that benefit from a comprehensive backup strategy


There are some typical scenarios where having a backup strategy proves particularly useful.

Cloud-native workloads:

  • Applications that use virtual machines (VMs), containers, databases or object storage in AWS, Microsoft® Azure and other clouds should have a backup strategy. Storing these backups in a separate cloud environment such as the IBM Cloud® provides the best isolation and protection for backups.
  • Top cloud service providers: AWS, Microsoft Azure, IBM Cloud, Google Cloud and Oracle.

Virtual machines:

  • Most organizations run some applications in virtual environments either on premises or in the cloud. These virtual machines must be backed up to preserve their storage, configuration and metadata, ensuring rapid application recovery in the case of cyberattacks or disaster scenarios.
  • Key virtualization technologies: VMware®, Microsoft® Hyper-V, Red Hat® and Nutanix.

Enterprise applications and infrastructure:

  • Enterprise applications and infrastructure support critical business workloads and workforce collaboration. Ensuring quick application and data recovery in the case of cyberattacks is mission critical to avoid top line business impact.
  • Critical enterprise applications: Microsoft® Suite, Oracle Database, SAP and other database technologies.

SaaS applications:

  • Many customers are not aware of their responsibilities for backing up their data in SaaS applications. Even though they have SaaS, they can benefit from a backup solution that can prevent customer data loss if the SaaS service is compromised.
  • Common enterprise SaaS applications: Microsoft 365, Salesforce, ServiceNow and Splunk.

Back up data to the cloud for enhanced data protection


Effective disaster recovery (DR) practices mandate keeping usable business-critical backups offsite and immutable. Traditionally, this was achieved by sending backups to tape libraries in offsite locations. However, managing tape libraries became operationally complex due to the need to ensure that backups remained available for restoration in disaster scenarios. Restoring from tape libraries can also be slow and cumbersome, failing to meet recovery timelines crucial for critical application workloads.

Cloud storage offers a compelling offsite alternative to traditional tape backups. IBM Cloud® Object Storage is a fully managed cloud storage service with built-in redundancy, security, availability and scalability that is highly resilient to disaster events, ensuring data availability when needed. Accessible through APIs over the internet, cloud storage simplifies operational recovery procedures, which results in faster recovery times and lower data loss risks in cyberattack scenarios.

How IBM Cloud Object Storage protects backups


IBM Cloud Object Storage is a versatile and scalable solution that is crucial for storing and protecting data backups. It is used by clients across a wide range of industries and workloads to store hundreds of petabytes of backup data. Four of the top five US banks use IBM Cloud Object Storage to protect their data.

Clients can develop their native data backup solutions targeting IBM Cloud Object Storage or opt for industry-leading data protection tools such as Veeam, Storage Protect, Commvault, Cohesity, Rubrik, and others natively supporting backups to IBM Cloud Object Storage.

Key benefits of using IBM Cloud Object Storage for backups


Immutable data protection: Native immutability features help prevent backups from being modified or deleted during the retention window. Immutability provides the ultimate data protection against ransomware by blunting its ability to overwrite backup data with encryption.

Reduced disaster recovery time: Because your backup data is stored in a secured and separate environment, you can be confident that the backups will remain unaffected by cyberattacks on production environments. These unaffected backups make it easier to restore data and recover quickly.

Lower cost of backing up: Object storage is a fully managed storage service available at very low costs, allowing organizations to keep backup operational costs low while ensuring continued protection.

Resilience and availability: IBM Cloud Object Storage is a globally accessible service backed by redundant storage zones and network technologies, so your backups always remain available.

IBM Cloud Object Storage’s robust architecture ensures durability, scalability and cost-effectiveness, making it suitable for organizations of all sizes. Moreover, its immutability feature adds an extra layer of protection by preventing accidental or malicious alterations to backup data, thus ensuring data integrity and compliance with regulatory requirements. This feature, combined with IBM’s stringent security measures and extensive data protection capabilities, makes IBM Cloud Object Storage a trusted choice for businesses looking to secure their backup data reliably. By using IBM Cloud Object Storage, organizations can mitigate risks, streamline backup processes, and maintain peace of mind by knowing their critical data is securely stored and protected against any unforeseen events.

Source: ibm.com

Saturday 15 June 2024

Types of central processing units (CPUs)

Types of central processing units (CPUs)

What is a CPU?


The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks and manages operational functions that all types of computers use.

CPU types are designated according to the kind of chip that they use for processing data. There’s a wide variety of processors and microprocessors available, with new powerhouse processors always in development. The processing power CPUs provide enables computers to engage in multitasking activities. Before discussing the types of CPUs available, we should clarify some basic terms that are essential to our understanding of CPU types.

Key CPU terms


There are numerous components within a CPU, but these aspects are especially critical to CPU operation and our understanding of how they operate:

  • Cache: When it comes to information retrieval, memory caches are indispensable. Caches are storage areas whose location allows users to quickly access data that’s been in recent use. Caches store data in areas of memory built into a CPU’s processor chip to reach data retrieval speeds even faster than random access memory (RAM) can achieve. Caches can be created through software development or hardware components.
  • Clock speed: All computers are equipped with an internal clock, which regulates the speed and frequency of computer operations. The clock manages the CPU’s circuitry through the transmittal of electrical pulses. The delivery rate of those pulses is termed clock speed, which is measured in Hertz (Hz) or megahertz (MHz). Traditionally, one way to increase processing speed has been to set the clock to run faster than normal.
  • Core: Cores act as the processor within the processor. Cores are processing units that read and carry out various program instructions. Processors are classified according to how many cores are embedded into them. CPUs with multiple cores can process instructions considerably faster than single-core processors. (Note: The term “Intel® Core™” is used commercially to market Intel’s product line of multi-core CPUs.)
  • Threads: Threads are the shortest sequences of programmable instructions that an operating system’s scheduler can independently administer and send to the CPU for processing. Through multithreading—the use of multiple threads running simultaneously—a computer process can be run concurrently. Hyper-threading refers to Intel’s proprietary form of multithreading for the parallelization of computations.

Other components of the CPU


In addition to the above components, modern CPUs typically contain the following:

  • Arithmetic logic unit (ALU): Carries out all arithmetic operations and logical operations, including math equations and logic-based comparisons. Both types are tied to specific computer actions.
  • Buses: Ensures proper data transfer and data flow between components of a computer system.
  • Control unit: Contains intensive circuitry that controls the computer system by issuing a system of electrical pulses and instructs the system to carry out high-level computer instructions.
  • Instruction register and pointer: Displays location of the next instruction set to be executed by the CPU.
  • Memory unit: Manages memory usage and the flow of data between RAM and the CPU. Also, the memory unit supervises the handling of cache memory.
  • Registers: Provides built-in permanent memory for constant, repeated data needs that must be handled regularly and immediately.

How do CPUs work?


CPUs use a type of repeated command cycle that’s administered by the control unit in association with the computer clock, which provides synchronization assistance.

The work a CPU does occurs according to an established cycle (called the CPU instruction cycle). The CPU instruction cycle designates a certain number of repetitions, and this is the number of times the basic computing instructions will be repeated, as enabled by that computer’s processing power.

The three basic computing instructions are as follows:

  • Fetch: Fetches occur anytime data is retrieved from memory.
  • Decode: The decoder within the CPU translates binary instructions into electrical signals, which engage with other parts of the CPU.
  • Execute: Execution occurs when computers interpret and carry out a computer program’s set of instructions.

Basic attempts to generate faster processing speeds have led some computer owners to forego the usual steps involved in creating high-speed performance, which normally require the application of more memory cores. Instead, these users adjust the computer clock so it runs faster on their machine(s). The “overclocking” process is analogous to “jailbreaking” smartphones so their performance can be altered. Unfortunately, like jailbreaking a smartphone, such tinkering is potentially harmful to the device and is roundly disapproved by computer manufacturers.

Types of central processing units


CPUs are defined by the processor or microprocessor driving them:

  • Single-core processor: A single-core processor is a microprocessor with one CPU on its die (the silicon-based material to which chips and microchips are attached). Single-core processors typically run slower than multi-core processors, operate on a single thread and perform the instruction cycle sequence only once at a time. They are best suited to general-purpose computing.
  • Multi-core processor: A multi-core processor is split into two or more sections of activity, with each core carrying out instructions as if they were completely distinct computers, although the sections are technically located together on a single chip. For many computer programs, a multi-core processor provides superior, high-performance output.
  • Embedded processor: An embedded processor is a microprocessor expressly engineered for use in embedded systems. Embedded systems are small and designed to consume less power and be contained within the processor for immediate access to data. Embedded processors include microprocessors and microcontrollers.
  • Dual-core processor: A dual-core processor is a multi-core processor containing two microprocessors that act independently from each other.
  • Quad-core processor: A quad-core processor is a multi-core processor that has four microprocessors functioning independently.
  • Octa-core: An octa-core processor is a multi-core processor that has eight microprocessors functioning independently.
  • Deca-core processor: A deca-core processor is an integrated circuit that has 10 cores on one die or per package.

Leading CPU manufacturers and the CPUs they make


Although several companies manufacture products or develop software that supports CPUs, that number has dwindled down to just a few major players in recent years.

The two major companies in this area are Intel and Advanced Micro Devices (AMD). Each uses a different type of instruction set architecture (ISA). Intel processors use a complex instruction set computer (CISC) architecture. AMD processors follow a reduced instruction set computer (RISC) architecture.

  • Intel: Intel markets processors and microprocessors through four product lines. Its premium, high-end line is Intel Core. Intel’s Xeon® processors are targeted toward offices and businesses. Intel’s Celeron® and Intel Pentium® lines are considered slower and less powerful than the Core line.
  • Advanced Micro Devices (AMD): AMD sells processors and microprocessors through two product types: CPUs and APUs (which stands for accelerated processing units). APUs are CPUs that have been equipped with proprietary Radeon™ graphics. AMD’s Ryzen™ processors are high-speed, high-performance microprocessors intended for the video game market. Athlon™ processors was formerly considered AMD’s high-end line, but AMD now uses it as a basic computing alternative.
  • Arm: Although Arm doesn’t actually manufacture equipment, it does lease out its valued, high-end processor designs and/or other proprietary technologies to other companies who do make equipment. Apple, for example, no longer uses Intel chips in Mac® CPUs but makes its own customized processors based on Arm designs. Other companies are following this example.

Related CPU and processor concepts


Graphics processing unit (GPUs)

While the term “graphics processing unit” includes the word “graphics,” this phrasing does not truly capture what GPUs are about, which is speed. In this instance, its increased speed is the cause of accelerating computer graphics.


The GPU is a type of electronic circuit with immediate applications for PCs, smartphones and video game consoles, which was their original use. Now GPUs also serve purposes unrelated to graphics acceleration, like cryptocurrency mining and the training of neural networks.

Microprocessors

The quest for computer miniaturization continued when computer science created a CPU so small that it could be contained within a small integrated circuit chip, called the microprocessor. Microprocessors are designated by the number of cores they support.

A CPU core is “the brain within the brain,” serving as the physical processing unit within a CPU. Microprocessors can contain multiple processors. Meanwhile, a physical core is a CPU built right into a chip, but which only occupies one socket, thus enabling other physical cores to tap into the same computing environment.

Output devices

Computing would be a vastly limited activity without the presence of output devices to execute the CPU’s sets of instruction. Such devices include peripherals, which attach to the outside of a computer and vastly increase its functionality.

Peripherals provide the means for the computer user to interact with the computer and get it to process instructions according to the computer user’s wishes. They include desktop essentials like keyboards, mice, scanners and printers.

Peripherals are not the only attachments common to the modern computer. There are also input/output devices in wide use and they both receive information and transmit information, like video cameras and microphones.

Power consumption

Several issues are impacted by power consumption. One of them is the amount of heat produced by multi-core processors and how to dissipate excess heat from that device so the computer processor remains thermally protected. For this reason, hyperscale data centers (which house and use thousands of servers) are designed with extensive air-conditioning and cooling systems.

There are also questions of sustainability, even if we’re talking about a few computers instead of a few thousand. The more powerful the computer and its CPUs, the more energy will be required to support its operation—and in some macro-sized cases, that can mean gigahertz (GHz) of computing power.

Specialized chips

The most profound development in computing since its origins, artificial intelligence (AI) is now impacting most if not all computing environments. One development we’re seeing in the CPU space is the creation of specialty processors that have been built specifically to handle the large and complex workloads associated with AI (or other specialty purposes):

  • Such equipment includes the Tensor Streaming Processor (TSP), which handles machine learning (ML) tasks in addition to AI applications. Other products equally suited to AI work are the AMD Ryzen Threadripper™ 3990X 64-Core processor and the Intel Core i9-13900KS Desktop Processor, which uses 24 cores.
  • For an application like video editing, many users opt for the Intel Core i7 14700KF 20-Core, 28-thread CPU. Still others select the Ryzen 9 7900X, which is considered AMD’s best CPU for video editing purposes.
  • In terms of video game processors, the AMD Ryzen 7 5800X3D features a 3D V-Cache technology that helps it elevate and accelerate game graphics.
  • For general-purpose computing, such as running an OS like Windows or browsing multimedia websites, any recent-model AMD or Intel processor should easily handle routine tasks.

Transistors

Transistors are hugely important to electronics in general and to computing in particular. The term is a mix of “transfer resistance” and typically refers to a component made of semiconductors used to limit and/or control the amount of electrical current flowing through a circuit.

In computing, transistors are just as elemental. The transistor is the basic building unit behind the creation of all microchips. Transistors help comprise the CPU, and they’re what makes the binary language of 0s and 1s that computers use to interpret Boolean logic.

The next wave of CPUs


Computer scientists are always working to increase the output and functionality of CPUs. Here are some projections about future CPUs:

  • New chip materials: The silicon chip has long been the mainstay of the computing industry and other electronics. The new wave of processors (link resides outside ibm.com) will take advantage of new chip materials that offer increased performance. These include carbon nanotubes (which display excellent thermal conductivity through carbon-based tubes approximately 100,000 times smaller than the width of a human hair), graphene (a substance that possesses outstanding thermal and electrical properties) and spintronic components (which rely on the study of the way electrons spin, and which could eventually produce a spinning transistor).
  • Quantum over binary: Although current CPUs depend on the use of a binary language, quantum computing will eventually change that. Instead of binary language, quantum computing derives its core principles from quantum mechanics, a discipline that has revolutionized the study of physics. In quantum computing, binary digits (1s and 0s) can exist in multiple environments (instead of in two environments currently). And because this data will live in more than one location, fetches will become easier and faster. The upshot of this for the user will be a marked increase in computing speed and an overall boost in processing power.
  • AI everywhere: As artificial intelligence continues to make its profound presence felt—both in the computing industry and in our daily lives—it will have a direct influence on CPU design. As the future unfolds, expect to see an increasing integration of AI functionality directly into computer hardware. When this happens, we’ll experience AI processing that’s significantly more efficient. Further, users will notice an increase in processing speed and devices that will be able to make decisions independently in real time. While we wait for that hardware implementation to occur, chip manufacturer Cerebras has already unveiled a processor its makers claim to be the “fastest AI chip in the world” (link resides outside ibm.com). Its WSE-3 chip can train AI models with as many as 24 trillion parameters. This mega-chip contains four trillion transistors, in addition to 900,000 cores.

CPUs that offer strength and flexibility


Companies expect a lot from the computers they invest in. In turn, those computers rely upon having a CPUs with enough processing power to handle the challenging workloads found in today’s data-intensive business environment.

Organizations need workable solutions that can change as they change. Smart computing depends upon having equipment that capably supports your mission, even as that work evolves. IBM servers offer strength and flexibility, so you can concentrate on the job at hand. Find the IBM servers you need to get the results your organization relies upon—both today and tomorrow.

Source: ibm.com

Friday 14 June 2024

T-Mobile unlocks marketing efficiency with Adobe Workfront

T-Mobile unlocks marketing efficiency with Adobe Workfront

With 109 million customers and counting, “uncarrier” T-Mobile is one of the top mobile communications providers in the U.S. The company always puts the customer first, which it achieves by delivering the right experiences and content to the right customers at the right time. But with different sub-brands and business units, T-Mobile’s marketing and content workflows were complex—and often inefficient and disconnected.

Executive visibility is key for T-Mobile


To ensure the best customer experience, T-Mobile’s C-suite participates in all overarching marketing strategy and marketing campaign decisions. However, when critical decisions were pending, manual workflows and disjointed tools made it nearly impossible for senior leadership to see everything in one system or retrieve information efficiently.

The marketing operations team knew they needed to create a more seamless work management system to support its content supply chain.

“We realized leadership didn’t have the right information at their fingertips to make decisions in the moment. We knew we needed to pull together a leadership dashboard to show all of the given campaigns in real-time.” Ilona Yeremova, Head of Marketing Tools, Operations and Analytics Team, T-Mobile

Like many other large companies with complex marketing organizations, T-Mobile turned to Adobe Workfront to streamline its content supply chain, help connect teams, plan and prioritize content creation, ensure compliance, and manage assets and customer data.

Scaling Adobe Workfront activation throughout the organization


T-Mobile started implementing Adobe Workfront on the creative side of the house. One of its 25 groups, T-Studios, was using Workfront, but it was siloed from the other 24 groups. “We quickly realized that work management has to happen centrally within the organization. Data has to connect, people need to connect and collaborate, and we need to start talking in the same language,” Yeremova said.

T-Mobile did an inventory of the things they really wanted to accomplish with customer-focused content marketing efforts, and evaluated how they could orchestrate a seamless customer journey across the platform in a way that would aid in that delivery. They started with the customer in mind and then walked it back to the technology, applications, and processes. The key questions they asked themselves were:

  • What are those journeys we’re trying to orchestrate?
  • How are we trying to talk to those customers?
  • What are the trigger points?
  • How does that all come to life in a connected way in Adobe Workfront?

When T-Mobile first started using Adobe Workfront five years ago, it was a basic project management system for 60 employees. Today, it is regarded internally as a transformational business technology used by 6000+ employees as part of a content marketing strategy to achieve business objectives. Overall, T-Mobile has realized a 47% increase in its productivity on the marketing side since optimizing Adobe Workfront, without adding any additional headcount.

IBM helps unlock more value from Adobe Workfront


Once T-Mobile achieved a more mature state with Adobe Workfront, they wanted to better understand the ROI realized with Adobe Workfront and how to connect with other platforms. That’s when T-Mobile turned to IBM.

“IBM, a primary partner for content supply chain strategy and enablement, helped augment the T-Mobile team in a very seamless way,” said Yeremova. “[They are helping us] accelerate the onboarding of teams and connecting platforms.”

IBM drove change management for several departments, including the Technology Customer Experience (TCX) and Career Development teams, two of the largest groups at T-Mobile, both of whom were previously operating in Smartsheets.

“[We brought in IBM as an] outside party for change management because internally there’s just so much passion and inertia and you’ve got to take the passion out of it,” Yeremova said.

In addition to change management, IBM conducted a Value Realization assessment of Workfront for the two groups and found that the career development team realized a 90% decrease in time spent manually setting up and managing projects and 93% decrease in time creating or managing reports. The TCX team saved 11 hours a week by eliminating unnecessary meetings and improving automated workflows. T-Mobile now has all 25 marketing groups operating in Workfront, an effort for which IBM has onboarded and assisted with configurations. 

Yeremova says, “It’s all iterative. Tomorrow is going to be different than today. T-Mobile now has a fairly robust environment that is Adobe-centric, and everything is integrated within the platform.”

Looking forward to an AI-powered future


T-Mobile strongly believes that creating the right guardrails and building a strong foundation with Adobe Workfront has helped them prepare for the innovation that is happening today, as well as the AI-powered future of tomorrow.

“We are very diligent about governing the platform we have. And it’s critical for us to have clean data in the system.” Ilona Yeremova, Head of Marketing Tools, Operations and Analytics Team, T-Mobile.
As her team ingests data, they are constantly studying it and verifying it – because if your data is stale, nothing else will be accurate.

T-Mobile is currently focused onunifying taxonomies across the enterprise. Yeremova says, “The team did a lot of work and [the creative taxonomies] are like an A plus now – but next up, we’re focused on unifying taxonomies in the whole marketing organization and then even looking upwards to the enterprise.”

The combination of a mature work management strategy and a focus on change management, governance, and clean data sets T-Mobile up nicely to supercharge Workfront with new features and generative AI capabilities. “If you’re not onboard with AI, you’ll be left behind,” Yeremova says. IBM is currently helping T-Mobile evaluate different use cases where they can leverage generative AI, like enabling sales reps to make recommendations more quickly to customers in its more than 6,000+ retail stores.

“We’re going to be much quicker at doing things and it’s exciting to envision that future where folks are happy doing more of the purposeful work and less of the manual busy work,” Yeremova said. “I watch how much administrative stuff that my team does, and I know that there’s a better way to do it. If we can have GenAI technologies like IBM® watsonx™ do some of those repetitive, mundane tasks for us, I bet we’ll incrementally gain that benefit of more meaningful work. My team is small but mighty and we are incredibly lucky to have partnership from our Adobe and IBM teams.”

Source: ibm.com

Thursday 13 June 2024

5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

In business and beyond, communication is king. Successful service level agreements (SLAs) operate on this principle, laying the foundation for successful provider-customer relationships.

A service level agreement (SLA) is a key component of technology vendor contracts that describes the terms of service between a service provider and a customer. SLAs describe the level of performance to be expected, how performance will be measured and repercussions if levels are not met. SLAs make sure that all stakeholders understand the service agreement and help forge a more seamless working relationship.

Types of SLAs


There are three main types of SLAs:

Customer-level SLAs

Customer-level SLAs define the terms of service between a service provider and a customer. A customer can be external, such as a business purchasing cloud storage from a vendor, or internal, as is the case with an SLA between business and IT teams regarding the development of a product.

Service-level SLAs

Service providers who offer the same service to multiple customers often use service-level SLAs. Service-level SLAs do not change based on the customer, instead outlining a general level of service provided to all customers.

Multilevel SLAs

When a service provider offers a multitiered pricing plan for the same product, they often offer multilevel SLAs to clearly communicate the service offered each level. Multilevel SLAs are also used when creating agreements between more than two more parties.

SLA components


SLAs include an overview of the parties involved, services to be provided, stakeholder role breakdowns, performance monitoring and reporting requirements. Other SLA components include security protocols, redressing agreements, review procedures, termination clauses and more. Crucially, they define how performance will be measured.

SLAs should precisely define the key metrics—service-level agreement metrics—that will be used to measure service performance. These metrics are often related to organizational service level objectives (SLOs). While SLAs define the agreement between organization and customer, SLOs set internal performance targets. Fulfilling SLAs requires monitoring important metrics related to business operations and service provider performance. The key is monitoring the right metrics.

What is a KPI in an SLA?


Metrics are specific measures of an aspect of service performance, such as availability or latency. Key performance indicators (KPIs) are linked to business goals and are used to judge a team’s progress toward those goals. KPIs don’t exist without business targets; they are “indicators” of progress toward a stated goal.

Let’s use annual sales growth as an example, with an organizational goal of 30% growth year-over-year. KPIs such as subscription renewals to date or leads generated provide a real-time snapshot of business progress toward the annual sales growth goal.

Metrics such as application availability and latency help provide context. For example, if the organization is losing customers and not on track to meet the annual goal, an examination of metrics related to customer satisfaction (that is, application availability and latency) might provide some answers as to why customers are leaving.

What SLA metrics to monitor


SLAs contain different terms depending on the vendor, type of service provided, client requirements, compliance standards and more and metrics vary by industry and use case. However, certain SLA performance metrics such as availability, mean time to recovery, response time, error rates and security and compliance measurements are commonly used across services and industries. These metrics set a baseline for operations and the quality of services provided.

Clearly defining which metrics and key performance indicators (KPIs) will be used to measure performance and how this information will be communicated helps IT service management (ITSM) teams identify what data to collect and monitor. With the right data, teams can better maintain SLAs and make sure that customers know exactly what to expect.

Ideally, ITSM teams provide input when SLAs are drafted, in addition to monitoring the metrics related to their fulfillment. Involving ITSM teams early in the process helps make sure that business teams don’t make agreements with customers that are not attainable by IT teams.

SLA metrics that are important for IT and ITSM leaders to monitor include:

1. Availability

Service disruptions, or downtime, are costly, can damage enterprise credibility and can lead to compliance issues. The SLA between an organization and a customer dictates the expected level of service availability or uptime and is an indicator of system functionality.

Availability is often measured in “nines on the way to 100%”: 90%, 99%, 99.9% and so on. Many cloud and SaaS providers aim for an industry standard of “five 9s” or 99.999% uptime.

For certain businesses, even an hour of downtime can mean significant losses. If an e-commerce website experiences an outage during a high traffic time such as Black Friday, or during a large sale, it can damage the company’s reputation and annual revenue. Service disruptions also negatively impact the customer experience. Services that are not consistently available often lead users to search for alternatives. Business needs vary, but the need to provide users with quick and efficient products and services is universal.

Generally, maximum uptime is preferred. However, providers in some industries might find it more cost effective to offer a slightly lower availability rate if it still meets client needs.

2. Mean time to recovery

Mean time to recovery measures the average amount of time that it takes to recover a product during an outage or failure. No system or service is immune from an occasional issue or failure, but enterprises that can quickly recover are more likely to maintain business profitability, meet customer needs and uphold SLAs.

3. Response time and resolution time

SLAs often state the amount of time in which a service provider must respond after an issue is flagged or logged. When an issue is logged or a service request is made, the response time indicates how long it takes for a provider to respond to and address the issue. Resolution time refers to how long it takes for the issue to be resolved. Minimizing these times is key to maintaining service performance.

Organizations should seek to address issues before they become system-wide failures and cause security or compliance issues. Software solutions that offer full-stack observability into business functions can play an important role in maintaining optimized systems and service performance. Many of these platforms use automation and machine learning (ML) tools to automate the process of remediation or identify issues before they arise.

For example, AI-powered intrusion detection systems (IDS) constantly monitor network traffic for malicious activity, violations of security protocols or anomalous data. These systems deploy machine learning algorithms to monitor large data sets and use them to identify anomalous data. Anomalies and intrusions trigger alerts that notify IT teams. Without AI and machine learning, manually monitoring these large data sets would not be possible.

4. Error rates

Error rates measure service failures and the number of times service performance dips below defined standards. Depending on your enterprise, error rates can relate to any number of issues connected to business functions.

For example, in manufacturing, error rates correlate to the number of defects or quality issues on a specific product line, or the total number of errors found during a set time interval. These error rates, or defect rates, help organizations identify the root cause of an error and whether it’s related to the materials used or a broader issue.

There is a subset of customer-based metrics that monitor customer service interactions, which also relate to error rates.

◉ First call resolution rate: In the realm of customer service, issues related to help desk interactions can factor into error rates. The success of customer services interactions can be difficult to gauge. Not every customer fills out a survey or files a complaint if an issue is not resolved—some will just look for another service. One metric that can help measure customer service interactions is the first call resolution rate. This rate reflects whether a user’s issue was resolved during the first interaction with a help desk, chatbot or representative. Every escalation of a customer service query beyond the initial contact means spending on extra resources. It can also impact the customer experience.
◉ Abandonment rate: This rate reflects the frequency in which a customer abandons their inquiry before finding a resolution. Abandonment rate can also add to the overall error rate and helps measure the efficacy of a service desk, chatbot or human workforce.

5. Security and compliance

Large volumes of data and the use of on-premises servers, cloud servers and a growing number of applications creates a greater risk of data breaches and security threats. If not monitored appropriately, security breaches and vulnerabilities can expose service providers to legal and financial repercussions.

For example, the healthcare industry has specific requirements around how to store, transfer and dispose of a patient’s medical data. Failure to meet these compliance standards can result in fines and indemnification for losses incurred by customers.

While there are countless industry-specific metrics defined by the different services provided, many of them fall under larger umbrella categories. To be successful, it is important for business teams and IT service management teams to work together to improve service delivery and meet customer expectations.

Benefits of monitoring SLA metrics


Monitoring SLA metrics is the most efficient way for enterprises to gauge whether IT services are meeting customer expectations and to pinpoint areas for improvement. By monitoring metrics and KPIs in real time, IT teams can identify system weaknesses and optimize service delivery.

The main benefits of monitoring SLA metrics include:

Greater observability

A clear end-to-end understanding of business operations helps ITSM teams find ways to improve performance. Greater observability enables organizations to gain insights into the operation of systems and workflows, identify errors, balance workloads more efficiently and improve performance standards.

Optimized performance

By monitoring the right metrics and using the insights gleaned from them, organizations can provide better services and applications, exceed customer expectations and drive business growth.

Increased customer satisfaction

Similarly, monitoring SLA metrics and KPIs is one of the best ways to make sure services are meeting customer needs. In a crowded business field, customer satisfaction is a key factor in driving customer retention and building a positive reputation.

Greater transparency

By clearly outlining the terms of service, SLAs help eliminate confusion and protect all parties. Well-crafted SLAs make it clear what all stakeholders can expect, offer a well-defined timeline of when services will be provided and which stakeholders are responsible for specific actions. When done right, SLAs help set the tone for a smooth partnership.

Understand performance and exceed customer expectations


The IBM® Instana® Observability platform and IBM Cloud Pak® for AIOps can help teams get stronger insights from their data and improve service delivery.

IBM® Instana® Observability offers full-stack observability in real time, combining automation, context and intelligent action into one platform. Instana helps break down operational silos and provides access to data across DevOps, SRE, platform engineering and ITOps teams.

IT service management teams benefit from IBM Cloud Pak for AIOps through automated tools that address incident management and remediation. IBM Cloud Pak for AIOps offers tools for innovation and the transformation if IT operations. Meet SLAs and monitor metrics with an advanced visibility solution that offers context into dependencies across environments.

IBM Cloud Pak for AIOps is an AIOps platform that delivers visibility into performance data and dependencies across environments. It enables ITOps managers and site reliability engineers (SREs) to use artificial intelligence, machine learning and automation to better address incident management and remediation. With IBM Cloud Pak for AIOps, teams can innovate faster, reduce operational cost and transform IT operations (ITOps).

Source: ibm.com