Expert Views & Insights on Digital Transformation | Devoteam

eBPF Unveiled: Transforming Kernel Operations for Modern Computing

kavitha — Thu, 11 Jan 2024 12:18:35 +0000

eBPF (extended Berkeley Packet Filter) represents a significant advancement in Linux kernel technology. This blog aims to demystify eBPF, presenting an unbiased exploration of its capabilities, applications, and the way it’s shaping system-level programming. From network optimisation to security improvement, we’ll navigate through the intricacies of eBPF, offering insights into its role in modern computing. Join us as we explore the multifaceted world of eBPF, understanding its impact and potential in the realm of system performance and security.

What is eBPF?

eBPF is a revolutionary technology in the Linux kernel. It allows developers to run sandboxed programs in a restricted virtual machine inside the kernel, without changing kernel source code or adding kernel modules. Initially designed for network packet filtering, eBPF’s capabilities have expanded dramatically, making it a versatile tool in system-level programming.

eBPF provides a high-performance, secure way to dynamically extend kernel capabilities. Its flexibility allows it to be used for a wide range of system-level tasks, revolutionising how monitoring, networking, and security functionalities are implemented within the kernel space. By operating inside the kernel, eBPF programs can efficiently handle high-throughput data, like network packets, making them crucial for performance-critical environments.

What does eBPF stand for?

eBPF stands for “extended Berkeley Packet Filter.” The “extended” signifies its evolution from the original BPF, reflecting its expanded capabilities beyond simple packet filtering. The Berkeley Packet Filter, originally part of the BSD Unix operating system, was designed for network packet capture and filtering. eBPF extends this model with a more robust instruction set, wider applicability, and enhanced performance, transforming it into a powerful tool that can safely and efficiently interact with kernel-level operations.

What is eBPF used for (use cases)?

eBPF has a diverse array of use cases, primarily because of its ability to safely and efficiently extend kernel functionality. Key applications include:

Network Functionality: Implementing custom network protocols, packet filtering, and routing without impacting kernel stability.
Security: Enhancing system security by dynamically implementing firewalls, access controls, and intrusion detection systems directly within the kernel.
Performance Monitoring: Real-time monitoring of system and application performance metrics, enabling detailed observability and troubleshooting.
Tracing and Profiling: Kernel and user-space tracing for debugging and performance analysis, offering insights into system behaviour without traditional overheads.
Load Balancing: Efficient load balancing in networking scenarios, crucial for high-traffic environments.

These use cases highlight eBPF’s versatility in enhancing, securing, and monitoring system performance at the kernel level.

What are the features of eBPF?

eBPF boasts several powerful features:

Safety: eBPF programs are verified for safety within the kernel, ensuring they don’t harm the system (e.g., by preventing infinite loops).
Performance: Runs directly inside the kernel, offering high efficiency, especially critical for networking and monitoring tasks.
Flexibility: Applicable to a wide range of kernel functions, from networking to security.
User-Kernel Space Interaction: Provides a safe interface for user-space applications to interact with kernel-space operations.
Event-Driven Programming: Can be attached to various kernel events (like system calls or network events), enabling responsive and dynamic system behaviour.

These features make eBPF a powerful tool for modern kernel-level programming, offering both flexibility and security.

How does eBPF work?

eBPF works by allowing developers to write programs that run within the kernel space, yet in a sandboxed environment. We want to keep things simple, so without taking too much of your time, here’s a quick overview:

Writing and Compiling: Developers write eBPF programs, usually in a high-level language like C, which are then compiled into eBPF bytecode.
Loading and Verification: The bytecode is loaded into the kernel, where it undergoes a verification process to ensure it’s safe and won’t harm the system.
JIT Compilation: Once verified, the kernel performs Just-In-Time compilation of the bytecode to native machine code for efficient execution.
Attaching to Kernel Events: eBPF programs are attached to specific kernel events, like network packet arrival or system call execution.
Execution: When the specified event occurs, the eBPF program is executed, allowing it to modify, redirect, or inspect the data associated with the event.This process ensures that eBPF programs are both efficient and safe, providing powerful capabilities while safeguarding system integrity.

How are eBPF programs written?

Writing eBPF programs involves a few key steps:

Language Choice: Typically, eBPF programs are written in a restricted subset of C for ease of development and readability.
Compilation: The C code is compiled into eBPF bytecode using specialised compilers, like Clang/LLVM.
Loading into Kernel: The bytecode is loaded into the kernel using eBPF tools like BPF Compiler Collection (BCC).
Interaction with User Space: Often, eBPF programs are accompanied by a user-space application for control and data retrieval. This communication is facilitated through maps, which are data structures accessible both to the kernel and user space.
Debugging and Testing: Tools like bpftrace and various eBPF front ends assist in debugging and testing eBPF programs.

Writing eBPF programs requires understanding both the capabilities and limitations of the eBPF virtual machine and the kernel APIs it interacts with, making it a unique blend of kernel and application programming.

Which Companies Use eBPF?

A variety of companies across different industries use eBPF for their operations. These companies range from tech giants to innovative startups, each leveraging eBPF’s capabilities in unique ways to enhance performance, security, and efficiency within their systems. The applications vary widely, demonstrating eBPF’s versatility and its growing importance in modern computing infrastructure. Here are a few examples:

Netflix
Netflix utilises eBPF for detailed network monitoring and performance analysis. They leverage eBPF-based flow logs to gain insights into network traffic patterns, helping them optimise network performance and troubleshoot issues effectively. This approach provides a scalable solution for analysing high volumes of data, enabling Netflix to maintain a robust and efficient streaming service for its users.

Meta
Meta (formerly Facebook) uses eBPF in its network load balancer, Katran. Katran leverages eBPF to enhance network performance and scalability. This tool assists in efficiently directing traffic across Meta’s massive network infrastructure, ensuring high availability and reliability of services. By using eBPF, Meta can handle the vast scale of network traffic with improved performance and flexibility.

Walmart
Walmart, through its L3AF project, use eBPF to enhance network visibility and control within its infrastructure. This approach allows Walmart to manage network policies effectively, ensuring robust security and performance. By leveraging eBPF, Walmart achieves a more dynamic and scalable network operation, crucial for handling their extensive and complex digital environment.

Is eBPF Right for Your Business?

eBPF offers intriguing capabilities for enhancing system performance, security, and monitoring, especially in Linux environments. However, its adoption depends on your business’s specific needs, technical expertise, and infrastructure. While it provides powerful tools for network and system optimisation, it also requires a deep understanding of kernel operations and advanced programming skills. Therefore, it’s vital to consider your organisation’s technical maturity and the complexity of the challenges you face before deciding if eBPF is the right fit for your business.

How can I learn more?

This article is part of a larger series focusing on the technologies and topics found in the first edition of the TechRadar by Devoteam . To see what our community of tech leaders said about the current position of eBPF in the market, take a look at the most recent edition of the TechRadar by Devoteam.

SNOWDAY 2023: Your Guide to Snowflake’s Latest Innovations

arafath — Wed, 03 Jan 2024 10:35:05 +0000

During the dual online event on November 1st and 2nd, Snowflake unveiled a deluge of updates, providing insights into the development status of features announced earlier in the year at SUMMIT in Las Vegas. This event, tailored for diverse global audiences, served as a platform for Snowflake to provide crucial updates on features unveiled earlier this year at SUMMIT in Las Vegas and to introduce an array of new functionalities.

Christian Kleinerman, Senior Vice President of Product at Snowflake, took center stage to unveil a myriad of innovative features in the Snowflake ecosystem. Let’s delve into each segment:

Data Foundation: Elevating the Core of the Platform
Robust Snowflake Data Governance: Introducing SNOWFLAKE HORIZON
Cost Management: Enhancing Financial Control
Machine Learning Integration: A Paradigm Shift
SNOWFLAKE CORTEX: Elevating AI Accessibility
Scale with Applications: Native Integration Evolution

Data Foundation: Elevating the Core of the Platform

The Data Foundation segment stood out for its implications in everyday platform use, introducing interface enhancements and features designed to simplify user interactions.

Unistore, for mixed OLTP and OLAP Workloads

Unistore, initially introduced as a new Workload at the SUMMIT, progressed to a Public Preview of HYBRID TABLES, scheduled for late 2023. This feature allows the fusion of OLTP tasks with traditional OLAP usage, enabling real-time analytics. For instance, an e-commerce company could leverage Unistore to record user interactions in real-time and fuel real-time analytics, enhancing the personalization of the user experience.

Furthermore, Unistore has the potential to streamline operations, reducing the number of databases and pipelines by allowing fast single-point operations in Snowflake. This, in turn, accelerates the development of new features and use cases.

More Data Lake versatility with, managed or not, iceberg tables

Another highly anticipated feature is the integration with Apache ICEBERG TABLES standard, entering Public Preview with two integration options: UNMANAGED and FULLY MANAGED ICEBERG TABLES. This integration provides additional versatility to all workloads within the platform, whether in Data Warehousing, Data Lakes, or Data Lakehouses.

Performance Improvements everywhere for traditional workloads

Focusing on corporate DATA LAKES, Snowflake announced the General Availability of features for handling semi-structured data and introduced Dynamic File Processing with Snowpark in Python and Scala. The improvements extend to traditional DATA WAREHOUSES with innovations like automatic clustering cost estimation, materialized view refresh enhancements, and support for INSERT statements for the query acceleration service, all in private preview.

These enhancements, often unnoticed by the common eye, significantly contribute to the enhancement of the SNOWFLAKE PERFORMANCE INDEX. This index, incorporated into the platform’s web interface since August 2022, indicates a 15% improvement in platform performance compared to a year ago.

Robust Snowflake Data Governance: Introducing SNOWFLAKE HORIZON

The highlight of this block revolves around the Data Governance framework named SNOWFLAKE HORIZON, addressing Compliance, Security, Privacy, Interoperability, and Access—the five pillars of Horizon. This framework aims to provide control over data across all available Snowflake platforms and regions, ensuring compliance with local and international standards.

Many announcements were made regarding the Security of the platform:

ENHANCED NETWORK SECURITY (Public Preview): Improvements in the management of blacklists and whitelists for IP access.
IMPROVED AUTHENTICATION (Public Preview coming soon): Enhancements in authentication mechanisms.
DATABASE ROLES (General Availability): Introduction of new roles at the database level for expanded Role-Based Access Control (RBAC) capabilities.

However, what stood out for me in terms of Security Governance was the announcement of CIS: SNOWFLAKE FOUNDATION BENCHMARK. This serves as a catalog of recommendations and best practices, ensuring a consistent and robust security policy. Coupled with the upcoming TRUST CENTER (Private Preview coming soon), and a new section in the graphical interface, Snowflake is bolstering its security and compliance features.

Privacy also takes center stage with the announcement (still in development) of DIFFERENTIAL PRIVACY POLICIES. These policies aim to add layers of “noise” to data, making it less identifiable as granularity increases, contributing to improved data privacy.

In terms of Interoperability, features already mentioned in other sections of the Keynotes focus on the platform’s compatibility with new cataloging standards such as external catalogs, Iceberg Tables catalogs, and access to Iceberg tables from Rest APIs for Snowpark.

The final pillar of Horizon, Access, was addressed through features like AUTO-CLASSIFICATION (still in development) and CUSTOM CLASSIFIERS (Private Preview), both leveraging the capabilities of LLM and AI to assign custom classifications to objects automatically.

SNOWFLAKE COPILOT (in Private Preview) caught my attention for its potential to bring natural language interaction features, akin to ChatGPT, through LLM. This promises to facilitate building SQL statements from natural language requests, ushering in a new era of interaction within Snowflake’s graphical environment.

Cost Management: Enhancing Financial Control

The introduction of the COST MANAGEMENT INTERFACE empowers administrators with tools for enhanced financial control. Notable features include COST INSIGHTS, offering real-world examples for optimization, and the BUDGETING view (in Public Preview on AWS), providing a comprehensive overview of budgeted and actual spending.

Visibility, through charts and dashboards showing workload metrics per warehouse, is useful for evaluating scaling or considering increasing capacity through multi-clustering. Cost-per-query charts to identify queries that may need reengineering and more.

Putting efforts into Optimization, they focus on optimization through a COST INSIGHTS section, providing real use cases in our account where a situation that could be improved has been detected. It offers an explanation of the best applicable practice for that specific case and a guide on how to apply it. It is extremely interesting and a lifesaver for many account administrators.

As a final point in this section, they revisited the BUDGETING view (in Public Preview on AWS), where users can preview and compare budgeted and actual spending over time. This budget tracking can be automated with email or message notifications when certain non-compliance thresholds are exceeded. Budgets themselves can be configured individually for resources (by Database, Schema, Table, or Warehouse).

From this point on, we delve into the trending topic of Artificial Intelligence, Large Language Models, and Machine Learning. We all know a trend is here to stay and has long become the new paradigm shift that will impact all aspects of our lives.

Snowflake wasn’t going to be left behind, reacting months ago with the acquisition of Neeva and swiftly incorporating the expertise of its professionals into the core of the Data Cloud.

Machine Learning Integration: A Paradigm Shift

Snowflake’s foray into Machine Learning (ML) was marked by the imminent entry into the General Availability of the ML MODELING API. Key features such as FEATURE ENGINEERING, TRAINING, SNOWFLAKE MODEL REGISTRY, and SNOWFLAKE FEATURE STORE (in Private Preview) promise to redefine ML capabilities within the Snowflake platform.

To aid the transition for Data Analysts and Data Scientists accustomed to conducting their samplings and training from a Notebook, the folks at Snowflake will introduce their own interface, SNOWFLAKE NOTEBOOKS (In Private Preview). Developed in Streamlit, it will provide the characteristic development in cells that can contain Python code, Streamlit, SQL, and Markdown.

For more complex ML applications where this API falls somewhat short in terms of functionalities and power, they propose the use of SNOWPARK CONTAINER SERVICE (Public Preview Soon). In this approach, the complete application developed in Snowpark is published in a container similar to Kubernetes and runs within Snowflake, leveraging its optimized capabilities for flexible computing.

From my point of view, it’s fantastic to see how Snowflake incorporates all these functionalities that traditionally have had a high burden of infrastructure management or a lot of experience in configuring and deploying systems (such as installing compatible Python libraries, deploying containers, etc.) effortlessly as a genuine SaaS platform should.

SNOWFLAKE CORTEX: Elevating AI Accessibility

Snowflake’s total commitment to bringing the adoption of Artificial Intelligence to its customers’ workloads (AI for Everyone) has materialized in the CORTEX engine.

Comprising many features of services managed by Snowflake (Serverless) to provide access to cutting-edge LLM and AI models in the industry easily and quickly, helping democratize their use.

SPECIALIZED FUNCTIONS (In Private Preview): These functions aim to incorporate Translation, Sentiment Analysis, Summarization, and Extraction of Answers capabilities into queries and applications.

GENERAL FUNCTIONS, incorporating existing functions in leading LLM standards, such as LLAMA2, were also introduced. Well-known functions like LLM Inference, Vector as Native Data Type for Llama2, Complete, Txt2SQL EMBED_TEXT, Vector_L2_Distance, etc., are part of this offering.

The demonstration of these features showcased Snowflake’s substantial effort in making CORTEX a differentiating element compared to its competitors.

Scale with Applications: Native Integration Evolution

As a follow-up to the announcements made at the SUMMIT regarding the integration of native applications in Snowflake, they provided a few more updates:

As part of the application development lifecycle, they announced DATABASE CHANGE MANAGEMENT (in Private Preview). This feature will allow the execution of scripts directly against our account from a Git repository (thanks to the integration with GitHub announced at the SUMMIT). It even supports DML statements, and it enables us to manage CI/CD for development and the underlying Snowflake data model with incremental change analysis, etc.

Another feature related to this application integration announced at the event was the NATIVE APP FRAMEWORK (in Private Preview, soon to be in General Availability on AWS). It’s an environment for deploying and consuming native applications within the Data Cloud itself. To accelerate its development, they announced a $100 million investment in startups to assist in the development of these Native Apps.

Conclusion: A Year of Unparalleled Innovation

As we eagerly anticipate further announcements at the year-end partner event, Snowflake has undeniably delivered robust features. These developments solidify Snowflake’s position as a trailblazer in the cloud data platform space. The platform’s unwavering focus on performance, governance, AI integration, and financial control positions Snowflake as a comprehensive solution for tech experts and organizations navigating the evolving landscape of data management.

Snowflake’s continuous innovation is transforming the data landscape, and we are committed to keeping you informed at every step of this transformative journey.

How to Empty a Datalake?

kavitha — Tue, 02 Jan 2024 13:18:57 +0000

In an era where lakes are drying up and the climate emergency calls for a reevaluation of our consumption habits, we regularly clear our email inboxes but continue to fill servers with personal and professional data, whether temporary or permanent, valuable or transient, warm or cold. With the world’s reliance on resources and energy for these purposes and the importance of digital sobriety, how can we effectively empty a Datalake, clean up, and reduce its financial and environmental impacts?

While half of the world’s lakes and tanks, crucial in carbon storage, are shrinking due to rising temperatures, human activity, and reduced precipitation, 53% of the planet’s largest lakes have seen a significant decline in water levels from 1972 to 2020 [1].

Simultaneously, with the proliferation of connected devices, cloud computing, and more, the volume of digital data created or replicated globally continues to soar at a breathtaking rate, multiplying by thirty between 2010 and 2020 and projected to grow at an annual rate of around +40% until 2025 [2]. “Dark data,” or unused, unknown, and unexploited data generated by users’ daily interactions, accounts for 52% of the world’s stored data [3].

Computer data lakes (Datalakes), which facilitate data governance strategies within companies, keep accumulating data. While they address the need to economically leverage rapidly expanding data volumes, Datalakes are energy-intensive. The primary challenge is the storage of unnecessary or obsolete data. In businesses, the volume of data they manage doubles every two years [4]. On average, between 60% and 73% of all a company’s data is not used for analysis purposes [5]. Business-generated cold data amounts to 1.3 billion gigabytes, equivalent to 1.3 billion high-definition DVDs! [6]

In the face of the environmental crisis, Net Zero commitments are no longer sufficient; we must identify and activate all the levers to reduce our environmental footprint as soon as possible. But in businesses, as well as in personal settings, while we understand why we must do it, we often don’t know how or where to start. The “we keep it because you never know!” attitude not only harms the environment but also has a radically detrimental impact on companies’ finances. For example, the IDC research firm estimates that “dark data” costs global businesses €2 billion each month.

How can we quantify the volume of dormant data in comparison to usable and utilized data? What data unnecessarily clogs up storage space, consumes energy in vain, and drastically increases costs?

One of the first steps is to address data management within IT:

Governance: Who is responsible for the data, who has the right to add or remove data, what is the distribution of responsibilities and associated rights? How can users be engaged at every link in the chain (training and awareness issues)? The percentage of large global companies with a Chief Data Officer (CDO) reached 27% in 2022, up from 21% the previous year. This role is particularly common in Europe, where over 40% of large European companies have appointed a CDO to manage data [7].

Skills: Ingesting less data involves optimizing resources while considering data usage context. Key roles in these efforts include Data Architects, Data Engineers, Data Scientists, and Data Platform Engineers, with a strong emphasis on raising awareness among all stakeholders for a clear understanding of the issues.

Corporate culture: How is the need qualification between business and IT managed? How are projects handled? Is there a culture of economy and sobriety? How do the company’s commitments reach the operational level? Is controlling environmental costs a sufficient lever for cleaning up, or is a purely financial approach preferable, even if it means counting environmental gains as a “bonus”?

Storage methods: They must consider both business use cases (updates, access, etc.) and regulatory (GDPR) and financial (cost limits) considerations. Cold data can be a lever to reduce impacts and costs, enabling longer access to data by the business. Deletion is not a habit; we have a collector’s reflex, like children with a bag of marbles! The cost difference between hot and archival data can vary significantly between storage providers (from 1 to 20).

Technologies: Unequal in efficiency, they can also encourage consumption. Projects are initiated, migrations are made, but do we know how to decommission? What is the impact of a data platform? What are the technical characteristics that enable optimization?

Continuous improvement through monitoring: We can track the absolute value of storage and its growth, raising questions about the decoupling or correlation between data storage growth and added value for the business. Or the ratio between stored and used data.
Technical debt as a strong constraint: Technical debt costs would represent between 10% and 20% of new project expenses[8]. Data accumulates over time, and tidying up means taking care of the “legacy.” It is entirely possible to integrate this constraint into the daily management of a datalake.

Costs: For most companies, this cold data was neglected – its storage cost seemed “reasonable.” But with soaring electricity prices, storage costs are rising, and this factor can no longer be underestimated or ignored. The balance now includes financial costs, environmental costs, and the ever-increasing cost of labor.

Once we have identified all the parameters that should be taken into account for responsible data management, we must add a vision of what data management should evolve into:

Absolute value decline: The growth of digital impacts (carbon emissions from digital technology in France could triple by 2050, source: Arcep 2023) cannot and should not be “infinite.” One of tomorrow’s challenges will be to provide digital services with an environmental impact that grows more slowly than the proposed uses. This may involve technological developments and a selection of use cases based on their potential environmental, social, or societal impacts.

Stable operating model: It is essential to define processes at the datalake’s entry, from data qualification to ingestion in the correct format for use (sorting between cold data that should remain cold, usable cold data, and cold data to be permanently deleted), providing a framework for suppliers and consumers, and establishing a storage technology watch.

Data lifecycle phases:

Ingestion: once data is there, it stays; so, the ingestion mode is a lever to limit growth.
Data cleaning: temporary table cleaning and defining a lifespan for each piece of data after which it is automatically deleted.
Data exposure (also known as datamesh): allows data to be exposed to other teams and fully leveraged with optimized storage in one place.
Data cleaning: How to implement automatic cleaning processes, with suitable monitoring and alert systems for tracking?

CIOs have an increasingly significant role to play in achieving a company’s environmental goals, responding to RSE approaches’ industrialization while reducing the environmental impact of their own assets. Fortunately, many principles and best practices in tech exist to help CIOs reduce their impacts and empower CSOs.

Sources:

[1] According to the latest study published in Science

[2] Statista

[3] According to a study by “Le GreenIT”

[4] Study conducted by the Enterprise Strategy Group (ESG) for MEGA International on “The Strategic Role of Data Governance and Its Evolution,” October 2022

[5] Forrester, 2016

[7] Statista, Mars 2023 + Statista, Février 2023

[8] Source McKinsey, Juillet 2020

Source : https://www.linode.com/content/cloud-block-storage-benchmarks/

What Makes Dataiku a Must–Have Tool for Data Science and AI?

kavitha — Fri, 22 Dec 2023 16:04:06 +0000

Artificial Intelligence (AI) has quickly risen to the forefront of technological innovation as the fundamental driving force behind developments such as machine learning (ML), analytics, generative AI, and intelligent automation. As the use cases for these technologies continue to grow, so too does the number of enterprises that are shifting focus to the implementation of this as a mechanism for operational transformation. And recent findings from McKinsey serve as confirmation of this shift, indicating a twofold increase in AI adoption since 2017.

Now, the once seemingly distant concept of “Everyday AI” is rapidly transitioning into a fully realised reality, as AI becomes more integrated within daily business practices – and technology companies like Dataiku are stepping up to help enterprises leverage the tech to reach new heights.

What is Dataiku?

Dataiku DSS (Data Science Studio) is a collaborative data science software platform with French roots that consolidates ML and analytics to provide customers with a comprehensive platform for developing and deploying AI applications that prioritise data-driven decision-making on a fundamental level. Distinguished by its highly integrated and user-friendly design, the DSS platform is highly accessible to both seasoned and entry-level data scientists. Its ergonomic features enable users to effortlessly create models in just a few clicks, while simultaneously streamlining the entire processing chain. The key capabilities of the platform include data preparation, visualisation, machine learning, DataOps, MLOps, analytic apps, collaboration, governance, explainability, and architecture, while plug-ins enable additional capabilities. Today, Over 500 companies use Dataiku, including many leading global enterprises, including the telecommunications giant Orange, which chose Dataiku as the solution to elevate its data science and ML initiatives.

What are the top 10 use cases of Dataiku?

Model Deployment: Once trained, DSS enables users to seamlessly integrate models into production environments, including integrating them with business applications and systems.
Time Series Analysis: For datasets featuring temporal components, Dataiku provides time series analysis, forecasting, and anomaly detection, which are essential for applications like demand prediction and fraud detection.
Predictive Maintenance: Within industrial contexts, Dataiku can predict machinery and equipment failure, enabling proactive maintenance strategies that reduce downtime and cost overall.
Feature Engineering: Enhance model performance by crafting new features from existing data. Dataiku supports techniques like scaling, encoding categorical variables, and generating derived features.
Customer Segmentation and Personalisation: Tailor marketing efforts and customer experiences by utilising Dataiku to segment customers based on behaviour, demographics, or other variables.
Collaborative Data Science: Daitaiku can further help foster cross-functional teamwork and knowledge sharing by empowering teams to collaborate on data projects, share insights, and collectively tackle analysis and modeling tasks.
Automated Machine Learning (AutoML): Users can automate feature selection, model training, and hyperparameter tuning with Dataiku’s AutoML capabilities, which helps simplify the process of building effective models.
Exploratory Data Analysis (EDA): Dataiku’s interactive visualisation capabilities, enable users to visually explore data characteristics to uncover patterns, understand relationships, and gain insights.
Data Preparation and Cleaning: Dataiku provides tools for data wrangling, enrichment, and feature engineering, making it easy to clean, transform, and prepare data from diverse sources.
Machine Learning Model Development: Dataiku facilitates the creation of machine learning models using various algorithms, offering features for model training, hyperparameter tuning, and evaluation.

It’s important to highlight that Dataiku’s versatility extends beyond these use cases, allowing organisations to tailor the platform to their specific needs. This adaptability renders Dataiku a powerful tool for elevating data-driven decision-making and fostering innovation across diverse industries.

How does Dataiku compare to other competitors in the market?

Dataiku is just one of a myriad of market-leading data science platforms, therefore understanding the key differentiators of each is paramount to choosing the right solution for your business. That said, let’s dive into a brief overview of some of the top data science platforms on the market today:

Dataiku stands out as a cross-platform desktop application, offering a comprehensive suite of tools, including notebooks (akin to Jupyter Notebook), workflow management (akin to Apache Airflow), and automated machine learning. Rather than simply integration, Dataiku aims to provide an all-in-one solution that can replace existing tools.

Alteryx is positioned as an analytics-focused platform. Although Alteryx is comparable to dashboarding solutions like Tableau, it goes a step further by including integrated machine-learning components. It specialises in providing no-code alternatives to traditionally code-dependent tasks in machine learning and advanced analytics.

Databricks is mainly a managed Apache Spark environment, that also includes integrations with tools like MLFlow for seamless workflow orchestration.

Knime is functionally similar to Alteryx, however, it offers an open-source self-hosted option and its paid version is cheaper. Additionally, it features a modular design that integrates machine learning components and analytics, offering flexibility in workflow creation.

Datarobot is centered around automated machine learning. Users upload data in a spreadsheet-like format, and then Datarobot automatically identifies optimal models and parameters to predict specific columns.

Sagemaker is focused on abstracting away the complexities of infrastructure needed to train and serve models. Recently the platform expanded its offering to include Autopilot (similar to Datarobot) and Sagemaker Studio (akin to Dataiku), to provide a more holistic environment for diverse machine-learning tasks.

Is Dataiku the right solution for your business?

Finding the right data solution for your business depends on several factors such as business objectives, team knowledge & expertise, budget, data requirements, and more. One standout feature of Dataiku in particular is its user-friendly design, making it accessible to teams with varying technical backgrounds. While a foundational level of technical knowledge is beneficial, it doesn’t necessitate a team primarily composed of software engineers. This flexibility in team composition is advantageous for businesses aiming to leverage data science capabilities without an exclusive reliance on highly technical roles.

Dataiku’s core strength, however, lies in offering a predefined, all-in-one solution, making it an attractive option for businesses seeking a comprehensive platform that consolidates various data science functionalities. This is especially beneficial for enterprises that may not have the resources or inclination to manage multiple tools for different stages of the data science workflow. With Dataiku, the need for extensive tool integration is minimised, simplifying the overall data processing operation and enabling a more direct throughline to actionable, data-driven insights.

5 Key Takeaways:

AI Driving Technological Frontiers: Artificial Intelligence (AI) stands at the forefront of technological innovations, impacting machine learning, analytics, generative AI, and intelligent automation, with AI adoption doubling since 2017 according to one McKinsey report.

Dataiku’s Holistic Data Processing: Dataiku’s comprehensive suite of tools covers data preparation, visualisation, machine learning, DataOps, MLOps, analytic apps, collaboration, governance, explainability, and architecture, offering a holistic approach to data processing.

Dataiku Use Cases: Dataiku’s top 10 use cases span model deployment, time series analysis, predictive maintenance, feature engineering, customer segmentation, collaborative data science, AutoML, EDA, data preparation, and machine learning model development.

Dataiku’s Predefined All-in-One Strength: Dataiku’s core strength lies in offering a predefined, all-in-one solution, simplifying data processing, and minimising the need for extensive tool integration.

Dataiku’s User-Friendly Versatility: Dataiku’s versatility, coupled with its user-friendly design, makes it an accessible and powerful tool for varied data science teams, irrespective of their technical composition.

How can I learn more?

This article is part of a larger series focusing on the technologies and topics found in the first edition of the TechRadar by Devoteam . To see what our community of tech leaders said about the current position of Dataiku in the market, take a look at the most recent edition of the TechRadar by Devoteam.

Decoding the Power of SS&C Blue Prism in Business Automation

kavitha — Mon, 18 Dec 2023 10:40:42 +0000

What does SS&C Blue Prism do?

Blue Prism is a tool used primarily to automate back office functions that would otherwise have to be completed by a human. Blue Prism can be used to create a ‘digital workforce’ that follows rule-based business processes to interact with systems and applications in exactly the same way as human users would. It is generally used to automate clerical and administrative tasks.

What does this look like?

SS&C Blue Prism can be used for almost any task. Existing customer deployments cover a range of use cases including:

Automating manual customer service actions
Handling General Data Protection Regulation (GDPR) information access requests
Building back office transaction frameworks
Fixing quantity mismatches in planning software like SAP
Data entry, processing and transfer operations

Built on the Microsoft .NET Framework, SS&C Blue Prism automates any application and supports any platform – including legacy mainframes and web-based SaaS services.

SS&C Blue Prism coined the phrase robotic process automation (RPA) to describe this capability. And the system is widely recognised as a pioneer in the field of RPA.

Is the Blue Prism framework free?

No, SS&C Blue Prism is proprietary software. It can be purchased outright or as a Software as a Service (SaaS) subscription according to your company’s needs and preferences.

How difficult is Blue Prism to learn?

SS&C is code-free, allowing non-programmers to begin building and deploying their own Automation – the Blue Prism then performs allocates and executes the ‘digital workers’ required to complete it. According to SS&C, building a new automation is a simple, three-stage process:

Modelling – Building an object by telling Blue Prism about an application you want it to use. This is as simple as showing Blue Prism the application and clicking on the various fields, buttons, etc. The object can then recognise user interface elements such as the ‘Login’ button..
Initiating – This is the second stage of building an object. Use the drag-and-drop editor to build a workflow diagram for each elementary, reusable activity you want to carry out in the application, such as completing the ‘logging in’ action with the ‘Login’ button.
Assembling – Use the same drag-and-drop approach to build the process. Workflow steps can be calculations, decisions, logic activities, or a command to use an action from an object, eg. trigger the ‘Export Website Report’ process using the action ‘Logging in’.

These steps can all be completed without any coding or other technical skills – you simply model the task you want to automate SS&C Blue Prism offers a Center of Excellence and a BP University to deliver training, certifications and exams if required by your organisation.

What are the key features and capabilities of Blue Prism?

Accelerate common tasks by deploying as many ‘robots’ as required. This allows your business to redeploy or reduce your workforce wherever required.
Can be used with any application via its user interface. By mimicking a real user, this approach negates the need for APIs and custom development.
Automate human-like decisions with ML, increasing autonomy and further accelerating operations.
Automated process capture allows you to build processes as you work for rapid prototyping.
Document automation to quickly and reliably extract information from documents for capture and re-use in other systems.
SAP automations to improve data quality and accuracy within your SAP system and to accelerate automations by up to 90%.
Combine RPA and BPM in one tool to simplify business process automation and improve end-to-end operations.
Digital Exchange offers pre-built workflows and automations to enable even faster time to deployment and reduce initial overheads.
Generative AI connector to deliver high quality notes, documents and annotations as part of your processes.

Who uses SS&C Blue Prism?

Notable users of Blue Prism include:

Coca Cola
Boeing
Johnson & Johnson
Du Pont
Ericsson
Telefonica
Pfizer
Sony
Walgreens
National Health Service (NHS)
BNP Paribas
EDF

The toolset is popular across a range of industry verticals including:

Finance and Accounting
Human Resources
Logistics Management
Healthcare
Aerospace
Telecoms

How can I learn more?

This article and infographic are part of a larger series centred around the technologies and themes identified in the 2023 edition of the TechRadar by Devoteam report. To learn more about SS&C Blue Prism and other technologies you need to know about, please download the TechRadar by Devoteam.

Is Golang the Missing Piece for Cloud-Native Success?

kavitha — Mon, 18 Dec 2023 09:01:43 +0000

When it comes to programming in the cloud, there are a myriad of programming languages to choose from – but some stand a cut above the rest. Golang (Go) is one such language. Announced by Google in 2009, Golang’s unique balance between simplicity and performance has contributed to its popularity, making it one of the premiere programming languages in modern software development. In this article, we’ll take a closer look at Golang, highlighting its distinctive features, benefits, drawbacks, and assess its role in the developer toolkit.

Why was Golang created?

The origin of Golang, more commonly referred to as Go, began in 2007 when Google was experiencing a period of rapid expansion. As the company grew, so did the code that was being used to manage its infrastructure, adding a layer of complexity that slowed down operations. Recognising the need for a fresh approach, cloud engineers Robert Griesemer, Rob Pike, and Ken Thompson embarked on crafting a new programming language designed around two key objectives: quick performance and simplicity. Thus, Go was born.In 2012, Go became an open-source project, and version 1.0 was officially released to the public where it rapidly garnered a surprising level of popularity among developers. Today, Go remains one of the leading modern programming languages, currently ranking at #10 in the TIOBE programming community index.

What is Golang used for?

Originally built with a focus on networking and infrastructure programs, Go was intended to serve as a successor to high-performance server-side languages like Java and C++. Today, Go is utilised in an array of domains, spanning cloud-based and server-side applications, automation in DevOps and site reliability, command-line tools, as well as emerging fields such as AI and data science. Surprisingly versatile, Go extends its reach to include microcontroller programming, robotics, and game development.However, it’s worth noting that where Go truly stands out is in the infrastructure domain where popular infrastructure tools such as Kubernetes, Docker, and Prometheus, are all written in Go.

What makes Golang different from other programming languages?

Go is just one in an ever-growing sea of programming languages. Yet despite the flood of options, many would be quick to argue that it’s a cut above other prominent languages like Perl, C++, Python, etc. Here’s why…

4 benefits that give Go its edge:

Simplicity. One of the most attractive attributes of Golang is its simplicity. Where other programming languages deliver complexity and a steep learning curve, Go is comparatively simple and easy to understand – particularly for those who already have basic programming knowledge. It’s a running joke within developer circles that users new to Go can read and digest the entire spec in a single afternoon.

Speed. Go’s inherent simplicity and accessibility mean that once a user grasps its fundamentals, they can very quickly and efficiently apply the language. And because it is so fast, developers will likely want to use it for nearly everything they use command line interpreters for, ultimately replacing their bash scripts, Python sketches, and Java efforts with a quicker and easier solution.

Versatile Performance. Part of the beauty of Go as a modern programming language is that it has been designed to align with the predominant environment developers use – scalable, cloud-based servers optimised for performance. Go’s versatility shines as it compiles seamlessly on nearly any machine, empowering developers designed for automation at a large scale, Go makes it relatively easy to write high-performing applications, and because it is compilable on nearly any machine, it can be used to create anything from robust web applications to efficient tools for data preprocessing and beyond.

Innovation. Every six months a new version of Golang is released with many improvements to the language and standard library. With each release, the ecosystem of auxiliary libraries expands significantly, catering to a broader range of functionalities. This commitment to regular updates not only ensures that Golang remains at the forefront of technological advancements but also empowers developers with a continuously evolving toolkit.

What are the disadvantages of Golang?

Golang is a powerful language that is well-suited for building high-performance, concurrent applications. However, like any language, it has both strengths and weaknesses – here are three disadvantages to consider:

Lack of libraries: Every new update of Go brings with it the expansion of auxiliary libraries, however, even with these dedicated efforts, Go lacks the extensive library support that more established languages like Java or Python provide. This can potentially make development more difficult as developers may have to rely on third-party libraries for certain tasks.

Does not support inheritance: One potential disadvantage of Go is the absence of direct support for object-oriented programming (OOP). Go’s approach to OOP is based on composition, interfaces, and structs that function similarly to classes, but favors composition over inheritance. Additionally, the lack of familiar OOP keywords such as “extends” or “implements” typically used in languages like Java may pose initial challenges for programmers accustomed to OOP conventions.

Small (but growing) community: As a young language, the Go community is still relatively small when compared to the community support garnered by other programming languages. This can make it harder to find support or collaborate on projects. It is worth noting, however, that what the community lacks (for now) in size, it makes up for in dedication and enthusiasm.

Is Golang the right programming language for your next project?

Overall, Golang is one of the best choices for developing cloud-native applications, and is also a strong candidate for serverless approaches, edge computing, and frugal software development, thanks to its low hardware and energy consumption. Furthermore, due to its low resource consumption and fast startup behavior, it is also ideally suited for the development of sustainable applications as well as for serverless microservices.Ultimately, Golang is an excellent choice for businesses with predictable growth and relies on fast response times from servers. So if you’re interested in a high-performing, versatile programming language that offers a vibrant, growing community, and is easy to learn, Go is well worth considering.

5 Key Takeaways:

Golang’s Rise to Prominence: Introduced by Google in 2009, Golang has swiftly become a premier programming language, renowned for its unique blend of simplicity and performance.

Versatile Applications: Initially designed for networking and infrastructure, Golang now finds applications in diverse fields, from cloud-based development to microcontroller programming.

Golang’s Key Advantages: Golang’s simplicity, speed, versatile performance, and commitment to regular updates, make it stand out programming language.

Challenges with Golang: Golang faces challenges such as limited libraries compared to more established languages, the lack of support for generic functions, and a relatively small (albeit growing) community.

Project Suitability: Golang is ideal for cloud-native applications, serverless approaches, and sustainable development, particularly for businesses with predictable growth and a need for fast server response times.

Want to assess Golang’s relevance and potential for your organisation?

Connect with one of our experts today and find out if Golang is the right solution for you.

This article is part of a larger series centred around the technologies and themes found within the 2023 edition of the TechRadar by Devoteam report. To learn more about Golang and other technologies you need to know about, please download the TechRadar by Devoteam.

End-to-end Data Observability with Monte Carlo

kavitha — Wed, 13 Dec 2023 13:35:42 +0000

What Does Data Observability Mean?

Data has become a critical component of digital services, products, and decision-making. Inaccurate or flawed data can lead to wasted resources, loss of revenue, and reduced trust in the organisation. Studies show that companies lose millions of dollars annually because of bad data, resulting in missed opportunities, misallocated budgets, inaccurate reporting, and customer churn.

According to the 2023 State of Data Quality survey, “The problem of erroneous and inaccessible data still exists today. In fact, data downtime nearly doubled year over year”. Over 50% of respondents stated that at least a quarter of their revenue was impacted by data quality issues. The average percentage of revenue affected by these issues increased from 26% in 2022 to 31% in 2023.

Data observability solves this problem by providing an end-to-end solution to ensure data is accurate and reliable. With this approach, teams can detect, resolve and prevent data incidents by having visibility across the data ecosystem. This concept is about getting visibility over your entire data ecosystem so that you can monitor freshness, schema, volume, and quality and understand your environment deeply enough to prevent the same issues from happening over and over again.Similar to its DevOps equivalent, data observability makes use of automated monitoring, alerting, and triaging methods to detect, assess, and resolve issues concerning data quality and discoverability.

An Overview of the Monte Carlo Data Observability Platform

As a leader in data observability, Monte Carlo is dedicated to eliminating data downtime, i.e., periods of time when the data is incomplete, incorrect, or missing, and making sure that your data is reliable at every stage of the data pipeline.

Data observability is based on five pillars:

Freshness: How recent is the data and how often are tables updated?
Volume: Is there missing data or duplicate data? What counts as a large change to table size?
Schema: Has the organization of data changed? If so, who made the changes to the schema and when?
Quality: Does your data fall within the expected range?
Lineage: If data breaks, which downstream assets were impacted and which upstream sources are contributing to the issue?

Monte Carlo uses a data collector deployed in its own secure environment to connect to data warehouses, data lakes and BI tools. It does not store or process the actual data. Only metadata, logs and statistics are extracted. Monte Carlo then learns about the data environment and the historical patterns and automatically monitors for abnormal behaviour, raising alerts when anomalies occur or when pipelines break.

By using query and BI tool reads to understand the importance of tables in data warehouses or lakehouses, Monte Carlo recognises Key Assets and assigns an Importance Score between 0-1 to every table, with 1 signifying higher importance.

It detects freshness, volume, and schema incidents out of the box, providing end-to-end coverage across the entire data stack. Notable features include ML-enabled data anomaly detection, data lineage for getting to the root of the problem, data quality insights, as well as integrations and interoperability with other data tools (databases, catalogues, BI tools, warehouse and lakehouse platforms).

In case of data incidents, a clear and concise context is presented in a structured format, including the table(s) involved, the incident owner, severity level, notification channels, as well as the number of affected users, queries and reports. Users can investigate the incident using the “Field Lineage” and “Table Lineage” tools and identify potential sources that live upstream or downstream.

The main benefits of the Monte Carlo Data Observability platform

The Monte Carlo Data Observability platform provides an extensive approach for achieving better data quality management at scale, surpassing the capabilities of traditional testing and monitoring solutions.

The main benefits include:

A centralised view of the data ecosystem, including schema, lineage, freshness, volume, users, queries, etc., to achieve a better understanding of data health over time;
Automatic monitoring, alerting, and root cause analysis for data incidents, without requiring significant configuration or threshold-setting;
Lineage tracking;
Custom and machine learning-generated rules;
Enhanced data reliability insights.

What can Monte Carlo be used for?

Several typical use cases stand to gain from the implementations of Monte Carlo, such as:

Data quality monitoring and testing

Automate data quality tests with out-of-the-box monitoring, powered by machine learning recommendations, to enable centralised monitoring of all production tables and customised monitoring for critical assets.

Data mesh and self-serve

Ensure reliable self-serve analytics with data quality and integrity, supporting data mesh and ownership of trustworthy data products. Create flexible domains to map data sources and consumers with efficient detection and resolution of data incidents and anomalies.

Report and dashboard integrity

Monte Carlo automatically detects impacted data consumers and affected BI reports and dashboards during an incident. Teams are provided with table and field-level lineage to understand relationships between upstream tables and downstream reports.

Customer-facing data products

Monte Carlo’s integration in workflows results in shorter time-to-detect and time-to-resolution for data incidents, vital for creating data products kept reliable and trustworthy by maintaining end-to-end data observability.

In conclusion

Unprecedented volumes and sources of data are being used to drive everyday business decisions, which means that data downtime due to broken dashboards, ineffective ML, or inaccurate analytics can translate into millions of dollars of lost revenue for large companies. So it has become imperative for data to be accurate, current, reliable, accessible, and easily monitored. Monte Carlo offers end-to-end data observability delivered in a user-friendly product.

Want to assess Monte Carlo’s relevance and potential for your organisation?

Connect with one of our experts today and find out if Monte Carlo is the right solution for you.

This article is part of a larger series centred around the technologies and themes found within the 2023 edition of the TechRadar by Devoteam report. To learn more about Monte Carlo and other technologies you need to know about, please download the TechRadar by Devoteam.

Optimising End-to-End Data Management with Databricks

kavitha — Tue, 05 Dec 2023 12:35:06 +0000

The history of Databricks

Databricks was founded in 2013 by the minds behind Apache Spark, Delta Lake, and MLflow. The platform was first made available for public use in 2015 as the world’s first lakehouse platform in the cloud.Databricks now combines the best of data warehouses and data lakes to offer an open and unified platform for data and Artificial Intelligence (AI), with a single security and governance model.

A lakehouse is an open architecture that combines the features and best practices of data warehouses and data lakes. It allows for transaction support, schema enforcement and governance, BI support, openness, decoupling of storage and compute, support for diverse data types and workloads, and end-to-end streaming. It provides a single system that supports enterprise-grade security and access control, data governance, data discovery tools, privacy regulations, retention, and lineage. A lakehouse is enabled by implementing data warehouse-like structures and data management features on top of low-cost cloud storage in open formats and provides an API for accessing the data directly.

Databricks is built to handle and manage all types of data and is cloud agnostic, which means it can govern data storage wherever it is located. This platform is intended to support a range of data and artificial intelligence workloads, allowing team members to access the necessary data and co-create to drive innovation.

The features mentioned above are present in the architecture of the Databricks Lakehouse platform, including the use of the Delta Lake foundation to ensure reliability and performance, fine-grained governance for data and AI achieved through the Unity Catalog, and support for persona-based use cases.

An Overview of the Databricks Lakehouse Architecture

Databricks has two main planes: control and compute.

The control plane contains the backend services that Databricks manages. Notebook commands are other workspace configurations are stored and encrypted at rest in this plane.

Data processing is handled by the compute plane. For most computing tasks, Databricks uses the classic compute plane comprising resources in the user’s AWS, Azure or Google Cloud Platform account. As for serverless SQL warehouses or Model Serving, Databricks uses serverless compute resources running in a compute plane in a user’s Databricks account.

The E2 architecture, released in 2020, offers features such as multi-workspace accounts via the Account API, customer-managed VPCs, secure cluster connectivity using private IP addresses, and customer-managed keys for encrypting notebook and secret data. Token management, IP access lists, cluster policies, and IAM credential passthrough are also included in the E2 architecture, making Databricks easier to manage and more secure in its operations.

Databricks supports Python, SQL, R, and Scala to perform data science, data engineering, data analysis, and data visualisation in various formats (bar chart, line chart, area chart, pie chart, histogram, heatmap, scatter chart, bubble chart, box chart, combo chart, cohort analysis, counter display, funnel visualisation, choropleth map visualisation, marker map visualisation, pivot table visualisation, sankey, sunburst sequence, table, word cloud).

In 2023, Databricks introduced an updated user interface that enhances the overall navigation experience and reduces the number of clicks it takes to complete a task. The improved UI features impact the home page, sidebar, and search functionality, including a streamlined tile system, unifying previously separate pages for data science and engineering, SQL, and machine learning. The sidebar has been revamped, giving direct access to universal resources like workspaces and compute resources. At the same time, the global search function now looks for all available assets, including notebooks, tables, dashboards, and more. Overall, these improvements ensure simpler navigation and discoverability of features for Databricks users.

The main benefits of the Databricks Lakehouse platform

Databricks has a unified approach that eliminates the challenges caused by previous data environments, such as data silos, complicated structures, and fractional governance and security structures.

As a lakehouse platform, Databricks includes all these key features and benefits: support for ACID (Atomicity, Consistency, Isolation, and Durability) transactions, schema enforcement and governance, openness, BI support, storage and compute decoupled, support for structured and unstructured data types, end to end streaming, and support for diverse workloads, including data science, ML, SQL and analytics.

The platform is:

Simple

Databricks unifies data warehousing and AI use cases on a single platform. It also employs natural language to offer a simplified user experience. The Data Intelligence Engine allows natural language usage to discover and explore new data.

Open

Based on open source and open standards, Databricks provides users with complete control of their data and avoids the use of proprietary formats and closed ecosystems.

Collaborative

Delta Sharing enables secure sharing of live data from your lakehouse to any computing platform without requiring complex ETL or data replication.

Multi-cloud

Databricks Lakehouse runs on every major public cloud—Microsoft Azure, AWS and Google Cloud— tightly integrated with the security, compute, storage, analytics and AI services natively offered by the cloud providers.

What can Databricks be used for?

Databricks offers a suite of tools that enable users to aggregate their data sources on one platform and process, store, share, analyze, model, and monetize the datasets across a broad range of applications, from business intelligence (BI) to generative AI.

Use cases include:

Building an enterprise data lakehouse that combines the strengths of enterprise data warehouses and data lakes, providing a single source of truth for data;
ETL and data engineering, combining Apache Spark with Delta Lake and custom tools for data ingestion;
Machine Learning capabilities, extended by tools such as MLflow and Databricks Runtime for Machine Learning;
Large language models and generative AI where Databricks Runtime for Machine Learning includes libraries like Hugging Face Transformers allowing users to integrate existing pre-trained models;
Data warehousing, analytics, and BI, based on user-friendly UIs, scalable compute clusters, SQL query editor, and notebooks that support Python, R, and Scala;
Data Governance and secure data sharing through the Unity Catalog, allowing controlled access to data;
DevOps, CI/CD, and task orchestration, reducing duplicate efforts and out-of-sync reporting while providing common tools to manage versioning, automation, scheduling, and deployment for monitoring, orchestration, and operations;
Real-time and streaming analytics, using Apache Spark Structured Streaming.

AI and ML with Databricks

Databricks facilitates the full machine learning (ML) lifecycle on its platform with end-to-end governance throughout the ML pipeline. Several built-in tools support ML workflows:

Unity Catalog – data catalog and governance tool;
Lakehouse Monitoring – for tracking model prediction quality and drift;
Feature Engineering and Serving – for finding and sharing features;
Databricks AutoML – for automated model training;
MLflow – for model development tracking;
Databricks Model Serving – for low-latency high-availability model serving;
Databricks Workflows – for automated workflows and production-ready ETL pipelines;
Databricks Repos – for code management with Git integration.

Databricks Runtime for Machine Learning includes tools such as Hugging Face Transformers and LangChain libraries which enable integration of pre-trained models or other open-source libraries into your workflow. The MLflow integration helps in using the MLflow tracking service with transformer pipelines, models, and processing components.

Databricks also provides AI functions that data analysts can use to access LLM models, including OpenAI, directly within their data pipelines and workflows.

Databricks in Action at Devoteam

As a Databricks Consulting Partner, Devoteam can help organisations build, deploy or migrate to the Databricks Lakehouse Platform. With our team’s specialised experience and industry knowledge, we can assist in implementing complex data engineering, collaborative data science, full lifecycle ML and business analytics initiatives.

Want to know how we used Databricks to implement a new data platform that empowers Omgevingsdienst, an environmental service in the Netherlands, to gain more control over their data? Check out our success story here.

In conclusion

Over 9,000 organisations across the globe now consider Databricks their data intelligence platform of choice. With its ability to support enormous-scale data engineering, collaboration in data science, comprehensive machine learning, and business analytics, Databricks is driving the mission to democratize data and AI and assist data teams in resolving challenging issues.

Want to assess Databricks’s relevance and potential for your organisation?

Connect with one of our experts today and find out if Databricks is the right solution for you.

This article and infographic are part of a larger series centred around the technologies and themes found within the 2023 edition of the TechRadar by Devoteam report. To learn more about Databricks and other technologies you need to know about, please download the TechRadar by Devoteam .

Snowflake Data Cloud: Unleash Your Data’s Potential

kavitha — Tue, 28 Nov 2023 08:59:58 +0000

What is Snowflake?

Snowflake is a Data Cloud platform designed to unify, integrate, analyse, and share data at scale and speed. It can process structured, semi-structured, and even unstructured data from various sources, such as databases, files, or streams of data. The Data Cloud is available on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform, in more than 20 regions across the globe.

Since its launch, Snowflake has been dedicated to breaking down data silos by creating a unified and secure place for all data, with a single governance model for all use cases on the platform. Originally focused on analytics, Snowflake has expanded to include data engineering, data science, data-intensive applications, machine learning and artificial intelligence use cases. Snowflake’s goal is to collaborate on all aspects of data, leveraging its incredibly effective compute engine to provide rich and informative experiences to users.

Who is Snowflake for?

Snowflake enables organisations, in particular those situated in an international, complex data environment, or conversely, those that lack IT and DBA resources internally to exploit their data easily and quickly.

As a fully self-managed cloud service, there’s no need to select, install, configure, or manage any hardware, virtual or physical. You can expect virtually no software installation, configuration, or management. Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake. Additionally, Snowflake runs entirely on cloud infrastructure, with all service components (excluding optional command line clients, drivers, and connectors) running in public cloud infrastructures.

What makes Snowflake architecture unique?

Snowflake has a central data repository for persisted data that can be accessed by all compute nodes in the platform. Queries are processed using massively parallel processing (MPP) compute clusters, with each node in the cluster storing a portion of the entire data set locally. This approach provides easy data management, while also delivering exceptional performance and scale-out advantages.

Snowflake’s unique architecture has three layers:

Database Storage

Snowflake reorganises, optimises and compresses the data loaded into the platform and stores it in a columnar format. Customers can only access the stored data through SQL queries using Snowflake since the data objects are not directly visible or accessible.

Query Processing

Snowflake processes queries using virtual warehouses, which are MPP compute clusters consisting of multiple nodes allocated from cloud providers. Virtual warehouses are independent and don’t share compute resources, allowing one to operate without affecting the performance of the others.

Cloud Services

This layer is a collection of services that organise activities across Snowflake, including authentication, access control, query parsing and optimisation, metadata management and infrastructure management.

Unlike traditional data warehouses that rely on static partitioning of large tables, Snowflakes uses a unique micro partition format that delivers performance and scale without known limitations, such as data skew and maintenance overhead. This means that the platform allows concurrent reading and writing of data without locking/blocking, and rapid undos of deletes/inserts/edits by modifying pointers to data blocks. The experience of managing data in Snowflake feels a lot more intuitive, and a lot less risky due to dropping/restoring functionality and possible time travel functionality on all data objects on the platform.

The Time Travel feature is based on Snowflake’s data versioning system, which allows users to query snapshots of data as it existed at a specific point in time, or as it existed up to a certain number of days ago. This feature can be useful for auditing, investigating, and correcting data issues – and the related cloning feature can be used to virtually eliminate the need for pre-production environments for testing data products.

You can share live, ready-to-query data across Snowflake accounts without any data movement, as long as the cloud region is the same. In addition to sharing data, you can also share business logic and services, ensuring your ecosystem has the data and tools it needs. If data sets and apps are put on the Snowflake marketplace or a private exchange, it can even be distributed across clouds and regions, after native and secure replication of the content is in place.

Snowflake offers fine-grained governance and access control to ensure security and compliance with industry and region-specific data regulatory requirements.

How does Snowflake tap into machine learning (ML) and artificial intelligence (AI)?

By consolidating all your data in one location, an ML or AI process becomes significantly easier. Initiating ML or AI algorithms natively on Snowflake, close to where data resides, is enabling businesses to bypass the need for complex data pipelines and additional governance processes, thereby accelerating data operations and time to market for enterprise data products. Through centralised and streamlined operations, organisations can natively bring AI applications to life using Snowflake.

Snowflake has additionally built Snowpark, a developer environment for apps and ML models based on the Anaconda distribution. It covers the languages of Java, Python, and Scala, to implement models and applications easily. Snowpark ML Modeling, currently available as an open preview feature, has Python APIs for preprocessing data and training models.

What are the main benefits of Snowflake?

Break down barriers of siloed data in your organisation

For decades, siloed data both on-premise and in the cloud have limited insights, creating significant business challenges. Snowflake delivers unlimited scalability and concurrency, unique data-sharing capabilities, and a Data Marketplace for sharing data across departments, subsidiaries, geographies, and with your business partners.

Reduce query times from hours to seconds

Materialized views (available in the Enterprise Edition) are pre-computed datasets that can provide faster querying than querying against base tables. Materialized views excel when costly operations like aggregation, projection, and selection are frequently run on large datasets.

Built-in governance

Snowflake Horizon is a unified set of capabilities for compliance, security, interoperability, data access, and privacy within Snowflake’s Data Cloud. It includes features to safeguard your data, maintain business continuity, monitor data quality, and track data lineage.

Get value from data with AI

Snowflake’s recently launched features make it possible for any user to include LLMs in analytical processes. Developers can build GenAI-based applications and execute powerful workflows like fine-tuning foundation models on enterprise data.

In conclusion

By combining data warehouse, data lake, data engineering, data science, business apps and data sharing services, Snowflake spans the entire space between data sources and end users. Natively multi-cloud, multi-geography, and disarmingly easy, the Snowflake platform breaks down both the organisational and technical barriers of traditional data infrastructures to create a one-stop shop for data and democratises its use in the enterprise. Along with the unique architecture, the newly announced AI features enable analysts, data scientists, developers, and business leaders to unlock the full potential of data.

Want to assess Snowflake’s relevance and potential for your organisation?

Connect with one of our experts today and find out if Snowflake is the right solution for you.

This article and infographic are part of a larger series centred around the technologies and themes found within the 2023 edition of the TechRadar by Devoteam report. To learn more about Snowflake and other technologies you need to know about, please download the TechRadar by Devoteam .

UiPath: A Game-Changer for Operational Productivity

kavitha — Thu, 23 Nov 2023 13:16:48 +0000

In the age of instant digital connectivity, rapid innovation, and ever-increasing demand, business success isn’t relegated to simply the quality of goods or services that are provided, but the overall efficiency of the entire enterprise. The emergence of Robotic Process Automation (RPA) began a revolution in operational efficiency, playing a pivotal role in improving process quality, speed, and productivity. Initially adopted exclusively by large enterprises due to high licensing costs, RPA has become an indispensable tool as the technology has become more accessible. Now, automation is a universal necessity, redefining tasks across industries and enabling businesses of all sizes to thrive.

With that in mind, it comes as no surprise that today’s business leaders are embracing automation not just for process optimisation but as a strategic asset. A recent PWC survey reveals that as many as 53% of CFOs say they plan to accelerate digital transformation using data analytics, AI, automation, and cloud solutions.

UiPath, a dominant player in the global RPA market which is projected to reach $25.56 billion by 2027, leads this transformative wave. In this article, we’ll explore some of the features and benefits that make UiPath a strategic business automation tool of choice.

What is UiPath?

Headquartered in Romania, UiPath is a global software company that provides a robust, secure, low-code RPA platform for end-to-end automation. The platform enables users with limited technical expertise to create software robots or ‘bots’ capable of seamless interaction with their existing IT infrastructure, encompassing legacy applications, databases, and cloud-based services.

Some of UiPath’s core strengths include its intuitive workflow dashboard, recording functionalities, and visual programming tools. Integration is further streamlined through access to third-party APIs, which enables users to weave a network of automated workflows spanning multiple systems effortlessly. The platform’s user-friendly visual editor empowers individuals without coding experience to construct intricate automated workflows, using a simplified drag-and-drop approach to create unique automations with ease.

Beyond basic automation, UiPath extends its capabilities to include machine learning and artificial intelligence tools, empowering users to create sophisticated automation by incorporating various models for natural language processing (NLP), intelligent document processing (IDP), and more.

What makes UiPath a Leading RPA Tool?

In light of the growing trend towards automation, it’s no surprise that the RPA sector is among the most robust and competitive markets. In fact, market growth is projected to continue multiplying for the foreseeable future, expanding at an astounding CAGR (compound annual growth rate) of 38.2% to approach $31 billion by 2030.

Within this explosive market, UiPath has undeniably established itself as a premier RPA tool, starting with its historic U.S. software IPO in 2021 (with UiPath stock reaching 68 USD by the close of its first trading day) as well as being named Leader in the 2022 Gartner Magic Quadrant for Robotic Process Automation. But what is it exactly that sets UiPath apart from its competitors?

Here are 3 key differentiators that set UiPath apart:

Ease-of-Use.The simplicity of the UiPath interface is among the most significant characteristics that define the platform. Using its intuitive drag-and-drop interface, users with little to no technical levels are capable of creating sophisticated animation. At the same time, the platform enables developers to employ code to further customise and refine automations for specific use cases.

Continuous Innovation. UiPath presently offers the most comprehensive RPA platform on the market, catering to the entire automation lifecycle, from discovery to measurement. Recent UiPath enhancements include cloud RPA (SaaS) robots, API automation, point-and-click configuration of Machine Learning models, DevOps lifecycle control, and integration of UiPath Apps with RPA. Additionally, on UiPath’s near-term roadmap is the upcoming Studio Web, a web-based design environment that enables users to create automations in the cloud.

Customer Satisfaction and Community Support. UiPath excels in customer satisfaction, earning high ratings on platforms like G2 and Gartner’s Peer Insights. With a commitment to customer success through learning resources, online training, and an active developer community, UiPath stands out in providing continuous support, frequent updates, and crowdsourced knowledge sharing.

The Top 6 Key Advantages UiPath Offers

As a leader in business process automation, UiPath is widely considered to be the most robust RPA tool on the market today – and for good reason. Let’s take a closer look at the top 6 advantages that UiPath has to offer.

Interaction and Collaboration: UiPath breaks away from traditional automation limitations by enabling interaction and collaboration between digital and human workforces. This inclusive approach caters to business processes requiring human intervention to implement approvals or inputs, expanding the scope of automations in a cost-effective manner.
Interconnectivity: UiPath’s platform excels in seamless interconnectivity with major enterprise products and applications. Featuring open APIs, integration with third-party analytics, and the ability to invoke code, UiPath enhances business process management by fostering compatibility with diverse systems.
Machine Learning and Predictive Analytics: UiPath stands out with its Machine Learning (ML) models that are cost-effective, customisable, and adaptable to organisational needs. Offering out-of-the-box, drag-and-drop ML models, UiPath simplifies the implementation and maintenance of predictive analytics, providing a highly scalable solution for a wide range of businesses.
Task Capture for Process Discovery: UiPath’s Task Capture tool transforms process documentation, automating the tedious and error-prone manual effort. Overall, task capture accelerates automation development and process improvements, resulting in fewer errors, lower costs, and reduced time spent on document-related tasks compared to manual processing.
Process Document Understanding: UiPath Document Understanding merges RPA and AI to streamline end-to-end document processing. Capable of extracting and interpreting data from various documents, this tool handles diverse formats and recognises objects like tables, handwriting, and signatures, offering a comprehensive solution for document-related tasks.
Test Automation with Test Suite: Generally, traditional manual testing is a traditionally tedious, time-consuming task, often requiring numerous tools and steps to complete. By contrast, UiPath’s Test Suite solves the inefficiencies of manual testing, providing an integrated bundle of continuous delivery automation testing tools. By combining RPA technology with best-in-class testing practices, Test Suite enhances speed, coverage, and effectiveness in ensuring the quality of RPA and application deployments.

What Businesses can Benefit from UiPath?

The pace at which the business landscape moves continues to accelerate year after year, meaning that increasing efficiency and productivity while reducing time to market and overall costs is crucial. Within this context, it’s clear that automation – and tools that enable its implementation – are not merely elective technology; but strategic necessities for success.

As a market-leading RPA tool, UiPath is a sound solution for businesses across industries aiming to leverage automation to streamline their operations and enhance overall efficiency. In particular, its intuitive design, robust capabilities make it accessible to individuals with varying skill levels, while and its overall affordability and flexible licensing model mean that businesses of all sizes are able to reap the benefits including efficiency gains, improving employee satisfaction, reducing churn, and enhancing customer service quality.

It’s worth noting that organisations adopting UiPath often follow a hybrid approach, with business users tackling smaller tasks and the technical team handling more complex processes. This strategic balance ensures efficient automation deployment across different levels, preventing overwhelm and maximising returns on investment.

[5 Key Take Away]

Automation’s Strategic Surge: Business leaders are increasingly focused on automation as a strategic asset, with 53% of CFOs saying they plan to accelerate digital transformation using data analytics, AI, automation, and cloud solutions.

Key UiPath Differentiators: UiPath’s success is driven by a user-friendly interface, continuous innovation, and a strong focus on customer satisfaction, distinguishing it with accessible, efficient automation that evolves with market demands.

UiPath’s Comprehensive Advantages: UiPath’s robust platform offers six key advantages, including fostering interaction, seamless interconnectivity, and innovative features like machine learning models, task capture, process document understanding, and an efficient test suite.

Hybrid Automation Approach: UiPath’s adoption involves a hybrid approach, with business users handling smaller tasks and the technical team managing complex processes, ensuring effective automation deployment across various levels.

Broad Applicability of UiPath: Among the most versatile RPA solutions on the market, UiPath’s intuitive design, robust capabilities, affordability, and flexible licensing make it a strategic necessity for diverse businesses seeking efficiency gains and operational streamlining.

This article and infographic are part of a larger series centred around the technologies and themes found within the 2023 edition of the TechRadar by Devoteam report. To learn more about UiPath and other technologies you need to know about, please download the TechRadar by Devoteam .

Expert Views & Insights on Digital Transformation | Devoteam

eBPF Unveiled: Transforming Kernel Operations for Modern Computing

What is eBPF?

What does eBPF stand for?

What is eBPF used for (use cases)?

What are the features of eBPF?

How does eBPF work?

How are eBPF programs written?

Which Companies Use eBPF?

Is eBPF Right for Your Business?

SNOWDAY 2023: Your Guide to Snowflake’s Latest Innovations

Data Foundation: Elevating the Core of the Platform

Unistore, for mixed OLTP and OLAP Workloads

More Data Lake versatility with, managed or not, iceberg tables

Performance Improvements everywhere for traditional workloads

Robust Snowflake Data Governance: Introducing SNOWFLAKE HORIZON

Many announcements were made regarding the Security of the platform:

Cost Management: Enhancing Financial Control

Machine Learning Integration: A Paradigm Shift

SNOWFLAKE CORTEX: Elevating AI Accessibility

Scale with Applications: Native Integration Evolution

Conclusion: A Year of Unparalleled Innovation

How to Empty a Datalake?

What Makes Dataiku a Must–Have Tool for Data Science and AI?

Decoding the Power of SS&C Blue Prism in Business Automation

What does SS&C Blue Prism do?

What does this look like?

Is the Blue Prism framework free?

How difficult is Blue Prism to learn?

What are the key features and capabilities of Blue Prism?

Is Golang the Missing Piece for Cloud-Native Success?

Why was Golang created?

What is Golang used for?

4 benefits that give Go its edge:

What are the disadvantages of Golang?

Is Golang the right programming language for your next project?

5 Key Takeaways:

Want to assess Golang’s relevance and potential for your organisation?

End-to-end Data Observability with Monte Carlo

What Does Data Observability Mean?

An Overview of the Monte Carlo Data Observability Platform

The main benefits of the Monte Carlo Data Observability platform

What can Monte Carlo be used for?

In conclusion

Want to assess Monte Carlo’s relevance and potential for your organisation?

Optimising End-to-End Data Management with Databricks

The history of Databricks

An Overview of the Databricks Lakehouse Architecture

The main benefits of the Databricks Lakehouse platform

What can Databricks be used for?

AI and ML with Databricks

Databricks in Action at Devoteam

In conclusion

Want to assess Databricks’s relevance and potential for your organisation?

Snowflake Data Cloud: Unleash Your Data’s Potential

What is Snowflake?

Who is Snowflake for?

What makes Snowflake architecture unique?

Sharing data with Snowflake

How does Snowflake tap into machine learning (ML) and artificial intelligence (AI)?

What are the main benefits of Snowflake?

In conclusion

Want to assess Snowflake’s relevance and potential for your organisation?

UiPath: A Game-Changer for Operational Productivity