Data & AI Summit 2025: Topics for data engineers at a glance

Data & AI Summit 2025: Topics for data engineers at a glance

Why the Data & AI Summit is relevant for data engineers

In June 2025, the Data & AI Summit will once again be the focus of many experts from the fields of data analysis, data engineering and artificial intelligence. The conference brings together the latest developments, tools and tried-and-tested methods for building modern data platforms. Data engineers are given the opportunity to discuss tried-and-tested concepts and the strategic focus on modern data architectures in an international environment. The event is regarded as an important source of inspiration for all those who want to actively support technological change

In the coming year, the discussion will shift to the integration of artificial intelligence in data pipelines, automation and security and governance in the handling of data. Unlike in the past, classic ETL processes and big data solutions will be less in the foreground. Topics such as data lakehouses, the productive use of large language models (LLMs) and automated data quality checks are increasingly shaping the programme. Data engineers will learn on site how these models can be specifically integrated into existing tech stacks

Focal points and trends at the Data & AI Summit 2025

The summit programme reflects key industry trends. The focus will be on combining established data engineering concepts with innovative, AI-supported processes. Companies such as Databricks and Snowflake will be presenting advances in lakehouse architectures that bring together both structured and unstructured data in a standardised environment. In practice, batch and streaming workflows have become standard. Numerous sessions and workshops will focus on the merging of data lakes and warehouses - a boundary that has long since become blurred in the day-to-day work of many projects

Delta Lake and competing technologies such as Apache Iceberg and Hudi will be a particular focus. As part of best practices, data engineers learn how data quality, versioning and transaction security in data pipelines can be permanently guaranteed. A typical practical example: streaming data from IoT applications is stored, processed and made available for machine learning purposes using Delta Lake. Industry-leading companies demonstrate how they monitor billions of events every day in near real time. Production scenarios show how sensor values from production lines are continuously recorded and analysed using AI-based anomaly detection - allowing deviations to be detected at an early stage and processes to be secured

Further developments in the area of feature stores, DataOps platforms and self-service BI are becoming equally important. Data engineers are faced with the task of creating effective interfaces between data sources and AI-based applications. Modular, scalable architectural approaches are creating platforms that support these requirements. In addition, practical examples show how optimised data layouts and advanced caching strategies can help to reduce infrastructure costs

Best practices for efficient data pipelines

The summit will not only focus on technological trends, but above all on concrete solutions for the day-to-day work of data engineers. User-orientated workshops and live demonstrations will make transparent which tried-and-tested methods are currently being implemented by leading companies worldwide. Approaches for automating data workflows with orchestrated pipelines in Apache Airflow, DAGster or Databricks Workflows are particularly in demand. Applications will show how so-called self-healing pipelines react automatically to errors and generate corresponding reports

A proven approach is to establish CI/CD processes for each pipeline that automatically perform data quality and performance checks. The following is an example of how such tests can be carried out in PySpark

from pyspark.sql.functions import col def check_nulls(df, columns): for c in columns: if df.filter(col(c).isNull()).count() > 0: raise ValueError(f'Null values found in column {c}')

Such check mechanisms are integrated directly into the CI/CD pipeline as jobs, for example using GitHub Actions or Jenkins. The Data & AI Summit will highlight applications such as how companies use open source frameworks like Great Expectations to establish automated data contracts. In this way, data drift is quickly recognised and reported to developers, ensuring high-quality data bases

The importance of Infrastructure-as-Code (IaC) continues to grow in this context. Tools such as Terraform or Databricks-CLI make it possible to provide complete data infrastructures reproducibly via scripts and manage them consistently across different environments. Experience reports at the summit will show how companies can significantly accelerate the introduction of complex machine learning pipelines through the targeted use of IaC

Data security, observability and governance: challenges and solutions

With modern, broad-based data platforms, the requirements for security, compliance and process monitoring are increasing. Panels and deep-dive talks on data security in particular are among the most popular format points at the Data & AI Summit. Here, experts discuss how to effectively secure data lakes in a practical way. Techniques such as role-based access control, fine-grained guidelines and data masking are presented. Detailed solution scenarios explain the added value of tools such as Unity Catalog or Open Policy Agent, especially in the context of multi-layered cloud environments

In the area of finance, companies will demonstrate how they secure sensitive customer data via the Unity Catalog in Databricks using SQL-based access policies. Metadata management is audit-proof and tamper-proof, while data access is continuously logged. The connection to SIEM systems ensures comprehensive audit capability without unnecessarily complicating the daily work of data engineers. Live demonstrations of AI-supported detection of access anomalies are particularly impressive

Observability in data pipelines

Specialised tools such as Monte Carlo, Databand or OpenLineage are increasingly being used to monitor data pipelines. Detailed reports provide information on the status of all data flows, problems are recorded and - in the best case - can even be proactively minimised. Participants will learn in practice-orientated sessions how data observability can be implemented as an integral component. For example, it is suggested that business metrics be generated directly from raw data and continuously analysed - a process that many data engineers have now firmly established in their method set

AI integration and practical LLM scenarios for data engineers

With the growing maturity of generative AI technologies, the field of tasks and daily processes in data engineering are also changing. LLMs not only support the automatic extraction and analysis of metadata, but also make it possible to execute complex queries in natural language. In an exemplary application, a data engineer integrates an LLM-based analysis module that enables power users to enter SQL queries as natural language into a data lakehouse system using a chatbot. The results appear immediately as a data frame visualisation - an enormous simplification for users

Another practical example illustrates the fully automated anonymisation of personal data. Using LLM, dummy datasets are analysed, names, addresses and bank data are recognised, classified and anonymised before the next process step

import openai def anonymize(text): response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": f "Anonymise the following data: {text}"}] ) return response.choices[0].message['content']

At the same time, the generation of specific data quality rules is simplified by LLMs: analysts formulate rules such as "Values in the 'Revenue' field must not be negative", the required test code is generated automatically and integrated into Spark or SQL tests. This approach will be presented at the summit as a pioneering blueprint for productive DataOps processes

Recommendations for the visit and outlook for the future of data

Participating data engineers benefit in particular from the practice-oriented structure of the summit: hands-on workshops, deep dives and numerous networking opportunities characterise the event. The exchange with international colleagues provides valuable insights into real production environments and enables participants to learn from tried-and-tested best practices and sources of error. A targeted selection of relevant sessions is recommended - for example on streaming, AI, security and observability - and sufficient space should be planned for informal dialogue

Both beginners and professionals will benefit from the diverse programme: while less experienced participants will get to know proven architecture templates, experienced data engineers will have the opportunity to discuss architectural design patterns, improve existing implementations in a targeted manner and transfer the latest open source tools into concrete prototypes

A look into the future shows: Data engineering as a discipline is constantly evolving. Automation through AI, self-service platforms and robust DataOps structures are increasingly becoming a matter of course. The Data & AI Summit 2025 illustrates the direction in which the industry is moving - and conveys the skills that data engineers can use to shape the design of modern data platforms in the long term