Free Dataset Now Available: Address Data in French Polynesia
Blog home

Scaling Data As a Service (DaaS) with Platform Engineering

January 18, 2026
by
Nan Zhu

AI Summary / Key Takeaways 

  • Platform engineering helps data companies scale DaaS by abstracting infrastructure complexity from product teams.
  • SafeGraph uses platform engineering to enable faster data delivery, higher reliability, and better developer experience.
  • Abstraction layers for data lakes, Spark workloads, and Kubernetes reduce operational friction as data products grow.

Scaling Data as a Service (DaaS) with Platform Engineering

SafeGraph is a geospatial data company that curates high-precision data on millions of places around the globe. Our datasets provide detailed, accurate, and up-to-date information on points of interest and how people interact with those locations. 

To scale SafeGraph’s Data as a Service (DaaS) for a rapidly growing user base, we built a platform engineering team that serves as an enabler for other teams. In this blog post, we share our approach to platform engineering, how it is implemented at SafeGraph, and how it enables product development teams to deliver data products more efficiently. 

Why Platform Engineering Is Essential for Scaling Data Startups 

Why do startups need a platform engineering team? This is not an easy question to answer, especially for companies like SafeGraph. Large organizations such as Google, Facebook, Netflix, and Uber rely heavily on platform teams to build and operate their own infrastructure stacks, as they face challenges related to massive traffic volumes, complex use cases, and scale. 

These challenges may not appear to apply to smaller startups. Cloud vendors and open-source technologies have reduced, if not eliminated, the need to build infrastructure from scratch. However, as highlighted in prior discussions, platform engineering is still essential to integrate vendor solutions into a cohesive, cloud-based infrastructure that supports business needs. 

In addition, SafeGraph faced several challenges that platform engineering helped address: 

  • Product developers were occasionally blocked by errors or issues in tools such as Apache Spark and Kubernetes for data services, where vendors could not always provide adequate support due to limited domain context.
  • Introducing innovative technologies often creates friction by requiring changes to established workflows and working habits.
  • Selecting among multiple solutions for specific use cases involved complex trade-offs, including concerns around vendor lock-in.

The Role of the Platform Team in Solving Key Challenges 

The platform team serves as a domain expert responsible for unblocking product development teams from issues related to development tools and infrastructure. In addition to addressing immediate challenges, the platform team builds long-term infrastructure solutions that support the rapid growth of engineering teams as the business scales. 

Beyond technical enablement, the platform team also plays a key role in fostering a strong engineering culture by reducing operational overhead and minimizing interruptions to product development workflows. 

Platform Engineering at SafeGraph: Scaling Infrastructure to Support Data Growth 

In this section, we explain how SafeGraph fulfills the core missions of platform engineering

Resolving Infrastructure Bottlenecks  

Unblocking other teams from immediate issues related to infrastructure and development tools is one of the most critical responsibilities of platform engineers. Many startups build their infrastructure on top of open-source technologies such as Kubernetes and Apache Spark. While open source offers flexibility, it does not always mean low cost and can introduce significant operational overhead or become a bottleneck in the product development pipeline. As a result, the platform engineering team must act as the domain expert for these open-source systems. 

Success in this role has a twofold impact on the platform team and the broader organization. 

Delivering high-quality products to customers in a timely manner is always a top priority for startups. In-house experts who deeply understand the technologies used in product development are especially valuable in challenging situations. Although platform engineering often involves short-term costs, such as temporary workflow interruptions or resource diversion, solving immediate problems builds trust across engineering teams and offsets these costs by reducing the time and effort required to resolve issues without sufficient domain expertise. 

One example of how the platform team at SafeGraph addressed immediate product development challenges involved improving the performance of our Spark-based data processing stack. A Spark job that processed a small volume of data could run for up to 24 hours, only to fail frequently and stall downstream consumers. 
This work is a practical example of Apache Spark optimization in production, where deep understanding of execution internals directly improved reliability and performance at scale.  

After investigation, the platform team identified two root causes: 

  • The single-threaded task serialization mechanism in Spark’s DAG Scheduler was overwhelmed by a multi-threaded job submission approach, leaving executors idle.
  • The Spark job implementation relied on Scala’s parallel collections, which imposed an expensive hash code calculation in the default fork-join pool. 

By batching job submissions and avoiding the default fork-join pool, the team reduced execution time from 24 hours to approximately 3 hours, meeting reliability expectations and restoring downstream stability. 

We have many other examples of how platform teams can resolve immediate and unexpected issues to unblock product development teams and maintain focus on business priorities. 

Long-term Infrastructure Solution 

Another key mission of the platform engineering team is to build long-term infrastructure solutions that support the company’s growth. For startups, building infrastructure often involves introducing the right technologies for specific purposes, such as managing service and data job configurations or simplifying service deployment. 

Introducing New Technology 

Introducing innovative technology also brings several challenges. New tools often require significant changes to existing workflows that engineers are already accustomed to, creating friction between short-term product delivery and long-term infrastructure benefits. This trade-off is a common challenge for many companies. 

Evaluating whether technology can serve long-term needs is equally difficult. This challenge is evident across areas such as data warehousing, stream processing, and messaging queues, where a wide range of technologies exist. These options differ significantly in design and capabilities and can impose high switching costs when business requirements change. 

To address these challenges, the platform engineering team focuses on building sustainable and efficient infrastructure solutions by: 

  • Minimizing operational overhead and easing the adoption of new technologies.
  • Enabling flexibility across different solutions while keeping switch-over costs low 

Both objectives can be achieved through effective infrastructure abstraction. 

Abstraction for Easier Adoption 

One example of addressing complexity through infrastructure abstraction is SafeGraph’s machine learning (ML) model management, deployment, and versioning system. This system effectively functions as an MLOps abstraction layer, shielding ML engineers from tooling complexity while preserving robust model governance.  

We have built multiple ML models to support the delivery of high-quality data products. As our customer base and user requirements have grown, the number of ML models has increased, creating challenges in managing them effectively. 

MLflow emerged as a promising solution to address these challenges. However, introducing MLflow at SafeGraph came with significant costs. Adoption required adding configuration files for each ML project and modifying existing workflows to include additional manual steps, such as running commands before committing models. 

One example of the low return on investment from this change was the effort required to display Git commit hashes in the MLflow Run UI. To leverage MLflow’s built-in functionality for this purpose, ML engineers were required to complete multiple steps, including: 

  • Adding a project description file to their project directory 
  • Changing their existing workflows, such as local Python runners or Jupyter notebooks, to use the MLflow command line to run projects

These requirements introduced friction, especially given that the underlying goal was simply to surface metadata in the UI. Similar challenges arose when displaying training data versions in the MLflow UI, where engineers had to manually log parameters using the MLflow API, often repeating the same steps across projects. 

Overall, while MLflow is a powerful MLOps tool, it introduced significant distractions for ML engineers, diverting attention from their primary goal of applying state-of-the-art techniques to improve data product quality. 

Rather than adopting MLflow strictly according to official documentation or vendor guidance, we built an internal library that exposes APIs for logging parameters and metrics, uploading models, and integrating with Git. These APIs automate previously manual steps and automatically capture metadata such as artifact versions and data read/write versions. As a result, ML engineers can focus on their core work while benefiting from robust MLOps capabilities without disrupting their existing workflows. 

Abstraction for Flexibility  

Another benefit of abstraction is maintaining SafeGraph’s flexibility in an uncertain and evolving technical landscape. This approach is an example of data lake format abstraction, allowing teams to work with versioned datasets without coupling to a specific storage technology. 

When we began building SafeGraph’s data lake, several technologies in the market, including Delta Lake, Apache Iceberg, and Apache Hudi, could serve as its foundation and provide key capabilities such as data versioning and time travel. We narrowed the decision to Delta Lake and Apache Iceberg, and the choice proved challenging for several reasons. 

Delta Lake is primarily developed by Databricks. Although it is open source, Databricks also offers versions with proprietary features within the Databricks Spark Platform. As a result, adopting Delta Lake can implicitly lock computing and storage layers to the Databricks ecosystem. 

Apache Iceberg, despite its highly active open-source community, was still relatively early at the time, and we encountered several usage constraints and bugs during evaluation. 

Additionally, differences in how Delta Lake and Iceberg implement similar functionality can result in high switching costs if a format change is required in the future. 

To address this dilemma, we built an internal library that provides APIs for common data lake operations, such as reading and writing versioned datasets and viewing dataset history. While these operations are implemented using Delta Lake or Apache Iceberg, product development teams do not need to be aware of the underlying format. This abstraction ensures that switching to alternative formats in the future would require minimal or no code changes. 

Engineering Culture Enabler 

A well-established and healthy engineering culture, which is essential for any strong technical organization, does not come without cost. These costs arise from changes in mindset and behavior and from unavoidable tooling overhead, even when there is alignment on adopting new practices. 

The platform team serves as an enabler of engineering culture by reducing these associated costs. At SafeGraph, we aim to build an engineering culture that values operational excellence in services. Operational excellence is driven by comprehensive monitoring, timely alerting, and other capabilities that help engineers improve service SLAs and debug issues efficiently. These capabilities are best built and maintained as part of platform engineering, rather than being delegated to individual product teams and requiring them to allocate limited resources to develop tools from scratch. 

The platform team is also well positioned to promote the desired engineering culture. Because the “products” delivered by the platform team are used across teams, cultural practices that lead to success become highly visible, and their benefits are easily shared and reinforced organization-wide. For example, the platform team at SafeGraph built solutions to minimize or eliminate manual steps in using Terraform, a tool that is widely known for its steep learning curve. As the user experience improved progressively across teams, the benefits became widely shared and reinforced a culture focused on minimizing unnecessary human intervention in processes. 

By resolving immediate challenges, building cost-effective long-term infrastructure, maintaining a future-oriented approach, and enabling strong engineering culture, platform engineering plays a critical role in optimizing engineering organizations for efficiency and sustainability. 

FAQs 

1. What is platform engineering in data companies? 

Platform engineering is the practice of building internal platforms that abstract infrastructure complexity, allowing data and product teams to focus on delivering reliable data products instead of managing systems. 

2. How does platform engineering support DaaS scalability? 

By standardizing data pipelines, compute workloads, and deployment processes, platform engineering enables data-as-a-service platforms to scale efficiently without increasing operational overhead. 

3. Why is platform engineering important for startups offering data services? 

Startups benefit from platform engineering because it reduces technical debt early, improves system reliability, and supports rapid growth without constant re-architecture. 

4. How does SafeGraph use platform engineering? 

SafeGraph uses platform engineering to manage large-scale location data pipelines, optimize Spark workloads, and ensure consistent, high-quality data delivery to customers. 

5. What technologies are commonly used in platform engineering for data platforms? 

Common technologies include Kubernetes, Apache Spark, data lake abstraction layers, and MLOps tooling to support scalable and resilient data services. 

Ready to build on high-precision data? Get a free sample of SafeGraph’s datasets and see how accurate, regularly refreshed data can support analytics, Modeling, and decision making at scale.

Get a free sample

Browse the latest

Questions? Get in touch with our team of data experts.