As point of interest (POI) information grows in importance for businesses – for both outward-facing applications and inward-facing dashboards – a key question many companies have to grapple with is whether to build vs. buy the data. Building a POI database affords greater customization and data control, but comes with many expenses and considerations – both initially and moving forward.
Your organization can mitigate some of these upfront costs by instead buying or licensing a pre-existing database as a starting point. However, you will still need to spend time, money, and engineering and data resources to correct any inaccuracies, duplications, and omissions. And you will still need to maintain the database over time to ensure the data stays fresh and precise.
There’s a happy middle to this dilemma, though: buy or license a database where the provider is committed to doing all the setup and maintenance work for you. That’s what we do here at SafeGraph. To show you why we think we’re the best option, we’re going to explore the pros and cons of each choice in the buy vs. build decision through the following sections:
Before getting into the specifics of each option, we’ll first talk a bit about why build vs. buy is such an important decision to get right in terms of acquiring a POI database.
Whether your organization needs POI data to power a consumer-facing map application, analyze a trade area, create a market forecast model, or fulfill some other use case, one thing is constant. And that is the quality of your outputs is only as good as that of your inputs.
Let’s take the consumer-facing map application as an example. If a consumer goes to find a place using your company’s map, but your POI data hasn’t been updated since last quarter, it’s possible that place may have closed down or been turned into a different place entirely.
The pace at which POIs open, close, move, or change is faster than you might think. That’s why it’s very difficult to maintain a highly-accurate, large-scale POI database. So for companies needing to do so, there are three general options:
The decision to build vs. buy a database of POI information requires your business to consider a number of factors. The following sections will outline many of these aspects for each of the three main choices, so you’ll know what’s in store before you make your final selection.
Choosing to build vs. buy a data pipeline of POI information has a few core benefits. First, your organization can customize the system to fit its exact use cases. Second, your company doesn’t have to pay or rely on an outside company to continually manage and update the database. And finally – on a related note – being able to own the data (or at least get relaxed licensing terms on it) gives your business greater freedom in terms of what it can use the data for.
However, your organization needs to make a lot of upfront commitments and decisions if it opts to build vs. buy POI data. Here are some aspects that will need to be taken into account.
To start, your business needs to decide what kinds of use cases it will likely use POI data for. Based on these, it can determine what capabilities it wants the POI database to have. We mentioned this customization is an advantage of choosing to build vs. buy software, but it’s not without its tradeoffs.
The database will need one or more software engineers dedicated to setting it up and managing it, and they aren’t cheap to hire. You’ll likely need upwards of $150,000 for each. Then there’s the costs of computing to source, process, clean, and maintain the data infrastructure. How much your business needs will depend on the scope of the data you plan to cover (e.g. maybe just for a specific country or a limited number of data attributes). But you should expect costs in the $100,000 to $600,000 range, for starters.
After setting the database up, your company will need to start populating it with data. One inexpensive option is to copy from open datasets, such as OpenStreetMap. But there are issues with relying solely on free POI data; often, it’s incomplete, not updated regularly, difficult to download, or lacking documentation to explain how it works. Another thing to be careful about is reading the licensing terms of any dataset your organization downloads. The provider may not always allow the data to be used commercially, or for specific use cases.
Another method is to build a web crawling tool to scrape publicly available POI information. This is a faster way of doing things, but is still subject to many of the problems with using free data listed above. It’s also an added cost for your business, usually in a starting range of $100,000 to $600,000 depending on the breadth and depth of data you want to collect (e.g. a territory or country vs. a full continent).
Once you have some entry-level POI data, you’ll need to branch out into acquiring specialized datasets for your company’s specific needs. However, this involves much more work than simply fitting datasets together like puzzle pieces. There are all sorts of challenges with processes like matching entries between datasets that refer to the same place (and, by extension, verifying that place actually exists), as well as cleaning the data to eliminate duplicate entries within the same dataset. Varying attributes and data formats – especially for addresses – make these processes all the more difficult. That’s why they often require advanced machine learning techniques to do well.
To illustrate, once a place is verified to exist by checking it across multiple datasets, it has to be geocoded, classified, and assigned attribute data. This can be tricky and time-consuming to do manually (e.g. which attributes should be prioritized? Is a place’s coordinates at an exact address or at the centroid of a building, for instance?). Using a geocoding program or advanced machine learning algorithm can save your organization time here too, but you’ll have to weigh it against how much it will impact your bottom line.
In any event, not cleaning and merging data properly can cost your company far beyond what it paid to purchase or license the data. If using it internally, incorrect data can throw off your organization’s analysis and modeling. And if using it in a consumer-facing capacity, customers may quickly leave your product and not recommend it to others, citing its unreliability.
One other cost to keep in mind when your company is mulling over build vs. buy decision criteria for a POI database is how many resources would otherwise be spent in other areas of your business. If your organization has geospatial data and applications at the core of its operations (or is planning to make this shift), then it might make sense to use the build vs. buy framework. If not, then you need to ask a couple of key questions.
The first one is how long it will take before your business starts seeing a return on the investment of building vs. buying a database. The process takes longer than other options – possibly 6 months, if not more – so your company needs to be able to cover the expenses until the project gets off the ground. Also remember that your company will be responsible for maintaining and updating the database going forward. So be sure to factor that into the time and money spent as well.
The other, related, question is what your organization would otherwise be doing with what it spends on the build vs. buy strategy. That includes getting the database up and running, as well as maintaining it over time. Again, this comes back to what your company’s key value proposition is. If that isn’t geospatial data and applications, then it’s usually a better idea to start with a premade database instead of trying to reinvent the wheel. This frees up your organization to focus its time, money, and engineering resources on doing what it does best.
If your company is considering a buy vs. build approach instead, it will need to find a dataset to download or license as a starting point. This is still a difficult and time-consuming process, not only to find potential providers, but also to evaluate their data quality. What scope of geographies does the data cover? When was the data last updated, and how often is (or was) it reviewed? What contextual information does the data have, and how many records actually have this information filled in? These are all questions your company needs to ask.
A potential option, if your company’s a larger enterprise, is to use a database built by a competitor you acquire or merge with. This, of course, requires many other considerations, as there are a lot of costs, negotiations, and management involved with acquisitions and mergers. Another option is to use a free open-source database, such as OpenStreetMap. Again, though, there are numerous pitfalls with using geospatial data that’s contributed and updated primarily by volunteers, just to save on upfront costs.
In any case, there will still be work to do in terms of setting up the database’s infrastructure. That includes not only processing and storing the data, but also doing any necessary cleanup work. Depending on which provider you license from, the data may be missing entries, or have entries with incorrect attribution or that are duplicates referring to the same place. The longer it has been since the data was last updated (and the less frequently it was updated), the more likely it is your organization will encounter these errors and omissions. Again, not dealing with these issues in a timely manner can cost your business down the road, in terms of inaccurate analysis/modeling or loss of consumer trust.
Your organization will require the help of data scientists and software engineers to do all of this work, both initially and going forward. Like we discussed in the build vs. buy analysis, this represents not only a monetary cost on top of what your organization pays to license the data. It’s an opportunity cost in terms of what your business would otherwise be devoting time and engineering / data science resources to.
If your organization is set on the buy vs. build strategy but doesn’t want to deal with all the extra data scrubbing and maintenance work, there is a third option. It can buy or license a POI database from a third party that has already taken care of all the merging, cleaning, verifying, and updating tasks – namely, us here at SafeGraph.
This option is much less expensive – in terms of time, money, and engineering/data resources – than building a POI database from scratch. Additionally, the effort SafeGraph puts into making our Places dataset as complete, accurate, and fresh as possible ultimately saves your company resources over alternatives that may have lower upfront costs. Because our Places data is so precise and well-documented, your business can use it right out of the box. There’s no need to take up your engineers’ or data scientists’ time verifying whether or not a place exists, plotting exactly where it is on a map, deciding which classifications and attributes apply to it, or erasing/merging any duplicate entries.
All of this doesn’t just free up your organization to invest its time and human resources into the things it really wants to be doing. It also saves your company money because it eliminates the costs of software and talent acquisition associated with setting up and maintaining an in-house POI database. Meanwhile, your business can be confident that its internal analyses and models – and its consumer-facing applications – are powered by accurate data that allows both stakeholders and customers to make well-advised decisions.
As an example, the commercial real estate consultants at Avison Young previously used POI datasets that were often messy and limited to information about big brands. This made it difficult for them to deliver answers to their clients’ site selection questions in a timely manner, as analysts were spending up to 40% of a project just cleaning data and adding in missing information from open sources. Switching to the SafeGraph Places database has given them a much faster way to get a comprehensive overview of the business mixes in trade areas. This has allowed them to get insights to their clients sooner, so these businesses can act upon the advice before they lose the opportunity.
It’s important to consider the cost of building vs. buying a database of POI information, should you decide your organization needs one. And remember this cost is not just monetary; it also includes time, human resources, and opportunity.
That’s why you should do as much research as you can before you make your final decision. For example, check out a sample of SafeGraph’s POI data from our Places dataset. Or if you want to schedule a demo to get an in-depth look at how Places can work for your organization, get in touch with our sales team.