Let’s say you’re a mid-sized organization with a desire for more integrated and meaningful reports. You have four different software packages that are each tracking some aspect of the constituents you deal with – for instance, your donors, activists, several different types of clients, and volunteers. Each system is working pretty well by itself, but it’s very hard to look across the organization to get a full picture. How likely is it that a volunteer will donate? What is the full value of services you're providing to each client? What kind of constituent is most likely to be a lifelong donor? They’re interesting questions, but almost impossible to determine at the moment.
This is a situation in which many organizations find themselves. What can you do? You could consider using a single constituent relationship management system (CRM) – one software package that would replace all four of your current systems, to allow you a single, integrated view. This can be a compelling option, especially for smaller organizations. But for larger organizations that are happy with their existing software, and making a lot of use of the specialized feature sets, no single software is likely to address everything needed.
Enter data warehousing: a centralized repository of data that automatically pulls information from each of your systems in order to report on it or analyze it in a central place. Each of your existing software packages continues to work in exactly the same way, but you also have the ability to see all of their data together in one place.
This isn’t a low-cost or a low-impact solution – it’s likely to take someone experienced between 80 to 120 hours, or potentially far more, to plan out, set up, and build the data exchange processes for a warehouse. You’ll need internal IT staff or a strong continuing relationship with a consultant to support it. For larger nonprofits, however, it can be an interesting option to help understand how the organization is really performing (perhaps across similar types of programmatic function), identify opportunities to improve service to constituents, and understand data in a deeper and more meaningful way.
Components of Data Warehousing
A data warehouse is much more of a set of processes than an actual piece of software you can buy off the shelf. What will you need in order to create a data warehouse?
Defining Your Reporting Needs
Let's begin with the end in mind. What reports are you trying to create out of your data warehouse? What information should be included? What analysis is necessary? How flexible do reports need to be over time? Your goals in this area affect everything else. Take the time to carefully analyze your needs, and determine what systems you’ll need to pull data from, what fields to extract, and how you'd ultimately like to see and use the data. For example, if you want to identify your super-users from a specific region who may be ideal to help procure bigger donation sources, you may need to cross-reference multiple databases to see which supporters have given large donations and participated in a set number of action campaigns in the past year. Think of how you plan to use this data to enhance your work and plan backward from there.
Extracting Data from Your Existing Software
Once your needs are nailed down, you'll need to define a process to get data out of each of your existing systems, ideally in some sort of automated way. This would mean, for instance, the ability for a programmer to access the data via code with an API, to access the database directly, or to prompt the system to automatically export data into a file at the same time every week. If your software package doesn’t allow you to access the data, there’s unfortunately little recourse, which means that the ability to access data should be a core consideration when choosing any system. As a last resort, you could plan to manually export the data into a defined file format at the same time each week, but this would introduce a large possibility of human error into the process.
Creating a Data Repository
You’ll need somewhere to put the data that can handle all the different types of information you’ll need to store. It should be a structured database with some flexibility to define the fields that you want to track and the relationships between them. As a minimal solution, you might even be able to setup one of your existing systems to take in information from your other systems – perhaps using custom field functionality. Microsoft Access could also serve as a lightweight data repository.
You could also consider using a flexible constituent management tool with strong reporting for your data repository. SugarCRM or Microsoft Dynamics could be interesting options in this realm. Using one of these tools streamlines the reporting process, as you can use the same software package both to store the data and to create your reports.
However, if one of your key goals is to facilitate complex reporting and data mining, you may find that these lightweight solutions quickly slow down in response time if you have any considerable volume of data (tens of thousands of rows of data or more). More robust databases – like MySQL, Microsoft SQL Server, or Oracle – can provide more powerful tools to let you optimize the way your database works so as to provide reports more quickly. Amazon’s new SimpleDB could also provide an interesting virtual data repository, paid for based on the file space and bandwidth you use – but is likely to require considerable technology expertise to implement.
Matching and Mapping Data
With a definition of where the data is coming from, and where it’s going, you’ll need to work through the most complicated step: matching up the data across different systems, and defining the business rules to dictate how it should go into the repository.
Before you can even think about combining data from multiple systems, it’s critical to ensure that the data itself is defined and stored in standard formats. If a member ID is a six digit alphanumeric code in one system, and an eight digit number in another, no automated process is going to be able to match them. Similarly, if the data is messy and undependable (for instance, if the names captured in your volunteer management systems are often entered quickly and spelled incorrectly) trying to combine it will only lead to trouble. Organizations frequently have to spend considerable amounts of time standardizing and cleaning their data before they’re able to view it in a consolidated way.
The next step is to think through the mapping and matching process. Which field stores the member ID in each system? The email address? If the different systems have different email addresses for the same constituent, which should win? Define a “system of record” for each field, to define which is likely to have the best data. You’ll then need to define a set of business rules to specify how data should be transformed (for example, do you need to convert spelled out states to state abbreviations to make the data match?) and what data overrides what other data. These business rules are often very complex and the most time consuming part of a data warehousing project.
Once your business rules are defined, you’ll need to setup processes to actually transform the data, carry out the rules, and move the data from their starting systems into the repository. These data processing programs can be written by hand in a number of different programming languages, or you might be able to find some software tools that can help. This class of tool is typically called ETL software (ETL stands for Extract, Transform, and Load, but the area is better known by the acronym).
It’s worth looking at the package you’re going to use for your data repository to see what ETL functions it provides. For instance, SQL Server and SugarCRM both include some ETL support. There’s also stand-alone tools which can provide powerful (although also complicated) support – for instance, Talend Jitterbit, and Pentaho are all open-source ETL packages.
Creating Reports and Analyzing Data
Finally, consider the repository and reporting solution that’s most likely to meet your needs. It’s convenient to store the data in a package that has its own reporting tools – like one of your existing systems, Access, SugarCRM, or using Microsoft’s Reporting Services or Analysis Services that come bundled with SQL Server – but you’re likely to run into functional limitations with these solutions if you have more complex needs. A specialized reporting tool, like Crystal Reports or JasperReports, can sit on top of your data repository and provided more sophisticated reporting support, but will be more complicated to implement and for your users to learn how to use. Alternatively, if your reporting needs won’t change much over time, you could create reports as web pages that pull data directly from your repository.
There’s also a whole category of powerful and very expensive reporting and analysis tools primarily used by the corporate world. Tools like Business Objects and Cognos provide very flexible and powerful support with relatively user friendly interfaces, but can cost hundreds of thousands of dollars to license and implement.
Is a Data Warehouse for You?
Data warehousing is certainly not a solution for every organization. It’s not likely to be a great option for tiny organizations or those without any IT support structure. It’s a more advanced technique which will require technical skills and a considerable investment – a minimum of eighty hours or so and potentially ranging up to hundreds or even thousands of hours for very complex systems.
Consider whether you've defined your reporting needs carefully enough to really take advantage of a data warehouse. Are you sure you're doing everything you can with your existing systems? Have you thought about the data dashboards that help executives, board, and staff easily see how the organization is performing? If not, it makes sense to start there.
In addition, carefully think through whether your existing systems are working for you before considering a data warehouse. Building a warehousing and reporting infrastructure on top of your existing systems will increase your dependence on them and make it harder to swap out. In fact, many organizations, especially smaller ones, might well find more value in replacing all of their systems with one larger integrated software package that can handle all of their processes and constituents, rather than going down the data warehousing route.
But for an organization of 75 staff members or more, with an IT support structure, a well thought out reporting strategy, and a number of existing systems that that are working well for the organization, a data warehouse can be an interesting option. It’s a robust and often powerful way to support cross-organizational reporting and data analysis.
For More Information
Salvation Army CIO Uses IT to Support Nonprofit A fairly detailed case study of Salvation Army's data tracking infrastructure.
A Web-Based Data Warehouse A look at the data warehouse developed by the Elizabeth Glaser Pediatric AIDS Foundation.
Many thanks to the experts who contributed their time and expertise:
- Paul Hagen, a consultant with Kairos Strategies who provides business, marketing, sustainability, and technology strategy consulting to help organizations, with an emphasis on Constituent Relationship Management.
- Peter Davis, a consultant with Free Flow Data which helps organizations free up and simplify data.
- Eric Leland, a consultant with Five Paths, provides a wide variety of technology services to small to mid-sized nonprofits, including software selection and data management.
*This article is courtesy of Idealware, which provides candid information to help nonprofits choose effective software. For more articles and reviews, go to www.idealware.org.