Dieser Blogpost ist auch auf Deutsch verfügbar
Time is a valuable asset, and it is easily consumed by the analysis and preparation of data. In other words, there may be good reason to come to grips with BI software.
This article takes an in-depth look at the topic of “data products” for teams. The article Why Your Team Needs Data Products dealt with data products more generally.
Just what are BI tools anyway?
BI stands for Business Intelligence.
It is quite a big topic, actually. Providers such as Microsoft, Amazon, IBM and others have much to offer in this space. However, I would like to focus on a specific kind of BI tool.
I will be considering software that has the following properties:
- Importing of data from a source
- Transformation of data
- Generation of analysis models (views of the data)
- Visualization of the data
Candidates that meet these requirements:
- Microsoft Power BI
- Google Looker
- Tableau
Tools for data teams and data analysts
But those are all tools for data teams and data analysts. They don’t have anything to do with us as a development team. We only supply the data.
That is a typical reaction. And if it applies in your case, please don’t stop reading just yet.
There are good reasons to engage with BI tools as a development team.
Examples:
- Visualization of the data comes “for free”.
- The option of sharing data also comes “for free”.
- Depending on the context, stakeholders may be able to work with the data themselves.
- Data from various sources can be linked up together.
- Depending on the solution, data can be “quickly” integrated and transformed.
- Stakeholders can work with the data themselves.
Once again, we find that “one size fits all” does not apply.
It goes without saying that solutions can be purpose-built for the analysis of data. Or existing tools and frameworks can be combined. It depends on the context whether that is a reasonable decision. And yet my hypothesis is that the use of BI software often represents the faster route.
Preliminary thoughts
A few preliminaries are important to note here. It is worth evaluating whether BI software could be of use in a given situation.
- Is the data available in a relational format?
- Does a tool already exist? How about the required infrastructure?
- If not, are the ordering processes fast enough?
- Which data sources are to be integrated?
- Where is the data saved and what must be considered here?
- How current is the data?
This list is just a starting point, of course.
Example with Google Looker
Let’s examine a quick example with Google Looker. This example comes from a real project, one in which all reports were implemented by a single person who was not trained as a developer. Based on these experiences, my hypothesis is that developers might need 2–3 days of familiarization time. Then the first reports will start rolling in.
Setup
In our specific setup, Google Big Query as used as the “warehouse”. This means that all data products are stored there. Google Looker is already available as infrastructure, and procedures already exist for setting up new projects.
Of course, the starting situation is not always this good. But it might be more often than you might think.
A little info about Looker
Looker is fully web-based. Looker works with various warehouses (e.g. Google Big Query). It works with relational data. In my estimation, it makes a good “one-stop-shop” solution for mid-sized companies. By this I mean that it may also be possible to carry out certain data transformations in Looker. In larger companies, it can be used for analysis and visualization of data products.
A few positive things from the perspective of a development team:
- Anyone who knows SQL is optimally positioned.
- The application-specific syntax (called LookML) is always used to generate SQL.
- The SQL can be viewed at any time, making debugging easier.
- Support for version management with Git, commits and deployments.
- Definitions are created in text form.
- The web editor validates definitions and also identifies dependencies.
There are also negative aspects, which may be more or less important, depending on the context.
- Native SQL syntax is not sufficiently checked.
- The web editor does not meet the standards of a modern code editor, not to speak of an IDE.
- Error information in the web editor could be better
Structure of a Looker model
This article is not intended as a how-to or tutorial, but a few basics are required in order to see how fast (or slow) the work will go.
We start by defining a model.
Every definition takes place in a file,
and the files can be organized into folders.
In the simplest case, the model only specifies
which definitions are to be taken into account.
This is done with the include
statement.
In the next step, the views
are defined.
Roughly speaking, these are the data tables we can work with.
A view can simply correspond 1:1 to a table in the warehouse.
But it is also possible to write larger SQL queries.
For instance, multiple existing tables could be combined in this way.
Careful consideration is required here.
Is this a job for Looker?
Or is this a job for a level prior to Looker?
Views can be written completely manually. But Looker can also read existing tables. The result is then a proposed set of dimensions and measures. Dimensions are the properties. They can be displayed and filtered. Measures are generally calculations. The average of X. The total of Y.
It is also possible to store a label
and a description
for all data
for greater clarity.
The effort involved in getting this far is relatively minimal.
This small excerpt defines a view
.
It is based on a table located in the dataset datamarts
.
Two dimensions are defined:
- the
code
which defines a product and - the
short_description
of the product.
Now we have to make this data publicly available for analysis.
This is done with explores
.
A simple example:
Now it is already possible to access the data via the web interface.
This has laid all the groundwork for carrying out the following activities:
- Filtering by dimensions
- Selecting the output values
- Visualizing data (diagrams)
- Exporting filtered data (e.g. as a CSV file)
- Creating
custom dimensions
andcustom measures
- Saving filtered
explores
aslooks
- Creating dashboards
From this point, other groups of people are able to work with the data. Even people who aren’t trained as developers.
Of course, it is rarely sufficient to just examine a view
.
That’s why it is possible to extend the explore
via join
.
In our example, we have a hierarchy of products.
So let’s add the next level to our view.
To do this, we add a definition.
We just need to imagine
that other views
have already been defined.
This integrates the view color_products
.
It receives the root_product_code
as a foreign key.
Now both tables are available in our explore product_data_explore
.
More is always possible
A proper model requires more lines, of course.
But the examples show
that the syntax is not complicated.
Plus, SQL can be employed in many places.
If you have to make an adjustment to a dimension, for instance,
this can be easily done with SQL.
For example, CAST
can be used to change a data type.
Nevertheless, this relatively small amount of work is enough to conveniently visualize the data.
Naturally, many other possibilities exist as well. It is worth taking a look at the documentation. Or find yourself an online course. There are a lot of options.
In our example, using Looker made sense because
- other teams had already published data in Big Query and
- the infrastructure was already available.
Alternatives
This is not the place for a detailed comparison, but I would like to briefly address two alternatives.
- Microsoft Power BI
- Good, old Excel
We carried out the first steps with Power BI before we learned who could set up our own Looker project for us. In contrast to Looker, the initial work is done in a graphical environment. Unfortunately, there is no proper version management. On the other hand, it’s possible to get started quickly. Unlike Looker, there is also a Windows client. This allows quite a few things to be done locally. Accessing databases is also possible, of course. But if you are interested in scaling up, a local installation is not sufficient. Still, small analyses and prototypes can be produced quickly this way.
When it comes to publishing reports or updating data in the cloud, you have to purchase the corresponding services from Microsoft.
Another tool is (good) old Excel. If the data is already available in the right format (e.g. in Big Query), many analyses can actually be done with Excel. Thanks in particular to the introduction of Power Query to Microsoft Excel, there are a ton of possibilities for analysis there. And Excel also supports visualizations.
Summary
In the end, it all depends on the context. But in many cases, a development team can save time by making use of BI software. And stakeholders can be involved at an earlier stage.
What positive or negative experiences have you had? Has this article stirred up any interesting thoughts? I look forward to your comments, questions and discussions.
Learn more about data mesh. We have a 2-day-training.