I recently delivered a training about data awareness. Strangely enough data awareness is not a common practice within organizations. To my great surprise the impact of data on their business is purely a matter of insights. That is of a valid statement, but that is not all to say about data. To start with, first of all look at the two ways where data is stored and what it means in your daily workload.
You can discuss this distribution in-depth, but I made the distinction of only two different data sources within organizations:
1 personal data sources
2 organizational data sources
So, let have a look at the personal sources. Those are files, stored “somewhere” on your personal devices, corporate servers, or cloud-based services like SharePoint. These sources are produced by applications, like Microsoft Office or Google Docs.
The roots of this type of documents can be found in the 70th. The found their place by the introduction of the Personal Computer and in its basic form they did not change much. They belong to the most used business tools, even in this period. Although most Excel business users are able to think in a more structured (table) way, it stays a free format model.
Organizational data sources are even older. In its original form they were stored on paper cards or tape reels. These days we know them better as tables in a Data Warehouse (DWH). A rather new way of corporate data sources are tables placed inside a Data lake. Data inside this data sources are most of the time grouped in tables, with relations between the tables. Since these data sources are the expertise of ICT professionals, normally this type of data source is well secured against abuse.
If you use a lot of personal data sources (like Excel) and extract data from corporate data sources, be beware of the dangers of what happens to your data. Especially if you use new applications like Power Pivot in Excel and Power Bi desktop. These applications work really fast with large data set, because they use compression. If you look at the picture below, you see that compression on a 134 Mb -over 2 million rows- text file is crunched to a mere 3,6 Mb Excel file.
Files this size are easily send to anyone by email. But not everyone realizes that the complete table is added to that Excel file! If the data contains private information inside, you might not use that information in your visualization, but that data is still retrievable from that file. Since you can get several tables from corporate data sources and add those in one Excel Power Pivot file, you basically draw a part the corporate data from behind the security borders into a personal document, with a relatively small size. If you a user and not aware of the data behind the visualizations, you might be in for a surprise.
Data awareness has everything to do with knowing the value of the data you use and the pitfalls of personal and corporate data sources. And to end this blog, we all make mistakes, but some of them are rather costly.