Prepare data for analysis with Power BI
You'll learn how to
use Power Query to extract data from different data sources, choose a storage
mode, and connectivity type. You'll also learn to profile, clean, and load data
into Power BI before you model your data.
Introduction
Like most of us, you
work for a company where you're required to build Microsoft Power BI reports.
The data resides in several different databases and files. These data
repositories are different from each other, some are in Microsoft SQL Server,
some are in Microsoft Excel, but all the data is related.
In this module’s
scenario, you work for Tailwind Traders. You’ve been tasked by senior
leadership to create a suite of reports that are dependent on data in several
different locations. The database that tracks sales transactions is in SQL
Server, a relational database that contains what items each customer bought and
when. It also tracks which employee made the sale, along with the employee name
and employee ID. However, that database doesn’t contain the employee’s hire
date, their title, or who their manager is. For that information, you need to
access files that Human Resources keeps in Excel. You've been consistently
requesting that they use an SQL database, but they haven't yet had the chance
to implement it.
When an item ships,
the shipment is recorded in the warehousing application, which is new to the
company. The developers chose to store data in Cosmos DB, as a set of JSON
documents.
Tailwind Traders has
an application that helps with financial projections, so that they can predict
what their sales will be in future months and years, based on past trends.
Those projections are stored in Microsoft Azure Analysis Services. Here’s a
view of the many data sources you're asked to combine data from.
Before you can create
reports, you must first extract data from the various data sources. Interacting
with SQL Server is different from Excel, so you should learn the nuances of
both systems. After gaining understanding of the systems, you can use Power Query
to help you clean the data, such as renaming columns, replacing values,
removing errors, and combining query results. Power Query is also available in
Excel. After the data has been cleaned and organized, you're ready to build
reports in Power BI. Finally, you'll publish your combined semantic model and
reports to Power BI service. From there, other people can use your semantic
model and build their own reports or they can use the reports you’ve already
built. Additionally, if someone else built a semantic model you'd like to use,
you can build reports from that too!
This module will focus
on the first step of getting the data from the different data sources and importing it into Power BI by using Power Query.
By the end of this
module, you’ll be able to:
·
Identify and connect
to a data source
·
Get data from a
relational database, such as Microsoft SQL Server
·
Get data from a file,
such as Microsoft Excel
·
Get data from
applications
·
Get data from Azure
Analysis Services
·
Select a storage mode
·
Fix performance issues
·
Resolve data import
errors
Get data
from files
Power BI Desktop allows you to get data from many types of
files. You can find a list of the available options when you use the Get data feature in Power BI
Desktop. The following sections explain how you can import data from an Excel
file that is stored on a local computer.
Scenario
The Human Resources (HR) team at Tailwind Traders has prepared a
flat file that contains some of your organization's employee data, such as
employee name, hire date, position, and manager. They've requested that you
build Power BI reports by using this data, and data that is located in several
other data sources.
Flat file
location
The first step is to determine which file location you want to
use to export and store your data.
Your Excel files might exist in one of the following locations:
·
Local - You
can import data from a local file into Power BI. The file isn't moved into
Power BI, and a link doesn't remain to it. Instead, a new semantic model is
created in Power BI, and data from the Excel file is loaded into it.
Accordingly, changes to the original Excel file aren't reflected in your Power
BI semantic model. You can use local data import for data that doesn't change.
·
OneDrive
for Business - You can pull data from OneDrive for Business into Power BI.
This method is effective in keeping an Excel file and your semantic model,
reports, and dashboards in Power BI synchronized. Power BI connects regularly
to your file on OneDrive. If any changes are found, your semantic model,
reports, and dashboards are automatically updated in Power BI.
·
OneDrive
- Personal - You can use data from files on a personal OneDrive account, and
get many of the same benefits that you would with OneDrive for Business.
However, you'll need to sign in with your personal OneDrive account, and select
the Keep me signed in option.
Check with your system administrator to determine whether this type of
connection is allowed in your organization.
·
SharePoint
- Team Sites - Saving your Power BI Desktop files to SharePoint Team Sites is
similar to saving to OneDrive for Business. The main difference is how you
connect to the file from Power BI. You can specify a URL or connect to the root
folder.
Using a cloud option such as OneDrive or SharePoint Team Sites
is the most effective way to keep your file and your semantic model, reports,
and dashboards in Power BI in-sync. However, if your data doesn't change
regularly, saving files on a local computer is a suitable option.
Connect to data
in a file
In Power BI, on the Home tab,
select Get data. In the
list that displays, select the option that you require, such as Text/CSV or XML. For this example, you'll select Excel.
Tip
The Home tab contains quick access
data source options, such as Excel,
next to the Get data button.
Depending on your selection, you need to find and open your data
source. You might be prompted to sign into a service, such as OneDrive, to
authenticate your request. In this example, you'll open the Employee Data Excel workbook that
is stored on the Desktop (Remember, no files are provided for practice, these
are hypothetical steps).
Select the file
data to import
After the file has connected to Power BI Desktop, the Navigator window opens. This
window shows you the data that is available in your data source (the Excel file
in this example). You can select a table or entity to preview its contents, to
ensure that the correct data is loaded into the Power BI model.
Select the check box(es) of the table(s) that you want to bring
in to Power BI. This selection activates the Load and Transform Data buttons as shown
in the following image.
Now you can select the Load button
to automatically load your data into the Power BI model or select the Transform Data button to launch
the Power Query Editor, where you can review and clean your data before loading
it into the Power BI model.
We often recommend that you transform data, but that process
will be discussed later in this module. For this example, you can select Load.
Change the
source file
You might have to change the location of a source file for a
data source during development, or if a file storage location changes. To keep
your reports up to date, you'll need to update your file connection paths in
Power BI.
Power Query provides many ways for you to accomplish this task,
so that you can make this type of change
when needed.
1.Data
source settings
2.Query
settings
3.Advanced
Editor
Warning
If
you are changing a file path, make sure that you reconnect to the same file
with the same file structure. Any structural changes to a file, such as
deleting or renaming columns in the source file, will break the reporting
model.
For example, try changing the data source file path in the data
source settings. Select Data source settings in
Power Query. In the Data source settings window,
select your file and then select Change
Source. Update the File
path or use the Browse option
to locate your file, select OK, and then select Close.
No comments:
Post a Comment