Introduction to Microsoft Fabric: using Notebook to load and transform data into Lakehouse

October 07, 2024

Whats up guys?!!!

Here is a post on Microsoft Fabric: which is an unified platform for data engineering, data warehousing, and business intelligence and essentially an one stop solution to facilitate end-to-end analytics and data platform designed for enterprises that require a unified solution, as well as to satiate all your need for data movement, processing, ingestion, transformation, real-time event routing, and report building. This blog could give you an idea to ingest and transform data from an external application (e.g. website/API/files or could be an Azure storage account) and load them into Lakehouse.

Ok: let's begin by creating a free account for Microsoft Fabric. Start by signing up for a dummy instance for D365: https://learn.microsoft.com/en-us/dynamics365/project-operations/environment/admin-trials

This will create a D365CE-F&O instance as along with a litter of utilities like:

a. Portal Azure login: although it will be an entirely a zero credit instance, however you can still create Entra IDs from here. This can help you to connect to F&O instance (if need be).

b. You can connect to Power-platform utilities: creating Power apps, Flows and Power BI.

c. Create a Free access to Microsoft Fabric, by visiting: https://app.fabric.microsoft.com/ , using the ID that will be cerated from the above. This will create neecessary capacities for Microsoft Fabric, with enough provisions to let you do enough of R&Ds for 60 days ✌✌✌

Once you login, your ladning page should look like:

Look at the URL from the address bar. Note that, I have purposely kept the experience= data-engineering. You can change this to PowerBI, if you wish to explore Power BI capabilities of MS Fabric.

Once done, you can create a new Workspace from as follows:

Give a suitable name:

Click on Advanced >> License mode >> Select Trial:

Uh, ok -- once this is done, in your newly created workspace >> Click on New >>

Click on More Options: select Lakehouse from the below choices:

Give a proper name here:

Press Create to continue.

Next, let us create a notebook for your Lakehouse. Click on New and select NoteBook

As usual give a proper name to your Notebook >> Click Create. This will automarically create a Notebook, under the Lakehouse created from the above step.

All set: let us now give you a background on what we will be doing. We need a dataset, to demo you how to bring the data from the said dataset and load it in as a file to the Lakehouse. For our demo, let us take it from Kaggle: a gigantic storehouse of data for data enfgineering. You can create an account here, and yes: it's free.

For our testing purpose, I will be using data from Kaggle gym data, which could be found as under:

https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset

Click on your name icon >>

Under API, click on Create New Token:

This will create new token for your User name and will get downloaded as a file in your local computer.

Let us come back to our workspace Notebook. Don't forget to ensure that you have chosen Python from the above highlighted:

Let us start by installing Kaggle module:

Click on the highlighted Run button, to start your session and work on the commnd you have given to start on.

Hover your mouse below the result, the following will appear to add new cell to add more code:

Add the following to the code window:

import os

os.chdir('/lakehouse/default/Files')

os.environ['KAGGLE_USERNAME'] = '<you will get this from the token file downloaded>'

os.environ['KAGGLE_KEY'] = 'you will get this from the token file downloaded'

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()

api.authenticate()

api.dataset_download_file('valakhorasani/gym-members-exercise-dataset', 'gym_members_exercise_tracking.csv')

By saying this:

a. You are changing the directory to the \Lakehouse\default\files folder:

b. You wiill get the file name details from Kaggle URL:

Using this you need to fill up the above code as necesssary. Run the code, the following will appear:

And just under the File (once you do a refresh), you would see the Gym-member-exercise file has been uploaded too:

You can now inspect the content of the file, by insepcting the data, by loading into a Dataframe:

import pandas as pd
import pandas as pd
df = pd.read_csv('/lakehouse/default/' + "Files/gym_members_exercise_tracking.csv")
df.head()

And the following will be the result:

Giving you the top(5) of the data records.
Amazing, eh?

So that's just the beginning. I will come back to show how to manipulate with this data, how to feed data from D365F&O and Dataverse soon.

Keep learning, keep Azure-ing. Love and Namaste, as always 💓💓💓

Search This Blog

DynamicsSubs

Introduction to Microsoft Fabric: using Notebook to load and transform data into Lakehouse

Comments

Post a Comment

Popular posts from this blog

How to handle the error "Could not grant admin consent. Your organization does not have a subscription (or service principal) for the following API(s): Dynamics ERP" in Azure Portal

Talk to D365 F&O using Custom Copilot Studio entities in interactive topics and agents

Make your menu items visible on main menu, conditionally,, using this cool feature of D365FO