Terminology: High-Level and Low-Level

High-level and low-level, as technical terms, are used to classify, describe and point to specific goals of a systematic operation; and are applied in a wide range of contexts, such as, for instance, in domains as widely varied as computer science and business administration.

High-level describe those operations that are more abstract in nature; wherein the overall goals and systemic features are typically more concerned with the wider, macro system as a whole.

Low-level describes more specific individual components of a systematic operation, focusing on the details of rudimentary micro functions rather than macro, complex processes. Low-level classification is typically more concerned with individual components within the system and how they operate.

In Computer Science, software is typically divided into two types: high-level end-user applications software (such as word processors, databases, video games, etc.), and low-level systems software (such as operation systems, hardware drivers, etc.).
As such, high-level applications typically rely on low-level applications to function.
In terms of programming, a high-level programming language is one which has a relatively high level of abstraction and manipulates conceptual functions in a structured manner.
low-level programming language is one like assembly language that contains rudimentary microprocessor commands.



SQL Inteview Questions

SET @first = 1
SET @step = 1
SET @last = 1000

WHILE(@first <= @last) BEGIN INSERT INTO TEST_NUMBER VALUES(@first) SET @first += @step


SELECT TOP (1000) [IncrNum]

-- SELECT [IncrNum] FROM 1 to 1000
WHERE [IncrNum] <= 1000

WHERE [IncrNum] <= 1000
AND ([IncrNum] % 2 <> 0)

What is ☆.。.:*Spark*.。.:☆?

All the hype around Apache Spark over the last 2 years gives rise to a simple question: What is Spark, and why use it? 

Spark is an open source, scalable, massively parallel, in-memory execution environment for running analytics applications. Think of it as an in-memory layer that sits above multiple data stores, where data can be loaded into memory and analyzed in parallel across a cluster.


Spark consists of a number of components:

  • Spark Core: The foundation of Spark that provides distributed task dispatching, scheduling and basic I/O
  • Spark Streaming: Analysis of real-time streaming data
  • Spark Machine Learning Library (MLlib): A library of prebuilt analytics algorithms that can run in parallel across a Spark cluster on data loaded into memory
  • Spark SQL + DataFrames: Spark SQL enables querying structured data from inside Java-, Python-, R- and Scala-based Spark analytics applications using either SQL or the DataFrames distributed data collection
  • GraphX: A graph analysis engine and set of graph analytics algorithms running on Spark
  • SparkR: The R programming language on Spark for executing custom analytics

Big data processing

Spark works to distribute data across a cluster, and process that data in parallel. It works in memory, making it much faster at processing data than MapReduce which shuffles files around on disk.

Spark also includes prebuilt machine-learning algorithms and graph analysis algorithms that are especially written to execute in parallel and in memory. It also supports interactive SQL processing of queries and real-time streaming analytics. As a result, you can write analytics applications in programming languages such as Java, Python, R and Scala.

These applications execute in parallel on partitioned, in-memory data in Spark. And they make use of prebuilt analytics algorithms in Spark to make predictions; identify patterns in data, such as in market basket analysis; and analyze networks—also known as graphs—to identify previous unknown relationships. You can also connect business intelligence (BI) tools to Spark to query in-memory data using SQL and have the query executed in parallel on in-memory data.

Spark can run on Apache Hadoop clusters, on its own cluster or on cloud-based platforms, and it can access diverse data sources such as data in Hadoop Distributed File System (HDFS) files, Apache Cassandra, Apache HBase or Amazon S3 cloud-based storage.


Scalable analytics applications can be built on Spark to analyze live streaming data or data stored in HDFS, relational databases, cloud-based storage and other NoSQL databases. Data from these sources can be partitioned and distributed across multiple machines and held in memory on each node in a Spark cluster. The distributed, partitioned, in-memory data is referred to as a Resilient Distributed Dataset (RDD).

A key Spark capability offers the opportunity to build in-memory analytics applications that combine different kinds of analytics to analyze data. For example, you can read log data into memory, apply a schema to the data to describe its structure, access it using SQL, analyze it with predictive analytics algorithms and write the predictive results back to disk. The results can be in a columnar file format for use and visualization by interactive query tools.



IBM made a strategic commitment to using Spark in 2015. 
http://www.ibmbigdatahub.com/blog/what-spark by Mike Ferguson

Will Power BI Eliminate the Need for Data Warehouses?

Recently I’ve had spoke to several peers and discussed with my mentor: why do we still need to build a Data Warehouse if we have Power BI?  For some SMEs (especially in NZ) or departments, may Power BI  well eliminate the need for a data warehouse?

As a fantastic self-service business intelligence tool, one of its key strengths about Power BI is the richness of its query and data modelling capabilities. These allow users to easily combine data from disparate data sources, add complex calculations, and generally model the data so that interactive reports and dashboards can be created in a fraction of the time.

The primary reason that data warehouses are created is to combine data from disparate sources into one accessible source for reporting. If that is the case and companies only need interactive reporting and dashboards, then Power BI will most likely eliminate the need for a data warehouse.

However, data warehouses exist to serve several purposes. Amongst others, here are some of the common reasons that data warehouses are implemented:

  1. Combining data from one or more disparate source systems (aggregating and standardising the data).
  2. Optimise data for reporting workloads (denormalisation).
  3. Reduce load on operational systems.
  4. Tracking historical changes in the data, allowing for point-in-time reporting. And handle the large-scale datasets.
  5. A single point of truth to use for all corporate reporting.

Power BI can, to some extent, cater for the first 3 points above. The 4th and 5th, however, cannot be achieved with Power BI so far. Historical change tracking is something that we will not see in Power BI in a few years. In addition to that, Power BI cannot currently serve as the single point of truth because not all reporting systems can use Power BI as the source for their reporting. Although Power BI is a great reporting tool, it does not cater for every scenario yet. There is still a need for other reporting tools, such as SQL Server Reporting Services, to cater for different business requirements.

Power BI has opened up a world of possibilities for organisations of any size. The small end of town now has the ability to do BI for a fraction of the cost they would have incurred until recently. However, for the large size enterprise requirements, a well-designed data warehouse is still required.

Read more: https://www.linkedin.com/pulse/data-warehouses-do-we-still-need-them-craig-bryden/

Being a Professional is a State of Mind: Worker VS Professional and Why it Matters

Being a professional used to mean that you had a very specialised set of skills or qualifications, or that you had to meet particular ethical standards (such as those of doctors or lawyers, for example). More recently, it’s been used to mean upwardly mobile, white-collar office workers. I’d argue though, that being a professional means something broader than this – and much more empowering. It’s less about how you earn a living and more about the way you think and feel about earning a living.

When I think about our professional audience, I don’t just think in terms of people who have particular jobs or particular skills; I think of people who have a particular approach to growing and applying their skills in order to fulfill their aspirations; who see their working life as a journey where they are constantly learning and developing. This distinction matters because it means the professional mindset is often very different to the state of mind associated with just ‘being at work’.

Here are 3 differences that help to explain why, when it comes to influencing people’s choices and behaviour, it’s the professional mindset that you want to engage:

  • The professional mindset is open to new possibilities

When we look at the types of content that people engage with on LinkedIn we find them investing their time in becoming better at what they do. It’s not just a question of doing the job they have; there’s a strong appetite for continuous personal development that shines through in their interest in sharing ideas, acquiring knowledge and absorbing expert opinions. Professionals may find their next job on LinkedIn, but they spend even more time focused on becoming better at the job they’re already doing. This type of curious, self-improving mindset is hungry for new ideas – and derives significant personal benefit and satisfaction from engaging with value-adding content.

  • The professional mindset is very rarely idle

LinkedIn serves more than 9 billion content impressions every week, with 57% of those impressions involving people accessing relevant content on their mobile, often outside of office hours. It’s testament to the fact that professionals’ interest in their work goes beyond earning an income and beyond the 9 to 5 – they are interested, stimulated and highly engaged in what they do.

  • The professional mindset isn’t a passenger

In part, this is because the professional mindset is fundamentally aspirational. It wants to make a difference; it’s focused on achieving things and it’s emotionally invested in the work that it does. This comes out in LinkedIn’s regular Talent Drivers survey, in which we ask our members what they value most in a job opportunity. In the Talent Drivers survey for 2015, the most popular answers were working for a company with a clear vision, and being able to make an impact in their new role. Professionals don’t just want to be taken along for the ride; their work is an expression of who they are – and that’s why appealing to them on an emotional level is such a productive strategy for B2B marketers. Their hearts and brains are both engaged in what they do.

I personally believe that, although not everyone in a job inhabits a professional mindset, everyone who works has the potential to enter into this state of mind. The professional mindset grows with increasing confidence in your skills, a feeling of control over where your career is going, and a sense of how your working life intersects with your interests, values and the type of person you want to be.

The organisation that someone works for, and the way that it invests in its people, obviously have a huge influence on how professional they feel. Sharing inspiring content that appeal to your prospects as human beings with aspirations, not just as one-dimensional personas, is a great place to start. Empower an audience to embrace their professional mindset and you’ll be rewarded with higher engagement levels and an enduring influence over the choices they make.


read more:

Power BI Use Case: Energy Industry

This report focuses on demand forecasting within the energy sector. The views provide region level energy usage status and forecast of future usage for optimizing the operations.

Case Study:

Storing energy is not cost-effective, so utilities and power generators need to forecast future power consumption so that they can efficiently balance the supply with the demand. During peak hours, short supply can result in power outages. Conversely, too much supply can result in waste of resources. Advanced demand forecasting techniques detail hourly demand and peak hours for a particular day, allowing an energy provider to optimize the power generation process. This report focuses on demand forecasting within the energy sector. The report provides region level energy usage status and forecast of future usage for optimizing the operations.

The ‘Energy Solution Forecast’ page shows the demand forecast results from Azure Machine Learning model and different error metrics for user to identify the quality of the model. Temperature and its forecasts are used as a feature in the machine learning model.


The ‘Energy Solution Status Summary’ page shows the overall status of energy demand of each region. User can select a single region by clicking the filter on the left to investigate each region’s status.


Publish Report to Web from Power BI

With Power BI Publish to web, you can easily embed interactive Power BI visualizations online, such as in blog posts, websites, through emails or social media, on any device.

You can also easily edit, update, refresh or un-share your published visuals.

How to use Publish to Web

Publish to web is available on reports in your personal or group workspaces that you can edit. You cannot use Publish to web with reports that were shared with you, or reports that rely on row level security to secure the data.

You can watch how this feature works in the following short video. Then, follow the steps below to try it yourself.

The following steps describe how to use Publish to web.

  1. On a report in your workspace that you can edit, select File > Publish to web.

  2. Review the content on the dialogue, and select Create embed code as shown in the following dialogue.

  3. Review the warning, shown in the following dialog, and confirm that the data is okay to embed in a public website. If so, select Publish.

  4. A dialog appears that provides a link that can be sent in email, embedded in code (such as an iFrame), or that you can paste directly into your web page or blog.

  5. If you’ve previously created an embed code for the report, the embed code quickly appears. You can only create one embed code for each report.

Tips and Tricks for View modes

When you embed content within a blog post, you typically need to fit it within a specific size of the screen. You can also adjust the height and the width in the iFrame tag as needed, but you may also need to ensure your report fits within the given area of the iFrame, so you also need to set an appropriate View Mode when editing the report.

The following table provides guidance about the View Mode, and how it will appear when embedded.

View Mode How it looks when embedded
Fit to page will respect the page height and width of your report. If you set your page to ‘Dynamic’ ratios like 16:9 or 4:3 your content will scale to fit within the iFrame you provided. When embedded in an iFrame, using Fit to pagecan result in letterboxing, where a gray background is shown in areas of the iFrame after the content as scaled to fit within the iFrame. To minimize letterboxing, set your iFrame height/width appropriately.
Actual size will ensure the report preserves its size as set on the report page. This can result in scrollbars being present in your iFrame. Set the iFrame height and width to avoid the scrollbars.
Fit to width ensures the content fits within the horizontal area for your iFrame. A border will still be shown, but the content will scale to use all the horizontal space available.

Tips and tricks for iFrame height and width

The embed code you receive after you Publish to web will look like the following:

You can edit the width and height manually to ensure it is precisely how you want it to fit onto the page into which you’re embedding it.

To achieve a more perfect fit, you can try adding 56 pixels to the height dimension of the iFrame. This accomodates the current size of the bottom bar. If your report page uses the Dynamic size, the table below provides some sizes you can use to achieve a fit without letterboxing.

Ratio Size Dimension (Width x Height)
16:9 Small 640 x 416 px
16:9 Medium 800 x 506 px
16:9 Large 960 x 596 px
4:3 Small 640 x 536 px
4:3 Medium 800 x 656 px
4:3 Large 960 x 776 px

Managing embed codes

Once you create a Publish to web embed code, you can manage the codes you create from the Settings menu of the Power BI service. Managing embed codes includes the ability to remove the destination visual or report for a code (rendering the embed code unusable), or getting the embed code again.

  1. To manage your Publish to web embed codes, open the Settings gear and select Manage embed codes.

  2. The list of embed codes you’ve created appears, as shown in the following image.

  3. For each Publish to web embed code in the list, you can either retrieve the embed code, or delete the embed code and thus make any links to that report or visual no longer work.

  4. If you select Delete, you’re asked if you’re sure you want to delete the embed code.

Updates to reports, and data refresh

After you create your Publish to web embed code and share it, the report is updated with any changes you make. However, it’s important to know that it can take a while for update to be visible to your users. Updates to a report or visual take approximately one hour to be reflected in Publish to web embed codes.

When you initially use Publish to web to get an embed code, the embed code link is immediately active and can be viewed by anyone who opens the link. After the initial Publish to web action, subsequent updates to reports or visuals to which a Publish to web link points can take approximately one hour to be visible to your users.

To learn more, see the How it works section later in this article. If you need your updates to be immediately available, you can delete the embed code and create a new one.

Data refresh

Data refreshes are automatically reflected in your embedded report or visual. It can take approximately 1 hour for refreshed data to be visible from embed codes. You can disable automatic refresh by selecting do not refresh on the schedule for the dataset used by the report.

Custom visuals

Custom visuals are supported in Publish to web. When you use Publish to web, users with whom you share your published visual do not need to enable custom visuals to view the report.


Publish to web is supported for the vast majority of data sources and reports in the Power BI service, however, the following are not currently supported or available with Publish to web:

  1. Reports using row level security.
  2. Reports using any Live Connection data source, including Analysis Services Tabular hosted on-premises, Analysis Services Multidimensional, and Azure Analysis Services.
  3. Reports shared to you directly or through an organizational content pack.
  4. Reports in a group in which you are not an edit member.
  5. “R” Visuals are not currently supported in Publish to web reports.
  6. Exporting Data from visuals in a report which has been published to the web
  7. ArcGIS Maps for Power BI visuals
  8. Reports containing report-level DAX measures
  9. Secure confidential or proprietary information

Tenant setting

Power BI administrators can enable or disable the publish to web feature. They may also restrict access to specific groups. Your ability to create an embed code changes based on this setting.

Feature Enabled for entire organization Disabled for entire organization Specific security groups
Publish to webunder report’s File menu. Enabled for all Not visable for all Only visable for authorized users or groups.
Manage embed codes under Settings Enabled for all Enabled for all Enabled for all

Delete option only for authorized users or groups.
Get codes enabled for all.

Embed codeswithin admin portal Status will reflect one of the following:
* Active
* Not supported
* Blocked
Status will display Disabled Status will reflect one of the following:
* Active
* Not supported
* Blocked

If a user is not authorized based on the tenant setting, status will display as infringed.

Existing published reports All enabled All disabled Reports continue to render for all.

Understanding the embed code status column

When viewing the Manage embed codes page for your Publish to web embed codes, a status column is provided. Embed codes are active by default, but you may encounter any of the states listed below.

Status Description
Active The report is available for Internet users to view and interact with.
Blocked The content of the report violates the Power BI Terms of Service. It has been blocked by Microsoft. Contact support if you believe the content was blocked in error.
Not supported The report’s data set is using row level security, or another unsupported configuration. See the Limitations section for a complete list.
Infringed The embed code is outside of the defined tenant policy. This typically occurs when an embed code was created and then the publish to web tenant setting was changed to exclude the user that owns the embed code. If the tenant setting is disabled, or the user is no longer allowed to create embed codes, existing embed codes will show the status of Infringed.

How to report a concern with Publish to web content

To report a concern related to Publish to web content embedded in a website or blog, use the Flag icon in the bottom bar, shown in the following image. You’ll be asked to send an email to Microsoft explaining the concern. Microsoft will evaluate the content based on the Power BI Terms of Service, and take appropriate action.

To report a concern, select the flag icon in the bottom bar of the Publish to web report you see.

Licensing and Pricing

You need to be a Microsoft Power BI user to use Publish to web. The consumers of your report (the readers, viewers) do not need to be Power BI users.

How it works (technical details)

When you create an embed code using Publish to web, the report is made visible to users on the Internet. It’s publicly available so you can expect viewers to easily share the report through social media in the future. As users view the report, either by opening the direct public URL or viewing it embedded in a web page or blog, Power BI caches the report definition and the results of the queries required to view the report. This approach ensures the report can be viewed by thousands of concurrent users without any impact on performance.

The cache is long-lived, so if you update the report definition (for example, if you change its View mode) or refresh the report data, it can take approximately one hour before changes are reflected in the version of the report viewed by your users. It is therefore recommended that you stage your work ahead of time, and create the Publish to web embed code only when you’re satisfied with the settings.



When you use Publish to web, the report or visual you publish can be viewed by anyone on the Internet. There is no authentication used when viewing these reports. Only use Publish to web with reports and data that the anyone on the Internet (unauthenticated members of the public) should be able to see.


A Guide for Career Paths in Business Intelligence (BI)

The Foundational Skills of All BI Positions – Data Manipulation 

The more traditional BI route involves mastering relational concepts coupled with SQL (Structured Query Language) as well as masterting non-structured concepts involving Hadoop coupled with Python. This knowledge is used throughout any BI position – whether at junior or senior level.

A great place to start learning the basics of SQL code for free is at Code Academy. And here’s a great resource to start learning more about Hadoop.

If you’re reading this as a college student who’s trying to decide your major, Management Information SystemsBusiness Information SystemsIntelligence StudiesStatistics, and anything in Computer Science are excellent majors for a career in Business Intelligence & Analytics.

I personally double majored at Infomation System, plus Operation and Supply Chain management. I have taken related courses such as Stats, Accounting, Economic and Operation Research, they are exremely helpful.

Two Different Paths 

There are two main paths to consider for a career in BI, and they are generally thought of as the “back-end” and “front-end.” Furthermore, there are plenty of BI professionals that take a shot at both of these paths over their career as a way to gain more knowledge, broaden their skillset, and generate more opportunities.

For back-end BI development, the foundational skills revolve around 3 primary capabilities.

  1. The ability to source data: This involves being able to collect data in whatever system, stream, location, and format it exists.
  2. The ability to manipulate data: The raw data you’re working with will need to be validated, cleansed, and integrated. And in some instances, business logic will need to be applied to the data. This is the first step in creating value from the data.
  3. The ability to create data structures and storage architecture: In order to make the data useful to an audience of users, it must be organized and structured in a way that makes it intuitive to the user audiences and responsive to requests and queries.

People typically call back-end work the “technical” side of BI. Back-end work is hardly noticed by the end-user. Working on the technical side of BI involves less design of what you’re delivering for your company/client and more hands-on development, programming, and coding of the solution. That’s not to say that if you’re a back-end developer you won’t ever be doing architecture and design work. But for the majority of your time, you will be doing more technical tasks like development, and less design.

The back-end skill set from a general perspective is known as “Extraction, Transformation, and Loading” (ETL). ETL concepts can be easily studied and understood, but hands-on work with data and developing the code to move the data around is what really gains you valuable experience.

The foundational skill for “front-end” BI workers is the presentation of information, well known as reporting analyst. Some common skills companies look for in front-end BI developers are a strong understanding of data visualization best practices, ETL experience, strong analytical and quantitative skills, and strong communication skills.

The goal of front-end work is to place information into a context that allows the consumer of the information (the user) to use it to make smart decisions. Many of these roles use SQL (or a software tool that creates SQL for you) to both query and manipulate that data into a context that takes the form of reports or dashboards for your customer (internal customer or external customer).

People typically call front-end development the “functional side” of BI. It is in this position where your ability to interact and listen to your customers’ needs is extremely valuable. The technical ability to develop the interfaces in order to provide the reporting and analytics they have requested is vital to have as well.

Doing so generates invaluable business insights and knowledge that would not be realized without business intelligence. These professionals also routinely provide support to users and ensure the proper configuration and management of the BI solutions they are responsible for.

Popular Job Titles in Business Intelligence

– Business Intelligence Developer/Analyst/Consultant/Specialist

– Data Warehouse Developer/Consultant/Specialist

– Database Applications Developer

– Big Data/Hadoop/ETL/ Developer

– Reporting/Data Analyst

Read more:

Why Data Modelling is Important in BI

As per my experience in working on reporting systems based on semantic models, aka “self-service BI”, but the principles are the same we had in DSS (decision support system), OLAP (on-line analytical processing), and many other names that we’ve seen over the years.

The basic idea is always the same: enable the user to navigate data without rewriting a new query every time. In order to do that, a description of the data model with metadata that enrich the simple collection of table and relationships allows a generic client tool to offer a customized experience, as it was designed for a specific business need.
Creating a good data model specific for the business analysis is important because the goal is different compared to a database that runs a transactional application. If you want to record sales, you need a database that is optimized to collect data in a safe and accurate way. If you want to analyze sales, you need a data model that is optimized to query and aggregate data in volume. These two requirements are different in a way that is much larger than a simple performance issue.

A normalized model for a transactional application might show challenges for a query. How to interpret a missing data or a null value? A data model optimized for the analysis is simpler in its nature, because the data have been transformed so that they are unambiguous.

Power BI shares the same data modeling experience of Power Pivot and Analysis Services Tabular. The easiest way to create a data model is to get the data “as is” from a data source, define relationships, and then start designing reports. However, this approach raises several issues when you have too many tables, or different granularities in tables representing the same entity from different data sources. Power Query and the M language are here to solve these problems. If you are so lucky to read data from a well-designed data mart that follows the star schema principles, you don’t need any transformation. But in all the other cases, you probably should massage your tables before creating your reports.

The question is: what is the right data model for the business analysis?
The first simple answer is: the model should be designed to answer business questions, and not to simply represent the data as they come from the data source.
Yes, this answer is not a real answer, because it does not provide any practical guidance.
A better hint is: one entity, one table. This is a “user-friendly” way to describe a star schema, where each set of attributes describing a business entity is collected in a table called “dimension”, and each set of numeric metrics describing events and sharing the same granularity is saved in a table called “fact table”. But “one entity, one table” is simpler to understand.

A product is a table. Category, color, price, manufacturer: these are attributes. There is no added value in creating many relationships to just describe attributes of the same entity. Just collapse all these attributes in the same Product table.
A customer is a table. Country, city, address, number of employees are just attributes of the customer. It is useless to create a table with the countries.
Unless the country is a business entity in your analysis. If you are doing demographical researches, chances are that the country is a business entity, and not just an attribute of a customer.
An invoice is a table. Date of the invoice, shipping cost, products sold, quantities… all these attributes should be in a table with a conformed granularity. And when this is not possible, but only then, you start considering several tables at different granularities.

Design principles for data modelling

There are no right or wrong data model; just good or bad. Good data modelling is difficult; and the follow design principles could be useful:

–Be faithful to the specification of the requirement

–Use common sense and make assumption only if the specification fails to explain

–Avoid duplication and other redundant information

–The KISS principle

Customer: A customer may not yet have booked for a safari, or may have booked for and participated in several safaris. A company or an individual person may be a customer – but only individual persons are booked into scheduled safari trip occurrences. The information to be stored about a customer is:

  • For a company customer: name, address (first line address, second line address, city, postcode, country) and contact name, email address and phone number. A contact may change over time
  • For a person customer: first name, last name, address (first line address, second line address, city, postcode, country), email address, phone number and date of birth
  • A customer’s preferences (whether company or person) for time slots during which they wish to travel must be retained in the database. A customer may be able to travel in several time slots during the year. Tane wishes to retain this information so that he can target only those able to participate with publicity for scheduled trips