What is ☆.。.:*Spark*.。.:☆?

All the hype around Apache Spark over the last 2 years gives rise to a simple question: What is Spark, and why use it? 

Spark is an open source, scalable, massively parallel, in-memory execution environment for running analytics applications. Think of it as an in-memory layer that sits above multiple data stores, where data can be loaded into memory and analyzed in parallel across a cluster.


Spark consists of a number of components:

  • Spark Core: The foundation of Spark that provides distributed task dispatching, scheduling and basic I/O
  • Spark Streaming: Analysis of real-time streaming data
  • Spark Machine Learning Library (MLlib): A library of prebuilt analytics algorithms that can run in parallel across a Spark cluster on data loaded into memory
  • Spark SQL + DataFrames: Spark SQL enables querying structured data from inside Java-, Python-, R- and Scala-based Spark analytics applications using either SQL or the DataFrames distributed data collection
  • GraphX: A graph analysis engine and set of graph analytics algorithms running on Spark
  • SparkR: The R programming language on Spark for executing custom analytics

Big data processing

Spark works to distribute data across a cluster, and process that data in parallel. It works in memory, making it much faster at processing data than MapReduce which shuffles files around on disk.

Spark also includes prebuilt machine-learning algorithms and graph analysis algorithms that are especially written to execute in parallel and in memory. It also supports interactive SQL processing of queries and real-time streaming analytics. As a result, you can write analytics applications in programming languages such as Java, Python, R and Scala.

These applications execute in parallel on partitioned, in-memory data in Spark. And they make use of prebuilt analytics algorithms in Spark to make predictions; identify patterns in data, such as in market basket analysis; and analyze networks—also known as graphs—to identify previous unknown relationships. You can also connect business intelligence (BI) tools to Spark to query in-memory data using SQL and have the query executed in parallel on in-memory data.

Spark can run on Apache Hadoop clusters, on its own cluster or on cloud-based platforms, and it can access diverse data sources such as data in Hadoop Distributed File System (HDFS) files, Apache Cassandra, Apache HBase or Amazon S3 cloud-based storage.


Scalable analytics applications can be built on Spark to analyze live streaming data or data stored in HDFS, relational databases, cloud-based storage and other NoSQL databases. Data from these sources can be partitioned and distributed across multiple machines and held in memory on each node in a Spark cluster. The distributed, partitioned, in-memory data is referred to as a Resilient Distributed Dataset (RDD).

A key Spark capability offers the opportunity to build in-memory analytics applications that combine different kinds of analytics to analyze data. For example, you can read log data into memory, apply a schema to the data to describe its structure, access it using SQL, analyze it with predictive analytics algorithms and write the predictive results back to disk. The results can be in a columnar file format for use and visualization by interactive query tools.



IBM made a strategic commitment to using Spark in 2015. 
http://www.ibmbigdatahub.com/blog/what-spark by Mike Ferguson

Will Power BI Eliminate the Need for Data Warehouses?

Recently I’ve had spoke to several peers and discussed with my mentor: why do we still need to build a Data Warehouse if we have Power BI?  For some SMEs (especially in NZ) or departments, may Power BI  well eliminate the need for a data warehouse?

As a fantastic self-service business intelligence tool, one of its key strengths about Power BI is the richness of its query and data modelling capabilities. These allow users to easily combine data from disparate data sources, add complex calculations, and generally model the data so that interactive reports and dashboards can be created in a fraction of the time.

The primary reason that data warehouses are created is to combine data from disparate sources into one accessible source for reporting. If that is the case and companies only need interactive reporting and dashboards, then Power BI will most likely eliminate the need for a data warehouse.

However, data warehouses exist to serve several purposes. Amongst others, here are some of the common reasons that data warehouses are implemented:

  1. Combining data from one or more disparate source systems (aggregating and standardising the data).
  2. Optimise data for reporting workloads (denormalisation).
  3. Reduce load on operational systems.
  4. Tracking historical changes in the data, allowing for point-in-time reporting. And handle the large-scale datasets.
  5. A single point of truth to use for all corporate reporting.

Power BI can, to some extent, cater for the first 3 points above. The 4th and 5th, however, cannot be achieved with Power BI so far. Historical change tracking is something that we will not see in Power BI in a few years. In addition to that, Power BI cannot currently serve as the single point of truth because not all reporting systems can use Power BI as the source for their reporting. Although Power BI is a great reporting tool, it does not cater for every scenario yet. There is still a need for other reporting tools, such as SQL Server Reporting Services, to cater for different business requirements.

Power BI has opened up a world of possibilities for organisations of any size. The small end of town now has the ability to do BI for a fraction of the cost they would have incurred until recently. However, for the large size enterprise requirements, a well-designed data warehouse is still required.

Read more: https://www.linkedin.com/pulse/data-warehouses-do-we-still-need-them-craig-bryden/

Being a Professional is a State of Mind: Worker VS Professional and Why it Matters

Being a professional used to mean that you had a very specialised set of skills or qualifications, or that you had to meet particular ethical standards (such as those of doctors or lawyers, for example). More recently, it’s been used to mean upwardly mobile, white-collar office workers. I’d argue though, that being a professional means something broader than this – and much more empowering. It’s less about how you earn a living and more about the way you think and feel about earning a living.

When I think about our professional audience, I don’t just think in terms of people who have particular jobs or particular skills; I think of people who have a particular approach to growing and applying their skills in order to fulfill their aspirations; who see their working life as a journey where they are constantly learning and developing. This distinction matters because it means the professional mindset is often very different to the state of mind associated with just ‘being at work’.

Here are 3 differences that help to explain why, when it comes to influencing people’s choices and behaviour, it’s the professional mindset that you want to engage:

  • The professional mindset is open to new possibilities

When we look at the types of content that people engage with on LinkedIn we find them investing their time in becoming better at what they do. It’s not just a question of doing the job they have; there’s a strong appetite for continuous personal development that shines through in their interest in sharing ideas, acquiring knowledge and absorbing expert opinions. Professionals may find their next job on LinkedIn, but they spend even more time focused on becoming better at the job they’re already doing. This type of curious, self-improving mindset is hungry for new ideas – and derives significant personal benefit and satisfaction from engaging with value-adding content.

  • The professional mindset is very rarely idle

LinkedIn serves more than 9 billion content impressions every week, with 57% of those impressions involving people accessing relevant content on their mobile, often outside of office hours. It’s testament to the fact that professionals’ interest in their work goes beyond earning an income and beyond the 9 to 5 – they are interested, stimulated and highly engaged in what they do.

  • The professional mindset isn’t a passenger

In part, this is because the professional mindset is fundamentally aspirational. It wants to make a difference; it’s focused on achieving things and it’s emotionally invested in the work that it does. This comes out in LinkedIn’s regular Talent Drivers survey, in which we ask our members what they value most in a job opportunity. In the Talent Drivers survey for 2015, the most popular answers were working for a company with a clear vision, and being able to make an impact in their new role. Professionals don’t just want to be taken along for the ride; their work is an expression of who they are – and that’s why appealing to them on an emotional level is such a productive strategy for B2B marketers. Their hearts and brains are both engaged in what they do.

I personally believe that, although not everyone in a job inhabits a professional mindset, everyone who works has the potential to enter into this state of mind. The professional mindset grows with increasing confidence in your skills, a feeling of control over where your career is going, and a sense of how your working life intersects with your interests, values and the type of person you want to be.

The organisation that someone works for, and the way that it invests in its people, obviously have a huge influence on how professional they feel. Sharing inspiring content that appeal to your prospects as human beings with aspirations, not just as one-dimensional personas, is a great place to start. Empower an audience to embrace their professional mindset and you’ll be rewarded with higher engagement levels and an enduring influence over the choices they make.


read more: