New to Image Recognition? Start with Interesting Face Recognition with Python!

by Allison Zhang

Image recognition now is a hot topic in machine learning, artificial intelligence and data analysis. But what is Image recognition?

Image recognition is the process of identifying and detecting objects, places, people, writing and actions in images or videos. Once we have the information from images, this information are good source data for data analysis.


Image recognition is always connected deep learning. Deep Convolutional Neural Networks (CNN) is approved to be a very good approach to do object recognition in images. Convolutional Neural Networks perceive images as three-dimensional objects based on the RGB, rather than a flat one-dimensional canvas ingested by bare eyes.

We will not discuss in deep what is CNN and how that works cause this blog is supposed to be interesting!


Let’s use an interesting example to have a better understanding of deep learning. That is – face recognition!


First let’s see what will be used this time.

1. Python 3.6
2. Python site-packages: opencv-contrib-python (latest),
3. Haar Cascades from OpenCV3
4. WebCam

All the codes are based on the latest version of Python and OpenCV-python package.


In this experiment, we will use the built-in algorithm in OpenCV, which is called Haar Cascade. Haar Cascades is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

When we work with face detection, the algorithm trains the classifier with a lot of images of faces and images without faces. Then the algorithm calculates features from all these images. Some of the features are useful, like nose, eyes and hair. Others are irrelevant, obviously. The Cascade of Classifiers can group these features into different stages of classifiers and applied one-by-one to keep only the features passes all stages. Then the face region!

This is just a simple explanation of Haar Cascades Classifier. Paul Viola and Michael Jones worked a lot on this study and if you have interest, you can dig further!

After we have all packages we need, we can start our experiment. We will follow the following process to complete our experiment:

1. Face Detection: detect the face in the image or video
2. Data Gathering: Detect the unique characteristics of the face to
3. Data Training: Train the computer remember the face and differentiate it from other faces
4. Face Recognition


First let’s test our camera to detect our face. If you have more than one cameras connect to your device, please test with cv2.VideoCapture(N) to see which camera works for you.


Screen Shot 2018-04-03 at 9.13.54 AM.png

If you can see the “Square” is following you, then you make it!

Screen Shot 2018-04-03 at 9.15.06 AM.png


These two steps can be combined into one. Since data training is telling the machine how the person looks like, so we feed the machine basically every angle of our face. The more you feed the machine, the more accurate the result would be. But in this experiment, we used only 10 camera captures to train the machine.

Screen Shot 2018-04-03 at 9.16.22 AM.png


Finally, we can tell the machine who this is!

Screen Shot 2018-04-03 at 9.17.43 AM.png
Screen Shot 2018-04-03 at 9.18.03 AM.png


What can bring more fun is that OpenCV face recognition can be played with Raspberry Pi! Now it is furthered to face tracking and such concept can be applied to a lot of real-life situation This time we explained from only one aspect of how you can do with images or video. But there are more can be applied on images recognition. When we talk about data, some people refer it only to numbers. We know data are more than that. And as long as we can have all the data from images, we can do any analysis we need!

Why We Aren’t an Advanced Analytics Firm, and Why We Don’t Want to Be One!

by Ed Crowley – Chief Thought Leader – Virtulytix, Inc.

Advanced analytics is in vogue now and a lot of companies are calling themselves advanced analytics companies. From consulting firms to software companies, by calling themselves advanced analytics firms, companies hope to position themselves as being able to add value to customers businesses by helping them ‘sort through’ their data and turn it into meaningful business action. Gartner’s list of top advanced analytics firms demonstrates the software and technology centricity of the firms it identifies as leaders in advanced analytics (SAS, IBM, Dell, KNIME, and RapidMiner) and visionaries in advanced analytics (Microsoft, Alteryx, Predixion Software, and Alpine Data

Each of these companies provides great technology and certainly provides advanced analytics ‘tools’.  But to us, it’s all about application. Tools are great – but what use is a tool if it isn’t applied to solve a problem. The first person to pick up a sharp stone didn’t have a tool – they just had a sharp stone. But once they began ‘applying’ this tool by using it dig, hunt, or scrape – it became a powerful tool! And over time, and use, they refined the sharp rock into an even better tool – arrowheads, axe heads, you get the idea. The value is in the application – not the tool!

We think there are a ton of great tools out there. Finding tools isn’t the problem. In fact, I have spent this week at an IBM event (THINK) rubbing elbows with some of the leading developers in predictive analytics, prescriptive analytics, visual insights, and other parts of the IBM Watson Cognitive ecosystem. I am convinced there are tons of good analytics tools and we really like the tools from IBM. And we use these tools.  But I don’t think this makes us an advanced analytics firm.

There are also a number of firms like ours that have data scientist and data engineers. Now don’t get me wrong, data engineers and data scientist are hard to find. In fact, a study by Bain and Company found only 4% of firms are able to combine the people, tools, data, and organizational focus that they need to take advantage of their big data. Data engineers and data scientists are key to what we do and without them, we couldn’t successfully execute the great projects for our clients that we get to work on! We also have consultants with deep experience in specific industry segments, who understand processes, who can help clients develop strategies and execute these strategies. Firms we compete with like IBM Global Business Services (GBS), or Bain & Company, or Accenture have a few data scientists and data engineers as well as numerous consultants. Most of these companies call themselves management consultants or professional services firms. But we don’t consider ourselves just a management consulting or professional services firms. These firms might have (and this is a stretch) 1-2% of their work force that are data engineers or data science professionals. One demonstration of our focus is that 30% of our staff are data scientists and engineers and over 50% of our staff works in the analytics development and delivery group.

There are also firms that have programmers, project managers, and technical staff who can write python code, or do system deployments, or integrate tools into client’s on-premise systems or cloud-based environments like we do. They often call themselves system integrators, or solutions providers and they drive a significant portion of their revenue from selling software. But we don’t think we are a system integrator or solution provider firm since our primary focus is not selling software or software tools.

So, what is Virtulytix? Virtulytix is an applied analytics firm focusing on solutions for the Industrial Internet of Things (Industrial IoT). What does that mean? It means we bring all of the tools and resources to the table to enable our clients to leverage their data to drive significant operational improvements. We specialize in the industrial IoT sector by creating manufacturing quality, maintenance / service optimization, and logistics optimization platforms using machine data. In order to develop these platforms, we bring together:

  • Management consulting and analytics to understand customers business processes, IoT infrastructure, and enablement barriers.
  • Processes, knowledge, and analysis skills to develop use cases that describe the financial value of overcoming these barriers and optimizing client’s business processes.
  • Development and deployment of the industrial grade analytics decision making platforms which fit into your existing IoT and corporate IT infrastructure. We also develop pay-for-use analytics decision making platforms to allow clients to capture the benefits without the expense and time of developing an in-house solution.
  • Training, processes, and support to operate these platforms and ensure they will grow and evolve with the client organization’s needs.

As an applied advanced analytics firms, we will often leverage partners to provide cloud services, IoT sensor installation, gateway deployment, ERP systems, or other technology components which are important and necessary for our solutions, but not part of our core skill set. We outsource or partner for these services. We are big believers in selecting the best partners to deliver turn-key platforms rather than doing it all ourselves.

Some would argue that IBM’s GBS unit is an applied analytics firm – which it would be hard to dispute.  But they also have a significant management consulting practice, and they have a strong incentive to sell IBM software. So, I wouldn’t call them a ‘pure’ Applied Analytics firm. Also, while IGS is exceptional at working with massive, global deployments, they can’t scale down very well. If you’re looking for a plant level pilot or you are a firm that is ‘mid-market’ (generally considered to be firms with $100M to $3B in revenues), GBS will have a difficult time ‘scaling down’ to do proof of concept (POC) pilots or deployments.  We live in the mid-market, and even develop platform solutions using a SaaS model for smaller businesses (our SuppliesIQ being a prime example).

We think it takes unique skills and abilities to be a true applied advanced analytics firm. So, we aren’t an advanced analytics firm, we aren’t a management consulting firm, and we aren’t a systems integrator. We are a little bit of all these – but by bringing these things together, we feel we are much more than any individual component. The graphic below shows a few of the major components (but not all) that we bring together to develop and deliver advanced analytics solutions.

Based on our discussions with clients, partners, and other members of the Industrial IoT ecosystem, I think we are a pretty unique type of company. What do you think? How would you label our firm? Do you know of other companies that offer similar abilities? How would you describe them? We would love to hear your thoughts.

CRISP-DM the Scrum Agile Way. Why Not!

by Nameeta Raj, Virtulytix, Data Scientist

Do you often find yourself in the middle of an infinite data preparation, modeling and testing loop? How about utilizing the rapid-delivery agile software development methodology for your analytics projects?

Figure 1: Phases of CRISP DM

What is CRISP-DM?

The cross-industry standard process for data mining (CRISP-DM) is a framework used for creating and deploying machine learning solutions. The process involves the phases as shown in Figure 1.

There have been times when I found myself stuck in between a never-ending data preparation, modeling and testing phase, which has left me pondering around the minimum viable product concept of scrum agile.

What is Agile and What is Scrum?

Agile is an iterative software development methodology intended to reduce the time to market (time it takes from a product being conceived until its being available for sale). Scrum is one of many frameworks that can be used to implement agile development. In scrum agile, development is done in sprint cycles, and at the end of each sprint a minimum viable product is deployed. Typically, a sprint ranges anywhere from 1 to 4 weeks.

Extending Agile Software Development Approach to Analytics Projects

Figure 2: CRISP-DM the scrum agile way

Let us see how the merger can be accomplished. Any new requirement is prioritized and added to the product backlog by the product owner. The typical time-bound scrum meetings that are conducted are listed below;

Product Backlog Refinement Meeting:

The meeting should take place a few days before the start of a new sprint. The aim of the meeting is to understand the basic business, analyze cost benefit, and check the data scope. Initial estimation, finalization of the definition of ready and acceptance criteria are included in the meeting agenda. Business success criteria and data accessibility are some of the factors that can constitute towards the definition of ready.

Sprint Planning Meeting:

The meeting should take place right before the start of a new sprint. By the end of this meeting, the team members have a thorough understanding of the requirement, which would cover a substantial portion of the Business understanding phase of CRISP-DM. Re-estimation of items in the product backlog is done if required. The few days lag between the backlog refinement meeting and the sprint planning meeting is to ensure that all related activities required to meet the definition of ready has been completed. The acceptance criteria are finalized, the first sprint with a new requirement will aim at creating a minimum model fit to be demonstrated at the end of the sprint. Each consequent sprint will include further data preparation, data cleansing, and model enhancement activities. Taking the teams past velocity into consideration finalized requirements from the top of the product backlog are moved into the sprint backlog. The team is now committed to deliver the items on the sprint backlog and is ready to step into the next sprint.

Daily Scrum Meeting:

The 15-minute daily standup meeting is conducted to answer three main questions. What work was completed the previous day? What is the work planned for the day? Are there any issues obstructing progress?

Sprint Review / Customer Review / Demo Meeting:

The meeting is scheduled on the last day of the sprint. During this meeting the work committed by the team is compared to the work delivered. A brief demo of the completed work is done during this meeting. An overview of the data engineering activities along with the model created can be demonstrated to obtain feedback and new ideas from the team and stakeholders. These ideas can be implemented to improve the data engineering / modeling process in upcoming sprints. Any potential flaw in business understanding or irrelevant hypothesis testing can also be caught very early on during the demo session.

Sprint Retrospect Meeting:

The good, the bad, and the ugly of the completed sprint are discussed in this meeting.


I see a few probable advantages of using the scrum agile methodology. Those advantages include all stakeholders being well informed of the project progress right from the beginning. Potential never-ending modeling cycles can be eliminated, thus saving time. The sprint demo facilitates healthy team discussions and sharing of ideas. Technical bugs or mistakes in understanding the requirements can be detected very early during the lifecycle.


Will the Real Predictive Analytics Please Stand Up

by Scott Hornbuckle and Nameeta Raj

As an entrepreneur that focuses on utilizing leading edge technology to improve my clients’ businesses, I am often faced with people and companies using buzzwords carelessly, with little to no substance behind their claims. Predictive analytics along with big data, IoT, etc. are all the rage, but what is real, and what is just marketing fluff?

Let’s take predictive analytics as an example. Wikipedia defines predictive analytics as:

“Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.”

By this definition, predictive analytics would essentially be the utilization of a combination of statistical algorithms combined with machine learning and data mining used to predict a future event based on patterns discovered in historical data. Here’s the key, in order for a solution to be considered predictive analytics, it must include all of these components. Frequently when meeting with prospective clients, we are told that they are already using predictive analytics. When we probe a bit deeper, we discover that the client has created a spreadsheet that uses a simple linear regression equation, or they are using the linear algorithm included in a SQL database. While this is all fine and good, that’s just statistics, not predictive analytics.

Let’s go over an example from the office products industry. We developed a solution called SuppliesIQ to help printer/copier dealers reduce the cost of wasted toner from cartridges being changed out before they are empty. SuppliesIQ makes use of a time series modeling technique to ensure just in time (JiT) delivery of cartridges. SuppliesIQ is highly dynamic and chooses the best fitting model from a wide range of seasonal and non-seasonal time series models, not for each device, but for each cartridge within the device. An autoregressive integrated moving average (ARIMA) model forms the base model for SuppliesIQ. The models are created with the help of IBMs Predictive Maintenance and Quality platform which enables switching between ARIMA and exponential smoothing models to find the best fitting model for the toner cartridge. Historical data permits the model to identify quarterly monthly and weekly seasonality and adjust the predictions accordingly.

The graph below shows SuppliesIQ in comparison to a basic linear regression model present in the market. The orange line represents the actual toner levels; the blue line represents the predicted toner levels by SuppliesIQ, and the green line represents the estimated empty date according to the linear algorithm. The SuppliesIQ model accurately captures the straightforward weekly seasonality and the graph is relatively flat on weekends.

Figure 1: SuppliesIQ vs Linear Regression prediction

This cartridge ran empty on 11/14/2017. The linear regression model predicted it 6 days after the cartridge ran empty whereas the SuppliesIQ model predicted it a day after. Due to the short cycle length of the latest cycle, this linear regression model could not completely adapt to the increased printing behavior.

Figure 2: Six-month toner cycles

So, what are the key things to look for when determining whether or not a solution truly uses predictive analytics? Here are three key thinks to look for:

  1. More than just statistics: Advanced statistics are a key component of any predictive analytics solution. However, if one is simply using an out of the box linear algorithm from a tool like Excel, SQL, etc., I wouldn’t consider this to be predictive analytics. It can make fairly rudimentary predictions, but these are not the same.
  2. It’s dynamic and adjusts to changes in environment: This is one of the key components that separates true predictive analytics from the posers. Business environments are continuously in flux. This is due to business cycles, seasonality, scaling up/down, etc. An example of this is a school. If a printer is low on toner, when should the cartridge be shipped? Well, this depends on the context of the device. If it’s May, and the school is getting ready to dismiss students for break, the toner cartridge may be able to last until the start of the next term. A static model wouldn’t take this usage change into account. True predictive analytics looks at how each cartridge in each device is used and adapts to the user behavior. We will explore this topic further in a future blog.
  3. The model gets better over time: Machine learning is a key component in predictive analytics. Using a machine learning enables the model to improve over time automatically. Static regression models must be updated manually and applied broadly. This quickly shows the benefit of true predictive analytics. Predictive analytics takes into account the devices history, the accuracy or predictions made in the past, and adjusts accordingly. This dynamic improvement is essential in the rapidly evolving business environment we all work in.

In conclusion, there are a lot of companies claiming to offer predictive analytics. The technology is powerful and can enable companies to dramatically improve their businesses and evolve business models. However, the technology is complex, and the skill sets required to use the technology is in short supply. When you are looking to employ this technology, use the tips above to separate the real from the rest.

Is The Imaging Industry Under a “Positive Illusion”?

Is The Imaging Industry Under a “Positive Illusion”?

Edward Crowley

Investors tend to be overly optimistic and too confident when it comes to forecasting returns[1]. Newlywed spouses routinely hope and believe that their relationships will thrive, but research shows these hopes are often overly optimistic[2]. People are basically overly optimistic (okay – it’s a generalization and I certainly can name a few people whom I wouldn’t call optimistic!). We tend to overestimate the chances of something good happening to ourselves and underestimate the chances of negative events[3]. So what does this have to do with our industry? Perhaps a more than we think.

A recent headline caught my attention: “More growth predictions for MFPs”. The article referred to a forecast by Market Analyst HTF Market Intelligence, which predicts growth of 3.35 percent for the MFP market through 2021. Given the impact of MPS in reducing fleet sizes (an average reduction of 60% in terms of units during the first contract based upon Photizo Group research), the saturation of markets in business worldwide, and the slow-down in BRIC economics or at least the BRC portion it is really hard to imagine someone predicting growth.

So what gives? Two things. First, many OEMs are desperate for good news showing the market will grow to support optimist market projections to support a favorite project, or to gain more funding for marketing spend. Secondly, there are the natural human biases that things are going to be more positive in the future than they really are. Bad stuff happens to other people (and companies) – right?

Photizo Group believes that, for a number of reasons the market is entering a critical phase. One which is going to drive further consolidation of the market and declining unit and page volumes. We are being a contrarian in this – our competitors IDC and InfoTrends are forecasting growth. However, we don’t really worry about being contrarian. In 2006 we began saying MPS would be a game changer. At the time, and for several years there after, major firms such as IDC and Gartner said the market was over-hyped, that it was a passing fad, that it wasn’t that big of a deal. We were contrarian then, and we were right.

Understanding the potential for market contraction is particularly important because it requires very different strategies than growth markets. It requires a focus on business model and cost restructuring (not just reduction – but radical restructuring). Down markets require investment in process and logistics optimization which will drive fundamental shifts down in operating costs.

Most of us at Photizo Group have spent our career in this industry so we would love to call for growing unit volumes, increasing pages, and increasing profits for the industry. But we can’t – this would be a disservice to our clients and even our own integrity. As a trusted advisor to our clients, we have to give the unvarnished truth – whether it is easy to accept or not. And our view of the truth for the future is that we expect unit and page volumes to decline. And this is going to drive a fundamental shift in the industry. Are you ready for the shift?

[1] Handbook of Contemporary Behavioral Economics, Routledge Taylor & Francis Group, © 2006, p. 713

[2] Justin A. Lavner, Benjamin R. Karney, and Thomas N. Bradbury, “Newlyweds’ Optimistic Forecast of their Marriage: For Better or Worse?”, Journal of Family Pscyology 2013, August (27)4: 531-540


Change is Here to Stay …

by Mario Diaz, VP Consulting Services, Photizo Group
January 30, 2017

Photizo Group

The last few years have set a record pace for mergers and acquisitions. According to The Wall Street Journal, the value of mergers and acquisitions globally was over $4.3 trillion in 2015. Consolidation in the IT channel accelerated, as large solution and service providers acquired competitors and firms with strategic assets. The CRN Solution Provider 50, the fifty largest solution providers in North America, collectively executed more than 50 acquisitions in 2015.

Private equity (PE) is rapidly driving consolidation in the channel as PE-owned partners rapidly acquire smaller, regional players, and extend their reach in key geographies and industry verticals. Vendors will find that some of their largest partners are getting larger and gaining more channel power, which will have a significant impact on global and regional sales programs.

Business models in the channel are blurring. Leading solution providers are taking steps to capitalize on the growing demand for cloud services, software and application development, mobile solutions, and security. Innovative channel partners are investing to reduce dependency on declining technology resale margins and to develop higher-margin professional and managed services beyond print to support the SMB and Enterprise markets.

Along with solution providers, distributors, and service providers some vendors are making strategic acquisitions as well. The Apex acquisition of Lexmark adds software, services and an Enterprise focused channel presence to their business. Konica Minolta is aggressively acquiring managed services providers in Europe and North America. These companies provide a portfolio of IT services and related services such as application management, desktop, mobility, communications, hosting, and cloud. Ricoh acquired several leading providers of managed IT, cloud, data center, and professional services to small and mid-sized organizations. These acquisitions are examples of the strategic investment by vendors to expand and deepen their services portfolio to target the global SMB and Enterprise markets.

As the pace of mergers and acquisitions changes the face of imaging technology and solutions delivery to the customer, channel partners will need to take the right steps to be competitive.  To help move in the right direction, it is critical to know the real level of performance when measured against the ‘best of breed’ competitors in the industry.

Manufactures will need to know the composition of their channel partners by measuring them using objective, independent measurements (such as the Photizo Leader’s Index) rather than just sales revenue. Sales may be great today if that is the only measurement of success, but what if the partners are the ones who get displaced? Can manufacturers afford to depend on the partner’s subjective perception of how competitive they are?

Finally, Imaging Industry OEMs need to support channel partners by providing the proper tools, training, and high-value sales, marketing and customer success content in real-time. Along with hardware, channel partners are selling more complex software and services to maintain customer engagements as they position their companies as a trusted IT partner to grow revenue.



Is New York City emulating Barcelona?

Remember several years ago the movement to install Wi-Fi throughout cities? Apparently, the movement stopped mainly due to competing interests by telecom companies. News from the Big Apple might refire the kindling in that movement.

New York has started to install the fastest public Wi-Fi in the world in some of the unused phone booths located throughout the city. These “Wi-Fi hubs” will provide free gigabit downloads, free domestic VOIP phone calls, Internet browsing, and USB charging. The name of the plan getting implemented has been dubbed LinkNYC and will see a total of 7,500 installations scattered throughout the Bronx, Brooklyn, Queens, Staten Island, and Manhattan. (By the way, Barcelona is one city that has been trumpeted as a leader in providing free Wi-Fi for its citizens).

A Link at Third Avenue and 16th Street in Manhattan

The plan was to have about 4,550 hubs installed by July 2019 and 7,500 hubs by 2024. By 2024 LinkNYC is supposed to be the largest and fastest public, government-operated Wi-Fi network in the world.

How’s New York going to pay for this? Glad you asked – over 80 percent of the hotspots would have over 1,500 square inches of advertising displays built in. It has been estimated that this will generate over $500 million in revenue over the next 12 years. Besides the eyeballs that will see the advertising, the data collected will be a treasure trove of information that will be aggregated and anonymized and used to target ads for pedestrians.

Besides this project, New York is considering another new cellphone-signal-boosting technology (that is currently being used in Los Angeles) that would be installed on New York street lamps. Both of these undertakings could create a new economic business model that adds more choices, raises competition, and lowers the costs of Wi-Fi for consumers.

This is a new way of addressing the ‘digital divide’ in the U.S. This experiment is worth observing and supporting because most of the public libraries have lines of consumers waiting to get online because they don’t have access to broadband speed home Internet.


“The Golden Age of Paper,” The Rise of the World Wide Web (Internet), and Shift of Paper From Necessary to Convenience

Mundo-3WTwenty-five years ago, in 1991, the World Wide Web was born. Yes, each time you type you are using the World Wide Web, the Internet! The rise of the Internet was in many ways a golden time for paper. Initially, email was predominantly used as a creation and ‘transport’ mechanism. People would write emails and send them from their desktop (or by the late 90’s, their portable), but most people still printed off emails to read them. In fact, during the 90’s one of the highest volumes of printed material was email.

Read More