Get a personalized demo and find out how to accelerate your time-to-market
In the fall of 2008, I was working on my third startup, ReTel Technologies. Our goal was to analyze shopper behavior in grocery stores, and use that data to help stores and brands improve the customer experience and store profitability. But we had a challenge: how do you anonymously track hundreds of shoppers per day in a store? We thought we had the answer: active RFID tags on every shopping cart. We ponied up $25,000 to purchase a massive customized Oracle server and 50 active tags, and outfitted our first test store. The results came in, revealing fascinating insights around where shoppers spent time, and what opportunities existed to change the store to improve the experience. The chain’s management were impressed — until we asked if they’d like to buy and implement this system. “Install all that equipment? There’s no way that will hold up in day-to-day use at our stores!” We were dejected, but we were undeterred. We knew there had to be a better way to do this that fit within their existing stores’ operations — and that’s when we noticed that all of their locations were rife with security cameras. And we had just happened to come across a relatively new open source library that held the key to transforming those cameras from passive security devices to active data sources: OpenCV. Connect a camera to the cloud, use OpenCV’s motion detection algorithms to detect a shopper, and analyze the behavior – we were back in business. And, just like that, my career took a wild swing into perception. I haven’t looked back since.
Since that lucky discovery of OpenCV in the late naughts, and having spent over fifteen years working in perception and computer vision since, I feel qualified to share some observations about these transformative technologies and their place in the world. Let’s dive in!
Fifteen years ago, academic laboratories and early stage startups focused on computer vision had their sights set on a completely different set of problems than those that are in vogue today. If you were working at a top vision lab in 2010 or 2012 at TUM or ETH, for instance, chances were that you were digging into a difficult challenge around 3D scene reconstruction, or multi-sensor calibration, or perhaps a new technique for SLAM. Outside of academia, excitement around virtual reality and augmented reality was reaching a fever pitch, leading engineers in industry to focus on vision tasks around positional tracking and screen refresh rates.
What you were less likely to be focused on was AI or machine learning. Certainly, at that point in time, there were companies and academics focused on extracting training data from sensor sources for AI-driven applications, but they were the exception, not the norm. Today, that script appears to have flipped.
While there are still many companies and research labs focused on solving or improving upon core computer vision and perception tasks (such as Tangram Vision and multimodal calibration, for instance!), there are now orders of magnitude more that are focused on AI-driven computer vision applications. While this isn’t necessarily a bad thing, it is important for those in both academia and industry to recognize that those core areas of perception and computer vision that were once popular still aren’t entirely solved, and there are rich veins of opportunity for curious engineers and researchers who want to move the field forward. And, yes, applying AI techniques and libraries to some of these areas could very well yield breakthrough results. That rising tide of AI should lift all computer vision and perception boats.
Related to the previous point, the preponderance of researchers and engineers to focus in on AI-driven applications of computer vision has benefited some areas of research much more than others, leaving many very interesting challenges less explored. Do we need more research into scene segmentation or facial feature tracking? Sure, but moreso than that, the industry would likely benefit from more attention to vexing problems that have yet to be sufficiently attacked.
Examples that spring immediately to mind include feature detection along self-similar surfaces (a very tricky problem remaining for autonomous warehouse robots, for instance), or developing better algorithms for high dynamic range sensing in extreme lighting conditions (super important for agriculture automation). These seemingly mundane challenges may not generate attention-grabbing videos that go viral, but they can fundamentally shift what is possible in fields like robotics and automation by adding robustness to important tasks like navigation and obstacle avoidance.
Unlike building a mobile app or a SaaS company, building a robot brings a completely different set of risks, timelines, and capital requirements. As a result, robotics companies have traditionally required relatively larger amounts of capital and time to get to market. For those outside of the robotics world, it may be reasonable to assume that a robotics company that has raised $50M, or $100M, or $250M, or even $500M must now have a fleet of hundreds, if not thousands, of devices deployed to customers.
In rare cases, this is true. However, this is mostly the exception to the rule. Over the past couple of decades, a number of high profile robotics startups have collapsed after having raised hundreds of millions of dollars, but having failed to ship more than a handful of units. Beyond simply needing more capital to build prototypes and deploy customer units, robotics companies have also suffered from a few self-owns. Let me explain more below…
So why would a robotics company raise tens or hundreds of millions of dollars, yet only deploy a couple dozen robots? I believe part of the answer lies in the propensity for some robotics founders to hew too aggressively towards a “not built here” mentality. Armed with first principles thinking and a seemingly endless war chest of millions of dollars in venture capital funding, it can be tempting to want to build an entire robotics hardware and software stack from scratch, relying on third party vendors as little as possible. Put simply, this can be a fool’s errand.
Now, it’s fair to say that there are some robust open source tools (for instance, ROS) and well understood techniques that are so well developed and pervasive that it makes perfect sense to build on top of them instead of buying a solution from a vendor. In fact, I’d wager that most robotics and autonomy companies will have integrated some form of this into their stack, as well they should. In some cases, there simply is no other choice, as there is yet to be a well developed third party tool or system that could offload the effort otherwise required. So I applaud finding a reasonable balance, with an eye towards building quickly, cost-effectively, and with best practices.
Yet still…take sensor calibration, for instance. Even here at Tangram Vision, where we have amassed a team of some of the most talented perception engineers available, it has still taken years to develop and refine our multimodal calibration system to a point where it can support the deployments of thousands of robots. I continue to see well-funded robotics companies insist on taking this task on themselves, and I wish them the best of luck. If it took us three years, it will probably take them ten…which is nine years too many.
Conversely, we know that our best prospective customers come from companies where the founding team is not on their first rodeo. They’ve made the mistake of trying to build the full stack from scratch at their previous robotics startup, and ended up falling short of their goals. In their second go around, they’ve opted to work with companies like ours to get to market faster, and reserve precious internal engineering resources on core product roadmap items that truly differentiate their devices in the market.
Fifteen years ago, when I began my career in perception, software-addressable cameras and other sensors were relatively rare. Companies like Axis were introducing a new class of IP cameras that could be accessed via the internet, and processed with tools like OpenCV. Technological leaders in the automotive industry like Daimler Benz were deploying the very first sensor-powered ADAS features on high-end models like the Mercedes S Class. Hollywood studios were advancing the art of special effects with techniques like match moving.
Fast forward to today, and the world is awash in cameras, computer vision, and perception technology. A vast majority of the planet’s population carries a smart phone with an incredibly high quality camera, massive amounts of onboard compute, and the ability to access sophisticated computer vision and perception capabilities on a whim. Nearly every new car sold is equipped with multiple cameras and sensors that enable advanced ADAS features to improve road safety and the driver experience. In the world of entertainment, cameras may no longer even be necessary, as AI-powered systems can generate entire cinematic pieces with little to no human intervention required, having been trained on millions of hours of pre-existing real-world imagery.
Perception and computer vision now find themselves as central participants in diverse industries that touch every aspect of the human experience: medicine (analysis of medical imaging), finance (visual interpretation of financial chart movements), defense (autonomous drones), food production (vision-powered weeding, fertilizing, and harvesting), transit (biometric passports), and the list goes on and on.
After a decade and a half in the perception and computer vision space, I only see it continuing to accelerate in its importance and in the richness of applications. That’s part of what makes Tangram Vision such an exciting company. While, for now, we’re focused largely on mobile robots and autonomy, we’re excited to see what other industries build and shape themselves around sensors and perception. If the past fifteen years is any indication, the possibilities are endless.