First There Was Open Source Scanning

By now, most software developers understand that open source is free as in “free speech”, but not as in “free beer”, i.e., you can use it, modify it, and redistribute it, but you have to abide by the conditions of the license under which it was granted to you. See more in this GNU Open Source Definition. Also, since open source software is code like any other, it comes with varying quality.

Many software developers became acquainted with the legal requirements of open source licenses through open source audits. These are often required by external parties, e.g., in the case of M&A (see here an article by legal expert Haim Ravia about open source legal requirements). In other cases, the requirement is internal, raised by a legal counsel, security auditor, or compliance officer. But, increasingly more and more software development executives see the need and value in managing open source components in their product. Since open source software is code like any other in your product, it comes with varying levels of quality and security. And since, according to analysts, open source typically makes ~70% of the total code in a commercial software product it only makes sense to manage it.

Today, most software development executives use some processes to ensure (1) that they know what open source goes into their code; (2) that their developers chose open source components with proper security and quality; and (3) that their developers have not inadvertently include open source with licenses that endanger their own intellectual property.

Back in 2002, a startup named Black Duck Software pioneered an automated way to search for and identify open source code that was introduced by developers. In a nutshell, the method was based on scanning the code and identifying pieces of code (aka snippets) that resemble code that appears in known open source components. The user is then alerted to the similarity and should check each such instance. Soon, a few other vendors offered a code scanning solution to the open source discovery challenge (e.g., Protecode, Palamida, and Open Logic).


5 Pitfalls of Scanner-Based Open Source Management Solutions

Over time, it became clear that scanning is not as easy and automated as one may think.

1)  Many False Positives

One big challenge that increasingly arises is that open source code scanners often produce a large amount of “false positive” alerts. False positives are seemingly matching snippets, but that at a closer look turn out not to be coincidental and not really part of an open source component. Such false positive alerts will typically be flagged by the open source scanning solution, and then ruled out by the developers.

This is where the problem arises. So long as the total number of open source components is small, the number of false positives may be manageable. However, over the years and especially since the open source scanning solutions were introduced, the number of open source components out there grew exponentially. To give you an idea, the WhiteSource repository today contains about 2 million open source components in languages such as Java, Ruby, Python, and NPM, and another 60 million open source files in languages such as C/C++, Javascript, PHP, and ObjectiveC, etc. Now, with so many open source components available to developers to use (and quickly growing), one is guaranteed to have some coincidental snippet matches. How many? Thousands! A friend who uses an open source scanner complained about being presented with no less than 70,000 (yes, thousands) false positive alerts for a single product. Sifting through these false positives, especially in the days leading to a release or as part of an M&A due diligence process, can be quite tedious. If your developers use a scanner, I strongly recommend you check how many false positives it generates…

2) Scanning for Open Source Kills any Agile SDLC Process

Since scanning is time-consuming, it can almost never be done on a continuous basis. Today, many software development teams make an effort to make their SDLC increasingly agile. Many software vendors (especially SaaS) release new versions every month or even more frequently. Using a scanner to identify open source components brings this continuous process to a halt. It is not uncommon for the automated scanning process to take weeks to complete, which must then be followed by lengthy review of the alerts. Even if you are still using the waterfall development model, this can introduce significant delays to your release process.

download our free datasheet today - learn how to choose the open source solution that fits your needs!

3) Costly Tear and Replace

Things get much worse.

Consider your release process. Suppose that you have taken the time to scan your code for open source right before the release. What if you find that a developer used an open source component with a license that does not fit your policy, or with severe security vulnerability, or even just a version that is not favored by your support team? You can forget about your planned release date. You will now have to remove the rogue component, and pray that you can find a decent alternative. If you don’t, your developers may have to sit down and develop those same capabilities from scratch. And if you do find a similar component with a more permissive license, you will still have to undo and redo a lot of the development effort around integrating that component. Did I mention that your release schedule is screwed?

And if you only care to do the scanning prior to a due diligence process… Well, be prepared to explain to your CEO and shareholders why their exit will not happen. Not now. Not with this buyer. Not at this price.

4) Timing is Absolutely Critical when it comes to Security

When security vulnerability becomes known, it is critical to fix it as soon as possible because that is when potential attackers are best positioned to exploit it. This holds true for both proprietary as well as open source code. (See a previous post where I discuss software composition analysis from a security perspective). Unfortunately, in a scanner-based paradigm, you will only know about vulnerabilities the next time you perform the scan, which as we already explained can be months later. Even worse, if your solution is deployed on premise (rather than being provided as an always up-to-date cloud service), then you will also not know until your database has been updated. In contrast, a good continuous solution will easily alert you as soon as the vulnerability is known and then as soon as a fix is available. Put differently, if your solution is not continuous, your customers will remain vulnerable much longer.

5) Expensive to Buy and Operate

Proper deployment of a source code scanning solution requires a lot of expertise. Even more work is necessary if the solution is deployed on premise and needs customization. It is not thus a coincidence that most scanner solutions are offered with hefty professional services packages. If you watch the work of the professional services engineer, you will quickly learn that it is not easy to tune the scanner to achieve high precision, i.e., on one hand to identify open source components, and on the other hand reduce the number of false positives.

Actually running the scanner takes substantial amount of time. Then going through the list of false positive alerts is also a big drain on your developers’ expensive time. Both of these introduce substantial and costly delays into your release schedule. In fact, all these post-purchase costs typically come to a total price tag that can only be sustained by large companies.

And this is before counting the cost of post-hoc replacements (the tear and replace mentioned above), nor the cost of potential security issues due to late discovery.

Download free guide: "A survival guide to using GPL"


New Agile Solutions make it Easier to Better Manage Open Source Components

With today’s technology, properly managing your open source components need not be such a nightmare. New solutions traded the scanner-based approach with proper integration into tools that drive your software development lifecycle. For example, a simple plugin to the build tool will identify the open source components when they are first introduced and baked into your product. A back end server, typically a cloud-based service, will continuously track all open source components, and will provide a true-to-the-minute inventory report and licenses analysis. The same server will notify you, proactively, when security vulnerability first becomes known, as well as when a patch becomes available. In other words, this new technology provides timely input, reduces risks, requires no effort, and is a lot less costly.

As always, new approaches often result from personal pain and needs. We, the founders of WhiteSource, pioneered this new approach in 2011 after struggling with the usability of a scanner during the acquisition of our previous company.

Here are 8 benefits of the agile continuous integration approach:

  1. Zero work for developers. Installing the plugin takes minutes, and from that point on, all open source components are discovered, tracked, analyzed, and reported automatically.
  2. No false positives. Since the approach is not dependent on snippets, it scales to large number of open source components without producing a growing number of false positives.
  3. Immediate results. Since there is no scanning to perform, and since there is no need to review false positives, you can have a full analysis of your open source components just minutes after installing the plugin.
  4. Always know what you have. The plugin will report the open source in each and every build, and you can analyze and report on these on a daily basis (vs. once in a long while when a scan is performed)
  5. Have only what you want. You can set a policy in the server, and the plugin will notify the developer or even fail the build if a newly introduced open source component does not meet your policy.
  6. No tear and replace. Potentially rogue open source is discovered “at the door” and vetted against your policy before you invest in its integration, not after you have already invested much sweat at integrating it into your product.
  7. Security alerts are provided on time and into the future too. The service continues to match your open source inventory against newly discovered security vulnerabilities, proactively alerting you as needed. This continues to happen even long after your most recent build.
  8. Miniscule cost. Since it is fully automated, the cost of an agile solution is usually a small fraction of a comparable scanner-based solution. And this is before counting the burden on your developers and potential risks.


Move Forward to Agile Open Source Management

If you are still using a scanner-based solution to manage your open source, you must look into new generation agile solutions.

If you are using an agile development method, then you simply cannot continue to depend on a scanner-based approach. But even if you choose to stick to the waterfall model, the new tools require a lot less effort from your developers and DevOps team, provide a lot more functionality, and cost a small fraction of the old solutions.

WhiteSource pioneered the agile approach to open source management back in 2011. Many development teams worldwide, in all verticals, all company sizes, and all geographies count on us to manage their open source and are free to devote their time to their own innovation. If you consider dumping your scanner, call us for a quick demonstration. In less than an hour we can show you a full analysis of one of your products.