In October 2021, a Fedora Linux user asked a question about licensing. Fedora Project Leader Matthew Miller left a response: “Since we don’t have a complete, exploded, searchable repository of all of the packages in Fedora, I don’t have a quick way to check.” 

Followed by: “…or possibly pay Sourcegraph to do it for us. They seem like nice people.” He is correct, we (Sourcegraph) are nice people, but we don’t want your money. Instead, we wanted to team up with the Fedora community.

The Fedora Community can now search their universe of open source code—currently over 34,000 repositories and counting.

Introduction to code search

For those who aren’t familiar with the concept of code search, it enables teams to onboard to a new codebase and find answers faster, helps to identify security risks, and many other use cases. Sourcegraph has indexed over two-million repositories across multiple code hosts such as GitHub and GitLab. This article is going to focus strictly on code search for src.fedoraproject.org. Sourcegraph provides both a web app and CLI interface.

Using the Web app

When using the Sourcegraph web app you will need to start each search with repo:^src.fedoraprojects.org before entering any search queries. Using this link to the web app will include this initial string as shown here:

Sourcegraph web app interface

The following sections will provide some web app examples of searches that might be of interest.

Find repositories using popular OSI-approved licenses 

The following query will scan all the repositories for software that is compatible with the “Open Source Definition” (OSD).

repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|gpl|lgpl|mit|mpl|cddl|epl.*$
License search

Find files with TODOs

The following query can find TODOs in 34k repositories. This is great for those looking to contribute to projects that need help.

repo:^src.fedoraproject.org/ "TODO"
Search for TODO

Find files being served via FTP

A co-worker of mine from back in the day told me “FTP is a dead protocol”. Is it? You can add to this query to find any other protocol such as irc, https, etc.

repo:^src.fedoraproject.org/ (?:ftp)://[A-Za-z0-9-]{0,63}(.[A-Za-z0-9-]{0,63})+(:d{1,4})?/*(/*[A-Za-z0-9-._]+/*)*(?.*)?(#.*)?
Search for protocol

Find files with a vulnerable version of Log4j

This query will find any files that are possibly vulnerable (false positives can happen) to CVE-2021-44228 aka Log4j. You can also search for other vulnerabilities that can then be reported to project maintainers.

repo:^src.fedoraproject.org/ org.apache.logging.log4j 2.((0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15)(.[0-9]+)) count:all
Search for log4j

Use the CLI

Sourcegraph also has a command-line interface tool called src, which allows you to do everything I just mentioned above, plus other useful commands like getting results in JSON for programmatic consumption.

src search -json 'repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|g
pl|lgpl|mit|mpl|cddl|epl.*$'

JSON output

JSON output

Search Syntax

The examples shown may be a good starting point but are by no means the only queries that may be made. You can view all search query syntaxes and create your own as needed.

Conclusion

As you can see, with Sourcegraph, the Fedora Linux community can now quickly search for all code hosted at src.fedoraproject.org, regardless of whether they are literal or complex regex queries.

I appreciate the Fedora Linux community being so helpful and welcoming. If you have anything you want to add or questions, my team and I will be in the comments section below. You can also join us on Slack.

Special thanks to Vanesa Ortiz for making this collaboration happen, Ben Venker for his help fixing my broken regex (multiple times), as well as Rebecca Dodd and Nick Moore for their help with editing.

Justin Dorfman

Justin Dorfman is Sourcegraph’s Open Source Program Manager and is responsible for
fostering the adoption of universal code search in the open source community. Previously, he led similar initiatives for Curiefense (a CNCF project), Gitcoin, & MaxCDN. Justin has contributed to Bootstrap, Font Awesome, jQuery, Nginx, GNU Bash, Zsh, and many more. He also serves on the Selection Committee for Mozilla’s Open Source Support (MOSS) program and the Open Source Collective’s board of directors. In 2017, he co-founded SustainOSS, which hosts events and podcasts for Open Source Software Sustainers.

Posted by Contributor