About OSCAR
Project Mission
OSCAR is a project designed to quantify and analyze open source software contributions, specifically tracking GitHub activity across different technology companies.
Brief History
This project began in 2018 while Iwas working at Adobe. One of my responsibilities included managing Adobe's Open Source Office. Matt Asay, my boss at the time, asked if we could quantify the impact the Open Source Office has had on Adobe and roughly compare Adobe's open source activity to that of other technology companies. Inspired by Felipe Hoffa's "Top contributors to GitHub" work, this project slowly evolved over time, and over the years I have collected a lotof data.
How It Works
OSCAR works on an hourly "event loop."
1. Data Collection
      The system downloads hourly GitHub activity archives from
      GitHub Archive, a public dataset that captures all GitHub
      public activity. We specifically track activity on repositories that were forked or watched in
      the previous 30 days, maintaining a rolling 30-day list of "popular" projects. Why do this? As
      a low-pass filter: these "popular" projects end up accounting for about 15%of
      public GitHub git pushevents.
    
      Next, we look at all public GitHub git pushevents and try to find information
      about the users pushing code to these projects.
    
2. User-Corporation Association
      For every user contributing to these "popular" projects, we query the
      GitHub APIto retrieve the companyfield from their profile. Company associations are extracted from user profiles and tracked over time, allowing us to
      detect when developers change employers or update their affiliations.
    
      There is some regular-expression'ing going on to roughly associate these company
      strings to specific corporations, but we do our best, especially for known companies. It's not
      a perfect way to create these associations, but it's better than looking at e-mails associated
      to the commits (which most other similar analyses use as their approach). For me personally, I
      associate my personal e-mail with my git commits, so I wanted to try a different
      approach.
    
3. Analysis and Storage
The User-company association data is exported to Google BigQueryfor large-scale analysis: every month, we generate comprehensive reportson corporate GitHub activity across monthly, quarterly, and yearly timeframes, providing insights into which organizations are most active in the open source ecosystem.
Acknowledgments
This project is built on the shoulders of giants and would not be possible without the following open source technologies:
- Architect Framework- Serverless application framework powering our infrastructure
- Enhance.dev- Progressive web framework for the frontend
- GitHub Archive- Public dataset of GitHub activity
- Charts.css- CSS utility classes to style HTML tables as charts
Special thanks to Felipe Hoffa for pioneering GitHub data analysis techniques and inspiring this work.