Data Umbrella PyMC 2022 Open Source Report

Author: Reshama Shaikh

High Level Summary

Number of participants who:

  • Registered: 76
  • Attended: 38
  • Submitted >= 1 pull request: 24
  • Countries represented: 10

Background

The PyMC open source working sessions were organized by Data Umbrella to increase the participation of underrepresented persons in open source, python and data science.

This report focuses on the summary, impact and lessons learned of the Data Umbrella PyMC Open Source Working Sessions.

Event Sessions

A series of 3 separate working sessions were organized, plus pre and post event office hours. Participants were paired with another person with whom they could work during the working sessions.

The office hours provided a casual, unstructured space for participants to introduce themselves and ask any questions.

The 3 working sessions were scheduled at different days of the week and times in order to provide options for folks in the community to attend who had varying schedules.

The intention was that some participants would be able to attend multiple sessions to build experience in contributing. Some participants attended more than one session, and two participants attended all 3 sessions and both office hours.

Pre-Series Office Hours

Photo not available.

Session 1

Session 2

Session 3

Post-Series Office Hours

Event Sponsors

This event was supported by:

Mariatta Wijaya of Google created a 3-minute video with inspirational tips on contributing to open source for the Data Umbrella community.

Schedule of Sessions

  • 02-Jul-2022: Pre-series Office Hours (13-14:00 UTC) (1 hr)
  • 09-Jul-2022: Session #1 (13-16:00 UTC) (3 hrs)
  • 22-Jul-2022: Session #2 (16-19:00 UTC) (3 hrs)
  • 4/5-Aug-2022: Session #3 (23-2:00 UTC) (3 hrs)
  • 18-Aug-2022: Post-series Office Hours (23-24:00 UTC) (1hr)

We varied the schedule of working sessions to accommodate participants from different regions and time zones.

Number of Attendees

Session Data Umbrella Organizers PyMC Mentors Community Contributors Note
Pre-series Office Hours 3 2 24  
Session #1 3 4 20  
Session #2 3 4 12  
Session #3 1 4 6 Asia-Pacific (a)
Post-series Office Hours 1 3 4 Asia-Pacific(a)
         

(a) Session 3 and post-series office hours were for the Asia-Pacific time zone.

Event Participants

We used a Sphinx website whose source code was publicly available. We provided instructions on how participants could add their information to the website. Participants had the option to add their name, photo and other information to the event website as contributors. For some participants, adding their information was a milestone because they were working with Git, GitHub, sphinx and submitting a pull request for the first time.

Contributions Statistics

The contributions during the working sessions were tracked in this PyMC OS-WS spreadsheet. Contributions included both submitting a pull request and opening an issue where observed.

We worked on a few different repositories for the PyMC project:

  1. video-timestamps: this is a beginner-friendly list of issues where contributors watch a video from the PyMCon 2022 conference and add timestamps
  2. pymc-data-umbrella: this is the event website. Contributors could submit PRs to fix typos or clarify the contributing guide, as well as add their information to the list of participants
  3. pymc-dev/pymc: this is the main code repository for PyMC
  4. pymc-dev/pymc-examples: this is the repo that holds notebook examples for PyMC

As of the date of this report (28-Aug-2022), these are the PR stats:

  • Open: 2
  • Merged: 56
  • Issues opened: 6

Timestamps

Timestamps were added for 16 videos.

Event website

A number of PRs were submitted to update contributor information.

Updating Jupyter Notebooks

This was a more intermediate issue for new contributors, which was updating notebooks with consistent information for sphinx rendering.

PyMC documentation

These contributions were in the main code repository.

Demographics

Of the 74 people who registered, 38 attended. Of the 38 who attended, 24 submitted a pull request. This funnel graph shows the breakdown, by gender.

A total of 38 contributors attended at least one event of the working sessions, including office hours. 14 of 38 (37%) identified as she/her. 24 of 38 (63%) identified as he/him.

Contributors joined from 10 different countries. Country information was provided based on where participants were joining from.

  1. United States of America: 13
  2. India: 6
  3. Ghana: 4
  4. Kenya: 4
  5. Germany: 3
  6. United Kingdom: 2
  7. Canada: 2
  8. Brazil: 2
  9. Colombia: 1
  10. Ireland: 1

Returning Contributors

There were 3 “returning” contributors. These contributors had participated in a previous scikit-learn sprint.

Spoken Languages

The event was run in English. Participants were asked on their registration forms to indicate if they needed a translator. No translators were requested.

We had a channel for #espanol_chat which was utilized at a session when there was a Spanish-speaking mentor and participants from Latin America.

This barplot shows the primary spoken languages by the participants.

Impact Report for Data Umbrella PyMC Open Source Working Sessions

Non-measurable Impact

Aside from the number of PRs that were merged and issues that were opened, there is non-quantifiable impact of the open source working sessions. Some examples include:

  • learning to set up virtual environment
  • using Git (fork, clone, branch, fetching another’s PR)
  • introduction to tests such as: flake8 (linting, formatting), pytest, “continuous integration”
  • learning about sphinx and documentation
  • learning about numpydoc validation
  • navigating through the codebase structure of pymc
  • digging into functions, learning about errors
  • interacting with contributors on GitHub
  • learning, in general
  • networking, meeting people from around the world
  • building confidence (making a dent in “imposter syndrome”)
  • having fun

Finding out About the Working Sessions

We collaborated with a group of Community Partners to share about the event series and provided the community partners with a Social Media Kit with text to share on various platforms to spread the word.

For those who attended the working sessions, this is how they learned of the event. The main avenues were by invitation from Data Umbrella, Meetup, Twitter, LinkedIn and their network (“word of mouth”).

Sessions Feedback

Feedback has been shared a number of ways:

  • Event survey
  • Social media (Twitter, LinkedIn)
  • Casually, in conversation during the office hours and working sessions

Survey

We received 5 responses to the survey. The primary reason the responses rate was so low is that these events were spread over a 7-week period and different people attended different events.

Overall, the feedback on the surveys was positive.

In response to the question “What are your favorite parts about the sessions?

  • Interacting with Mr. Christian and getting to know more about the community and workings.
  • Working with other people - a lot of time spent alone when learning usually so it’s a nice change and good to be exposed to other people’s ideas
  • Meeting core PyMC team and other contributors, networking, learning to contribute to open source project

Suggestions for Improvement

In response to the question “What could have worked better at the sessions?

  • I had (and still have) difficulty finding certain pages and links - between pymc contributing section and dataumbrella/pymc website I get confused, since the websites look similar but have different URLs
  • Call out need to fork both pymc and pymc-examples (or whichever one you plan to contribute to)

Pair Programming

Because there were 3 separate working sessions plus the two office hour sessions, it required some flexibility on who planned to attend the sessions. We provided a spreadsheet where participants could add their name to pair them up with a programming partner.

Challenges

Challenge 1: Emails going to spam

We communicated with registrants via email and Discord. For a number of people, the emails went to spam and they missed it. We do have a reminder on the registration form to keep an eye out on their spam folder, but emails were still missed.

Challenge 2: Preparing by reading

The event had a comprehensive website and the events were posted on Meetup with instructions as well as in multiple places (event website, Discord, newsletters, emails) on the process (join Discord, read through the event website, submit a registration form). Despite numerous reminders, a number of people did not join Discord, some joined Discord at the start of the event, which might indicate they missed reminders, some participants did not submit a registration form, some participants did not review the website, etc.

It is important that participants submit a registration form for these reasons:

  • They have read and agreed to the code of conduct.
  • They understand how the event will go and how to prepare.
  • Many participants have anonymous Discord profiles and this information is needed to track who is joining the event and can be added to the private channel.
  • We need to connect participants to their GitHub pull requests to track contributions.
  • We need participants email addresses to communicate with them about the event.

Challenge 3: Discord

Some participants had technical issues with Discord. We have a 10-minute video on how to navigate Discord, though it is not apparent that all participants watched the video.

Perk: Mentorship

Working Sessions 2 & 3 had fewer participants which allowed for each pair programming group to have a mentor who could spend almost the full session with them. This was extremely beneficial and provided an opportunity to get to know the PyMC maintainers and ask many questions 2-on-1.

Perk: Organizers Contributing

The Data Umbrella team members are interested in contributing to open source too. Often at events which are busy, the organizer time is more dedicated to administrative tasks. Since the groups for Sessions 2 and 3 were smaller, it provided some time for the organizers to contribute as well. This is important as one of the challenges in community manager work is having time to do coding work. Additionally, the more the organizing team learns, the more they can assist new contributors in the community.

What’s Next: Maintaining the Momentum

We have already seen a few event participants continue to contribute after the event.

We hope to maintain the momentum by holding casual monthly “study groups” to continue contributing to PyMC.

Sessions: Social Media Shares

Carlo of Brazil

Pablo of Brazil

Igor of USA

Dustin of USA

Prince of Ghana

Rowan of Tennessee, USA

Benjamin of USA

Zoe of USA

Chris Fonnessbeck, PyMC Team Member, of USA


Social Media Promotion

Below are some of the social media announcements on the open source working sessions.

Twitter (English)

LinkedIn (English)

LinkedIn announcement


Acknowledgments

We thank the Data Umbrella & PyMC organizers who created the website, created event documents, conducted outreach, marketing and so much more!

  • Reshama Shaikh
  • Beryl Kanali
  • Sandra Meneses
  • Sandy Weng
  • Cristina Mulas Lopez
  • Christian Luhmann
  • Oriol Abril Pla
  • Thomas Wiecki

We thank the PyMC team who mentored at the sessions and those who were online during the weekend afterwards to promptly review the submitted pull requests, particularly:

  • Christian Luhmann
  • Oriol Abril Pla
  • Ravin Kumar
  • Dan Phan
  • Chris Fonnesbeck
  • Alex Andorra
  • Michael Osthege
  • Fernando Irarrázaval

References

Addendum

  • [no addendums or updates at the time of publication]