A few months ago, I discovered that GitHub keeps track of trending repositories, and since then, I often take a look at it to see what’s up. So this month, I decided to share my thoughts on what I found; let’s get started!
DALL·E Mini
AI model that generates images from text.
The announcement of Open AI’s DALL·E 2 took the community by storm, but given that it’s not available, it’s no surprise that this project is seeing significant interest.
PaddleNLP
NLP library with pre-trained models.
PaddleNLP is a library for Natural Language processing. It provides a comprehensive set of Chinese transformer models, and its design is based on Hugging Face’s Transformer library.
ColossalAI
A framework for large-scale Deep Learning parallel training.
As transformer architectures become the standard in many CV and NLP tasks, better performance comes with larger model sizes. Colossal AI aims to provide a simple AI to train large models in parallel.
DeepFaceLive
A library to swap faces from a website or video.
DeepFaceLive allows changing the face in real-time or from a recording. Imagine hopping on a Zoom call and looking like Keanu Reeves!. Crazy!
Label Studio
A data labeling tool for audio, text, images, videos, and time series via a UI.
Getting accurately labeled data is the first task in many ML projects. Label Studio supports many types of data and offers a graphical user interface to do it.
Intermission: Ploomber
Ploomber is a framework to develop pipelines interactively (Jupyter, VSCode) and deploy them to the cloud (K8s, Airflow AWS, SLURM).
Interactive tools like Jupyter make it hard to develop maintainable projects; Ploomber allows data scientists to keep the interactive workflow they are used to but embrace best practices from software engineering to ease the transition to production.
DevOps Exercise
A collection of >2.2k DevOps interview questions.
The first non-AI repository on the list! This repository hosts more than 2.2k DevOps questions to help you prepare for your interview!
PaddleOCR
A library for creating Optical Character Recognition tools.
PaddleOCR supports many OCR-related algorithms to help users through data production, model training, compression, inference, and deployment.
DeepFaceLab
DeepFaceLab is a library to replace faces in videos.
Another deepfakes library! According to the repository, more than 95% deepfake videos are created with DeepFaceLab.
IVY
Ivy aims to provide a single interface for ML frameworks.
With the explosion of computational frameworks such as JAX, TensorFlow, PyTorch, MXNet, and NumPy, it’s hard for practitioners to keep up and master them. Ivy aims to unify them so you can write once and export to any of them.
Airflow
Airflow is a platform to author, schedule, and monitor workflows.
Airflow is one of the most widely used platforms for managing workflows. It allows you to define workflows as directed acyclic graphs of tasks and schedule them.