Some hurdles I see:
- Github rate limits the GET requests so it doesn't seem possible to scrape all the source code on there. But maybe it can be crowdsourced like seti@home so 1000 people can install a program to get around this.
- Training the model. I would imagine this would be hardest as it would need millions of dollars for this? Is there a way to get around it or using free tools like colab?
- Running the api. Once the model is trained, would it be possible to run it on a lenovo type laptop? I guess you need lots of VRAM to run it?
Final question is will a home brewed version be just as good? What factors determine that?
Just curious on how we can do it as I imagine there a lot of ML experts here.