Scraping open content is OK. Search engines have been doing that, it's their main job.
LLM won't exist without large inputs, hehe, and the internet is a good source for a big volume of language, most of which can even make sense.
I don't feel like Reddit should be against LLMs, ignoring their bogus claims. At least I hope GitHub doesn't share private and licenced repos.