Active Entries
- 1: Speculating on where to settle someday
- 2: Becoming a rail commuter
- 3: Programming languages
- 4: Amber computer displays
- 5: Moving on from Doctor Who
- 6: A quiet weekend of relaxing and thinking
- 7: Television: production values and writing
- 8: A preview of Eurovision
- 9: Making nice coffee at home
- 10: Back to camping
Style Credit
- Style: Neutral Good for Practicality by
Expand Cut Tags
No cut tags
no subject
Date: 2025-01-18 09:21 pm (UTC)I've dabbled with training small numerical models on the desktop and running them on embedded hardware; they were surprisingly good but they were little more than the "hello world" of ML. I'm not entirely sure what level of power is needed to efficiently train an LLM or image/video model but in my book, my powers can only be used for good.
I read the other day that the "major" LLMs - ChatGPT and similar - have already been trained on the entire available corpus of humanity. Everything they could get their virtual hands on, and I'm sure there would have been shady deals to get access to text that was not publicly online. I also fully expect that Github Copilot has been trained on everything that was in Github (including private repos? probably, who knows for sure) as well as everything they could scrape from Bitbucket, Gitlab et al.