I'm working on processing our build logs to spot different trends in the builds (failing often, taking longer, etc.). That requires me to pull job logs, among other things. Each log has approx. 1MB. Currently I'm working on travis job log extractor and I'm wondering what would be better... I can either extract all the logs as-is. Or I can process them directly within the component and output only the metadata about the jobs.
What I particularly don't like about the raw log output is that the result would be a CSV file with few normal columns and one 1MB column with "wall of text", that can have any weird characters, etc. That feels weird and wrong, even though I don't think there are any limitations that would prevent me from doing this. So it feels like I should just parse the logs and find the meaningful info there right when I load them from the API.
On the other hand having raw logs stored in Keboola Connection is meaningful for future uses (especially because the API is very slow) and it makes the component flexible and reusable also for our clients and other teams.
I'm also thinking about uploading the files to file storage and just referencing them in the output table by a unique name. But then my preferred workflow of just doing all the transformations in SNFLK wouldn't work and I'd need to do some Python file parsing and juggle the logs back and forth, instead of doing it all within SNFLK database.
How would you approach it?