7 months ago
Sun Aug 25, 2024 9:58pm PST
Show HN: What's HN Working On – A structured dataset
The latest Ask HN: What are you working[1] on thread just dropped. And to give my own answer, building structured datasets!

I wrote a quick scraper for the HN comments. Just pulling every top level comment along with its replies as a nested object.

This ended up pulling 642 top level comments with about 458 replies. I created a posrgres db with this original data set. The replies I just concatenated together in the order they came in (with an indent field to mark what level comment it was. Then stringified the json array and added it to the db.

I generated the structured data using my own tool of course (OmniAI[2]). And pulled out the following values:

- project_category - Enum - PERSONAL_PROJECT, STARTUP, SELF_IMPROVEMENT, OTHER - is_open_source - Boolean - github_link - String - project_industry - Enum - SOFTWARE_DEVELOPMENT, HEALTHCARE, EDUCATION, TRANSPORTATION, etc. - one_liner - String - A one line pitch for the project - tech_stack - String[] - reply_sentiment - Num - Sentiment betwee 0 and 2 for the comment replies - demo_link - String - ai_project - Boolean

[1] https://news.ycombinator.com/item?id=41342017 [2] https://getomni.ai/

read article
comments:
add comment
loading comments...