Hey everyone,
I'm a long-time lurker on this website, amazed by all the information and knowledge here. It's been really helpful in my personal sourdough, croissant and home-milling journey and I want to say thanks! Also, I'm a data scientist professionally, and recently, I completed a fun project based on the recipes on TFL :) Check it out on my blog - pratima.io
More details: I scraped some recipes from this website (thanks again to all the posters!) to train a neural network that could generate new bread formulas. Even though it wasn't perfect, it gave me a prediction to make sourdough with cloves, yogurt, flax meal and spelt flour and I did just that. Apart from some flax meal clumps, it was actually super delicious, and my husband and I really enjoyed eating it. Here are some pictures of the loaves I baked.
Hope you enjoyed learning about this random but fun bread-related thing today :)
Wow, that is really cool. I am busy at the moment but I will read through your posts as soon as I can.
Thanks so much... really happy that TFL exists, I've learned so much here. And hope you enjoy the posts!
Ok, I've read them all now. What you did is fascinating.
For sure, using structured input would make things much simpler. Most recipe sites enforce that by splitting out the different sections of each recipe into distinct fields: intro, ingredients, instructions, time, quantity, etc. Or, yeah, even in a free form blog by a single author, a single author will tend to follow a standard structure and set of conventions. Not really the case here, though even in your small data set you were able to identify underlying topics and patterns.
There is a structured recipe content type here, actually, but, as you noticed, most recipes are in blog posts or some of the forum sections. That was an approach I modelled early on here and it seems like most community members have preferred. For me the motivation behind that was because I didn't want a single post with the French Bread recipe -- 65% AP flour, 2% salt, 2% yeast or whatever -- to discourage longer conversations about all the nuances. Even if we're all baking from the same formula, there is a lot to talk about! As your word cloud picked up, words around methods like "stretch and fold" and time come up frequently, which makes sense given that our set of ingredients is actually pretty small compared to most recipe sites and our discussions tend to revolve around time, temperature, and technique.
A few years back I looked into some of the semantic web / RDF type stuff to try to provide more structure to posts that computers would pick up even if it wasn't visible directly to end users but didn't follow through. It also kinda seems like the buzz around the semantic web has cooled off, hasn't it? Maybe not, maybe I'm just not talking the right people any more. Regardless, it is fascinating to me that machine learning tools have gotten mature enough that even on a small data set running on a laptop you can pick up patterns as well as you could.
Thank you much for sharing your posts! And the code. I've starred your repos. I'd love to try running some of it locally when I have some time, maybe even come up with a new recipe of my own. ;^)
Yes definitely, one of the issues was that with such a small dataset, the differences in authors' styles affects the neural net results. But even so, I've seen a recipe predictor built on standardized data that knows the format+structure but doesn't know the differences between 1 tsp salt and 1 kg salt. It just seems like a hard problem to solve in general and requires the algorithm to know what the ingredient actually does. So maybe a combination of formulas and discussions about nuances of techniques is the way to go? With some tweaking, of course.
Hmm I don't know much about semantic web/RDF but now seems like the time to learn. Thanks for telling me about it!
I really enjoyed learning about the patterns in the data, especially because as a human, I don't always pick up on them reading on a blog post or two. But it's good to know what I think about and consider in the kitchen is what others on TFL think about as well :)
And you probably have access to more data than me, so I'm very interested to see what happens when you run the code locally. Do keep me posted, I'd love to know!