who are you?
we are Brooke Husic and Enrique Henestroza Anguiano. we have been constructing crossword puzzles since 2020 and 2019, respectively.
why did you make this wordlist?
through working with several new constructors, Brooke observed that free access to a relatively clean wordlist was a major bottleneck for her mentees. as a self-identified language data nerd, Enrique felt that a free resource like this should exist and realized the data was there to make it happen.
can i share your wordlist elsewhere?
yes. our license info is here.
can i modify your wordlist and then share it elsewhere?
yes, under some conditions. our license info is here, which explains that you need to attribute us, use the same license (if applicable), and can’t sell it. if you make modifications, we suggest keeping your edits in a separate file, so you can download our updates without losing your personal edits.
can i sell a crossword puzzle i made with your wordlist?
yes! this is not relevant to the “noncommercial” part of our license. we would love to hear about any crosswords you publish or sell that were made with our wordlist and link to them in our gallery.
why is everything lowercase and whyaretherenospacesbetweenwords?
many of our sources do not include case sensitivity and spacing so we would have to fill this in ourselves, which is not a task we’ve prioritized.
why did you choose a scoring system with only a few levels?
to some extent our choice was informed by current practices; we previously saw that many people set a minimum score of 40 or 50 when they construct, so we did not want the functionality of our wordlist to require the use of a minimum score lower than 40.
why are a bunch of proper nouns scored pretty high? i thought you weren’t supposed to cross proper nouns!
because they’re used in crosswords and are part of our world! our algorithms don’t have word meaning information, just frequency and source. you’re still going to need to do some manual work to ensure that your grid’s crosses are fair.
so then why aren’t the expanded crossword names database and similar resources included?
we are enormous fans of Erica’s expanded crossword names database and similar resources and highly support constructors using multiple wordlists. because the expanded crossword names database wordlist is exclusively names, we didn’t want to assert how to include that in our wordlist: maybe you want to score them all very high and do a lot of manual work to ensure fair crosses, or maybe you want to score them lower so you aren’t consistently presented with corners full of names.
why are some good answers scored too low and some bad answers scored too high?
this is probably a function of frequency/rarity not aligning with goodness/badness (e.g., a “partial” used a couple times in a pinch might seem too common, while an answer with a lot of “scrabbly” letters might be scored low because it seems too rare — we are always looking for resources outside of crosswords to bump up common words). part of our goal is to generate the wordlist in an automated way so it can be continually regenerated without a lot of manual effort on our end, which would be unsustainable for us. while we’re regularly checking subsets of the wordlist to assess whether our algorithms are generally doing the right thing, this means we don’t see every single answer.
so should i let you know about stuff that is misscored?
if an answer is egregiously misscored please do let us know, especially if you see something offensive or harmful that shouldn’t be there. if you find a wonky plural or some crosswordese scored too high, we send our apologies but don’t want to apply too much manual overhead to our algorithm.
but in that case i can make the changes i want to my own version, right?
of course! please make whatever changes you want. we recommend keeping a separate list of your personal edits and preferences and prioritizing it higher than our list in your construction software. this is so that when you download updated versions from us, you won’t lose your edits.
why are some answers scored zero? why not just remove them?
we get answers from two different types of datasets: crossword puzzles and general language resources. the last stage of our algorithm screens our wordlist for answers that are in our “blocklist”. during screening, there are two contingencies: (1) if that screen finds a blocklist answer that comes from a crossword dataset, that answer is set to zero; and (2) if that screen finds a blocklist answer that comes from a general language dataset, that answer is removed. the reason for policy (1) is so those answers will not show up when filling if they happen to be contained in lower-priority wordlists you are using. the reason for policy (2) is so we do not include unprecedented harmful answers in our list.
how do you decide what’s on the block list?
we’ve aggregated lists of slurs and derogatory terms and of course these are blocked. we’ve also made the decision to block unpleasant words, broadly construed. because we are targeting newer constructors, we do not want to propagate the precedent that even somewhat harmful words are okay before a constructor has developed their personal policy. of course, you are welcome to add in any words we’ve zeroed in your personal edits. to conclude, we can’t say it better than Paolo Pasco:
“[N]o bummers” hasn’t done me wrong — it never hurts to be conscious about what words you put in your puzzle, and to put in the work to avoid unnecessarily hurting anyone.
can we give you money?
we are accepting donations to fund our website fees! please email us at spreadthewordlist at gmail dot com if you are interested in donating.