Databases! Apparently they are useful. In a quest to better understand this gap in my knowledge, Rahul, Yu, Joshua, and I sat down.
There are so many tools out there. What you invest in learning is important because you don’t know what will still be around, especially the web-based ones.
SQL as a way of querying a relational database – pain to learn, as it’s a programming language. Things like ScraperWiki help make that easier.
Can deal with your data in much more subtle ways. Worth it for the expertise and/or if you need to slice and dice to find stories in data.
SQLite is basically a database file on your computer. Can speak SQL to it. It’s behind your mail, your interactive nametags, etc. But if you want to share the data, you have to mail the file to someone. Doesn’t merge changes – have to treat it like a file. But for a one-off-research thing, it’s great. 10s of thousands of rows. Great for Python scripts, there are some UI tools (Lita – Adobe Air app). Research one-off
ScraperWiki can mod a file into an SQL file.
MySQL and PostGreSQL allow for security measures. Web apps. Runs on a server – you have to send queries to it, it sends things back. There are shortcuts to making this happen.
Xampp installs PHP, PHPmyadmin, Apache.
PHPmyadmin lets you talk to MySQL in a nice web-based UI.
Every row has a number. Numbers are unique per table. This is how they reference each other. Never put information in two places (unless you really need to).
SQL To learn the language, Scraperwiki.com really helps with playing around.
CartoDB is great for mapping things.
Import.io sets you up for easy scraping. So if you wanted to scrape moma.org, you would teach it what you wanted, and then tweak what it came up with.
Open Refine as difficult to learn but gives a table of charts as a correlation that plots column to column – helps you see if there’s a correlation between variables. Can group things for you based on different spellings because Google.
NaviCat does data transfer from one database to another.
Non Relational Databases
Don’t know what the structure will be yet. No standard way to query it. Write code to query, there are libraries and examples to help. Need a quick way to store data and fetch it.
When pulling in information, it’s nice when it’s already structured (like when Civic pulls from the Globe, they’re including word count, author, date, etc – some scripts were written for it, but…
CouchDB defines what is present, then pull and push. Difficulty is in querying it – there is no standard for that.
MongoDB updates and syncs well.
Firebase lets you play with these structures easily.
Tableau does some pretty awesome things. If you just want a visualization of the data in a spreadsheet.
Statwing also does the statistical analysis of the data you input.