Null Editors and OSN Import 2

 
For the last year, I've been working on writing import code for OSNews.com. It's been a lengthy process due to the fact that I'm doing this when I can spare the time. Some weeks there's furious development, others none at all. Some months ago I had the first complete version of the import code working. It was a completely custom module, relying only on core APIs to get the job done. I knew that the Migrate module existed, and provided a lauded framework for importing content into Drupal. I dismissed it because I simply didn't want to fight Migrate at the same time I was trying to figure out how to import content into Drupal in the first place. That hunch proved to be correct. Importing content into Drupal from a custom database is a rather specialized task. Drupal 7 isn't exactly designed with a separation of GUI and backend. It's better than in previous versions, but there are numerous gotchas that result in much gnashing of teeth. I learned a lot in the process, leveling up my Drupal skills. I would have left it at that if I didn't start using Migrate for deninet. I've already written about migrating deninet from 6 to 7 and use of the migrate_d2d module. Originally, I started writing the migration on a lark, but it turned out so easy to use the D2D framework that I had a complete solution over the course of a few days. This got me thinking, "Should I rewrite the OSNews migration module?" There are a few very good reasons to do this. The Migrate module provides import rollback for free. This makes it much easier to test imports without a complete re-initialization of the database. Migrate also provides a command line interface through the Drush shell script. This results in a huge speed increase in imports as there's no need to initialize all of Drupal in order to conduct an ETL (Extract-Transform-Load) task. Lastly, a rewrite makes it easy to conduct a complete review of the import code. When writing the first version of the OSN import code, my primary concern was to get it working. While I did spent a lot of time ensuring the accuracy of the result, there were several undocumented assumptions. One of which I've been fighting for the last few days. At present, OSNews runs a custom content manager with its own database schema. This is generally referred to as OSN4. There's no RBAC (Role Based Access Control) on the website, thus, regular users are completely separate entities from "editors" who have the right to post articles to the site. For the most part, every editor has a corresponding user account. Given that Drupal has RBAC, it makes sense to just assume that the "source of truth" is the users table in the OSN database. When we import a user, we only need to check if they're an editor and then assign the appropriate role. Notice I said "for the most part"; three editors have no corresponding user account. You can even find this in the OSN editors table in which the uid column has a value of NULL. This creates a huge problem for migrating content. Every migration has an association table in which the old row ID -- user ID in this case -- is linked to the new row ID. If the users table is considered the source of truth, all users get imported without a problem. All editors are imported too, with the exception of those three "null editors". Why not just tack on those three null editors in the user migration? The problem is that this creates ambiguity in the association table. The user association table has the OSN4 user ID column, and a Drupal 7 user ID column. The three null editors have no OSN4 user ID, but a unique editors ID. As a result, user 20 could be a different person than editor 20. This means we can't just tack on the null editors in the user migration. In the first version on my import code, I just assumed that this wasn't important and assigned content created by the null editors to the Drupal anonymous user. I assumed I could fix this later by finding each news item the null editors posted and assigning it to a new user account I manually created for them after the import. With the rewrite, however, I wanted to fix this once and for all. The decision, then, is to determine what is the source of truth -- the editors table, or the users table. If I chose the editors table, I would need to write a separate migration for editors. This in of itself isn't a problem. Query the editors table, left join with users, create the account. Done. A OSN4 editor ID to Drupal user ID association table would be automatically created. Sadly, that's the easy part. When writing the user migration, I would need to find a way to exclude any row in the users table that is an editor. While writing the query isn't that complicated, translating that into Drupal APIs (Migrate module does not accept raw queries) proved difficult. Even if I did succeed there, writing the comment import became a mess. No doubt editors posted comments, and comments in OSN4 used user IDs, not editors IDs. This means that a special case would need to be created in the comment import to handle editor comments so as to pull the right ID out of the editor association table, not the expected user table. Migrate module does *not* make this easy. The other way is to assume that the users table is the source of truth. This is true in all cases except for those three null editors. Importing comments relies on the users association table, as editors without a user account (hopefully) cannot post comments. Importing news articles into Drupal nodes becomes a little more complicated, as I would need to left join the row in the news table with the editors table to get the old user ID. This is pretty easy, thankfully. When it comes to those three null editors, I would need to intercept the case where the OSN4 user ID comes back as NULL, and then ask the editor's association table to give me their Drupal 7 user ID. This requires a weird special case migration -- OSNewsNullEditorMigration. This migration only handles those three, pesky editors. Writing that migration and the necessary interception code for nodes, however, is a straightforward task. Over the course of a hairpulling three days, I wrote the code assuming the users table. The node migration has yet to be modified. The user migration now assigns the editor role where appropriate, and the null editor migration is written and working. The node migration requires other modifications anyways in order to handle OSN4 "kinds" -- similar to Drupal content types.