Migrating path aliases into Drupal 8 redirects: Part 1

 

Drupal 8's new migration system offers a great deal of power and flexibility for importing your content. Indeed, a site migration gives you a unique opportunity to radically rethink your site's content and how it's organized. Often, this means some content types are removed, others created, and some content is transformed to different types. This often results in the canonical URL for a piece of content also being radically changed to fit the new organization. This is great, but it also creates a problem:

Broken links, useless bookmarks, bad search results, and a loss of "SEO juice".

While you could wait weeks or even years for search engines to re-index your site, you'll never be free of nagging 404s if your URL structure radically changed.

Unless...you migrate those too.

In this post, we'll set up a new Drupal 7 to Drupal 8 migration to import content path aliases as redirects. This way, visitors will be able to reach content, and search engines will be able to update their links accordingly.

Drupal doesn't provide "pretty" URLs out of the box. For most content, you get a utilitarian /node/<nodeID>. For a better user experience as well as improved search engine optimization (SEO), most Drupal sites use a the Pathauto module. Pathauto allows you to define a URL pattern by which a piece of content can be accessed.

URL patterns used by Pathauto can be simple, or very complicated depending on your need. For my site, a typical pattern was:

https://deninet.com/blog/<nodeID>/<nodeTitleURLNormalized>

This worked at the time, but it also wasn't the best decision I could have made. By nature, the node ID is unique to each instance of a site. While you can migrate content with the same node ID, it's highly recommended to generate a new node ID for better flexibility when migrating. Instead, I should have defined a URL pattern that was less tied to Drupal, and more to the content itself:

https://deninet.com/blog/<year>/<month>/<day>/<nodeTitleURLNormalized>

This pattern doesn't include the node ID or any Drupal-specific information. Instead, it has the section of content to which it belongs (blogs), the date it was posted, and then the post title. If I were to migrate the site again, it would be far easier for me to regenerate the same URL using Pathauto.

Despite my earlier bad decision, this is actually an advantage for us writing migrations. The resulting URLs between the old and new patterns do not overlap, meaning we have a unique source URL and target URL to migrate and we won't have to worry about collisions.

Now that we have decided on our new URL patterns, we need to be sure that Drupal is capable of generating custom URLs. Like earlier versions of Drupal we use the Pathauto module to generate new path aliases by use of a customizable pattern.

We also need another module to handle the changeover between our old and new URL scheme. The Redirect module allows us to specify a source path, and a target path to which to redirect users. It differs from Pathauto in a critical way. Pathauto creates URL aliases to existing content. In many ways, it's like defining a new canonical URL for a piece of content. Redirect doesn't define the canonical URL, but redirects you to that URL given a source path.

Search engines are designed to look for redirects. When one is encountered, it can be defined as a permanent redirect (301), or a temporary redirect (302). The Redirect module allows you to specify which kind of redirect to use for each source/target pair. In our case, the content is permanently moving homes to a new canonical URL, so all of our redirects will be permanent.

To start, we just need to download and install Pathauto and Redirect. This is easily accomplished using Composer and Drush:

$ cd /path/to/drupal/root
$ composer require drupal/pathauto --prefer-dist
$ composer require drupal/redirect --prefer-dist
$ drush en -y pathauto redirect

Before we can migrate path aliases into redirects, we need to set up the migration system and migrate our content. If you haven't set up a custom Drupal migration yet already, you might want stop here and read my migration series first. That will guide you on how to set up a Drupal to Drupal migration using the Migrate Tools and Migrate Plus modules.

Once you do have the migration system set up, we can create a new migration group. While in my migration series I suggested creating one group for all your migrations, more than one has a distinct advantage. Creating one migration group for every source content-type make it much easier to run multiple related migrations on the command line. This makes it easy to run a content type's migration, the paragraphs for that type, as well as multilingual translations. 

The easiest way to create a new migration is to use the web UI:

  1. Login to your site on your local development environment.
  2. Navigate to Admin > Structure > Migrations.
  3. Click Add migration group.
  4. Add a Label, and configure the Machine Name as necessary.
  5. Complete the rest of the form as you see fit. When finished, click Save.

With that finished, all that's needed is to export the group to our configuration directory. This can be easily done with drush:

$ cd /path/to/my/site
$ drush cex -y

This will export the migration group as to the configuration directory with the following file name:

migration_plus.migration_group.migrationGroupID.yml

Where:

  • migrationGroupID is the machine name of the migration you entered earlier.

The downside to using multiple migration groups is that you need to associate the database connection information manually for each new group. This only takes minute, but it's an easy to forget step and can really confuse you later when you try to run the migration.

Like in my migration series, I have a secondary database configured in settings.php identified by the key legacy. In order for our migration group to use the legacy database, we need to add a shared_configuration key, specifying the source database key:

langcode: en
status: true
dependencies: {  }
id: deninet_urls
label: 'deninet urls'
description: 'URL aliases for deninet'
source_type: 'Drupal 7'
module: null
shared_configuration:
  source:
    key: legacy

Yours will be slightly different depending on the id, description, and label you entered for your migration group. After adding the shared_configuration section, we can save the file and can re-import the configuration file using drush:

$ drush cim -y

With that done, we can create the migration itself. Unlike the group, there's no web UI in order to create the migration. Instead, we need to find a migration template and copy that into our site's configuration directory. 

Where do we find the template? Typically, we need to look for the module that is providing the source data for the migration. Inside that module's directory we will find a migration/ (or migration_templates/) subdirectory that contains one or more templates we can use. Since we're migrating Drupal 7 URL Aliases into Drupal 8 redirects, you'd think we would need to look to the Pathauto module, but that's not quite correct.

While we normally associate the Pathauto module with URL aliases, it actually doesn't provide URL aliases itself. Instead, it provides alias generation based on paths as well as bulk updating. Instead, it's the Path module in Drupal core that provides the URL aliasing mechanism. 

This this knowledge in mind, we can look in the Path module's directory, located under core/modules/path/. There, we find a migrations/ directory with two templates. Since we're migrating from a Drupal 7 to a Drupal 8 site, we copy the d7_url_alias.yml file to our site's configuration directory, and rename to suit the pattern Migrate Plus expects:

migrate_plus.migration.migrationID.yml

Where:

  • migrationID is the unique machine name for your migration. 

When we open the copied, renamed file, we'll see something like this:

id: d7_url_alias
label: URL aliases
migration_tags:
  - Drupal 7
  - Content
source:
  plugin: d7_url_alias
  constants:
    slash: '/'
process:
  source:
    plugin: concat
    source:
      - constants/slash
      - source
  alias:
    plugin: concat
    source:
      - constants/slash
      - alias
  langcode: language
  node_translation:
    -
      plugin: explode
      source: source
      delimiter: /
    -
      # If the source path has no slashes return a dummy default value.
      plugin: extract
      default: 'INVALID_NID'
      index:
        - 1
    -
      plugin: migration_lookup
      migration: d7_node_translation
destination:
  plugin: url_alias

Next we update the idlabel, and the migration_tags sections to match our custom migrations. Then, we add the migration_group item to associate this migration with the migration group we created earlier. When finished, the initial part of the file should look something like this:

langcode: en
status: true
dependencies: {  }
id: deninet_node_redirects
migration_tags:
  - 'Drupal 7'
  - deninet
  - content
migration_group: deninet_urls
label: 'deninet node redirects'
...

With the basic migration file created, we can to do a drush cim to import the new configuration. Then, we can confirm everything is working by listing our migrations with drush ms (migrate-status). This will list the status of all of our migrations, but the part we're looking for is our new migration group:

Group: deninet urls (deninet_urls)  Status  Total  Imported  Unprocessed  Last imported       
 deninet_node_redirects              Idle    3607   0      3607            

From the above we can tell several things. The group works as it appears in the list. Our custom migration also is registered and detects items to import. If the group appears but is empty, we may have forgotten to change the id of our migration, or entered the migration_group. If the group itself doesn't appear, check the shared_configuration

Once we have the migration appearing in the output of a drush ms, we can start writing the migration itself. First we look at the source section our our migration file:

...
source:
  plugin: d7_url_alias
  constants:
    slash: '/'
...

While there's not much in there, there are a few key pieces of information. We know the source pluginis d7_url_alias. There's also a string constant defined, slash, which provides a single forward-slash character (/).

So that seems fine, what about the destination section?

destination:
  plugin: url_alias

Ummmmmmm, that's not right. When we copied the d7_url_alias.yml file, the template assumed that we're migrating Drupal 7 URL aliases into Drupal 8 URL aliases. That's not what we want to do, instead, we want to transform the URL aliases into Drupal 8 redirects. 

Those redirects are provided by the Redirects module. Fortunately for us, the Redirect module also provides us a migration template. Open the Redirect module's directory (usually modules/contrib/redirect/) and open the migrations/ subdirectory. As you'd guess, there are templates for migrating from Drupal 6 or Drupal 7. Again, we open the Drupal 7 template, d7_path_redirects.yml:

id: d7_path_redirect
label: Path Redirect
migration_tags:
  - Drupal 7
source:
  plugin: d7_path_redirect
process:
  rid: rid
  uid: uid
  redirect_source/path: source
  redirect_source/query:
    plugin: d7_redirect_source_query
    source: source_options
  redirect_redirect/uri:
    plugin: d7_path_redirect
    source:
      - redirect
      - redirect_options
  language:
    plugin: default_value
    source: language
    default_value: und
  status_code: status_code
destination:
  plugin: entity:redirect

Well, the destination section is simple enough. Instead of plugin: url_alias, it's plugin: entity:redirect. We copy that into our our migration file, migrate_plus.migration.migrationID.yml, and run drush cim to re-import it. 

Keep the d7_path_redirects.yml file open, though, we're not done with it yet.

There's one more section of our migration file for us to work on, the process section. This section provides a list of target fields, and maps them to source fields through one or more transformation steps. In a typical migration, the source and target fields usually have a one-to-one correspondence. We would only need to specify the field names:

target_field_name: source_field_name

Our migration, however, is different. We're not just migrating data to and from the same type. Instead, we are transforming the data to a completely different type in the migration. This makes things a bit of a challenge for us. 

Our migration was copied from the Path module's template. From it, we can see the d7_url_alias plugin used in the source provides us the following fields:

  • source, or the internal URL for the path. For nodes, this is node/nodeID.
  • alias, or the URL alias at which to provide the content.
  • langcode and node_translation both relate to multilingual redirects. For this post, we can ignore those fields.

When we look at the d7_path_redirects.yml migration template, however, we see the entity:redirect plugin expects a completely different set of fields:

  • rid
  • uid
  • redirect_source/path
  • redirect_source/query
  • redirect_redirect/uri
  • language

UGH! What now!? Well, we figure out what we need to transform the data.

We know from our migration that our destination is a redirect entity. It's also pretty likely that redirects are content entities, so they will be stored in one or more database tables. Migrations tend to be very database reliant even if the source or destination are entities. If we can find the primary table for the redirect entity, we can learn a lot about what we're expected to migrate.

Luckily, it's not very hard to find, the redirect table lays it all out very clearly. You can browse the structure of the table using phpMyAdmin, a native client like Sequel Pro, or even just a DESCRIBE redirect SQL command. Unfortunately, the structure alone isn't very instructive.

What we need is an example. So, we create a redirect under Admin > Config > Search > Redirect, and then look at the resulting row in the database:

MariaDB [deninet_drupal8]> select uid, language, redirect_source__path, redirect_source__query, redirect_redirect__uri, status_code from redirect limit 1;
+------+----------+---------------------------------------------------------------------+------------------------+------------------------+-------------+
| uid  | language | redirect_source__path                                               | redirect_source__query | redirect_redirect__uri | status_code |
+------+----------+---------------------------------------------------------------------+------------------------+------------------------+-------------+
|    2 | und      | blog/1615/building-custom-migration-drupal-8-part-1-getting-started | N;                     | entity:node/1048       |         301 |
+------+----------+---------------------------------------------------------------------+------------------------+------------------------+-------------+
1 row in set (0.00 sec)

This already gives us some clues:

  • rid is the probably the redirect ID.
  • uid is the user who created the redirect.
  • redirect_source/path is the incoming URL to redirect from.
  • redirect_redirect/uri is the target URL to redirect to.
  • status_code is the HTTP status code for the redirect.

We don't need the rid, since we're creating new redirects in our migration whole-cloth. While the uid is important, it's not really critical for us, so we can probably set it to a known user ID, such as the Drupal superuser, UID 1. Since our content will always be available at the new URLs, we use a 301 for the status_code, signifying a permanent redirect. Our target site is also a single language, so we can set that with a static value of und to signify that no language was specified.

With this in mind, we can add the following items to the process section:

process:
  uid:
    plugin: default_value
    default_value: 1
  language:
    plugin: default_value
    source: language
    default_value: und
  status_code:
    plugin: default_value
    default_value: 301

That part was easy because it was all static, default values. Now we need to think about translating the URL alias data into a redirect.

From the database table above, we know the redirect_source/path field contains the incoming URL to redirect from. In our case, the source path is going to be the URL alias in Drupal 7. The d7_url_alias plugin provides this as the alias field. Both fields are simple text, so all we need to do copy over the value:

redirect_source/path: alias

What about redirect_source/query? In our example above, this is a perplexing N;. What the heck is it? From the name, you would think it's a URL query string, or, the part of the URL after a question mark (?) that contains GET data. We wouldn't need this for our URLs, and we certainly didn't enter one in for our example redirect, so why isn't this column NULL?

Many database columns in Drupal aren't a single value, but contain flattened PHP data provided by the serialize() function. This fields are stored in MySQL as Binary Large Objects, or BLoBs. When you encode a NULL value with serialize(), you get -- you guessed it -- N;.

So really, we don't need to migrate that field at all! Huzzah!

In this part, we created a new migration group and migration to import Drupal 7 URL aliases as Drupal 8 redirects. We began the process of mapping out our transformation of the aliases, but there's still one field left to go.

Next time, we'll discuss how to migrate the redirect_redirect/uri field. We'll use a custom source plugin to externalize the original Drupal 7 node ID, we'll map that through a our existing node migrations, and finally run the migration.

This post was created with the support of my wonderful supporters on Patreon.

If you like this post, consider becoming a supporter at patreon.com/socketwench.

Thank you!!!