Building a custom migration in Drupal 8, Part 4: Files and Content

 

In the last post, we finally wrote and executed our first migrations. We performed a dependency mapping to determine we needed to first migrate roles, then our users. We created new migrations in *.yml directly by searching our Drupal core directory for useful migration_templates. We're four parts in, and we have yet to migrate any nodes! Argh! Can we just start migrating nodes already!?

File migrations

It's really tempting at this point to jump in and start writing a migration for some of your simpler node types such as Basic Page or Blog post. Often, these simpler types have a hidden dependency: Files. Even if you've avoided Media module and relied on the File or Image field types alone, files still exist as a separate entity type within Drupal 7. Before we can migrate nodes, we also have to migrate the files too. 

Drupal 7 actually makes migrating files easier than if you needed to migrate from a Drupal 6 site. File management became a lot more standardized, so much so that we only one migration template: core/modules/file/migration_templates/d7_file.yml. Like the user and role migrations we created earlier, we can leave most of the template as it is. Start by copying the template into your editor, changing the id and label, and adding the migration_group:

id: yoursite_file
migration_tags:
  - 'Drupal 7'
migration_group: yoursite
label: 'yoursite files'

The File entity in Drupal 7 records who originally uploaded the file in a failed named uid. While we don't strictly need this field preserved in Drupal 8, it's certainly something we would like to if we can. Like we did for our user migration -- which depended on the role migration -- let's make our file migration depend on our user migration. Specify the dependency by adding it to the bottom of the file on a new line:

migration_dependencies:
  required:
    - yoursite_users

The last required thing we need to do is to tell the migration where to find our Drupal 7 file directory. The template expects us to have this on the same file system as our Drupal 8 site. Under the source section near the top of the template, you'll find a constant named source_base_path. This specifies the full directory path to the Drupal 7 file directory. While this can be any directory on the system hosting the Drupal 8 site, I found it easiest to put it in a subdirectory of my Drupal 8 files directory:

source:
  plugin: d7_file
  constants:
    source_base_path: /var/www/html/sites/default/files/migrate_files/

For reasons I still can't quite figure out, I needed to download the entire Drupal 7 files directory -- and not just the directory's contents -- to the directory I specified in source_file_path:

path/to/yoursite
└── sites
    └── default
        └── files
            └── migrate_files
                └── files

Once we have that set up, save it to a file in you sync directory named migrate_plus.migration.yoursite_file.yml where yoursite_file is the id of your file migration. Use drush to import the configuration and check the migration status:

$ drush cim -y
...
$ drush ms

Group: YourSite group (yoursite)    Status  Total  Imported  Unprocessed  Last imported       
yoursite_role                      Idle    4      0         4            N/A
yoursite_file                      Idle    507    0         507          N/A

If the migrate-status (ms) command shows a set number of files to import, we know that we've set everything up properly. Now we can run the migration with the migrate-import (mi) command:

$ drush mi yoursite_file

The file migration can take quite some time to run, even though the files are local. This is because the migration copies every file from our source_base_path to the expected place in our Drupal 8 file directory. So when running the file migration, be sure you have plenty of disk space! When you've finished the file migration, you can actually delete the directory specified in source_base_path, we will no longer need it unless we intend to rerun the file migration.

Simple node migrations

With the file migration finished, we can -- finally! -- get to a simple node migration. Since this is our first node migration, we want to take special care to select a node type that has the minimum amount of complexity for us to tackle:

  • The source node type is to be migrated to only one target node type.
  • It only depends on migrations we've already written, such as the user and file migration.
  • It doesn't use node references, entity references, or Paragraphs.
  • Both the source and target node types only rely on fields provided by Drupal core, no contrib or custom types.

When I looked at my site, the simplest migration was one form the Picture content type to the Gallery type. Both types have have the following fields:

  • Title
  • Body
  • uid (Author)
  • field_images, a multi-value core Image field

This means there was only one field that had a lot of complexity for us to worry about while the remainder can be copied through. Next, we need a template. Like our other migrations, Drupal core provides a template for Drupal 7 nodes in the node module: core/modules/node/migration_templates/d7_node.yml. Customize the idlabel, and add the migration_group as you did with all of our other migrations:

id: yoursite_gallery
migration_tags:
  - 'Drupal 7'
migration_group: yoursite
label: 'yoursite gallery'

Then, scroll down to the bottom of the node migration *.yml and add your user and file migrations as dependencies:

migration_dependencies:
  required:
    - deninet_user
    - deninet_file

Specifying the source type

If we look over the rest of the migration *.yml, you might find something...missing. For each of the migrations we've created so far, the source entity on the Drupal 7 site has been the same on the new Drupal 8 site. Users were migrated to users, roles to roles, and files to files. When migrating nodes, however, you need to specify both the source and destination content type. When you examine the source section of our node migration template, you won't even find a place to specify the type:

source:
   plugin: d7_node

You might think, "There's got to be a way to specify the node type there!" and you're right! Admittedly, the migration template doesn't make this obvious at all.

 Notice that under the source section we specify a plugin with the unique ID of d7_node. If we search the core directory for that plugin ID, we'll eventually find the following PHP class: core/modules/node/src/Plugin/migrate/source/d7/Node.php. This class queries the Drupal 7 database, pulling both built-in and custom node fields out of the database. It stands to reason that core would make this plugin configurable, so that we might tell it what node type to pull from the database. Looking at the plugin code, we find this:

if (isset($this->configuration['node_type'])) {
   $query->condition('n.type', $this->configuration['node_type']);
}

Ah-ha! The d7_node plugin can be passed a configuration parameter named node_type. When provided, the database query performed against the node table of the Drupal 7 database is restricted to the provided node type. You might hang your head wondering how to pass this value to something buried as deep in core as this plugin, but it's actually really easy!

source:
  plugin: d7_node
  node_type: 'picture'

Not too hard at all!

Customizing the field mapping

In our previous migrations, we largly ignored the bulk of the migration *.yml contained in the process section. This section specifies one or more field mappings between the target and source objects. The simplest is a direct mapping:

field_name_on_target: field_name_on_source

In a direct mapping, the source field value is copied into the target field. Notice that in the *.yml the mapping is written opposite what you might first expect. The field name on the Drupal 8 site is on the left, whereas the field name on the Drupal 7 site is on the right. For fields like the title, a direct mapping is good enough:

title: title

Having the target field name on the left and the source on the right seems silly at first, but it actually has a very good reason!  The purpose of the process section is more than just to specify mappings. As its name implies, the process section can be used to modify field values before storing them in your new site. Drupal 8 uses process plugins to allow you to modify field values during the migration. Several process plugins are included out of the box, but you can always write your own. The full format of a field mapping looks like this:

field_name_on_target:
   source: field_name_on_source
   plugin: the_process_plugin_id
   parameter: value
   parameter2: value2

When we use the short format, Drupal 8 falls back to the default process plugin, get. The get process plugin attempts to copy the source field entirely into the target field. This works fine for fields that hold a sting or number like our title field. 

Migrating multi-part fields

For fields like the body, the get plugin isn't enough. The body field actually is a multi-part field, containing a complex structure we need to traverse to migrate correctly. The two that are important to us are the body's value part which contains the actual text, and the format part. The format contains the text format of the value, usually plain_text, filtered_html, or full_html. There is also a third part, the summary, which was unused in my site, so I ignore it in my migration. In Drupal 8.3 or later, we can traverse each part of a multipart field by using the iterator plugin:

  body:
    plugin: iterator
    source: body
    process:
      value: value
      format:
        plugin: default_value
        default_value: full_html

There's a lot going on here! First we specify the plugin and then the source field in the Drupal 7 site. The iterator plugin lets us specify how to migate each part under its own process section. Since the body's value is just a string, we use a direct mapping and copy it over. For the format, however, we do something a little different.

In the case of my site, the filtered_html text format was a lot more permissive than I wanted it to be on Drupal 8. Since I didn't want to lose any potential formatting, I decided to migrate all my content setting the full_html format instead. To accomplish this, I employed yet another plugin, default_value. As it's name implies, it allows you to specify a static, default value in place of copying a value from the source site. The plugin takes only one paramter, the static value to save to the target site.

Mapping a field value through a migration

That takes care of the title and body field, but we still have all the images to migrate! Like the body field, field_images is a multi-part field. We're interested in preserving the following parts:

  • target_id/fid specifies the file entity ID. It's called target_id in Drupal 8, and simply fid in Drupal 7.
  • alt and title which specify the image alt and title attributes respectively.
  • width and height which specify the image's dimensions.

Again, like the body, we're going to use the iterator plugin to handle each part:

  field_images:
    plugin: iterator
    source: field_picture_file
    process:
      target_id: fid
      alt: alt
      title: title
      height: height
      width: width

This works, but we've made a potentially dangerous assumption. The default migration template for file migrations is set up to preserve the original site's fid for each file entity. This is probably a safe assumption to make if we're migrating to a pristine and empty Drupal 8 site, but what if we weren't? What if we added content -- including attached images or files -- prior to running the migration? In that case it's possible that our file migration might fail when it encounters an fid that's already in use, or worse, overwrite the existing file entity. The solution for the file migration is to remove the following mapping:

fid: fid

If we do that, however, it creates another problem. In our Gallery migration, we only have the original Drupal 7 fid. If it's not the same as the fid in Drupal 8, how can we know what to set for the target_id? The answer is simple, ask the file migration! The migration_lookup process plugin lets us ask a previously run migration for the target ID when given the source ID. When can nest the call to the migration plugin inside our call to the iterator plugin:

  field_images:
    plugin: iterator
    source: field_picture_file
    process:
      target_id:
        plugin: migration_lookup
        migration: yoursite_file
        source: fid
      alt: alt
      title: title
      height: height
      width: width

Here we see that under the target_id mapping, we call the migration plugin, passing it the name of the mgiration to query as well as the where to get the source ID. The migration plugin queries a table in our Drupal 8 database that preserves the ID mappings for each migration. You might wonder, however, how does the migration know what field/database column/thingawhatsit to use for the ID? That logic is actually contained in the mgiration's source plugin. If you look at our previous migrations, the source plugin was specific for the kind of entity we were migrating. So the d7_user plugin knows to use the uid, the d7_file plugin uses the fid, and the d7_node source plugin uses the nid. Nifty!

Improving the author ID mapping

After all that work to make sure field_image is migrated correctly, you might be wondering if we need to reexamine the node author field. The node migration template out of the box assumes that the uid of the node in Drupal 7 will correspond to a previously migrated user in Drupal 8 with the same uid:

uid: node_uid

This is mostly a safe assumuption, as we did configure our yoursite_user migration to preserve the original uid. We could improve it, however, by running it through the migration_lookup plugin:

uid:
   plugin: migration_lookup
   migration: yoursite_users
   source: node_uid

This works, but what if we wanted to be more selective about our migration? Let's say we want to skip migrating any gallery that belongs to the anonymous user (uid 0). If we look at the process plugin page, we find one that's perfect for our needs: skip_on_empty. When the source field value is NULL, false, or zero, the entire node (or "row" if you think like a database) is not migrated. Now we have a new problem, we want to use two process plugins for the same field. Fortunately, we can chain process plugins together using the *.yml list format:

uid:
   -
     plugin: skip_on_empty
     source: node_uid
   -
     plugin: migration_lookup
     migration: yoursite_users

The format looks a little weird, but it does make it clear there are two process plugins in use. Drupal 8 executes chained process plugins from the top-down in the *.yml. So, the first plugin to run is skip_on_empty. The first process plugin in a chain always gets the source field name, so we specify it there. The next plugin to run, migration_lookup, uses our user migration. We don't need to specify the source field name on subsequent process plugins in a chain as Drupal is smart enough to pass the output of the previous process plugin as the source of the next one.

What about that destination type?

Speaking of source plugins... We know the d7_node source plugin knows what content type to retreive data from because we specify it in the node_type parameter. Where do we specify the destination node type? You might think that we can do what we did before. We would look at the destination section of our migration template, find the plugin class in the core directory, and then specify a parameter for the destination type. When you look at the destination section, however, things look very, very different:

destination:
  plugin: entity:node

Huh? Well, if we dig around we do find out there's a plugin class with the ID entity:node, but it doesn't seem to take any parameters. What? This is also something about Drupal 8 node migrations that's rather counterintuitive. Instead of setting a plugin parameter, the destination node type is treated as a field. We need to specify the type in our field mapping, but there's a complication. Our source type name IS NOT the same as our target type name! Our source node type is named picture, but our target type is named gallery. Fortunately, we've already learned everything we need to solve this problem:

  type:
    plugin: default_value
    default_value: gallery

In our process section we add a new mapping for the type field (it's not in the template out of the box). Since we set the node_type in the source plugin already, we know that every piece of content we'll migrate here will be an image. So, we only need to specify that the type is gallery each time. For that, we use the default_value plugin. 

Why is the type a field mapping and not a parameter of the destination plugin? It turns out that the source plugin is the culprit. The source plugin does NOT require the node_type parameter! When it's not specified, the plugin defaults to all content types. In order to be sure all nodes are migrated properly, it's more versatile to treat the type as a field. That way, one node migration could slurp up all source nodes, and then create new Drupal 8 nodes with the correct content types. This is how Drupal 8's auto-generated migrations work for nodes. 

Save your node migration to the sync directory like you did the others. Do a configuration import, check the migration status for errors, then run the migration.

Summary

Wow! We've come a long, long way in this post. We created a file migration and a simple node migration. We've leveraged the hierarchical nature of the migration system to preserve our entities IDs even when they differ between Drupal 7 and 8. We've finally explored the process section and learned to customize our field mappings. We've uncovered and dissected multi-part fields, as well as picked up some useful tools like the default_value plugin to further customize how we migrate our data. In Part 5, we start to explore even more complex node migrations involving the Paragraphs module.

Thanks to our sponsors!

This post was created with the support of my wonderful supporters on Patreon:

  • Alina Mackenzie​
  • Karoly Negyesi
  • Chris Weber

If you like this post, consider becoming a supporter at patreon.com/socketwench.

Thank you!!!