Migrating ExpressionEngine to Drupal 7

I’ve been using ExpressionEngine on a project for almost two years. Although the end-users of the company like ExpressionEngine, my boss has never liked it and has wanted to move to Drupal for quite some time. We finally go the chance with the start of a new project. I have created websites using Drupal 6, but that was quite a few years ago. Now I was tasked with migrating ExpressionEngine 1.68 to Drupal 7. The entire script can be found on GitHub. Although the script is specific to ExpressionEngine, the steps required for migration can be used to migrate any content management system to Drupal.

I found a post that discussed migrating ExpressionEngine to Drupal 4.7. Although the post has a few short-comings (like no mention of migrating EE Structure data, I used the post as the basis of my migration script. I also found this post, (Guide to programmatic Node Creation in Drupal 7) to be quite useful.

Setup

To create the migration script, rename the index.php file located in your Drupal’s root directory. The create a new index.php file (this file will hold the migration script.

At the top of the file, create the DRUPAL_ROOT constant and create an array of database settings:

define('DRUPAL_ROOT', getcwd());
 
$ee['user'] = 'root'; // ExpressionEngine database user
$ee['pass'] = 'password'; // ExpressionEngine database password
$ee['host'] = 'localhost'; // ExpressionEngine database host
$ee['dbname'] = 'expeng'; // ExpressionEngine database name
$ee['prefix'] = 'exp_'; // ExpressionEngine database table prefix

Next, include Drupal bootstrap

include_once "includes/bootstrap.inc";
include "includes/common.inc";
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

Migrating User Data

The first step to migrating to Drupal 7 is migrating the user information. This is fairly straightforward.

function ConvertUsers($ee) {
    // Load all users from EE
    $sql = "SELECT username,password,email,url FROM ".$ee['prefix']."members order by member_id";
    $eeUsers = mysqlQuery($ee,$sql);
    foreach($eeUsers as $eeUser) {
        $drupalUser = user_load_by_name($eeUser['username']);
        if (!$drupalUser) {
            $user_array = array("name" => $eeUser['username'], "pass" => $eeUser['password'], "mail" => $eeUser['email'], "status" => 1);
            user_save($account, $user_array);
            echo "Added <i>$eeUser[username] <br />";
        } else {
           echo "$eeUser[username] already exists <br />";
        }
    }
}

Migrating Categories

EE categories will be migrated into Drupal’s taxonomy table.

function ConvertCategories($ee) {
  $sql = "SELECT cat_id,parent_id,cat_name from ".$ee['prefix']."categories";
  $eeCategories = mysqlQuery($ee,$sql);
  foreach ($eeCategories as $eeCategory) {
    $taxonomy = taxonomy_get_term_by_name($eeCategory['cat_name']);
    if (!$taxonomy) {
      $new_term = array(
        'vid' => 1,
        'name' => $eeCategory['cat_name'],
      );
      $new_term = (object) $new_term;
      taxonomy_term_save($new_term);
      unset($term);
      echo 'Imported category <strong>'. $eeCategory['cat_name'] . '</strong>'.'<br />';
    } else {
      echo "Category <b>$eeCategory[cat_name]</b> already exists as ".$taxonomy[0]->tid."<br />";
    }
  }
}

Migrating Posts

Migrating ExpressionEngine posts takes a lot of prep-work and data analysis. I found that there was a lot of bad data in our tables. For instance, I found that users created content types, but never used them.

Also EE’s main content table, exp_weblog_data, uses non-descript field names (i.e, field_id_4, field_id_5, field_id_6, etc). The content for each content type is stored in different fields. For instance, our careers content type is stored in field_id_19 (the teaser) and field_id_20 (the body), whereas our news content type is stored in field_id_9 (the teaser) and field_id_10 (the body).

To find the fields that belong to each content type, we first have to find the field_group for the content type. You will find each content type in the exp_weblog table. So for the careers content type, I found that the field_group is 4. I now use this information to query the exp_weblog_fields table:

SELECT * FROM exp_weblog_fields WHERE group_id = 4

The value for the field_id are the fields that are used for that content type. So for careers, content is entered in field_id_19 and field_id_20.

Note: The other rows in the query (21, 22, and 71) are not relevant for the migration.

Field Map

I created field map at the beginning of my post migration script that stores information about each ee field that is relevant for Drupal insertion. The field map array is an associative array whose key is the weblog_id. Each content type has a unique weblog_id (the values can be found in the exp_weblog table).

$weblog_field_map['3'] =  array('teaser' => 'field_id_19', 'body' => 'field_id_20'); //careers
$weblog_field_map['5'] =  array('teaser' => '', 'body' => 'field_id_203', 'start_time' => 'field_id_198', 'end_time' => 'field_id_199', 'start_date' => 'field_id_200', 'end_date' => 'field_id_201', 'location' => 'field_id_202'); //events
$weblog_field_map['6'] =  array('teaser' => 'field_id_9', 'body' => 'field_id_10'); //news
$weblog_field_map['8'] =  array('teaser' => '', 'body' => 'field_id_6'); //page
$weblog_field_map['29'] = array('teaser' => '', 'body' => 'field_id_154', 'company_title' => 'field_id_159', 'image_name' => 'field_id_160', 'sort_order' => 'field_id_163'); //leadership
$weblog_field_map['37'] = array('teaser' => 'field_id_9', 'body' => 'field_id_10'); //whitepapers
$weblog_field_map['44'] = array('teaser' => 'field_id_14', 'body' => 'field_id_190', 'buy_now_url' => 'field_id_226'); //products 
$weblog_field_map['47'] = array('teaser' => '', 'body' => 'field_id_227'); //faqs

Note: Prior to running the migration, make sure that you have created the same content types in Drupal (/admin/structure/type). If you need to create any Drupal modules that require content types, you don’t have to complete the entire module, you really only need to install the module so that the table structure is there prior to migration.

Iterating through ExpressionEngine Data

After creating the field map array, we are ready to begin the migration.

$sql = 'SELECT entry_id, weblog_id, title, author_id, url_title, entry_date, edit_date FROM '.$ee['prefix'].'weblog_titles WHERE status = "open" and (weblog_id NOT IN ( 12, 24, 26, 30, 49, 50, 51 )) order by entry_id';

The exp_weblog_titles table is similar to Drupal’s node table – it stores basic information about each post (such as title and id). I ordered the query by entry_id, so that they would be inserted into Drupal in the same order that they were created in EE.

Also note the ‘not in’. After some analysis, I determined that these content types were test data. Of course, you will have to determine for yourself which content types are relevant within your EE installation. In addition, some content types were created by custom EE modules and do not follow the standard pattern of content insertion, so these required special handling. I basically iterated over two types of queries – one for standard content types and other for special content types.

$msql = 'SELECT entry_id, weblog_id, title, author_id, url_title, entry_date, edit_date FROM '.$ee['prefix'].'weblog_titles WHERE status = "open"';
for ($i=0; $i<2; $i++) {
  $sql = $msql;
  if ($i == 0) {
    $sql .= ' and (weblog_id NOT IN ( 12, 24, 26, 30, 49, 50, 51 )) order by entry_id';
  } else {
    $sql .= ' and (weblog_id IN ( 49, 50, 51 )) ORDER BY weblog_id desc, entry_id';
  }
 
  $titles = mysqlQuery($ee,$sql);
  ...
}

Now I loop through the result. With each iteration, I collect the data that is necessary to be inserted into Drupal (content type, path, author_id, etc). I won’t go into the entire data collection here – for more information, view the script on GitHub.

foreach ($titles as $title) {
  // get the blog name, blog language, and field group
  $type = getWeblogInfo($ee, $title['weblog_id']);
 
  // get the content type
  $contentType = getContentType($type['blog_name']);
 
  // get the fields to query based on content type
  $fields = getEEFields($ee, $type['field_group']);
 
  // get the url title and parent url
  $path = getURLPath($ee, $title['entry_id'], $title['url_title'], $type['blog_name']);
 
  // get the author of the post
  $sql = 'SELECT username FROM '.$ee['prefix'].'members WHERE member_id='.$title['author_id'];
  $eeUser = mysqlQuery($ee,$sql, TRUE);
  $drUser = user_load_by_name($eeUser['username']);
 
  // make weblog_id a string
  $weblog_id = sprintf('%s', $title['weblog_id']);
 
  // run the query
  $sql = 'SELECT '. $fields . ' FROM '.$ee['prefix'].'weblog_data WHERE entry_id='.$title['entry_id'];
  $ee_content = mysqlQuery($ee,$sql, TRUE);
 
  ...
}

Migrating Images

Images are contained in some of the posts. In order to migrate them, I need to first find them. While still in the foreach loop above, I update the image path within each post:

$body = updateImagePath($contentType, $ee_content[$weblog_field_map[$weblog_id]['body']]);

I pass the content type and body content into the updateImagePath function. I use the field map array to retrieve the body content for a particular content type. So if weblog_id is equal to ‘6’

$weblog_field_map[$weblog_id]['body']

becomes

$weblog_field_map['6']['body']
//and the value is: field_id_10

//updateImagePath

function updateImagePath($contentType, $body) {
  // define the default Drupal file path for the images
  $file_path = 'sites/default/files/';
  $images = array();
  $new_body = $body;
 
  // find src= in the body and store in images array
  preg_match_all( '/src="(.*?)"/i', $body, $images ) ;
  $images = $images[1];	
  if (count($images) > 0) {
    // loop through each instance
    for ($i = 0; $i<count($images); $i++) {
      // we don't want to change external images
      if (substr($images[$i], 0, 4) != 'http') {				
        $parts = explode('/', $images[$i]);
	if (count($parts) > 0) {
	  // if there is only 1 level – folder/image name
	  if (count($parts) == 1) {
	    // images stored locally are prefixed with ee tags
	    // for instance: {filedir_1}myimage.jpg
	    // split the string, we only want the image name
	    $noslash = explode('}', $parts[0]);
	    $image = $noslash[1];
	  } else {
	    $image = $parts[count($parts) - 1];
	  }
	  $image_parts = explode('.', $image);
	  if (count($image_parts) > 0) {
	    // get the image type
	    $image_type = $image_parts[1];
	    if (in_array($image_type, array('jpg', 'JPG', 'jpeg', 'png', 'gif'))) {
	      $ee_path_prefix = explode('}', $parts[0]);
	      // we get original image path from the ee tag. So {filedir_1 = _res/img/
	      $original_path = getOriginalImagePath($ee_path_prefix[0]) . $ee_path_prefix[1]; 
	      for ($j=1; $j<count($parts); $j++) {
	        $original_path .= '/' . $parts[$j];
	      }
	      // I made the decision to create a folder for each content type to store images and PDF's.
	      $new_path = $file_path . $contentType . '/' . $image;
	      // replace the old image path with the new
	      $new_body = str_replace($images[$i], $new_path, $body);
	      // copy the files to the new location
	      if (!copy($original_path, $new_path)) {
	        echo "failed to copy $original_path...\n";
	      } else {
	        echo 'copy ' . $new_path . ' was successful <br /><br />';
	      }	
	    }
	  } // if count
	} // if count
      } // if not http
    } // end for $i
  }
  return $new_body;
}

Other Images

Some images are not contained within the body content. In these cases, the process is a bit different. In these instances, you want to insert information into Drupal’s file_managed (link) and file_usage (link) tables. Just make sure that you create the file_managed record before you call node_save (link). Again, you have to do the analysis before the migration to determine which content types are applicable:

if ($title['weblog_id'] == 29) {				
  $file = save_mris_leadership_file($ee_content[$weblog_field_map['29']['image_name']], $drUser->uid); 				
}

If the weblog_id is a specific content type, I want to save the image that is contained in the record to the file_managed table (save_mris_leadership_file).

function save_mris_leadership_file($imageName, $uid) {
  // set the path for the original image
  $img = '_res/img/photos/' . $imageName;
  $image = image_load($img);
  if ($image) {
    // scale the image
    image_scale($image, 112, 160);
 
    // copy the image to the relevant content type's location
    image_save($image, 'sites/default/files/mris_leadership/'. $imageName);
 
    $file = new stdClass;
    $file->uid = $uid;
    $file->filename = $imageName;
    $file->uri = 'public://mris_leadership/' . $imageName;
    $file->status = FILE_STATUS_PERMANENT;
    $file->filemime = mime_content_type($imageName);
 
    // store the image information in the file_managed table
    file_save($file);
 
    // return the image information
    return $file;
  } else {
    return NULL;
  }
}

After the node is saved, we want to add the image info to the file_usage table:

if($node = node_submit($node)) { // Prepare node for saving
  node_save($node);
 
  if ($title['weblog_id'] == 29) {
    if ($file) {
      file_usage_add($file, 'mris_leadership', 'node', $node->nid);
    }
  } 
}

Main Menu Migration

I collected the url information for each post while still in the foreach loop.

// get the url title and parent url
$path = getURLPath($ee, $title['entry_id'], $title['url_title'], $type['blog_name']);

After the node is saved, I store the url information in an associative array whose key is the node’s vid.

function createMainMenu($menuItems) {
  $proceed = createFooterMenu();
  $footerItems = array('Site Map', 'Feedback', 'Privacy Policy', 'Terms of Use', 'Copyright Notice', 'Disclaimer', 'Fair Housing');
  foreach ($menuItems as $key => $value) {
    $link = array();
    if (in_array($value['title'], $footerItems)) {
      $link['menu_name'] = 'menu-footer-menu';
    } else {
      $link['menu_name'] = 'main-menu';
    }
    $parentUrl = rtrim($value['parentUrl'], '/');
 
    // Public records has no menu link. 
    if (!in_array($parentUrl, array('public_records_county', 'public_records_state', 'public_records_federal'))) {
      $levels = explode('/', $parentUrl);			
      //I only want two levels for the main menu
      if (count($levels) == 1) {
        // create an array to store menu items
	// see menu_link_save
	// http://api.drupal.org/api/drupal/includes!menu.inc/function/menu_link_save/7
	$link['link_title'] = $value['title']; 
	$link['link_path'] = 'node/'. $key; 
	//$link['router_path'] = $value['path'];
	if ($value['parentUrl'] != '') {
	  $url_alias = db_query('SELECT source FROM {url_alias} WHERE alias = :pUrl', array(':pUrl' => $parentUrl))->fetchAssoc();
	  echo 'alias source: ' . $url_alias['source'] . '<br />';
	  $parent = db_query('SELECT mlid FROM {menu_links} WHERE link_path = :lnkPath', array(':lnkPath' => $url_alias['source']))->fetchAssoc();
	  $link['plid'] = $parent['mlid'];
	}
	// save the 
	menu_link_save($link);
	unset($link);
      } 
    }
  }
 
  // clear the menu cache
  menu_cache_clear_all();	
  return TRUE;
}

Well, that’s about it. Migrating ExpressionEngine to Drupal can be very frustrating, but it is really not that difficult. The entire script can be found on GitHub. Enjoy.

Be Sociable, Share!

Checkout My New Site - T-shirts For Geeks