Extensionless URLs with Adobe Experience Manager

Many SEO experts argue that as a best practice, your website’s pages should have extensionless URLs. For example, www.mysite.com/page-about-dogs/ would be more SEO optimized than www.mysite.com/page-about-dogs.html. Discussing the merits of this rule is another post for another day (look for one later), but sometimes this requirement just comes up. It may seem a little daunting at first, given the importance of extensions in Apache Sling, but I’ve put together a fairly simple solution that will allow you to use extensionless URLs in Adobe Experience Manager. This solution also gives you more flexibility than the “out of the box” way to handle this in CQ5.5: The Link Checker Transformer. The solution consists of two parts: 1) a set of simple mod_rewrite rules to add to your Apache web server configuration and 2) an implementation of a Sling Rewriter Transformer. The former translates the extensionless URL requests into a request that Apache Sling can resolve, by adding an extension as it passes through the request, unbeknownst to the site visitor. The latter works like the AEM Link Checker by scanning pages upon request (server-side) for “href” attributes and replacing the appropriate extensions within them. This way the internal links in your site will have the correct URL without the author having to be aware of the “extensionless” requirement. For information on the “rewrite” module and how to configure it, check out this page. Assuming you understand the basics, here are the rewrite rules needed to consume extensionless links:

# Handle request with no slash and no extension
RewriteCond %{REQUEST_URI} !^/content/dam/.*
RewriteCond %{REQUEST_URI} !.*\..*$
RewriteCond %{REQUEST_URI} !.*/$
RewriteRule (.*)$ $1/ [R,QSA]
 
# Handle requests to pages ending with .html
RewriteCond    %{REQUEST_URI} !^/content/dam/.*
RewriteCond    %{REQUEST_URI} .*.html$
RewriteRule    (.*).html$ $1/ [R,QSA]
 
# Handle requests to pages ending with a trailing slash
RewriteCond     %{REQUEST_URI} !^/content/dam
RewriteCond     %{REQUEST_URI} .*/$
RewriteCond     %{REQUEST_URI} !^/$
RewriteRule     (.*)/$ $1.html [PT,L,QSA]
- See more at: http://www.citytechinc.com/us/en/blog/2013/04/extensionless-urls-in-adobe-experience-manager.html#sthash.F6LOnSdu.dpuf

These rules do the following (respectively):

  • Redirect (302) an extensionless request with no trailing forward slash to the same URL with the forward slash.
  • Redirect (302) a request with a “.html” extension to the same URL with a forward slash instead of the extension.
  • Pass through (200) a request ending in a forward slash, but adding the “.html” extension behind the scenes (so Sling can resolve appropriately.)  This rule must come last.

With these rules, Apache handles interpreting requests that have no extensions, but it can do nothing about relative links that are generated within AEM (i.e. any link created with a dialog). Technically, all your links will still work, because of the second rule above, but those links will still include “.html” in the page’s markup. As of CQ version 5.5, the Link Checker Transformer component exposes an OSGi configuration, labeled “Strip HTML Extension”, but it will not replace the extension with a forward slash. If that configuration is appropriate for your setup, you can stop here. If you are using the rest of this custom solution, DO NOT check “Strip HTML Extension.”

To replace “.html” extensions with a forward slash, I implemented a Sling Rewriter Transformer, which uses the same plumbing as the aforementioned Link Checker Transformer. Here is the Apache documentation about what’s going on. In summary, my Transformer scans a page for “href” attributes. When it finds one and determines that it’s a relative (internal) link, it replaces “.html” with a forward slash, then lets the request continue through. The implementation uses the Apache Cocoon SAX implementation to manipulate the rendering HTML elements. Per Sling, the Transformer requires an associated factory class, which becomes the actual OSGi service (the transformer itself is just a class, not an OSGi service).

I’ve posted the implementation, here, as a Gist. You will also have to create an OSGi configuration, scoped to your application, which will tell Sling to hook into the new Transformer component. In the example below, the “transformerTypes” property corresponds to the “pipeline.type” property, set in the factory class. Below is the XML version of the required nodes, which would reside beneath /apps/<myapp>/config:

<rewriter
   jcr:primaryType="sling:Folder">
   <mytransformer
       jcr:primaryType="nt:unstructured"
       contentTypes="[text/html]"
       enabled="{Boolean}true"
       generatorType="htmlparser"
       order="0"
       paths="[/content/myapp, /content/anotherapp]"
       serializerType="htmlwriter"
       transformerTypes="[mylinktransformer]" />
</rewriter>
- See more at: http://www.citytechinc.com/us/en/blog/2013/04/extensionless-urls-in-adobe-experience-manager.html#sthash.F6LOnSdu.dpuf

Now, any links generated by AEM will be extensionless and because you already implemented the mod_rewrite rules, the requests can be resolved through the web server. This is a pretty brute-force change, but one that works automatically. The catch is that it requires the use of a web server to resolve extensionless links, because of the use of rewrite rules. Therefore, you need to be careful how you apply the configuration. You’ll never want to apply it to an author instance, and you may not want to apply it to local development publish instances. You can use run modes and appropriate OSGi configuration nodes to scope the functionality to the environments that use a properly configured web server. Take a look at the code and borrow what you want. It will be easy to tweak the URL transforming logic in LinkTransformer.startElement(), if it doesn’t suit your needs exactly.

Leave a Reply

avatar
  Subscribe  
Notify of