I'm Michael Suodenjoki - a software engineer living in Kgs. Lyngby, north of Copenhagen, Denmark. This is my personal site containing my blog, photos, articles and main interests.

Article

Integrating Offline XHTML Validation into FrontPage Improving Standard Compliance with FrontPage

Updated 2011.01.23 17:22 +0100

Ikke tilgængelig på dansk.

By Michael Suodenjoki, michael@suodenjoki.dk.

Version 1.2 August 2002.

Abstract

This article describes how you can improve the quality of your HTML/XHTML pages by integrating offline HTML/XHTML validation into Microsoft FrontPage using James Clark's SGML Conforming Parser (SP).

Related Article: Integrating HTML Tidy into Microsoft FrontPage.

Requirements

The code in this article requires:

Download Source

Download the validation.zip source file for this article.

The validation.zip file contains:

Contents

1 Introduction
    1.1 Introduction to Validation
  1.2 When to Validate
2 The Validator 
3 Integration with FrontPage
    3.1 Customizing the VBA code
    3.2 Customizing the FrontPage menu
4 Conclusion

Appendix A: Integration with your ASP based web server.

1 Introduction

How often have you written web documents in editors or text processors that simply couldn't produce the underlying web language correctly? You may not be aware of it, but most of today HTML editors are not very good at producing valid HTML. As a author of web documents you have an interest in authoring your documents so that your pages can be read in one of the browsers available.

Most of my pages in my personal homepage are written as XHTML documents which are the emerging standard for web documents (see www.w3.org). It's a XML-based version of the HTML standard, with some important differences (among others):

If you want to be sure that your XHTML document can be viewed in browsers only supporting HTML you may follow the guidelines described in the XHTML specification, Appendix C HTML Compatibility Guidelines.

For me it is most important that the code is "pretty", commented and valid with respect to the right standards. And this should be true whether you have written the code by hand in a regular text editor (like notepad) or generated it via a WYSIWYG editor (like FrontPage).

This article describes how you can improve the web documents written with the Microsoft FrontPage editor. I will mainly focus on the XHTML part. Microsoft FrontPage is just one out of many editors in which you can create web documents or manage/edit entire webs (collections of web documents). FrontPage is a fairly decent editor that produce good quality XHTML code, however it's not perfect.

1.1 Introduction to Validation

Whenever you write a web page you do so using a language as e.g. HTML or XHTML. Actually there are many "languages" available that are more or less strict with respect to how you write your code. A web page is said to be valid when it conforms to the syntax (and semantics) of the language specified in its DOCTYPE declaration.

The DOCTYPE declaration - appearing typically as the first code line of your web page - defines which "language" that the page are written with (or is supposed to). There are a few different languages that you may use. The difference between them lay in the maturity (the version) and in the strictness of the syntax. 3 types are typically available for each language. For example for HTML 4.01, these are:

Transitional
The HTML 4.01 Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes (most of which concern visual presentation). For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                    "http://www.w3.org/TR/html4/loose.dtd">
Strict
The HTML 4.01 Strict DTD includes all elements and attributes that have not been deprecated or do not appear in frameset documents. For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                    "http://www.w3.org/TR/html4/strict.dtd">
Frameset
The HTML 4.01 Frameset DTD includes everything in the transitional DTD plus frames as well. For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
                   "http://www.w3.org/TR/html4/frameset.dtd">

The table below summarizes some of the languages. Most web pages are today written in HTML 4.0 or in XHTML 1.0.

Language Description
HTML 3.2 The first widespread HTML version back from 1997. Prior versions where not as standardized.
HTML 4.0 Intermediate version.
HTML 4.01 Latest/current HTML version.
XHTML 1.0 A rewritten HTML based on XML. Currently (June 2002) the latest version.

While most web page WYSIWYG editors today do not validate the web page, this is definitely something that will change in the future. Standardization is the right way forward and validation is a key toward that. Until the editors catch up with the standards we - as web page authors - must validate our pages manually, online as a web service or using external tools (offline validation).

One such (offline) validator is James Clark's SGML Parser (SP for short) which will be used as validator in this article.

1.2 When to Validate

Validation should occur as close to the actual editing of the web page as possible. We can prioritize when to validate:

  1. When the author is writing the web page. The editor should preferably not be able to produce any invalid code. This will ensure that no invalid web page can be authored or saved.
  2. When the user chooses to validate the web document. It should be possible to see easily which errors there are and to look up the actual code to fix them.
  3. When the web page is being uploaded to a web server. Pages which are invalid should not be accepted by the web server.
  4. When the web page has been uploaded to a web server. The online W3C validation service can be used or you can implement an "offline" validation service in your Intranet.

As of today most web pages are either not validated or validated by the author after it has been uploaded. This corresponds to prioritization level 4 above (depending on the quality of your web editor). After reading this article and implementing the code you can push the level to 2. While this is not perfect it's at least a step in the right direction.

2 The Validator

As mentioned previously the validator that we will use will be James Clark's SGML Conforming Parser (SP). But what is SGML? SGML is an acronym for Standard Generalized Markup Language and is a system for defining markup languages. Authors mark up their documents by representing structural, presentational, and semantic information alongside content. HTML is one example of a markup language. In fact most of the underlying languages used on the Internet is based on SGML.

If you want to read more about SGML vs. HTML you can read the first section of the SGML tutorial at http://www.w3.org/TR/html4/
intro/sgmltut.html
.

We are using the SP validator's executable - named nsgml.exe - to do the validation for us. It may be downloaded at http://www.jclark.com/sp/howtoget.htm. If you want to validate XHTML documents you must also retrieve the SGML definitions for XHTML 1.0 from W3 at http://www.w3.org/TR/xhtml1/xhtml1.zip (you may unzip them to SP's pubtext directory).

For our need we must supply nsgmls.exe with a few arguments:

The nsgml.exe takes a lot more arguments which will not be describe here. Full documentation of arguments etc. can be found at http://www.jclark.com/sp/nsgmls.htm

Note that nsgmls.exe will try connecting to the Internet to retrieve the DTD from W3 if you do not have it locally - usually within the pubtext sub folder.

Example of command prompt call validating a XHTML 1.0 file named myfile.xhtml:

  c:\program files\validator\nsgmls -s -c pubtext\xhtml.soc -f outputfile.log myfile.xhtml

You may omit the -f option if you want the output in the console.

Likewise you may validate one of your HTML documents; assuming here that you're using HTML 4.1:

  c:\program files\validator\nsgmls -s -c pubtext\html4.soc myfile.html

At http://ktmatu.com/info/do-it-yourself-offline-html-validator/ there is an excellent description of how you easily can incorporate validation into the Windows explorer

SP's error messages is at first hand quite cryptic, but after a while not that hard to understand. Usually they have information enough to find the troublesome lines. Numbers like "3:12" indicate that there is something wrong in line 3, column 12. We can always build a basic parser to fetch the line, column and error message. I guess this is exactly what they have done at W3's validation service.

Now let's integrate it with FrontPage...

3 Integration with FrontPage

Microsoft FrontPage (version 2000 or 2002) both support Visual Basic for Applications that we can utilize for integrating the validation into the menu system so a simple menu activation will validate our web document.

I have made a single Visual Basic file available (Validate.bas) that essentially wraps the call to the validator within VBA. You may follow the guide in section 3.2 that describes how to incorporate it into FrontPage.

I will not comment the code in details just mention that the basic principle is that the nsgmls.exe executable is called with the current active FrontPage document saved in a temporary file and that we present the results from nsgmls in a dialog.

'
' Validate.bas - Integrating James Clark's SP in Microsoft FrontPage 2000/2002
'
'

Option Explicit

' Specifies path to where you have installed NSGML
Const NSGML_PATH = "C:\Program Files\Validator\"       ' Remember trailing backslash

' Specifies Path to the SP NSGML executable...
Const NSGML_PROGRAM_FILE = NSGML_PATH & "nsgmls.exe"

' Specifies path to a temporary file...
Const TEMP_INPUT_FILE = NSGML_PATH & "input.tmp"
Const TEMP_OUTPUT_FILE = NSGML_PATH & "output.tmp"

' Specifies the input files to SP (nsgml.exe)
Const XHTML1_SOC_FILE = NSGML_PATH & "Pubtext\XHTML.soc"
Const HTML4_SOC_FILE = NSGML_PATH & "Pubtext\HTML4.soc"'
'

'************************************
' VALIDATE_FILE
'
'
Sub Validate_File()

  Dim bFlipToHTMLSource As Boolean
  bFlipToHTMLSource = False

  If ActivePageWindow Is Nothing Then
    MsgBox "Please open a file in the Frontpage Editor.", _
    vbOKOnly Or vbCritical
    Exit Sub
  End If

  If Not ActivePageWindow.ViewMode = fpPageViewNormal Then
    bFlipToHTMLSource = True
    ActivePageWindow.ViewMode = fpPageViewNormal
  End If

  Dim doc As FPHTMLDocument
  Set doc = ActivePageWindow.Document

  Dim fs
  Set fs = CreateObject("Scripting.FileSystemObject")

  Dim ts
  Set ts = fs.CreateTextFile(TEMP_INPUT_FILE)

  ' Write the current frontpage document into a temporary file
  ts.Write doc.DocumentHTML
  ts.Close

  Dim sSocFile ' As String
  Dim sLine ' As String
  Dim nLine ' As Integer

  ' Assume that we're going to validate XHTML 1.0
  sSocFile = XHTML1_SOC_FILE

  ' The following code tries (primatively) to find
  ' which DTD that is specified in the input file, so
  ' that we can choose which SOC file to give nsgml.exe
  Set ts = fs.OpenTextFile(TEMP_INPUT_FILE)
  nLine = 1
  While nLine < 4 And Not ts.AtEndOfStream ' We're only looking in the 4 first lines
    sLine = ts.ReadLine()
    If InStr(sLine, "DTD HTML 4") > 0 Then
      sSocFile = HTML4_SOC_FILE
    End If
    nLine = nLine + 1
  Wend
  ts.Close

  Dim strCmd As String

  ' Build command line
  strCmd = NSGML_PROGRAM_FILE & " -s" & _
           " -c """ & sSocFile & """" & _
           " -f """ & TEMP_OUTPUT_FILE & """" & _
           " """ & TEMP_INPUT_FILE & """"

  'MsgBox strCmd

  ' Excecute the command line...
  ' For more information see
  '   <http://support.microsoft.com/support/kb/articles/Q129/7/96.asp>
  ExecCmd strCmd

  If bFlipToHTMLSource Then
    ActivePageWindow.ViewMode = fpPageViewHtml
  End If

  Dim es
  'Read the TEMP_OUTPUT_FILE and copy the content into the Form_output
  Set es = fs.OpenTextFile(TEMP_OUTPUT_FILE, 1) ' 1=ForReading
  If es.AtEndOfStream Then
    Dim sOutput
    sOutput = "Document successfully validated"
    If sSocFile = XHTML1_SOC_FILE Then
      sOutput = sOutput & " as XHTML 1.0"
    Else
      sOutput = sOutput & " as HTML 4.0"
    End If
    sOutput = sOutput & ". No errors reported."
    Form_tidy_output.TextBox_tidy_output.Text = sOutput
  Else
    Form_tidy_output.TextBox_tidy_output.Text = es.ReadAll
  End If
  Form_tidy_output.Caption = "Validation Result"
  Form_tidy_output.Show

  Exit Sub

  ValidationError:
    MsgBox "Validation could not execute correctly. " & _
           "No changes have been carried out." & Chr(10) & _
           "Error # " & CStr(Err.Number) & " " & Err.Description, _
           vbOKOnly Or vbCritical
End Sub

3.1 Customizing the VBA code

The VBA code cannot be used directly but must be customized to the location of where you have SP located. Five string constants should be defined. If you have installed everything in the a single folder with the 'pubtext' sub folder beneath it should enough to fix up the NSGML_PATH constant.

' Specifies path to where you have installed NSGML
Const NSGML_PATH = "C:\Program Files\Validator\"    ' Remember trailing backslash

' Specifies Path to the SP NSGML executable...
Const NSGML_PROGRAM_FILE = NSGML_PATH & "nsgmls.exe"

' Specifies path to a temporary file...
Const TEMP_INPUT_FILE = NSGML_PATH & "input.tmp"
Const TEMP_OUTPUT_FILE = NSGML_PATH & "output.tmp"

' Specifies the input files to SP (nsgml.exe)
Const XHTML1_SOC_FILE = NSGML_PATH & "Pubtext\XHTML.soc"
Const HTML4_SOC_FILE = NSGML_PATH & "Pubtext\HTML4.soc"

You may add extra error level check after an execution of ExecCmd. ExecCmd() returns the error level from the executed file. For Tidy, "0" means "OK", "1" means "There are warnings", "2" means "There are errors". When errors occur, Tidy can't continue. One could simply add something like this:

If ExecCmd(strCmd) > 1 Then
  ...
  Exit Sub
End If

3.2 Customizing the FrontPage menu

This section shows you how to customize the FrontPage menu with an extra menu with the call to our VBA function.

How to guide:

  1. Open FrontPage.
  2. Activate the 'Tools|Macro|Visual Basic Editor' menu item. This should open up the VBA editor of FrontPage.
  3. Right click the Modules folder and select 'Import File...' and import the ExecuteCmd.bas file. Activate import file again and select the Validate.bas file. See figure below. You should now have at least two modules in the Modules folder - one named ExecuteCmd and one named Validate. If there is an empty Module1 module you can safely delete it.

    Figure illustrating the import file feature of VBA.
    » Figure illustrating the import file feature of VBA.

  4. Import the form file Form_output.frm which defines the dialog that will be used to display the (output) result from our call the external program.
  5. Go into the Validate module and define the 5 constants as described earlier in section 3.1
  6. Close the VBA editor by activating 'File|Close' or press Alt+Q. You're now back in FrontPage.
  7. Select 'Tools|Customize..." to open up the Customize dialog.
  8. Select the Commands tab in the Customize dialog (see figure below).

    The Tools|Customize dialog.
    » The Tools|Customize dialog.

  9. Select New Menu in the categories list box. The right hand side list box should contain at least one command available named "New Menu".
  10. Click and drag the "New Menu" command up to the main menu of FrontPage at a preferred location e.g. after the Format menu item. The location where you insert should be marked by a vertical insertion bar.
  11. Right click at your newly inserted menu item to open a special customize context menu.
  12. The context menu contain a menu item called New Name where you can specify a name for your menu item. I've used the "E&xtras" name, where the ampersand indicates which letter that acts as shortcut - and therefore will be underlined.
  13. In the Customize dialog select the Macro item from the categories list box. Two new commands should now be available at the right hand side. Select the "Custom Menu Item" and drag it to your new Extras menu.
  14. Again select the context menu by right clicking and give the new menu item a name e.g. "&Validate Document".
  15. Within the context menu opened use the "Assign macro" item to specify which macro to execute when the menu item is activated by the end-user. Select the Validate macro (the one coming from the Validate VBA module).
  16. Close the Customize dialog.
  17. You should now be up and running. Test that everything works.

4 Conclusion

I have shown you how you can integrate the SP validator into FrontPage and thereby improve the overall quality of your web documents in an easy manner.

There are of course things that could be improved. Among other things it would be nice to:

I would like to thanks James Clark, jjc@jclark.com for making the SGML Conforming Parser (SP) available free for use for everybody.

Nice authoring.

Appendix A: Integration with your ASP based web server

This appendix describes some help topics that may be of use if you want to put the validation into an ASP-based web server - like Microsoft's Internet Information Server (IIS). In this way you can provide validation on pages in your local Intranet.

First install the SP validator in a new folder of your main folder of your Intranet Web. For example you could name it 'executables'. See figure below for basic folder layout.

Figure illustrating a possible folder layout of you web server.
» Figure illustrating a possible folder layout of you web server.

Ensure that the new folder is setup as an executable folder.

Remember to setup folder/file permissions for the IWAM_<server name> "user", so that "he" can access the files to validate. Furthermore the IWAM_<server name> user most also have execute permissions to the folder where the nsgml.exe file is located.

I'm not a web administrator myself so be carefully with tampering with the webs security setup.

Create a validate.asp ASP server file that you store in another folder, e.g. in the 'validate' folder as indicated in the figure above. The validate.asp may look something like:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html>

  <head>
    <meta http-equiv="Content-Language" content="en-us">
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

    <title>Validation</title>

    <meta name="author" content="Michael Suodenjoki" />
    <meta name="description" content="Validate Service" />
    <meta name="keywords" content="validate,validation,service" />
    <meta name="keywords" content="correctness,wellformness" />
    <meta name="robots" content="index,follow" />

    <!--#include virtual="/include/styles.inc"-->
  </head>

<body>

<!--#include virtual="/include/header.inc"-->

<h1>Validation Service</h1>

<div class="box180">
  Usefull links:
  <ul>
    <li><a href="http://www.htmlhelp.com/reference/html40/alist.html">
       HTML 4.0 Elements (Alphabetical list)</a>
    </li>
  </ul>
</div>

<p>The following page has been validated for XHTML :</p>

<%
Response.Write("<p>http://" & Request.ServerVariables("SERVER_NAME") & 
  Request.QueryString("url") & "</p>")

Dim sValidateFile

sValidateFile = Server.MapPath(Request.QueryString("url"))

' Debug messages (currently outcommented)
'Response.Write("<p>" & Request.QueryString("url") & "</p>")
'Response.Write("<p>" & sValidateFile & "</p>")
'Response.Write("<p>" & Request.ServerVariables("SCRIPT_NAME")& "</p>")
'Response.Write("<p>" & Server.MapPath(Request.ServerVariables("SCRIPT_NAME"))
  & "</p>")

Dim oWSH,sCmdLine

Set oWSH = Server.CreateObject("WScript.Shell")

' Debug message (currently outcommented)
'Response.Write("Executable: " & Server.MapPath("/executables/nsgmls.exe") & "<br/>" )

' Create a temporary file name (that is used for output)
Dim oFSO ' As Object - FileSystemObject.
Dim sTempFile ' As String - temporary file name

Set oFSO = Server.CreateObject("Scripting.FileSystemObject")

sTempFile = Server.MapPath("/executables") & "\" & oFSO.GetTempName()

' Build command line
'
' Example:
' C:\Validator\SP\bin\nsgmls -s -c C:\Validator\SP\pubtext\xhtml.soc 
'   -f %TEMP%\validation-results.txt %1
'
sCmdLine = Server.MapPath("/executables/nsgmls.exe") & " -s" & _
           " -c " & Server.MapPath("/executables/pubtext/xhtml.soc") & _
           " -f " & sTempFile & _
           " """ & sValidateFile & """"

' Debug output - write command line...
'Response.Write("<p>Command line: &quot;" & sCmdLine & "&quot;</p>" )

' Execute the command line
Call oWSH.Run(sCmdLine,1,True)

'Call oWSH.Run("notepad.exe",5,False)
%>


<%
' Read the result and present it
Dim oFile
Set oFile= oFSO.OpenTextFile(sTempFile, 1) '1=ForReading

Dim sResult
If oFile.AtEndOfStream <> True Then
  sResult = oFile.ReadAll()
End If 
'Response.Write("<div>" & sResult & "</div>" )
oFile.Close

Set oFile = oFSO.OpenTextFile(sValidateFile,1) '1=ForReading
Dim sSource
sSource = oFile.ReadAll()
oFile.Close

' Delete the temporary file...
oFSO.DeleteFile(sTempFile)

Set oFSO = Nothing
Set oWSH = Nothing 
%>

<h2>Validation Result</h2>

<%
Dim sSourceLines
sSourceLines = Split(sSource,vbCrLf)

Dim sLines, nLastLine
sLines = Split(sResult, vbCrLf)
nLastLine = ubound(sLines)

'
' 'Report Error'
'
' This function parses a SGML error line and reports it in a more nice way.
'
Function ReportError( sErrorLine )
  Dim sElems

  sElems = Split(sErrorLine, ":" )

  ' 0 = sElems(0) = Drive of exe
  ' 1 = sElems(1) = File name of exe
  ' 2 = sElems(2) = Drive of web page
  ' 3 = sElems(3) = File name of web page file
  ' 4 = sElems(4) = Line number
  ' 5 = sElems(5) = Column
  ' 6 = sElems(6) = Type of error 'E' = error
  ' 7 = sElems(7) = Error Message

  If sElems(2)="E" Then
    Response.Write( "<p>Error: " & sElems(3) & ":" & sElems(4) & "</p>" )
  Else
    If sElems(6)="E" Then
      Response.Write( "<li>Line <a href='#" & sElems(4) & "'>" & sElems(4) & _
        "</a> Column " & sElems(5) & " - " )
      Response.Write( "Error: " )
      If 7 <= ubound(sElems) Then
        Response.Write( sElems(7) )
      End If 
      Response.Write( "</li>" )
      Response.Write( "<blockquote class='syntax'>" ) 
      Response.Write( Replace(sSourceLines(sElems(4)-1),"<","&lt;") & "<br/>" )
      Response.Write( Replace(Space(sElems(5))," ","&nbsp;") & _
        "<span style='color:red'>^</span>" )
      Response.Write( "</blockquote>" )
    End If
  End If 
End Function

'
' 'ReportErrors'
'
' Function to reports errors from a SGML output file (assuming 
' these a present in sLines array.
'
Function ReportErrors()
  nLine = 0
  While nLine < nLastLine 
    Call ReportError(sLines(nLine))
    'Response.Write nLine & ": " & sLines(nLine) & "<br/>"
    nLine=nLine+1
  Wend
End Function 

If nLastLine > 0 Then
  Call ReportErrors()
Else
  Response.Write("<p>No Errors Found.</p>")
End If 

%>

<h2>Source Listing</h2>

<p>Below is the source input used for this validation:</p>

<pre class="syntax">
<%
nLastLine = ubound(sSourceLines)

nLine = 0
While nLine < nLastLine 
  Response.Write "<a name=" & nLine+1 & ">" & nLine+1 & "</a>: " & _
    Replace(sSourceLines(nLine),"<","&lt;") & "<br/>"
  nLine=nLine+1
Wend
%>
</pre>

<!--webbot bot="PurpleText" PREVIEW="Footer Included Here..." -->
<!--#include virtual="/include/footer.inc"-->
</body>
</html>

To validate a web page you should give the validate.asp a parameter named url, e.g. as:

  http://www.myweb.com/validate.asp?url=/files_to_check/mypage.html

Hope you can use it.

Valid XHTML 1.0!