• Skip to main content
  • Skip to primary sidebar

Ryan McCormick

MS Word Document text to String with VBA

November 23, 2015 by Ryan Leave a Comment

I once had a project where I had to work with a large directory of unsorted word documents. The main directory had a deep sub-directory where there were multiple versions of the same document with the same file name. It was a mess.

Basically, my responsibilities were to pull out unique document filenames with the most recently modified version and parse each document for a unique key-id using regular expressions. I won’t get into all of the details, but basically I found that the easiest way to run regex searches was to break each document into raw text and then work from there.

For this post, I built both an early (needs references) and late binding version of the function I used to extract text from each Word Document for my project.

NOTE: If don’t need all of the inner document content, I added a method to pull a fragment by start, stop index. To use, you must comment out the line “docContent = oWdoc.Content” and un-comment out the line “docContent = oWdoc.Range(0, 500)”. Adjust the 0 (start) and 500 (stop) to your needs.

Extract Text From MS Word Document with VBA – Early Binding Example

'----------------------------------------------------
' Get Text From MS Word Document (Early Binding)
'----------------------------------------------------
' NOTE: To use this code, you must reference
' The Microsoft Word 14.0 (or current version)
' Object Library by clicking menu Tools > References
' Check the box for:
' Microsoft Word 14.0 Object Library in Word 2010
' Microsoft Word 15.0 Object Library in Word 2013
' Click OK
'----------------------------------------------------
Function getWordDocText(iFile) As String
    Dim oWord As Word.Application
    Dim oWdoc As Word.Document
    Dim docHeader As String
    Dim docFooter As String
    Dim docContent As String
        
    ' Initialize Word Objects
    '---------------------------------
    Set oWord = New Word.Application
    Set oWdoc = oWord.Documents.Open(iFile)
    
    ' Get Content From Document
    '---------------------------------
    ' Get primary header
    docHeader = oWdoc.Sections(1).Headers(1).Range.Text
    
    ' Get primary footer
    docFooter = oWdoc.Sections(1).Footers(1).Range.Text
    
    ' Get document content
    docContent = oWdoc.Content
    '---------------------------------
    ' Limit to first 500 characters of
    ' main document content. Uncomment
    ' to use and adjust accordingly:
    '---------------------------------
    'docContent = oWdoc.Range(0, 500)
    '---------------------------------
        
    ' Return Document Content
    '---------------------------------
    getWordDocText = docHeader & vbNewLine & docContent & vbNewLine & docFooter
    
    ' Clear Memory
    '---------------------------------
    oWdoc.Close
    oWord.Quit
    Set oWdoc = Nothing
    Set oWord = Nothing
End Function

Extract Text From MS Word Document with VBA – Late Binding Example

'----------------------------------------------------
' Get Text From MS Word Document (Late Binding)
'----------------------------------------------------
' NOTE: This is the late binding version of the
' Get Text From MS Word Document code. No reference
' to Microsoft Word XX.0 Object Library is needed
'----------------------------------------------------
Function getWordDocText(iFile) As String
    Dim oWord As Object
    Dim oWdoc As Object
    Dim docHeader As String
    Dim docFooter As String
    Dim docContent As String
    
    ' Initialize Word Objects
    '---------------------------------
    Set oWord = CreateObject("Word.Application")
    Set oWdoc = oWord.Documents.Open(iFile)
    
    ' Get Content From Document
    '---------------------------------
    ' Get primary header
    docHeader = oWdoc.Sections(1).Headers(1).Range.Text
    
    ' Get primary footer
    docFooter = oWdoc.Sections(1).Footers(1).Range.Text
    
    ' Get All Main Document Content
    docContent = oWdoc.Content
    '---------------------------------
    ' Limit to first 500 characters of
    ' main document content. Uncomment
    ' to use and adjust accordingly:
    '---------------------------------
    'docContent = oWdoc.Range(0, 500)
    '---------------------------------
    
    ' Return Document Content
    '---------------------------------
    getWordDocText = docHeader & vbNewLine & docContent & vbNewLine & docFooter
    
    ' Clear Memory
    '---------------------------------
    oWdoc.Close
    oWord.Quit
    Set oWdoc = Nothing
    Set oWord = Nothing
End Function

As always, please comment with questions, issues, etc…

Filed Under: Microsoft Access, Microsoft Excel, Microsoft Word, VBA Tagged With: extract text, ms word document, vba

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Recent Posts

  • Force Quit Kill all Chrome Windows MacOS
  • SOLVED: Angular 6 CLI Karma Stuck in Single Run | Karma Stops Running
  • How to Manually Install Java 8 on Ubuntu 18.04 LTS
  • Remove VirtualBox from Ubuntu 16.04 Xenial
  • Clear all Node Modules Folders Recursively Mac/Linux

Recent Comments

  1. KKV on Webstorm adding spaces between imports and braces | JavaScript and TypeScript
  2. jusopi on Clear all Node Modules Folders Recursively Mac/Linux
  3. Qaisar Irfan on Clear all Node Modules Folders Recursively Mac/Linux
  4. mustafa on Remove VirtualBox from Ubuntu 16.04 Xenial
  5. Pourya on How to Manually Install Java 8 on Ubuntu 18.04 LTS

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in