1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > php 读取word文档_如何使用PHP制作Microsoft Word文档

php 读取word文档_如何使用PHP制作Microsoft Word文档

时间:2021-10-08 16:46:39

相关推荐

php 读取word文档_如何使用PHP制作Microsoft Word文档

php 读取word文档

As I had pointed out in my previous article, PHP and WMI – Dig deep into Windows with PHP, we do live in a world where we PHP devs have to deal with the Windows operating system from time to time. WMI (Windows Management Interface) is one such occasion and Microsoft Office Interop is another – an even more important and more frequently used one.

正如我在前一篇文章PHP和WMI中所指出的那样-通过PHP深入研究Windows ,我们确实生活在一个这样的世界中,我们PHP开发人员必须不时地处理Windows操作系统。 WMI(Windows管理界面)就是这样一种情况,而Microsoft Office Interop是另一种场合-更重要且更频繁地使用。

In this article, we will see a simple integration between Word and PHP: to generate a Microsoft Word document based on the inputs in an HTML form using PHP (and its Interop extension).

在本文中,我们将看到Word和PHP之间的简单集成:使用PHP(及其Interop扩展名)基于HTML表单中的输入来生成Microsoft Word文档。

准备工作 (Preparations)

First, please make sure a typical WAMP environment has been set up in your Windows development machine. As Interop is purely a Windows feature, we will have to host Apache and PHP under Windows. In this instance, I am using EasyPHP 14.1, which is quite easy to install and configure.

首先,请确保在Windows开发计算机中设置了典型的WAMP环境。 由于Interop纯粹是Windows的功能,因此我们必须在Windows下托管Apache和PHP。 在这种情况下,我使用EasyPHP 14.1 ,它非常容易安装和配置。

Next, we will have to install Microsoft Office. Its version is not that critical. I am using Office Pro but any Office version later than should work.

接下来,我们将必须安装Microsoft Office。 它的版本不是那么关键。 我正在使用Office Pro,但是之后的任何Office版本都可以使用。

We then have to make sure the libraries to develop an Interop application (called PIA, Primary Interop Assemblies) are installed. To ascertain this, we can open the Windows Explorer and navigate to:<Windows Directory>\assemblyand we will see a bunch of installed PIAs:

然后,我们必须确保已安装用于开发Interop应用程序的库(称为PIA,主要Interop程序集)。 为了确定这一点,我们可以打开Windows资源管理器并导航到:<Windows Directory>\assembly,我们将看到一堆已安装的PIA:

We see aMicrosoft.Office.Interop.Wordentry (underlined in the snapshot). This will be the PIA we use in this demo. Please pay special attention to its “Assembly Name”, “Version” and “Public Key Token”. These are to be used in our PHP scripts very soon.

我们看到一个Microsoft.Office.Interop.Word条目(在快照中带有下划线)。 这将是我们在此演示中使用的PIA。 请特别注意其“程序集名称”,“版本”和“公钥令牌”。 这些将很快在我们PHP脚本中使用。

In this directory, we can also see other PIAs (including the whole Office family) available for programming (not only for PHP, but also for , C#, etc)

在此目录中,我们还可以看到可用于编程的其他PIA(包括整个Office系列)(不仅适用于PHP,还适用于,C#等)

If the PIAs list does not include the whole package ofMicrosoft.Office.Interop, we will either re-install our Office and include PIA features; or we have to manually download the package from Microsoft and install it. Please consult this MSDN page for detailed instructions.

如果PIA列表不包含Microsoft.Office.Interop的整个软件包,则我们将重新安装Office并包含PIA功能;否则,我们将重新安装Office。 否则我们必须从Microsoft手动下载该软件包并进行安装。 请查阅此MSDN页面以获取详细说明。

NOTE:Only Microsoft Office PIA Redistributable is available to download and install. The PIA version in this package is 14.0.0. Version 15 only comes with Office installation.

注意:仅Microsoft Office PIA Redistributable可下载和安装。 该软件包中的PIA版本是14.0.0。 版本15仅随Office 安装一起提供。

Finally, we have to enable the PHP extensionphp_com_dotnet.dllin thephp.inifile and restart the server.

最后,我们必须在php.ini文件中启用PHP扩展名php_com_dotnet.dll并重新启动服务器。

Now we can move on to the programming.

现在我们可以继续编程了。

HTML表格 (The HTML form)

As the focus of this demo is on the back end processing, we will create a simple front end with a simple HTML form, which looks like the figure below:

由于此演示的重点是后端处理,因此我们将使用简单HTML表单创建一个简单的前端,如下图所示:

We have a text field for “Name”, a radio button group for “Gender”, a range control for “Age” and a text area for “Message”; and finally, of course, a “Submit” button.

我们有一个用于“名称”的文本字段,一个用于“性别”的单选按钮组,一个用于“年龄”的范围控件以及一个用于“消息”的文本区域; 最后,当然,还有一个“提交”按钮。

Save this file as “index.html” in an directory under the virtual host’s root directory so that we can access it with a URI likehttp://test/test/interop.

在虚拟主机根目录下的目录中将该文件另存为“ index.html”,以便我们可以使用诸如http://test/test/interop的URI对其进行访问。

后端 (The back end)

The back end PHP file is the focus of our discussion. I will first list the code of this file, and then explain it step by step.

后端PHP文件是我们讨论的重点。 我将首先列出该文件的代码,然后逐步说明。

<?php$inputs = $_POST;$inputs['printdate']=''; // A dummy value to avoid a PHP notice as we don't have "printdate" in the POST variables. $assembly = 'Microsoft.Office.Interop.Word, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c';$class = 'Microsoft.Office.Interop.Word.ApplicationClass';$w = new DOTNET($assembly, $class);$w->visible = true;$fn = __DIR__ . '\\template.docx';$d = $w->Documents->Open($fn);echo "Document opened.<br><hr>";$flds = $d->Fields;$count = $flds->Count;echo "There are $count fields in this document.<br>";echo "<ul>";$mapping = setupfields();foreach ($flds as $index => $f){$f->Select();$key = $mapping[$index];$value = $inputs[$key];if ($key == 'gender'){if ($value == 'm')$value = 'Mr.';else$value = 'Ms.';}if($key=='printdate')$value= date ('Y-m-d H:i:s');$w->Selection->TypeText($value);echo "<li>Mappig field $index: $key with value $value</li>";}echo "</ul>";echo "Mapping done!<br><hr>";echo "Printing. Please wait...<br>";$d->PrintOut();sleep(3);echo "Done!";$w->Quit(false);$w=null;function setupfields(){$mapping = array();$mapping[0] = 'gender';$mapping[1] = 'name';$mapping[2] = 'age';$mapping[3] = 'msg';$mapping[4] = 'printdate';return $mapping;}

After setting up the$inputsvariable to hold the values posted from our form, and creating a dummy value forprintdate– we will discuss why we need this later – we come across these four critical lines:

设置完$inputs变量以保存从表单中发布的值,并为printdate创建一个虚拟值之后,我们将讨论为什么以后需printdate–我们遇到了以下四个关键方面:

$assembly = 'Microsoft.Office.Interop.Word, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c';$class = 'Microsoft.Office.Interop.Word.ApplicationClass';$w = new DOTNET($assembly, $class);$w->visible = true;

A COM manipulation in PHP requires an instantiation of a “class” within an “assembly“. In our case, we are to operate with Word. If we reflect on the first screenshot we showed, we will be able to construct the full signature of the Word PIA:

PHP中的COM操纵需要在“程序集”中实例化“”。 在我们的情况下,我们将使用Word进行操作。 如果我们回顾显示的第一个屏幕截图,我们将能够构造Word PIA的完整签名:

“Name”, “Version”, “Public Key Token” are all taken from the information displayed when we browse to “c:\Windows\assembly“.

当浏览到“c:\Windows\assembly”时,“名称”,“版本”,“公钥令牌”均取自显示的信息。

“Culture” is alwaysneutrual.

“文化”永远是neutrual

The class we are to invoke is always the assembly’s name plus “.ApplicationClass“.

我们要调用的类始终是程序集的名称加上“.ApplicationClass”。

With these two parameters set, we will be able to instantiate a Word object.

设置了这两个参数后,我们将能够实例化Word对象。

This object can stay in the background or we can bring it to the foreground by setting itsvisibleattribute totrue.

该对象可以留在后台,也可以通过将其visible属性设置为true来将其置于前景。

Next, we open the document to be processed and assign the “document” instance to a$dvariable.

接下来,我们打开要处理的文档,并将“文档”实例分配给$d变量。

In that document, to create content based on the inputs from the HTML form, we have a few options.

在该文档中,要基于HTML表单的输入来创建内容,我们有一些选择。

The most unfavorable way is to hard code all the contents in PHP and then output into the Word document. I strongly discourage this due to the following reasons:

最不利的方法是硬编码PHP中的所有内容,然后输出到Word文档中。 我强烈不建议这样做,原因如下:

There will be no flexibility. Any change in the output will require modification of the PHP script. 没有灵活性。 输出中的任何更改都将需要修改PHP脚本。 It violates the separation between control and presentation. 它违反了控制和表示之间的分隔。 It will drastically increase the lines of code if we are to apply styles to the document contents (alignment, font, style, etc). Programmatically changing styles is too cumbersome.如果我们要将样式应用于文档内容(对齐方式,字体,样式等),它将大大增加代码行。 以编程方式更改样式太麻烦了。

Another way is to do a “search-replace”. PHP has strong built-in capabilities in doing this. We can create a Word document putting some special delimiters around the placeholder contents that are to be replaced. For example, we can create a document containing something like this:

另一种方法是进行“搜索替换”。 PHP具有强大的内置功能。 我们可以创建一个Word文档,在要替换的占位符内容周围放置一些特殊的分隔符。 例如,我们可以创建一个包含如下内容的文档:

{{name}}

{{name}}

and in PHP, we can simply replace this with the “Name” value we retrieved from the form submission.

在PHP中,我们可以简单地将其替换为从表单提交中检索的“名称”值。

This is straightforward and avoids all the disadvantages in the first option. We just need to find the right delimiter, and in this case, we are more like doing a template rendering, except that the template used is now a Word document.

这很简单,避免了第一种选择的所有缺点。 我们只需要找到正确的定界符,在这种情况下,我们更喜欢执行模板渲染,只是现在使用的模板是Word文档。

The third option is my recommendation and is an advanced topic in Word. We will use fields to represent the placeholders, and in our PHP code, we will directly update the fields with respective form values.

第三个选项是我的建议,并且是Word中的高级主题。 我们将使用字段来表示占位符,并且在我们PHP代码中,我们将使用各自的表单值直接更新字段。

This approach is flexible, fast and conforms with Word’s best practices. It also avoids full text search in the documents, which helps performance. Note that this option has its drawbacks too.

这种方法灵活,快速,并且符合Word的最佳做法。 它还避免了在文档中进行全文搜索,从而提高了性能。 请注意,此选项也有其缺点。

Word, ever since its debut, has never supported named indexes for fields. Even though we provided a name for the fields we created in the Word document, we still have to use number subscripts to access each field. This also explains why we have to use a dedicated function (setupfields) to do the manual mapping between the field index and the name of the form fields.

自从首次出现以来,Word就从不支持字段的命名索引。 即使我们为在Word文档中创建的字段提供了名称,我们仍然必须使用数字下标来访问每个字段。 这也解释了为什么我们必须使用专用功能(setupfields)在字段索引和表单字段名称之间进行手动映射。

To learn how to insert fields in a Word document (click here for a ready-made version), please consult the relevant Word help topics and manuals. For this demo, we have a document with 5MERGEFIELDfields. Also, we placed the document in the same directory as the PHP script for easy access.

若要了解如何在Word文档中插入字段(单击此处以获取现成的版本),请查阅相关的Word帮助主题和手册。 对于此演示,我们有一个包含5个MERGEFIELD字段的文档。 另外,我们将文档与PHP脚本放在同一目录中,以便于访问。

Please note, the fieldprintdatedoes not have a corresponding form field. That is why we added a dummyprintdatekey to the$inputsarray. Without this, the script can still run but there will be notice saying that the indexprintdateis not presented in the$inputsarray.

请注意,字段printdate没有相应的表单字段。 这就是为什么我们在$inputs数组中添加了一个虚拟的printdate键。 没有这个,脚本仍然可以运行,但是将会注意到,在$inputs数组中没有显示索引printdate

After updating the fields with form values, we will print the document using:

使用表单值更新字段后,我们将使用以下方法打印文档:

$d->PrintOut();

ThePrintOutmethod has a few optional parameters and we are using its simplest form. This will print one copy to the default printer connected to our Windows machine.

PrintOut方法具有一些可选参数,我们正在使用其最简单的形式。 这会将一份副本打印到连接到我们Windows计算机的默认打印机上。

We can also choose to usePrintPreviewto take a look at the output before we decide to print the document. In a purely automated environment, we will of course usePrintOutinstead.

在决定打印文档之前,我们还可以选择使用PrintPreview查看输出。 在纯自动化环境中,我们当然会改用PrintOut

We have to wait for a few seconds before we quit the Word application because the printing job needs some time to be fully spooled. Withoutdelay(3),$w->Quitgets executed immediately and the printing job gets killed too.

在退出Word应用程序之前,我们必须等待几秒钟,因为打印作业需要一些时间才能完全被后台处理。 没有delay(3)$w->Quit立即执行,打印作业也被杀死。

Finally, we call$w->Quit(false)to close the Word application invoked by our PHP script. The only parameter provided here is to specify if we want to save changes before quitting. We did make changes to the document but we really don’t want to save them because we want to keep a clean template for other users’ input.

最后,我们调用$w->Quit(false)以关闭由PHP脚本调用的Word应用程序。 此处提供的唯一参数是指定我们是否要在退出之前保存更改。 我们确实对文档进行了更改,但我们确实不想保存它们,因为我们想保留一个干净的模板供其他用户输入。

After we complete the code, we can load the form page, input some values and submit the form. The below images show the output of the PHP script and also the updated Word document:

完成代码后,我们可以加载表单页面,输入一些值并提交表单。 下图显示了PHP脚本的输出以及更新的Word文档:

提高编码速度并更多地了解PIA (Improving the coding speed and understanding more about PIA)

PHP is a weakly typed language. A COM object is of typeObject. During our PHP coding, there is no way to get a meaningful code insight out of an object, be it a Word Application, a Document, or a Field. We don’t know what properties it has, or what methods it supports.

PHP是一种弱类型语言。 COM对象的类型为Object。 在我们PHP编码中,无法从对象(Word应用程序,文档或字段)中获得有意义的代码洞察力。 我们不知道它具有什么属性或它支持什么方法。

This will greatly slow down our development speed. To make it faster, I would recommend we develop the functions in C# first and then migrate the code back to PHP. A free C# IDE I would recommend is called “#develop” and can be downloaded here. I prefer this one to the VS series because #develop is smaller, cleaner, and faster.

这将大大减慢我们的开发速度。 为了使其更快,我建议我们首先使用C#开发函数,然后将代码迁移回PHP。 我推荐的免费C#IDE称为“ #develop”,可以在此处下载。 我更喜欢VS系列,因为#develop更小,更干净,更快速。

The migration of C# code to PHP is not scary at all. Let me show you some lines of C# code:

从C#代码到PHP的迁移一点都不可怕。 让我向您展示一些C#代码:

Word.Application w=new Word.Application();w.Visible=true;String path=Application.StartupPath+"\\template.docx";Word.Document d=w.Documents.Open(path) as Word.Document;Word.Fields flds=d.Fields;int len=flds.Count;foreach (Word.Field f in flds){f.Select();int i=f.Index;w.Selection.TypeText("...");}

We can see that C# code is almost identical to the PHP code we showed previously. C# is strongly typed so we see a few type casting statements and we have to explicitly give our variables a type.

我们可以看到C#代码几乎与我们之前显示PHP代码相同。 C#是强类型的,因此我们看到一些类型转换语句,我们必须显式地为变量提供类型。

With variable type given, we can enjoy code insight and code completion so the development speed is much faster.

使用给定的变量类型,我们可以享受代码见解和代码完成的乐趣,因此开发速度更快。

Another way to speed up our PHP development is to tap on Word macros. We perform the same actions we need to do and record them with a macro. The macro is in Visual Basic, which can also be easily transformed to PHP.

加快PHP开发速度的另一种方法是利用Word宏。 我们执行所需的相同操作,并使用宏对其进行记录。 该宏位于Visual Basic中,也可以轻松转换为PHP。

Most importantly, Microsoft’s official documentation on Office PIA, especially the namespace documentation for each Office applications, is always the most detailed reference material. The mostly used three applications are:

最重要的是, Microsoft有关Office PIA的官方文档 ,尤其是每个Office应用程序的名称空间文档,始终是最详细的参考资料。 最常用的三个应用程序是:

Excel : /en-us/library/microsoft.office.interop.excel(v=office.15).aspx

Excel : http : ///zh-CN/library/microsoft.office.interop.excel(v = office.15).aspx

Word : /en-us/library/microsoft.office.interop.word(v=office.15).aspx

Word : http : ///zh-CN/library/microsoft.office.interop.word(v = office.15).aspx

PowerPoint : /en-us/library/microsoft.office.interop.powerpoint(v=office.15).aspx

PowerPoint : http : ///zh-CN/library/microsoft.office.interop.powerpoint(v = office.15).aspx

结论 (Conclusion)

In this article, we demonstrated how to populate a Word document using PHP COM libraries and Microsoft Office Interop capabilities.

在本文中,我们演示了如何使用PHP COM库和Microsoft Office Interop功能填充Word文档。

Windows and Office are widely used in everyday life. To have knowledge on the power of both Office/Windows and PHP will be essential for any PHP + Windows programmers.

Windows和Office在日常生活中被广泛使用。 要了解Office / Windows和PHP的功能,对于任何PHP + Windows程序员都是必不可少的。

With PHP’s COM extension, the door to mastering this combination is opened.

通过PHP的COM扩展,打开了掌握此组合的大门。

If you are interested in this area of programming, please leave your comments and we will consider having more articles on this topic. I look forward to seeing more real world applications developed using this approach.

如果您对此编程领域感兴趣,请留下您的评论,我们将考虑针对该主题发表更多文章。 我期待看到使用此方法开发的更多实际应用程序。

翻译自: /make-microsoft-word-documents-php/

php 读取word文档

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。