Through a Scanner Darkly:

A Personal Interim Report on the PACLIC Proceedings Digital Archive Project

In a PACLIC Steering Committee meeting on December 8th of 2004 scheduled during the lunch break of the first day of PACLIC 18 conference held at Waseda University in Tokyo, it was agreed that a digital archive of past PACLIC proceeding volumes would be desirable and in order. For some unknown reason, the Committee designated Prof. Yasunari Harada as Digital Archivist in charge of the project, most probably in his absense of mind. He did not object at the time and was quite unaware of the scope and volume of the project then and for some time later because his hands and mind were more than fully occupied with the immediate task of running PACLIC 18 and its two satellite workshops and submitting financial and other reports to the university on expenditure and achievements of the conference in connection with the funding he had secured for the conference. Later, he came to understand that he may have committed himself to a task he had no hope of fulfilling. He did not have any necessary equipment, any funding, any human resource, but worst of all, he only had three past PACLIC volumes at hand.

Support and help arrived soon. Prof. Akira Ikeya, now Honorary President of the Logico-Linguistic Society of Japan, supplied most of the past PACLIC proceedings volumes. As the material were sent and arrived at his office some time in early 2005, however, Prof. Harada had to move his office to a newly completed school building which had been under construction for the previous three years. He dumped most of the books and documents and vintage PC equipments he had kept in his old office, packed up 72 card-board boxes of books and documents and storage media that he was unable to discard at the time, and everything naturally got mixed up. Almost a year later, shortly before another PACLIC Committee meeting was to be held at Academia Sinica in Taipei in December 2005, he was requested to give a report on the progress of the project and his answer was very short and to the point: nil. Additional support and help arrived gradually. Other sources sent him additional copies or volumes of the past PACLIC proceedings. His grant applications for various projects were accepted, which enabled him to buy scanners and OCR software. Waseda University Library informally agreed to put the materials on their insititutional repository or D Space. Research Group on Language and Information Siences of Media Network Center at Waseda University agreed to support the project, which is not very surprising because the only active member in the group was himself. Some of his former students agreed to work on the project "on their spare time".

The scanning process was not a big deal, once a volume is destroyed unbound into separate consecutive sheets of paper. 100 sheets can be scanned in less than half an hour. The unbinding process took 3 seconds once the volume is properly placed under the electric cutting machine, which is located in Waseda University Legal Documentation and Information Center, a branch of Waseda University Library 3 minutes away from his office. However, with 9 classes per week and an average of four research projects to oversee each semester, it was difficult for Prof. Harada to find time to actually unbind the past PACLIC proceedings volumes. Another difficulty was in finding the right configuration for scanning. Figuring out how to work with the OCR software was another issue. Most of the work can be done almost automatically, once you know how to set things. Mr. Satoshi Ando, his former student at the undergraduate School of Law, was instrumental in the scanning and OCR process. In fact, Prof. Harada knows nothing about the proper settings and he is wondering how he can process PACLIC 16 and PACLIC 17 after Mr. Ando graduates from the undergraduate School of Law. Mr. Ando assures him that he will continue to help him even when he is traveling in South America or even when he gets his job.

Things were manageable so far. You can count the number of volumes, and if you have a volume, you can count the number of papers and pages, probably. Producing meta-data for each of those papers is a completely different ball game. You can spend your life-time editing and amending one collection of papers. You can spend a thousand when you are dealing with 15 volumes, each compiled in different regions with different editors from different organizations. Capitalization convention of paper titles fluctuates even within one volume, and it is difficult to tell which part of author identification is the family name and which part is the given name. Table of contents entries do not exactly match with the title of the paper. Some times, it is not clear what the title might be of a given proceedings volume. Ms. Mayumi Kawamura, who had finished her graduation thesis at School of Letters under his guidance a few years back, tried to deal with all those unresolvable chaos. Without her devotion and dilligence, we will not have our PACLIC D Space Collection now.

After those efforts, Prof. Harada found himself embedded in a pile of unbound sheets of paper with dangling cover pages. He had promised Prof. Ikeya and Dr. Choe, who had sent him the volumes, to rebind the material after the scanning process. He had intended to send the materials to local binders, which can be found in abundance around the university campus, but as it turned out, their jobs were often not so professional, so he decided to delegate the rebinding process to Ms. Kanako Maebo, another of his former students at the Graduate School of Applied Japanese Linguistics. With a hot-melt binding machine installed in Waseda University Legal Documentation and Information Center and a handful of everyday supplies, she was able to restore the destroyed volumes into something that resemble the original.

The story does not end here yet. Through confusion and disorder inherent in his life at the university, Prof. Harada had forgotten about PACLIC 16 and PACLIC 17 until all others had been processed. He can disbind the volumes, but he does not know how to scan them. At the same time, he began to suspect that organizers of recent PACLIC conferences expect him to do the job for their conferences. He tried to point out that his project was originally intended only retrospectively but he is not yet sure if they understand him.