Copy pdf metadata (title, year, author...etc) into excel cells

  • I know and sure about that, but I post it in case some one like syrian are blocked to enter even some science sites because of world politics.


    Adobe is forced to do that and lose some selling products, as we forced to do that even if this threats our privacy and security.

    it ends up, we leave in 2011 while world is trying every thing new, I will never accebpt that what ever the way is.

    Before 2011 we were able to bye all products, now world bad people trying to use us against regime by blocking every thing they can.


    Thanks sir again and again wish you all the best.

  • For what it is worth, here is #4 link's code but with early binding.


  • MetaData of pdf where shown in acrobat: File>properties>description

    And all metadata that are stored there is accessible by previous codes


    But there are some pdfs that have some thing called xmp is similar to xml
    I can access the xmp info in acrobat: File>properties>description>Additional MetaData>Advanced


    Some times the data is stored in xmp file and not inserted in File>properties>description, in this case codes return nothing


    Have you reached this type of xmp before?


    I can extract xmp file using acrobat and i found it similar to xml

    this is an xmp file


    <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>

    <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05 ">

    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

    <rdf:Description rdf:about="doi:10.1016/j.progpolymsci.2010.01.006"

    xmlns:dc="http://purl.org/dc/elements/1.1/"

    xmlns:prism="http://prismstandard.org/namespaces/basic/2.0/"

    xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"

    xmlns:xmp="http://ns.adobe.com/xap/1.0/"

    xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/"

    xmlns:pdf="http://ns.adobe.com/pdf/1.3/"

    xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"

    xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">

    <dc:format>application/pdf</dc:format>

    <dc:identifier>10.1016/j.progpolymsci.2010.01.006</dc:identifier>

    <dc:title>

    <rdf:Alt>

    <rdf:li xml:lang="x-default">Polymeric materials for bone and cartilage repair</rdf:li>

    </rdf:Alt>

    </dc:title>

    <dc:creator>

    <rdf:Seq>

    <rdf:li>D. Puppi; F. Chiellini; A.M. Piras; E. Chiellini</rdf:li>

    </rdf:Seq>

    </dc:creator>

    <dc:publisher>

    <rdf:Bag>

    <rdf:li>Elsevier Ltd</rdf:li>

    </rdf:Bag>

    </dc:publisher>

    <dc:subject>

    <rdf:Bag>

    <rdf:li>Biodegradable polymers</rdf:li>

    <rdf:li>Tissue engineering</rdf:li>

    <rdf:li>Polymeric scaffolds</rdf:li>

    <rdf:li>Bone</rdf:li>

    <rdf:li>Cartilage</rdf:li>

    </rdf:Bag>

    </dc:subject>

    <prism:aggregationType>journal</prism:aggregationType>

    <prism:publicationName>Progress in Polymer Science</prism:publicationName>

    <prism:copyright>© 2010 Elsevier Ltd. All rights reserved.</prism:copyright>

    <prism:issn>0079-6700</prism:issn>

    <prism:volume>35</prism:volume>

    <prism:number>4</prism:number>

    <prism:coverDisplayDate>April 2010</prism:coverDisplayDate>

    <prism:coverDate>2010-04</prism:coverDate>

    <prism:pageRange>403-440</prism:pageRange>

    <prism:startingPage>403</prism:startingPage>

    <prism:endingPage>440</prism:endingPage>

    <prism:doi>10.1016/j.progpolymsci.2010.01.006</prism:doi>

    <prism:url>http://dx.doi.org/10.1016/j.progpolymsci.2010.01.006</prism:url>

    <pdfaid:part>1</pdfaid:part>

    <pdfaid:conformance>B</pdfaid:conformance>

    <xmp:CreatorTool>Elsevier</xmp:CreatorTool>

    <xmp:ModifyDate>2010-03-04T18:53:25+05:30</xmp:ModifyDate>

    <xmpRights:Marked>True</xmpRights:Marked>

    <pdf:Producer>Acrobat Distiller 8.1.0 (Windows)</pdf:Producer>

    <pdf:Keywords>Biodegradable polymers; Tissue engineering; Polymeric scaffolds; Bone; Cartilage</pdf:Keywords>

    <xmpMM:DocumentID>uuid:480fb66d-2c5d-432d-84db-b61fadfc99bf</xmpMM:DocumentID>

    <xmpMM:InstanceID>uuid:f37f0421-2c40-4a64-88bc-b7974bc6200f</xmpMM:InstanceID>

    <pdfx:AuthoritativeDomainↂ005B1ↂ005D>elsevier.com</pdfx:AuthoritativeDomainↂ005B1ↂ005D>

    <pdfx:AuthoritativeDomainↂ005B2ↂ005D>sciencedirect.com</pdfx:AuthoritativeDomainↂ005B2ↂ005D>

    <pdfx:AuthoritativeDomain>

    <rdf:Bag>

    <rdf:li>elsevier.com</rdf:li>

    <rdf:li>sciencedirect.com</rdf:li>

    </rdf:Bag>

    </pdfx:AuthoritativeDomain>

    </rdf:Description>

    </rdf:RDF>

    </x:xmpmeta>

    <?xpacket end="w"?>



  • I know reading data could be easy if the xmp file is separated from pdf file, but in this case i can only view the data from acrobat.


    i miss the code that reach the hidden xmp MetaData


    What do you think?

  • in This link https://www.example-code.com/vb6/xmp_bag_seq_alt.asp

    they are using Chilkat ActiveX but for a jpg image

    i install the Chilkat

    I tried the code for pdf, but an error says:Unrecognized file type


    Code




    immediate result as:

    ChilkatLog:

    LoadAppFile:

    DllDate: Feb 8 2021

    ChilkatVersion: 9.5.0.86

    UnlockPrefix: NONE

    Architecture: Little Endian; 32-bit

    Language: ActiveX

    VerboseLogging: 0

    path: e:\nn\puppi2010.pdf

    Auto-unlocking for trial mode...

    unlockCode: Auto unlock for 30-day trial

    regKeyUnlock:

    product: ChilkatBundle

    hcCurDate: Fri, 14 May 2021 15:08:23 +0300

    hcExpire: 5/2021

    Component successfully unlocked using trial key

    --regKeyUnlock

    xmpLoadFile:

    Unrecognized file type

    filename: e:\nn\puppi2010.pdf

    --xmpLoadFile

    Failed.

    --LoadAppFile

    --ChilkatLog

  • As I thought, it looks too involved for VBA for me. One would need to set Search.docXMP=True. I don't know how to set the search object or how to query to get what you want. The API shows the javascript code to work the 3 Copyright metadata fields.


    This thread has gotten so long, I forgot if I discussed exiftool. It does not work with all applications but PDF has the most features in it. I have used Shell() to execute the exiftool.exe with command line switches. If that interests you, see: exiftool.org


    Here is an example to strip "all" metada from a PDF file using Acrobat and exiftool.

  • Hello sir and thanks again

    I tried the code and it showed error in shell line, any way the code lead me to anthor tool "I textsharp"

    And in this link https://stackoverflow.com/ques…ng-any-data-to-user-on-pd

    they used c# I think

    Code
    public string ReadXmpMetadata(byte[] src) {
        PdfReader reader = new PdfReader(src);
        byte[] b = reader.Metadata;
        return Encoding.UTF8.GetString(b, 0, b.Length);
    }  

    I am trying to install itextsharp but on the site when install it, the file is not dll or ocx or any type could be added to references

    I searched for it as dll but after adding it the program not responding.


    Will try with it if it reads all meta data maybe I can store it and search it to get info I need

    Year

    Pages rang

    Volume

    Issue

    Title

    Jornal

    Author

  • If you attach a pdf file, and tell me the metadata fieldnames, I can better help.


    For the standard Acrobat GetInfo() fields, that is easy as you saw.


    For the xmp metadata fields, even the itextsharp does not return individual fields. It looks like it is returning all xmp metadata as a string. I suspect that you would have to parse it out too. If you can generate the xmp file data as you did, parsing it might be doable. If that interest you, we can pursue that.


    If I had the example pdf file with the known fieldnames with values to retrieve, the exiftool returns metadata but from the xmp, I am not sure. Here is an example output to a csv file. Row 2 does not align here in the pasted text but I think you get it.

    Microsoft® Excel® 2010

    SourceFile ExifToolVersion FileName Directory FileSize FileModifyDate FileAccessDate FileCreateDate FilePermissions FileType FileTypeExtension MIMEType PDFVersion Linearized PageCount Language TaggedPDF Author CreateDate ModifyDate Producer Creator
    ken.pdf 12.17 . 2018:08:02 09:04:49-05:00 2021:05:15 11:31:26-05:00 PDF application/pdf 1.5 2 Yes 2018:08:02 09:04:49-05:00 Microsoft® Excel® 2010

    The command to generate in a CMD prompt would be something like:

    Code
    exiftool -csv ken.pdf > outken.csv

    I can only guess at why your Shell() command failed since you did not post code. For the string in Shell() for exiftool, one would use the full drive:\path\filename.ext. Those would be double quote encapsulated. Normally, I do Win+R, CMD, Enter, and then, D:, cd myfiles\exiftool. My ken.pdf was in that folder. I copied my exiftool(-k).exe to exiftool.exe so that I could use the command line switches. If you click the -k file in file explorer, the help file will show in a CMD window.


    Here is -all option for the same file in exiftool. As you can see, it may well get all that you need.


    For the most part, I think that the itextsharp.dll is for those without Acrobat. I had a book for itextsharp but when I loaned it out, it was never returned. I had the University of Oklahoma do a research project for me to build a pdf using that DLL. I posted a tutorial for how to use it using vb.net. Using itextsharp might be more appropriate for a c# or vb.net forum. I have not used that in VBA. My tutorial is applicable for vb.net users. The EXE that I created from it can be used by most programming languages. The link is at a WordPerfect forum. My old itextsharp vb.net tutorial is at: https://www.wpuniverse.com/vb/…s-Parameters-5-iTextSharp

  • Hello sir

    Got it .. i will sort my work and my goal with pdf examples for each case


    I saw how getinfo and other ways work very easy and helpfull , but i was shocked when i found some pdf MetaData cant be reached with previous methods, where i saw xmp data, and yes it differs in the way it arranged form pdf to another (what is called <year> in some pdfs it is mentioned in other way in other pdfs)

    i saw how to search xml files and i found it close to xmp, this way i can search keywords (year, author, title ...) if i got it, that i what i was thinking


    I will work on my next reply to explain my thoughts and needs in a good way, not to waste your time sir.


    I will also attach the error i got with exiftool,


    Finally

    i know i am not good at all in this, but i aim to do this idea and trying hard to understand the way it works


    I am lucky i met you, thanks

  • Here is an example using exiftool that I worked up for you.

  • Dont't know what to say sir


    Wish you all the best


    I run the code and it returns the values into the excel sheet


    There are some fields with the value "saved"

    I will check if it refers to the values that are in pdf description

    Tosay will attach pictures for results, once electricity is on.

    Sorry we only have 6 hours of electricity in the day.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!