Mobile/Bot Parser

Post a reply


This question is a means of preventing automated form submissions by spambots.
Smilies
:D :) :( :o :shock: :? 8) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :wink: :!: :?: :idea: :arrow: :| :mrgreen: :top:

BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: Mobile/Bot Parser

Re: Mobile/Bot Parser

by tyrence » Sat Jun 16, 2012 7:23 am

Awesome, thank you. Those changes work great :)

Re: Mobile/Bot Parser

by Khuri » Wed Jun 13, 2012 6:20 pm

First, thanks for the parser output. We've cleaned up the list a bit and added it to our "to-do" list ;)

The XML-parser is now updated as well.
Changes include:
- Added optional parameter "output" to parse in either BBCode or HTML
- Added optional parameter "fullitem" to display item-ID, icon-ID and item-name instead of only item-ID
- Added description for "goto"-BBCode
- Added Parser-Usage to count on Site Statistics
- Added link to the parser to the website main menu, Multimedia -> Tools -> Knowledge XML-Parser

See the Parser News for info on how to use the new features.

Re: Mobile/Bot Parser

by tyrence » Sun Jun 10, 2012 10:41 am

Btw, here's the output from parsing.

http://pastebin.com/QvVzv3uG

Received response code '404' for url 'X' -- this is the message indicating images that don't exist

Error parsing guide: Unrecognized node type 'X' at position N. -- this indicates a node that wasn't on the api, or was misspelled in the guide, or in a few cases the guide uses [ and ] to emphasize something

Error parsing guide: could not find closing tag for node 'X' at position N -- usually this is seen in conjunction with the error above due to a typo of the tag name

Re: Mobile/Bot Parser

by tyrence » Sun Jun 10, 2012 4:05 am

Khuri wrote:
tyrence wrote:How much work would it be to change the [item], [itemname], and [itemicon] tags to include the lowid, highid, imageid, and ql?
Simply said: right now impossible, because we don't have these information in our database at all. So we can't parse it. All we have is the item id you get, plus the item name and the image itself (which you get displayed when rendering the page normally, not on the parser; we could include those information tho, if it would be of any help).
If you included the name and the image id (along with the item id), then we wouldn't have to do a lookup. The image id is part of the image url, so if you wanted to just include the image url, I could parse the image id from it (for instance, http://static.aodevs.com/icon/99180 -> 99180 is the image id).
Khuri wrote:
tyrence wrote: Also, it would make it easier to parse if the tags that didn't have a closing tags were self closed(like in xml).
Sorry, i don't really understand this one. Do you have a parser now that already understands it? Then the change seems not required. If you don't... what language is your bot? Because filtering those 7 tags in total could basically be written with a small regex in a minute. That answer depends of course on how you handle the text.
Well I actually wrote a parser for it that parses the tags and builds a DOM tree, and then I take the DOM tree and traverse it to get AOML output. And I did this instead of just using regex replace because for some of the tags, particularly the item/itemicon/itemname tags, I have to do a lookup to know what to replace it with. It is possible to do with regex but it gets kinda messy and writing a parser wasn't too much work. Anyway, the reason this would help is because, for instance, this guide (http://www.ao-universe.com/mobile/parse ... bot&id=443) has a tag (goto) that isn't listed on the api page, so the parser doesn't know if it should wait for a closing tag or not. If the tags were self-closed when they didn't have an explicit closing tag then the parser could handle unknown tags more gracefully.

Re: Mobile/Bot Parser

by Khuri » Tue Jun 05, 2012 12:31 pm

tyrence wrote:How much work would it be to change the [item], [itemname], and [itemicon] tags to include the lowid, highid, imageid, and ql?
Simply said: right now impossible, because we don't have these information in our database at all. So we can't parse it. All we have is the item id you get, plus the item name and the image itself (which you get displayed when rendering the page normally, not on the parser; we could include those information tho, if it would be of any help).
tyrence wrote:Also, it would make it easier to parse if the tags that didn't have a closing tags were self closed(like in xml).
Sorry, i don't really understand this one. Do you have a parser now that already understands it? Then the change seems not required. If you don't... what language is your bot? Because filtering those 7 tags in total could basically be written with a small regex in a minute. That answer depends of course on how you handle the text.
tyrence wrote:And then not really related to the api but more the guides themselves, there are some guides that have incorrectly nested tags which also make it harder to parse. I did find a work around for that but it's hackish. See: http://www.ao-universe.com/mobile/parse ... serwebsite for an example of one of the guides that has this.
Yeah... this might happen in a few guides. The problem here is that i can't blame people on that. There's a lot of guides and most of them are simply text walls, it can happen that an editor does not see it. If we find such problems we try to remove them.
tyrence wrote:And there are also guides that have missing images. I could run my parser over all the guides and give you a list if that would help.
That'd be most welcome.

Re: Mobile/Bot Parser

by Silvana » Tue Jun 05, 2012 10:16 am

Since this was already brought up in another thread, I think it would be best to keep it to one so that thoughts and ideas don't get split up :)

Re: Mobile/Bot Parser

by tyrence » Tue Jun 05, 2012 9:58 am

I know when we first talked I said I didn't care which format it was in, but now having written a parser for it I have a few suggestions.

How much work would it be to change the [item], [itemname], and [itemicon] tags to include the lowid, highid, imageid, and ql?

As Llie says, the additional item lookups slow down the bot. Also, in the case of Budabot, we don't have all items in the items database. In particular, implants and some of the tradeskill items are not included since it's rare that people need to look those up and it keeps the bot smaller and faster.

Also, it would make it easier to parse if the tags that didn't have a closing tags were self closed(like in xml).

For instance

Code: Select all

[br] -> [br/]
[ct]text[td]text[tr]text[td]text[/ct] -> [ct]text[td/]text[tr/]text[td/]text[/ct]
[list][*]text[*]more text[/list] -> [list][*/]text[*/]more text[/list]
And then not really related to the api but more the guides themselves, there are some guides that have incorrectly nested tags which also make it harder to parse. I did find a work around for that but it's hackish. See: http://www.ao-universe.com/mobile/parse ... serwebsite for an example of one of the guides that has this.

Code: Select all

[b][size=18][color=orange]How do I get an Apartment? [/b][/size][/color]
And there are also guides that have missing images. I could run my parser over all the guides and give you a list if that would help.

Re: Mobile/Bot Parser

by Morgo » Sat Feb 25, 2012 3:17 pm

XML is a complex language with thousands of different variants, there no right or wrong its all about what is convient in the given case. We of course always welcome proposals on features

Re: Mobile/Bot Parser

by Llie » Fri Feb 24, 2012 5:18 pm

Well, I agree that having the folders organized hierarchically is probably a good idea, but it doesn't really matter to me one way or the other. I managed to get my Vhabot plug-in working so I'm happy. :D

Taking a second look at your proposed change, putting subfolders into a <folderlist> element would be easy enough for C#'s parser to handle. I went on the assumption that Khuri put the items in the flat folderlist in order from outside to inside, so in my working with the folders, I completely ignore the offset/parent elements, and I know exactly how many folders deep a particular guide is by getting the length of the array.

If the folders are nested, then I'd have to traverse the folders, which I suppose I could do with a "while" loop as part of the formatting of the output rather than just pulling it out by length and using a "for" loop to put together the folder hierarchy, so I guess its not that big of a difference.

I'm not arguing for or against the way Khuri organized the folders. I'm just happy there's an XML interface at all, and that I'm able to slap together some code to parse it.

Re: Mobile/Bot Parser

by Morgo » Fri Feb 24, 2012 1:41 pm

Makes no sense either to have a flat structure if you pass to XML. XLTS do indeed have some nice feature that you can a full size XML document and pass trough and keep the structure and only parse some content of the XML document.

Re: Mobile/Bot Parser

by enlo » Fri Feb 24, 2012 12:40 pm

Actually, I only took note of the parser because I was considering to update the implementation in Budabot (which does not work since the new aou design was released)
-> Budabot Issue 129


about the unknown levels: that's the whole point of folders, isn't it? :)

my point is that a flat layout does not help at all. To pick out anything useful you just have to work the complete output to figure out which of those X nodes actually are on first level, and then go through the complete output again just to find which nodes belong to which 1st level.
XSLT has some nice features to crawl through hierarchical data, so it's really easy to handle the unknown depth of levels.
and if I still need a flat list, it's a single XPath selector to get the hierarchy as flat list.

if you don't structure the data, every client will have to do it.
which is a pity, when your underlying data format (for once) is hierarchical

Re: Mobile/Bot Parser

by Llie » Thu Feb 23, 2012 5:05 pm

The main difference is the nesting of the folders, and that gets messy for XML parsers as you can't predict how many levels down a guide might be -- sometimes 2 sometimes 3.

The current layout hands everything out as a flat array which is easier to parse. It's not as good a representation of the underlying data as the one suggested at top, but it is easier to handle.

Re: Mobile/Bot Parser

by Khuri » Thu Feb 23, 2012 1:45 pm

Hi enlo, and thanks for your feedback.
Pseudo XML is a good word, i don't work with XML too much :)
Just made my mind up about what structure would make sense and "most easy" to use afterwards. The folder list is something i just added because i thought it might be useful. The parent/offset tag was added to make it as simple as possible to parse a hierarchy list ;) - but i have no idea if any of the chatbots make use of it at all yet, or if it's just "data trash".

But your suggestion is a good one, also with the version parameter.
Might also add another parameter to recieve item names instead of the ID (as mentioned here).

But apart from that, I'd like to relay your question to the bot developers (if any of them is reading this). After all it's them who use this feature ;)

Mobile/Bot Parser

by enlo » Thu Feb 23, 2012 12:07 pm

Hey there
I just took a look at the sweet XML output parser and would like to make some improvement suggestion

WARNING: code ahead :lol:
I work a lot with xml and one thing on that output really hurt my eye: The list view is some kind of pseudo xml.

Code: Select all

<folder>
   <id>1</id>
   <name>Classic AO</name>
   <offset>0</offset>
   <items>0</items>
   <parent>0</parent>
</folder>
<folder>
   <id>8</id>
   <name>Gameplay Guides</name>
   <offset>1</offset>
   <items>53</items>
   <parent>1</parent>
</folder>
The items are named folder and use a parent tag to tell where they belong, instead of being nested in each other.

why use xml, when it's not even hierarchic?


I know it's more work for you, but I still would suggest the following layout:

Code: Select all

<folder>
   <id>1</id>
   <name>Classic AO</name>
   <items>0</items>
   <folderlist>
     <folder>
         <id>8</id>
         <name>Gameplay Guides</name>
         <items>53</items>
      </folder>
   </folderlist>
</folder>
you can drop the offset and parent tags, since the structure already tells you where a folder belongs to.


I would also suggest adding a version parameter to your parser URL.
When you change any functionality, you simply add a new version (not really hard using multiple files and includes)
Each time you change the functionality in a way that would break clients (modification or deletion of anything)
:top:


keep on rockin guys 8)
If you like it but are too busy to do that, I might have some spare time for a change or two on the parser ;)

Top