My project

For communicating with your fellow chronologists with off-topic stuff.

Moderators: Col_Fury, michel, Arthur, Somebody, StrayLamb

Post Reply
Kang
Hero
Hero
Posts: 75
Joined: Sun Apr 17, 2005 6:40 pm
Location: Kansas City, MO, USA

My project

Post by Kang »

Hi, obviously I've been active here lately after no posts for 17 years, so I thought I would share why. I have a lot of hobbies, and one of them was to look through the 90s indexes and read through the chronological entries for the characters. When I first found this site, I was blown away that all of the continuity was there. Eventually I moved on to other hobbies, but whenever I was curious about something I would pop back in here.

My son is six now, and he is enjoying comic books. I tried to read X-Men with him starting at the Bronze Age with the Claremont reboot since we had just finished the X-Men Animated Series, and several of the storylines were directly from that era. He enjoyed it, but then he kept asking to read "from the start in order" (definitely my son!). So first I tried to use a reading list, then I started diving into the MCP to create my own lists. I realized what I was doing searching the MCP could probably be down with a computer much more efficiently.

What is weird was one of the first things I ever coded back when I was in school was a tool that would go through the MCP and give you a "last in" and "next in" for all of the characters in a given comic. That was probably in 2003, before you guys added the search feature to the site. I was planning on approaching you guys with it, but I got busy at my new job. Eventually as I matured as a developer, I realized the code was crap, and then I saw it had been implemented much better on the site.

So back in the present, I got to work on the tool to help with my ordering problem. I wanted a tool that could be used to determine where a comic sits in relation to another. I knew the data was not granular enough to facilitate this in all cases (partly because the style guide requires you keep the least information necessary, and partly because of things like time travel and mistakes), but for most issues it would be okay. I wrote something just pretty simple in Powershell to get me started. I wrote a short parsing script that goes through the pages and finds all of the pairs of entries where the first is immediately followed by the second in at least one character's chronology. I also have a heuristic to figure out who the most popular character is that has that pair in their chronology and write that out. So then I wanted to eliminate as many loops as possible from the data so that I get generate more authoritative of answers. To do this I wrote a script to find loops of length N within that list. I will eventually write a tool to walk the tree unbound, but I figure solving the smaller loops first would be helpful. When a loop shows up, it is indicative of one of four things:
  1. A story has a break in the middle, and some of the characters have not been separated to one side or the other of the split (usually just due to the style guide).
  2. There are multiple ambiguous flashbacks in the story that take place at different times (usually due to the style guide).
  3. There is a time traveler who has looped back on their own continuity (there isn't really a notation to handle this).
  4. There is a legitimate mistake in the MCP.
To fix these four cases, I pulled the MCP into a git repository, and I created my own branch. That allows me to update my copy in ways that would violate the style guide for the MCP, and still merge in changes that are made from the main MCP. I'm also testing out a syntax for the time travel stories so those won't lead to loops in the data. I do understand I have a long project ahead of me. There were a few hundred loops involving just two entries, and I've worked through a lot of those. Whenever I've found something that appears to be in error or where the ambiguity could be removed without violating the style guide, I have been putting it on this board. I also have some cleanup items I found by making my regular expressions more strict which I will put up. That way my research would be mutually beneficial.

I will say that maybe a git repository (or another change management system) would be a great upgrade to this site. Rather than submitting errors and changes to a message board, you could just allow all users access to a repository, and have some kind of approval system. You would be able to see the changes all in one place with the old and new side-by-side rather than doing all the bolded (<--) and (-->) stuff. You'd be able to have an explanation attached to the change, and with proper tags you'd be able to search for all changes associated with an issue to make sure you aren't violating some past research that was forgotten around that issue. When the change is approved, it can be pulled in directly, and it could even have style guidelines applied automatically, leading to less typos and inconsistent formatting. All of the pending changes will be in a queue, and you'll be able to send changes back for reworking or reject them completely to get them out of your queue. The fact that each row is independent makes this a perfect use case for a change management system.

Anyway, that is just an idea. If you go to a change management system, I think it would be cool. Otherwise, I'll just keep giving updates on the message board and keep using my local repository.

It is good to be back! I'm glad this project is still going strong!
User avatar
michel
Director
Director
Posts: 3148
Joined: Tue Feb 24, 2004 1:00 pm
Location: France
Contact:

Re: My project

Post by michel »

Very interesting post, Kang! Now I have some questions ^^

Correct me if I'm wrong, that's what I've understood: you're creating your own reading order from the listings (so your son can read all the comics from the start and in order), and to check it, you've coded a tool that compares the relative position of two listing entries (which one comes before the other one). And you're testing every pair "of entries where the first is immediately followed by the second in at least one character's chronology."

How are you creating your timeline exactly?
You must have an enormous number of pairs to test?! Do you test them all or only those involved in your timeline?

Then I understand you're focusing on the loops you're finding, and I agree with how you're checking if it's OK (splitted stories, ambiguous FBs, time travel) or if there's a mistake in the MCP.

I'm applauding the notion of using the building of your own timeline to check inconsistencies in the listings and to report them! That's something I'm trying to do too ^^ So keep on with your corrections, and thank you by the way!

Now about your idea of a git repository... that's an interesting idea that we will discuss between directors.
Kang
Hero
Hero
Posts: 75
Joined: Sun Apr 17, 2005 6:40 pm
Location: Kansas City, MO, USA

Re: My project

Post by Kang »

michel wrote: Thu Sep 08, 2022 4:13 am Correct me if I'm wrong, that's what I've understood: you're creating your own reading order from the listings (so your son can read all the comics from the start and in order), and to check it, you've coded a tool that compares the relative position of two listing entries (which one comes before the other one). And you're testing every pair "of entries where the first is immediately followed by the second in at least one character's chronology."
Yeah, that is the basic idea. It isn't very good yet, but the closer two stories are the better it would do. Once I work through more of the loops the interval for which it would work will also get better.
michel wrote: Thu Sep 08, 2022 4:13 am How are you creating your timeline exactly?
My timeline for my son was just X-Men affiliated stories. I basically just went through the relevant character appearances and did a lot of exploring in the MCP. The tool isn't able to create a timeline yet, but the extremely long term goal would be that it could just sort the comic entries in the MCP. Ideally, the way it would work is somebody would have a list of comics, then a tool could sort it based on what it knows (maybe mark the things that definitely come before or after vs. the ones which were placed arbitrarily). Then the user could take the arbitrary entries and place them based on things like the amount of time that passes in the stories. I think that is pretty far off, though.
michel wrote: Thu Sep 08, 2022 4:13 am You must have an enormous number of pairs to test?! Do you test them all or only those involved in your timeline?
I've been running them all through. It only takes less than a minute to generate the list of pairs, and less than a minute to find the loops involving two entries. The number of loops of that size was not that big, so I've been working through that. As the loops get larger, the it'll probably take longer to run, and eventually I'm going to instead just have to walk the resulting trees and stop when I find a loop. Right now I've been working through anything, not just entries related to my reading order. It has been fun to look at parts of the Marvel Universe I had been unfamiliar with.

Also, I will admit I get off on tangents, too. The Alpha Flight stuff I looked at was not from the tool. I was looking at that from where I was reading with my son.
michel wrote: Thu Sep 08, 2022 4:13 am I'm applauding the notion of using the building of your own timeline to check inconsistencies in the listings and to report them! That's something I'm trying to do too ^^ So keep on with your corrections, and thank you by the way!
Sure, thank you!
Post Reply