Fun with spreadsheets

I use spreadsheets to track my performance in online poker. I’ve learned a lot about spreadsheets by using them to track my personal performance, and as well to create standings for a league that I play in. Poker helps you build all sorts of life skills.

I think it’s nice to give back, so I wanted to share some of the things that I learned how to do, in case others would like to make use of it.

Tournament Points

I play tournaments almost exclusively, and the site I play on, Replay Poker, uses the following formula to calculate tournament points earned through participation in a tournament. The points aren’t chips, but they are used to track performance against other participants in the same tournament circuit, and are used to generate leaderboards and award bonus prizes to the top players.

The formula is as follows:

(1000 × ( √Runners)/√(Finishing Position))×(1+log10(buyin))

Replay Poker tournament points formula

Where Runners is the number of players who entered the tournament, Finishing Position is your place you finished, and Buyin is the entry fee in chips.

For the spreadsheet, that’s this:

=CEILING(1000 * ( SQRT(runners)) / SQRT(finish) * (1+log10(buyin)))

Easy, peasy.

Bank History Date to DateTime

Replay has a bank transactions page, where you can see every transaction against your account, positive and negative.

The columns are as follows:

|Date|Change|Type|Description|Total|

I just copy this data straight out of the web site and paste it into my spreadsheet, and it goes in pretty nicely. Well, for the most part.

I can do conditional sums on the Change column to get totals for the different transaction types, and use this to show a breakdown of how I get chips, whether by ring game wins, tournament wins, promotional chips, or other types of bonuses. It helps me to get a better feel for how well I’m really playing to filter out the chips that I’m not winning directly through Ring or Tournament play.

I use the bankroll data to generate charts that show my performance over time.

One thing that is annoying about the way Replay presents the data on their website, they don’t give you a button where you can just download your data as a .csv file. So the only way I can do it is to copy-paste from the page. The bank history loads a bit at a time, so I have to scroll to the very bottom in order for it all to display, and since I have several thousand transactions by now, this takes time. But once I get them copied over to my spreadsheet once, I don’t need to grab that transaction ever again, which is good.

But there’s still a problem with the data that I scrape from the site. The date field is irregular. By that I mean, when I paste it into a spreadsheet, the spreadsheet does not recognize it as a date format. As a result, it just treats the date as a string of text.

This causes problems when I create charts which incorporate the date info. The chart generator doesn’t know how to parse the date strings or do comparisons, so it only knows how to treat the text as a label for the data in the columnn next to it, and doesn’t know how to order the data correctly. Fortunately, the Bank page presents the data in order, but if not for that, I’d have a hell of a time organizing the data in order to do any meaningful analysis with it.

For the longest time, I didn’t bother trying to conver the date into a legitimate datetime format that spreadsheets can understand. But today I sat myself down and worked on it until I figured it out. It took a whole morning, and was pretty involved, so I thought I’d write it up in detail and share it.

First, let’s look at the format Replay uses when they present the data.

A typical transaction date/time looks like this:

Oct 21, 9:21pm

If you go back earlier than the current year, they add the year in as well:

Dec 31 2018, 9:20pm

Spreadsheets have sophtisticated functionality for date and time values. But there’s no easy way to convert this format into a datetime value. But we can build up a way to it using the text and date/time functions. It’s fairly complicated, so to do it, I attacked the problem by breaking it down into small pieces that I could handle.

I don’t know that this is the best way to do it, but it works. If you know of a better way to do any of this, I’d love to hear from you.

First, I created new columns in the sheet, where I could manipulate the string in my Date column and isolate the part of the string that I needed to convert the string into a datetime value. This gave me 6 columns:

|YYYY|MM|DD|hh|mm|AM/PM|

To isolate each part of the date from the string, I used the following functions. Assume that the date string is found in cell A2 for the following functions:

YYYY

=IF(FIND(",", A2)<=7, YEAR(NOW()), RIGHT(LEFT(A2, FIND(",", A2)-1), 4))

To get the year, we need to know if the date is from this year or an earlier year. If no year actaully appears in the string, we can assume that the date is from this year.

(Really, it means that the date was scraped from the site in the current year, not necessarily that this is still the year the data was collected. This will give us a problem next year if we don’t do something about that, but we’ll worry about that later.)

To figure out if the year appears in the string, we have to infer that by the position of the comma in the string. Month is always a 3 character abbreviation, then there’s a space, then the date is either one or two characters, then there’s the comma. Counting the characters, that’s MON(3) + SPACE(1) + DATE(1-2) + COMMA(1) . We start counting from 0. So COMMA will always be at 7 or 6 (if the date is a single digit).

Unless of course there’s the year, in which case there’s another space, then the four digit year, then the comma will appear 4 characters later, at position 10 or 11.

With me so far? Good.

So, the FIND() function tells us the position of the comma in the string. If FIND(",", Value) returns 7 or less, then we know that the YYYY is not found in the string, and we can therefore infer that the YYYY is equal to the year component of the output of the function NOW() . Which is to say, YEAR(NOW()) .

If the FIND() function returns 8 or more, then we know that the YYYY value is found in the string, and we can grab it by combining the LEFT() and RIGHT() substring functions as follows:

First, grab the substring up to just before the comma. LEFT(A2, FIND(",", A2)-1) .
Next, feed that string into the RIGHT() function to get the last 4 characters, which should be the YYYY value.

=RIGHT(LEFT(A2, FIND(",", A2)-1), 4)

It’s a pain, and very inelegant, but it works.

MM

=SWITCH(LEFT(A2, 3), “Jan”, 1, “Feb”, 2, “Mar”, 3, “Apr”, 4, “May”, 5, “Jun”, 6, “Jul”, 7, “Aug”, 8, “Sep”, 9, “Oct”, 10, “Nov”, 11, “Dec”, 12)

This is easy. The first 3 characters of the date string area always the month. We use a SWITCH() function to check the first three characters of the string, which we get by doing LEFT(A2, 3) , and then match up to the 12 known month abbreviations, and return the month number.

That wasn’t so bad, was it?

DD

=LEFT(MID(A2, FIND(" “, A2) + 1, FIND(”,", A2) - FIND(" ", A2) -1), 2)

What a mess! Can you understand that? I can’t! And I wrote it!

Let’s reformat it a bit, so we can see how the functions are working together. I have nested one function inside another, so that its return value becomes one of the input parameters of the next one. This makes it really hard to read. But we can re-write it, like this:

=LEFT
(
MID(A2, FIND(" “, A2) + 1,
FIND(”,", A2) - FIND(" ", A2) -1),
2
)

Getting the day of the month is pretty complicated, but we can do it using a bunch of FIND() and LEFT() and MID() functions working together with our knowledge of the string format.

Again, the presence or absence of the YYYY from the string throws us for a loop and makes this harder than it needed to be.

The MID() function works by returning the middle part of the string, starting at the position given in the first argument, and the next N characters as given in the 2nd argument. But to use MID() , I need to solve two problems:

  • Problem 1: Getting the starting position.
  • Problem 2: Knowing whether the date is 1 or 2 digits.

How I attack this problem:

  • I didn’t write this all at once. I did each piece one at a time, then put it together, then tested it, then fixed it, then tested some more, until it correctly handled every date in my data set properly.
  • I know that there is only 1 comma in the date string, and it is always immediately after the last digit of the date… OR the year, if the YYYY part is present. So FIND(",", A2)-1 is the position of the last character of DD or YYYY .
  • I know that for strings from the current year, the first space in the string occurs between the Month and the Day.
  • I want to grab a substring from the first SPACE character to the COMMA , but not including the space or the comma.

So, the return value of the calculation MID(A2, FIND(" ", A2) + 1, FIND(",", A2) - FIND(" ", A2) -1)

will always be either:

  • DD if it’s from a current-year string, or

  • DD YYYY if it’s from a previous-year string.

  • If the string is coming from a current-year value, that’s it.

  • If the string is from a previous year value, we need to grab the left 2 characters only from this substring, which we can do with LEFT() .

  • We can grab the first two characters of the substring either way, and if it’s only 2 characters long already because it’s from a this-year string, it won’t hurt anything.

That wasn’t so bad after all, now, was it?

hh

=MID(A2, FIND(",", A2)+2, FIND(":", A2) - FIND(",", A2) -2)

This looks like a mess too, and it’s not pretty. But, once again, it works. The approach taken is similar to the approach we took with getting DD .

The COMMA in the date string is our signpost. We know the first character of hh is always found 2 characters after the comma. So, FIND(",", A2) + 2 is the first character of hh . This would be much easier if hh was always 2 digits, but sometimes it’s a single digit, so we need to figure out where the end of hh is found. Fortunately, we have another signpost, the COLON . hh always ends one character prior to the COLON .

The MID() function wants to know three things:

  1. What string it is operating on: (for our example, it’s stored in cell A2 )
  2. Where to start in the string: 2 characters after the first space. We already figured that out, it’s FIND(",", A2) + 2
  3. How far to go from there: 1 or 2 characters — how do we know?

We figure it out by taking the index of the COLON , and subtracting from that the starting index (the position in the string where the COMMA occurs), and then subtract 2 from that .

FIND(":", A2) - FIND(",", A2) -2

It’s not as hard as it looks, once you break it down.

mm

=MID(A2, FIND(":", A2)+1, 2)

This one is easy. Fortunately for us, the mm part of the date string always occurs one position after the COLON , and is always two characters long.

AM/PM

=RIGHT(A2, 2)

Getting am or pm is even easier. It’s always the last two characters in the date string. So RIGHT(A2, 2) gets that for us very simply and reliably.

So now we have each component of our datetime nicely broken out into separate columns. Now it’s time to put it all together, re-assembling it into a proper datetime value.

To do this, we’ll need two functions: DATE() and TIME() .

=DATE(YYYY, MM, DD) + TIME(hh, mm)

Well, this isn’t quite it, but it’s very nearly correct. We have one more issue to handle, which is that the hh as we are deriving it above is given in 12 hour format. We need to provide it to DATE() in 24-hour format. So we will do this by checking the AM/PM value and add 12 if the string is “pm”.

So instead of putting TIME(hh, mm) , we do this instead:

TIME(IF(EXACT(ampm, “am”), hh, hh+12), mm)

The whole thing looks like this:

=DATE(YYYY, MM, DD) + TIME(IF(EXACT(ampm, “am”), hh, hh+12), mm)

The reason this works is, Date and Time values are both actually just numbers. So when you add them together, you get a DateTime .

And if you wanted to do all of this in a single cell, you could!

Given our original date string is found in cell A2 , the function looks like this:

=DATE(IF(FIND(",", A2)<=7, YEAR(NOW()), RIGHT(LEFT(A2, FIND(",", A2)-1), 4)), SWITCH(LEFT(A2, 3), “Jan”, 1, “Feb”, 2, “Mar”, 3, “Apr”, 4, “May”, 5, “Jun”, 6, “Jul”, 7, “Aug”, 8, “Sep”, 9, “Oct”, 10, “Nov”, 11, “Dec”, 12), LEFT(MID(A2, FIND(" “, A2) + 1, FIND(”,", A2) - FIND(" “, A2) -1), 2)) + TIME(IF(EXACT(RIGHT(A2, 2), “am”), MID(A2, FIND(”,", A2)+2, FIND(":", A2) - FIND(",", A2) -2), MID(A2, FIND(",", A2)+2, FIND(":", A2) - FIND(",", A2) -2)+12), MID(A2, FIND(":", A2)+1, 2), 0)

I prefer to keep things broken out into separate columns, because you never know when you might need just one of the components again, plus it’s much easier to troubleshoot if something is broken, which makes it easier to maintain and feel confident that it is correct.

Why go through all this?

DateTime values are sortable, and can be treated by the Chart builder as data, rather than simply as labels for data. This makes it more useful.

Once the hard work has been done (and I just did it, so thank me!) it’s easy to put it into place and just use it.

But that’s not all! DateTimes can be formatted using the formatting options for dates that are built into the spreadsheet. You can format the way the date is displayed in the cell, or in a chart that you make with data from those cells, in any way you like, which is awesome! You can also use DateTime values in DATE and TIME functions, and there’s a bunch of those. This makes working with DateTime values way more useful, functional, flexible, and powerful than working with those ugly text strings.

One last thing

When the current year is over, we start a new year. All of the 2019 date values that were pasted in during the year they were the current year will still be as they were on 1/1/2020; this will cause us a problem with the way our conversion works currently.

Can you see why? It’s because we make the assumption that if the YYYY component is missing from the date string, the date string is from the current year. But that won’t be true any longer for 2019 values once 2019 is no longer the current year.

How can we fix this?

Well, the easiest way to do it is to go back through the range of data values for all of 2019, and use Find/Replace to insert the YYYY value.

Be careful doing this, as you don’t want to insert YYYY into any cells that already have it.

How to do it safely:

  1. Select the range of cells that hold your current-year values.
  2. Press Ctrl+H to bring up the Find and Replace dialog.
  3. For Find , enter a comma ( , ).
  4. For Replace , enter " 2019," (don’t forget the space at the front!)
  5. For Search , select Specific Range , and confirm the range is set correctly.
  6. Click Replace All .
  7. Verify everything was replaced correctly.

Conclusion

Congratulation, a winner is you! Now you can convert Replay’s date strings to DateTime data values. No doubt you’ll be a big hit at parties from now on.

2 Likes